Results 1  10
of
67
Mixtures of Probabilistic Principal Component Analysers
, 1998
"... Principal component analysis (PCA) is one of the most popular techniques for processing, compressing and visualising data, although its effectiveness is limited by its global linearity. While nonlinear variants of PCA have been proposed, an alternative paradigm is to capture data complexity by a com ..."
Abstract

Cited by 532 (6 self)
 Add to MetaCart
Principal component analysis (PCA) is one of the most popular techniques for processing, compressing and visualising data, although its effectiveness is limited by its global linearity. While nonlinear variants of PCA have been proposed, an alternative paradigm is to capture data complexity by a combination of local linear PCA projections. However, conventional PCA does not correspond to a probability density, and so there is no unique way to combine PCA models. Previous attempts to formulate mixture models for PCA have therefore to some extent been ad hoc. In this paper, PCA is formulated within a maximumlikelihood framework, based on a specific form of Gaussian latent variable model. This leads to a welldefined mixture model for probabilistic principal component analysers, whose parameters can be determined using an EM algorithm. We discuss the advantages of this model in the context of clustering, density modelling and local dimensionality reduction, and we demonstrate its applicat...
GTM: The generative topographic mapping
 Neural Computation
, 1998
"... Latent variable models represent the probability density of data in a space of several dimensions in terms of a smaller number of latent, or hidden, variables. A familiar example is factor analysis which is based on a linear transformations between the latent space and the data space. In this paper ..."
Abstract

Cited by 361 (6 self)
 Add to MetaCart
(Show Context)
Latent variable models represent the probability density of data in a space of several dimensions in terms of a smaller number of latent, or hidden, variables. A familiar example is factor analysis which is based on a linear transformations between the latent space and the data space. In this paper we introduce a form of nonlinear latent variable model called the Generative Topographic Mapping for which the parameters of the model can be determined using the EM algorithm. GTM provides a principled alternative to the widely used SelfOrganizing Map (SOM) of Kohonen (1982), and overcomes most of the significant limitations of the SOM. We demonstrate the performance of the GTM algorithm on a toy problem and on simulated data from flow diagnostics for a multiphase oil pipeline. Copyright c○MIT Press (1998). 1
A Survey of Dimension Reduction Techniques
, 2002
"... this paper, we assume that we have n observations, each being a realization of the p dimensional random variable x = (x 1 , . . . , x p ) with mean E(x) = = ( 1 , . . . , p ) and covariance matrix E{(x )(x = # pp . We denote such an observation matrix by X = i,j : 1 p, 1 ..."
Abstract

Cited by 141 (0 self)
 Add to MetaCart
(Show Context)
this paper, we assume that we have n observations, each being a realization of the p dimensional random variable x = (x 1 , . . . , x p ) with mean E(x) = = ( 1 , . . . , p ) and covariance matrix E{(x )(x = # pp . We denote such an observation matrix by X = i,j : 1 p, 1 n}. If i and # i = # (i,i) denote the mean and the standard deviation of the ith random variable, respectively, then we will often standardize the observations x i,j by (x i,j i )/ # i , where i = x i = 1/n j=1 x i,j , and # i = 1/n j=1 (x i,j x i )
Learning and Design of Principal Curves
, 2000
"... Principal curves have been defined as ``self consistent'' smooth curves which pass through the ``middle'' of a $d$dimensional probability distribution or data cloud. They give a summary of the data and also serve as an efficient feature extraction tool. We take a new approach by ..."
Abstract

Cited by 105 (4 self)
 Add to MetaCart
(Show Context)
Principal curves have been defined as ``self consistent'' smooth curves which pass through the ``middle'' of a $d$dimensional probability distribution or data cloud. They give a summary of the data and also serve as an efficient feature extraction tool. We take a new approach by defining principal curves as continuous curves of a given length which minimize the expected squared distance between the curve and points of the space randomly chosen according to a given distribution. The new definition makes it possible to theoretically analyze principal curve learning from training data and it also leads to a new practical construction. Our theoretical learning scheme chooses a curve from a class of polygonal lines with $k$ segments and with a given total length, to minimize the average squared distance over $n$ training points drawn independently. Convergence properties of this learning scheme are analyzed and a practical version of this theoretical algorithm is implemented. In each iteration of the algorithm a new vertex is added to the polygonal line and the positions of the vertices are updated so that they minimize a penalized squared distance criterion. Simulation results demonstrate that the new algorithm compares favorably with previous methods both in terms of performance and computational complexity, and is more robust to varying data models.
A unified model for probabilistic principal surfaces
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2001
"... AbstractÐPrincipal curves and surfaces are nonlinear generalizations of principal components and subspaces, respectively. They can provide insightful summary of highdimensional data not typically attainable by classical linear methods. Solutions to several problems, such as proof of existence and c ..."
Abstract

Cited by 61 (6 self)
 Add to MetaCart
AbstractÐPrincipal curves and surfaces are nonlinear generalizations of principal components and subspaces, respectively. They can provide insightful summary of highdimensional data not typically attainable by classical linear methods. Solutions to several problems, such as proof of existence and convergence, faced by the original principal curve formulation have been proposed in the past few years. Nevertheless, these solutions are not generally extensible to principal surfaces, the mere computation of which presents a formidable obstacle. Consequently, relatively few studies of principal surfaces are available. Recently, we proposed the probabilistic principal surface (PPS) to address a number of issues associated with current principal surface algorithms. PPS uses a manifold oriented covariance noise model, based on the generative topographical mapping (GTM), which can be viewed as a parametric formulation of Kohonen's selforganizing map. Building on the PPS, we introduce a unified covariance model that implements PPS … 0< <1†, GTM … ˆ 1†, and the manifoldaligned GTM …>1† by varying the clamping parameter. Then, we comprehensively evaluate the empirical performance (reconstruction error) of PPS, GTM, and the manifoldaligned GTM on three popular benchmark data sets. It is shown in two different comparisons that the PPS outperforms the GTM under identical parameter settings. Convergence of the PPS is found to be identical to that of the GTM and the computational overhead incurred by the PPS decreases to 40 percent or less for more complex manifolds. These results show that the generalized PPS provides a flexible and effective way of obtaining principal surfaces. Index TermsÐPrincipal curve, principal surface, probabilistic, dimensionality reduction, nonlinear manifold, generative topographic mapping. 1
Continuous latent variable models for dimensionality reduction and sequential data reconstruction
, 2001
"... ..."
Developments of the generative topographic mapping
 Neurocomputing
, 1998
"... 1 Introduction Probability theory provides a powerful, consistent framework for dealing quantitatively with uncertainty (10). It is therefore ideally suited as a theoretical foundation for pattern recognition. Recently, the selforganizing map (SOM) of 19) was reformulated within a probabilistic s ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
(Show Context)
1 Introduction Probability theory provides a powerful, consistent framework for dealing quantitatively with uncertainty (10). It is therefore ideally suited as a theoretical foundation for pattern recognition. Recently, the selforganizing map (SOM) of 19) was reformulated within a probabilistic setting(7) to give the GTM (Generative Topographic Mapping). In going to a probabilistic formulation, several limitations of the SOM were overcome, including the absence of a cost function and thelack of a convergence proof.
A KSegments Algorithm for Finding Principal Curves
 Pattern Recognition Letters
, 2000
"... We propose an incremental method to find principal curves. Line segments are fitted and connected to form polygonal lines. New segments are inserted until a performance criterion is met. Experimental results illustrate the performance of the method compared to other existing approaches. ..."
Abstract

Cited by 24 (2 self)
 Add to MetaCart
We propose an incremental method to find principal curves. Line segments are fitted and connected to form polygonal lines. New segments are inserted until a performance criterion is met. Experimental results illustrate the performance of the method compared to other existing approaches.
Principal curve clustering with noise
, 1997
"... was supported by ONR grants N000149610192 and N000149610330. The authors are Clustering on principal curves combines parametric modeling of noise with nonparametric modeling of feature shape. This is useful for detecting curvilinear features in spatial point patterns, with or without backgroun ..."
Abstract

Cited by 21 (4 self)
 Add to MetaCart
was supported by ONR grants N000149610192 and N000149610330. The authors are Clustering on principal curves combines parametric modeling of noise with nonparametric modeling of feature shape. This is useful for detecting curvilinear features in spatial point patterns, with or without background noise. Applications of this include the detection of curvilinear mine elds from reconnaissance images, some of the points in which represent false detections, and the detection of seismic faults from earthquake catalogs. Our algorithm for principal curve clustering is in two steps: the rst is hierarchical and agglomerative (HPCC), and the second consists of iterative relocation based on the Classi cation EM algorithm (CEMPCC). HPCC is used to combine potential feature clusters, while CEMPCC re nes the results and deals with background noise. It is importanttohave a good starting point for the algorithm: this can be found manually or automatically using, for example, nearest neighbor clutter removal or modelbased clustering. We choose the number of features and the amount of smoothing simultaneously using approximate Bayes