• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

C.M.: Mixtures of probabilistic Principal Component Analyzers. (1999)

by M E Tipping, Bishop
Venue:Neural Comput.
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 532
Next 10 →

Unsupervised learning of finite mixture models

by Mario A. T. Figueiredo, Anil K. Jain - IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE , 2002
"... This paper proposes an unsupervised algorithm for learning a finite mixture model from multivariate data. The adjective ªunsupervisedº is justified by two properties of the algorithm: 1) it is capable of selecting the number of components and 2) unlike the standard expectation-maximization (EM) alg ..."
Abstract - Cited by 418 (22 self) - Add to MetaCart
This paper proposes an unsupervised algorithm for learning a finite mixture model from multivariate data. The adjective ªunsupervisedº is justified by two properties of the algorithm: 1) it is capable of selecting the number of components and 2) unlike the standard expectation-maximization (EM) algorithm, it does not require careful initialization. The proposed method also avoids another drawback of EM for mixture fitting: the possibility of convergence toward a singular estimate at the boundary of the parameter space. The novelty of our approach is that we do not use a model selection criterion to choose one among a set of preestimated candidate models; instead, we seamlessly integrate estimation and model selection in a single algorithm. Our technique can be applied to any type of parametric mixture model for which it is possible to write an EM algorithm; in this paper, we illustrate it with experiments involving Gaussian mixtures. These experiments testify for the good performance of our approach.

Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifolds

by Lawrence K. Saul, Sam T. Roweis, Yoram Singer - Journal of Machine Learning Research , 2003
"... The problem of dimensionality reduction arises in many fields of information processing, including machine learning, data compression, scientific visualization, pattern recognition, and neural computation. ..."
Abstract - Cited by 385 (10 self) - Add to MetaCart
The problem of dimensionality reduction arises in many fields of information processing, including machine learning, data compression, scientific visualization, pattern recognition, and neural computation.

A Unifying Review of Linear Gaussian Models

by Sam Roweis, Zoubin Ghahramani , 1999
"... Factor analysis, principal component analysis, mixtures of gaussian clusters, vector quantization, Kalman filter models, and hidden Markov models can all be unified as variations of unsupervised learning under a single basic generative model. This is achieved by collecting together disparate observa ..."
Abstract - Cited by 351 (18 self) - Add to MetaCart
Factor analysis, principal component analysis, mixtures of gaussian clusters, vector quantization, Kalman filter models, and hidden Markov models can all be unified as variations of unsupervised learning under a single basic generative model. This is achieved by collecting together disparate observations and derivations made by many previous authors and introducing a new way of linking discrete and continuous state models using a simple nonlinearity. Through the use of other nonlinearities, we show how independent component analysis is also a variation of the same basic generative model. We show that factor analysis and mixtures of gaussians can be implemented in autoencoder neural networks and learned using squared error plus the same regularization term. We introduce a new model for static data, known as sensible principal component analysis, as well as a novel concept of spatially adaptive observation noise. We also review some of the literature involving global and local mixtures of the basic models and provide pseudocode for inference and learning for all the basic models.
(Show Context)

Citation Context

...ents of y the model could not be easily corrected since v has spherical covariance (R = fflI). The SPCA model is very similar to the independently proposed probabilistic principal component analysis (=-=Tipping and Bishop, 1997-=-). If we go even further and take the limit R = lim ffl!0 fflI (while keeping the diagonal elements of Q finite 9 ) then we obtain the standard principal component analysis or PCA model. The direction...

Sparse subspace clustering

by Ehsan Elhamifar, René Vidal - In CVPR , 2009
"... We propose a method based on sparse representation (SR) to cluster data drawn from multiple low-dimensional linear or affine subspaces embedded in a high-dimensional space. Our method is based on the fact that each point in a union of subspaces has a SR with respect to a dictionary formed by all oth ..."
Abstract - Cited by 241 (14 self) - Add to MetaCart
We propose a method based on sparse representation (SR) to cluster data drawn from multiple low-dimensional linear or affine subspaces embedded in a high-dimensional space. Our method is based on the fact that each point in a union of subspaces has a SR with respect to a dictionary formed by all other data points. In general, finding such a SR is NP hard. Our key contribution is to show that, under mild assumptions, the SR can be obtained ’exactly ’ by using ℓ1 optimization. The segmentation of the data is obtained by applying spectral clustering to a similarity matrix built from this SR. Our method can handle noise, outliers as well as missing data. We apply our subspace clustering algorithm to the problem of segmenting multiple motions in video. Experiments on 167 video sequences show that our approach significantly outperforms state-of-the-art methods. 1.
(Show Context)

Citation Context

...ve approaches, such as K-subspaces [14], alternate between assigning points to subspaces, and fitting a subspace to each cluster. Statistical approaches, such as Mixtures of Probabilistic PCA (MPPCA) =-=[24]-=-, Multi-Stage Learning (MSL) [22], or [13], assume that the distribution of the data inside each subspace is Gaussian and alternate between data clustering and subspace estimation by applying Expectat...

Probabilistic Independent Component Analysis

by Christian F. Beckmann, Christian F. Beckmann*t, Stephen M. Smith , 2003
"... Independent Component Analysis is becoming a popular exploratory method for analysing complex data such as that from FMRI experiments. The application of such 'model-free' methods, however, has been somewhat restricted both by the view that results can be uninterpretable and by the lack of ..."
Abstract - Cited by 208 (13 self) - Add to MetaCart
Independent Component Analysis is becoming a popular exploratory method for analysing complex data such as that from FMRI experiments. The application of such 'model-free' methods, however, has been somewhat restricted both by the view that results can be uninterpretable and by the lack of ability to quantify statistical significance. We present an integrated approach to Probabilistic ICA for FMRI data that allows for non-square mixing in the presence of Gaussian noise. We employ an objective estimation of the amount of Gaussian noise through Bayesian analysis of the true dimensionality of the data, i.e. the number of activation and non-Gaussian noise sources. Reduction of the data to this 'true' subspace before the ICA decomposition automatically results in an estimate of the noise, leading to the ability to assign significance to voxels in ICA spatial maps. Estimation of the number of intrinsic sources not only enables us to carry out probabilistic modelling, but also achieves an asymptotically unique decomposition of the data. This reduces problems of interpretation, as each final independent component is now much more likely to be due to only one physical or physiological process. We also describe other improvements to standard ICA, such as temporal pre-whitening and variance normafisation of timeseries, the latter being particularly useful in the context of dimensionality reduction when weak activation is present. We discuss the use of prior information about the spatiotemporal nature of the source processes, and an alternative-hypothesis testing approach for inference, using Gaussian mixture models. The performance of our approach is illustrated and evaluated on real and complex artificial FMRI data, and compared to the spatio-temporal accuracy of restfits obtaine...
(Show Context)

Citation Context

...endent components in the signal + noise sub–space and (iii) assessing the statistical significance of estimated sources. At the first stage we employ probabilistic Principal Component Analysis (PPCA, =-=[49]-=-) in order to find an appropriate linear sub-space which contains the sources. The choice of the number of components to extract is a problem of model order selection. Underestimation of the dimension...

Generalized principal component analysis (GPCA)

by René Vidal, Shankar Sastry - IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE , 2003
"... This paper presents an algebro-geometric solution to the problem of segmenting an unknown number of subspaces of unknown and varying dimensions from sample data points. We represent the subspaces with a set of homogeneous polynomials whose degree is the number of subspaces and whose derivatives at a ..."
Abstract - Cited by 206 (36 self) - Add to MetaCart
This paper presents an algebro-geometric solution to the problem of segmenting an unknown number of subspaces of unknown and varying dimensions from sample data points. We represent the subspaces with a set of homogeneous polynomials whose degree is the number of subspaces and whose derivatives at a data point give normal vectors to the subspace passing through the point. When the number of subspaces is known, we show that these polynomials can be estimated linearly from data; hence, subspace segmentation is reduced to classifying one point per subspace. We select these points optimally from the data set by minimizing certain distance function, thus dealing automatically with moderate noise in the data. A basis for the complement of each subspace is then recovered by applying standard PCA to the collection of derivatives (normal vectors). Extensions of GPCA that deal with data in a highdimensional space and with an unknown number of subspaces are also presented. Our experiments on low-dimensional data show that GPCA outperforms existing algebraic algorithms based on polynomial factorization and provides a good initialization to iterative techniques such as K-subspaces and Expectation Maximization. We also present applications of GPCA to computer vision problems such as face clustering, temporal video segmentation, and 3D motion segmentation from point correspondences in multiple affine views.
(Show Context)

Citation Context

.... This can be done using, e.g., K-subspaces [10], an extension of K-means to the case of subspaces, subspace growing and subspace selection [15], or Expectation Maximization (EM) for mixtures of PCAs =-=[19]-=-. Unfortunately, most iterative methods are, in general, very sensitive to initialization; hence, they may not converge to the global optimum [21]. The need for initialization methods has motivated th...

Learning with Labeled and Unlabeled Data

by Matthias Seeger , 2001
"... In this paper, on the one hand, we aim to give a review on literature dealing with the problem of supervised learning aided by additional unlabeled data. On the other hand, being a part of the author's first year PhD report, the paper serves as a frame to bundle related work by the author as we ..."
Abstract - Cited by 202 (3 self) - Add to MetaCart
In this paper, on the one hand, we aim to give a review on literature dealing with the problem of supervised learning aided by additional unlabeled data. On the other hand, being a part of the author's first year PhD report, the paper serves as a frame to bundle related work by the author as well as numerous suggestions for potential future work. Therefore, this work contains more speculative and partly subjective material than the reader might expect from a literature review. We give a rigorous definition of the problem and relate it to supervised and unsupervised learning. The crucial role of prior knowledge is put forward, and we discuss the important notion of input-dependent regularization. We postulate a number of baseline methods, being algorithms or algorithmic schemes which can more or less straightforwardly be applied to the problem, without the need for genuinely new concepts. However, some of them might serve as basis for a genuine method. In the literature revi...
(Show Context)

Citation Context

...s come from simple families such as Gaussians with structurally restricted covariance matrices. Combinations of mixture and latent subspace models have also been considered in numerous variants (e.g. =-=[88]-=-, [34],[93],[5]). Finally note how, within all these models, complexity can be regulated at various levels. The relations between latent and observable variables are kept simple by choosing relatively...

Variational Inference for Bayesian Mixtures of Factor Analysers

by Zoubin Ghahramani, Matthew J. Beal - In Advances in Neural Information Processing Systems 12 , 2000
"... We present an algorithm that infers the model structure of a mixture of factor analysers using an ecient and deterministic variational approximation to full Bayesian integration over model parameters. This procedure can automatically determine the optimal number of components and the local dimension ..."
Abstract - Cited by 191 (22 self) - Add to MetaCart
We present an algorithm that infers the model structure of a mixture of factor analysers using an ecient and deterministic variational approximation to full Bayesian integration over model parameters. This procedure can automatically determine the optimal number of components and the local dimensionality of each component (i.e. the number of factors in each factor analyser). Alternatively it can be used to infer posterior distributions over number of components and dimensionalities. Since all parameters are integrated out the method is not prone to over tting. Using a stochastic procedure for adding components it is possible to perform the variational optimisation incrementally and to avoid local maxima. Results show that the method works very well in practice and correctly infers the number and dimensionality of nontrivial synthetic examples. By importance sampling from the variational approximation we show how to obtain unbiased estimates of the true evidence, the exa...
(Show Context)

Citation Context

...iple of the identity the model becomes a mixture of probabilistic PCAs. Tractable maximum likelihood procedure forstting MFA and MPCA models can be derived from the Expectation Maximisation algorithm =-=[4, 11-=-]. The maximum likelihood (ML) approach to MFA can easily get caught in local maxima. 2 Ueda et al. [12] provide an eective deterministic procedure for avoiding local maxima by considering splitting a...

Multilinear Analysis of Image Ensembles: TensorFaces

by M. Alex O. Vasilescu, Demetri Terzopoulos - IN PROCEEDINGS OF THE EUROPEAN CONFERENCE ON COMPUTER VISION , 2002
"... Natural images are the composite consequence of multiple factors related to scene structure, illumination, and imaging. Multilinear algebra, the algebra of higher-order tensors, offers a potent mathematical framework for analyzing the multifactor structure of image ensembles and for addressing the d ..."
Abstract - Cited by 188 (7 self) - Add to MetaCart
Natural images are the composite consequence of multiple factors related to scene structure, illumination, and imaging. Multilinear algebra, the algebra of higher-order tensors, offers a potent mathematical framework for analyzing the multifactor structure of image ensembles and for addressing the difficult problem of disentangling the constituent factors or modes. Our multilinear modeling technique employs a tensor extension of the conventional matrix singular value decomposition (SVD), known as the N-mode SVD.As a concrete example, we consider the multilinear analysis of ensembles of facial images that combine several modes, including different facial geometries (people), expressions, head poses, and lighting conditions. Our resulting "TensorFaces" representation has several advantages over conventional eigenfaces. More generally, multilinear analysis shows promise as a unifying framework for a variety of computer vision problems.

Feature selection for unsupervised learning

by Jennifer G. Dy, Carla E. Brodley, Jennifer G. Dy, Carla E. Brodley - Journal of Machine Learning Research , 2004
"... In this paper, we identify two issues involved in developing an automated feature subset selection algorithm for unlabeled data: the need for finding the number of clusters in conjunction with feature selection, and the need for normalizing the bias of feature selection criteria with respect to dime ..."
Abstract - Cited by 146 (4 self) - Add to MetaCart
In this paper, we identify two issues involved in developing an automated feature subset selection algorithm for unlabeled data: the need for finding the number of clusters in conjunction with feature selection, and the need for normalizing the bias of feature selection criteria with respect to dimension. We explore the feature selection problem and these issues through FSSEM (Feature Subset Selection using Expectation-Maximization (EM) clustering) and through two different performance criteria for evaluating candidate feature subsets: scatter separability and maximum likelihood. We present proofs on the dimensionality biases of these feature criteria, and present a cross-projection normalization scheme that can be applied to any criterion to ameliorate these biases. Our experiments show the need for feature selection, the need for addressing these two issues, and the effectiveness of our proposed solutions.
(Show Context)

Citation Context

...e each feature has equal cost. Other interesting and current directions in feature selection involving feature transformations are mixtures of principal component analyzers (Kambhatla and Leen, 1997; =-=Tipping and Bishop, 1999-=-) and mixtures of factor analyzers (Ghahramani and Beal, 2000; Ghahramani and Hinton, 1996; Ueda et al., 1999). We consider these mixture algorithms as wrapper approaches. In recent years, more attent...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University