Results 1  10
of
532
Unsupervised learning of finite mixture models
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2002
"... This paper proposes an unsupervised algorithm for learning a finite mixture model from multivariate data. The adjective ªunsupervisedº is justified by two properties of the algorithm: 1) it is capable of selecting the number of components and 2) unlike the standard expectationmaximization (EM) alg ..."
Abstract

Cited by 418 (22 self)
 Add to MetaCart
This paper proposes an unsupervised algorithm for learning a finite mixture model from multivariate data. The adjective ªunsupervisedº is justified by two properties of the algorithm: 1) it is capable of selecting the number of components and 2) unlike the standard expectationmaximization (EM) algorithm, it does not require careful initialization. The proposed method also avoids another drawback of EM for mixture fitting: the possibility of convergence toward a singular estimate at the boundary of the parameter space. The novelty of our approach is that we do not use a model selection criterion to choose one among a set of preestimated candidate models; instead, we seamlessly integrate estimation and model selection in a single algorithm. Our technique can be applied to any type of parametric mixture model for which it is possible to write an EM algorithm; in this paper, we illustrate it with experiments involving Gaussian mixtures. These experiments testify for the good performance of our approach.
Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifolds
 Journal of Machine Learning Research
, 2003
"... The problem of dimensionality reduction arises in many fields of information processing, including machine learning, data compression, scientific visualization, pattern recognition, and neural computation. ..."
Abstract

Cited by 385 (10 self)
 Add to MetaCart
The problem of dimensionality reduction arises in many fields of information processing, including machine learning, data compression, scientific visualization, pattern recognition, and neural computation.
A Unifying Review of Linear Gaussian Models
, 1999
"... Factor analysis, principal component analysis, mixtures of gaussian clusters, vector quantization, Kalman filter models, and hidden Markov models can all be unified as variations of unsupervised learning under a single basic generative model. This is achieved by collecting together disparate observa ..."
Abstract

Cited by 351 (18 self)
 Add to MetaCart
(Show Context)
Factor analysis, principal component analysis, mixtures of gaussian clusters, vector quantization, Kalman filter models, and hidden Markov models can all be unified as variations of unsupervised learning under a single basic generative model. This is achieved by collecting together disparate observations and derivations made by many previous authors and introducing a new way of linking discrete and continuous state models using a simple nonlinearity. Through the use of other nonlinearities, we show how independent component analysis is also a variation of the same basic generative model. We show that factor analysis and mixtures of gaussians can be implemented in autoencoder neural networks and learned using squared error plus the same regularization term. We introduce a new model for static data, known as sensible principal component analysis, as well as a novel concept of spatially adaptive observation noise. We also review some of the literature involving global and local mixtures of the basic models and provide pseudocode for inference and learning for all the basic models.
Sparse subspace clustering
 In CVPR
, 2009
"... We propose a method based on sparse representation (SR) to cluster data drawn from multiple lowdimensional linear or affine subspaces embedded in a highdimensional space. Our method is based on the fact that each point in a union of subspaces has a SR with respect to a dictionary formed by all oth ..."
Abstract

Cited by 241 (14 self)
 Add to MetaCart
(Show Context)
We propose a method based on sparse representation (SR) to cluster data drawn from multiple lowdimensional linear or affine subspaces embedded in a highdimensional space. Our method is based on the fact that each point in a union of subspaces has a SR with respect to a dictionary formed by all other data points. In general, finding such a SR is NP hard. Our key contribution is to show that, under mild assumptions, the SR can be obtained ’exactly ’ by using ℓ1 optimization. The segmentation of the data is obtained by applying spectral clustering to a similarity matrix built from this SR. Our method can handle noise, outliers as well as missing data. We apply our subspace clustering algorithm to the problem of segmenting multiple motions in video. Experiments on 167 video sequences show that our approach significantly outperforms stateoftheart methods. 1.
Probabilistic Independent Component Analysis
, 2003
"... Independent Component Analysis is becoming a popular exploratory method for analysing complex data such as that from FMRI experiments. The application of such 'modelfree' methods, however, has been somewhat restricted both by the view that results can be uninterpretable and by the lack of ..."
Abstract

Cited by 208 (13 self)
 Add to MetaCart
(Show Context)
Independent Component Analysis is becoming a popular exploratory method for analysing complex data such as that from FMRI experiments. The application of such 'modelfree' methods, however, has been somewhat restricted both by the view that results can be uninterpretable and by the lack of ability to quantify statistical significance. We present an integrated approach to Probabilistic ICA for FMRI data that allows for nonsquare mixing in the presence of Gaussian noise. We employ an objective estimation of the amount of Gaussian noise through Bayesian analysis of the true dimensionality of the data, i.e. the number of activation and nonGaussian noise sources. Reduction of the data to this 'true' subspace before the ICA decomposition automatically results in an estimate of the noise, leading to the ability to assign significance to voxels in ICA spatial maps. Estimation of the number of intrinsic sources not only enables us to carry out probabilistic modelling, but also achieves an asymptotically unique decomposition of the data. This reduces problems of interpretation, as each final independent component is now much more likely to be due to only one physical or physiological process. We also describe other improvements to standard ICA, such as temporal prewhitening and variance normafisation of timeseries, the latter being particularly useful in the context of dimensionality reduction when weak activation is present. We discuss the use of prior information about the spatiotemporal nature of the source processes, and an alternativehypothesis testing approach for inference, using Gaussian mixture models. The performance of our approach is illustrated and evaluated on real and complex artificial FMRI data, and compared to the spatiotemporal accuracy of restfits obtaine...
Generalized principal component analysis (GPCA)
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2003
"... This paper presents an algebrogeometric solution to the problem of segmenting an unknown number of subspaces of unknown and varying dimensions from sample data points. We represent the subspaces with a set of homogeneous polynomials whose degree is the number of subspaces and whose derivatives at a ..."
Abstract

Cited by 206 (36 self)
 Add to MetaCart
(Show Context)
This paper presents an algebrogeometric solution to the problem of segmenting an unknown number of subspaces of unknown and varying dimensions from sample data points. We represent the subspaces with a set of homogeneous polynomials whose degree is the number of subspaces and whose derivatives at a data point give normal vectors to the subspace passing through the point. When the number of subspaces is known, we show that these polynomials can be estimated linearly from data; hence, subspace segmentation is reduced to classifying one point per subspace. We select these points optimally from the data set by minimizing certain distance function, thus dealing automatically with moderate noise in the data. A basis for the complement of each subspace is then recovered by applying standard PCA to the collection of derivatives (normal vectors). Extensions of GPCA that deal with data in a highdimensional space and with an unknown number of subspaces are also presented. Our experiments on lowdimensional data show that GPCA outperforms existing algebraic algorithms based on polynomial factorization and provides a good initialization to iterative techniques such as Ksubspaces and Expectation Maximization. We also present applications of GPCA to computer vision problems such as face clustering, temporal video segmentation, and 3D motion segmentation from point correspondences in multiple affine views.
Learning with Labeled and Unlabeled Data
, 2001
"... In this paper, on the one hand, we aim to give a review on literature dealing with the problem of supervised learning aided by additional unlabeled data. On the other hand, being a part of the author's first year PhD report, the paper serves as a frame to bundle related work by the author as we ..."
Abstract

Cited by 202 (3 self)
 Add to MetaCart
(Show Context)
In this paper, on the one hand, we aim to give a review on literature dealing with the problem of supervised learning aided by additional unlabeled data. On the other hand, being a part of the author's first year PhD report, the paper serves as a frame to bundle related work by the author as well as numerous suggestions for potential future work. Therefore, this work contains more speculative and partly subjective material than the reader might expect from a literature review. We give a rigorous definition of the problem and relate it to supervised and unsupervised learning. The crucial role of prior knowledge is put forward, and we discuss the important notion of inputdependent regularization. We postulate a number of baseline methods, being algorithms or algorithmic schemes which can more or less straightforwardly be applied to the problem, without the need for genuinely new concepts. However, some of them might serve as basis for a genuine method. In the literature revi...
Variational Inference for Bayesian Mixtures of Factor Analysers
 In Advances in Neural Information Processing Systems 12
, 2000
"... We present an algorithm that infers the model structure of a mixture of factor analysers using an ecient and deterministic variational approximation to full Bayesian integration over model parameters. This procedure can automatically determine the optimal number of components and the local dimension ..."
Abstract

Cited by 191 (22 self)
 Add to MetaCart
(Show Context)
We present an algorithm that infers the model structure of a mixture of factor analysers using an ecient and deterministic variational approximation to full Bayesian integration over model parameters. This procedure can automatically determine the optimal number of components and the local dimensionality of each component (i.e. the number of factors in each factor analyser). Alternatively it can be used to infer posterior distributions over number of components and dimensionalities. Since all parameters are integrated out the method is not prone to over tting. Using a stochastic procedure for adding components it is possible to perform the variational optimisation incrementally and to avoid local maxima. Results show that the method works very well in practice and correctly infers the number and dimensionality of nontrivial synthetic examples. By importance sampling from the variational approximation we show how to obtain unbiased estimates of the true evidence, the exa...
Multilinear Analysis of Image Ensembles: TensorFaces
 IN PROCEEDINGS OF THE EUROPEAN CONFERENCE ON COMPUTER VISION
, 2002
"... Natural images are the composite consequence of multiple factors related to scene structure, illumination, and imaging. Multilinear algebra, the algebra of higherorder tensors, offers a potent mathematical framework for analyzing the multifactor structure of image ensembles and for addressing the d ..."
Abstract

Cited by 188 (7 self)
 Add to MetaCart
Natural images are the composite consequence of multiple factors related to scene structure, illumination, and imaging. Multilinear algebra, the algebra of higherorder tensors, offers a potent mathematical framework for analyzing the multifactor structure of image ensembles and for addressing the difficult problem of disentangling the constituent factors or modes. Our multilinear modeling technique employs a tensor extension of the conventional matrix singular value decomposition (SVD), known as the Nmode SVD.As a concrete example, we consider the multilinear analysis of ensembles of facial images that combine several modes, including different facial geometries (people), expressions, head poses, and lighting conditions. Our resulting "TensorFaces" representation has several advantages over conventional eigenfaces. More generally, multilinear analysis shows promise as a unifying framework for a variety of computer vision problems.
Feature selection for unsupervised learning
 Journal of Machine Learning Research
, 2004
"... In this paper, we identify two issues involved in developing an automated feature subset selection algorithm for unlabeled data: the need for finding the number of clusters in conjunction with feature selection, and the need for normalizing the bias of feature selection criteria with respect to dime ..."
Abstract

Cited by 146 (4 self)
 Add to MetaCart
(Show Context)
In this paper, we identify two issues involved in developing an automated feature subset selection algorithm for unlabeled data: the need for finding the number of clusters in conjunction with feature selection, and the need for normalizing the bias of feature selection criteria with respect to dimension. We explore the feature selection problem and these issues through FSSEM (Feature Subset Selection using ExpectationMaximization (EM) clustering) and through two different performance criteria for evaluating candidate feature subsets: scatter separability and maximum likelihood. We present proofs on the dimensionality biases of these feature criteria, and present a crossprojection normalization scheme that can be applied to any criterion to ameliorate these biases. Our experiments show the need for feature selection, the need for addressing these two issues, and the effectiveness of our proposed solutions.