Results 1 - 10
of
80
Tensor decompositions for learning latent variable models
, 2014
"... This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models—including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation—which exploits a certain tensor structure in their low-order observable mo ..."
Abstract
-
Cited by 83 (7 self)
- Add to MetaCart
This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models—including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation—which exploits a certain tensor structure in their low-order observable moments (typically, of second- and third-order). Specifically, parameter estimation is reduced to the problem of extracting a certain (orthog-onal) decomposition of a symmetric tensor derived from the moments; this decomposition can be viewed as a natural generalization of the singular value decomposition for matrices. Although tensor decompositions are generally intractable to compute, the decomposition of these specially structured tensors can be efficiently obtained by a variety of approaches, including power iterations and maximization approaches (similar to the case of matrices). A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin’s perturbation theorem for the singular vectors of matrices. This implies a ro-bust and computationally tractable estimation approach for several popular latent variable models.
Latent Variable Graphical Model Selection via Convex Optimization
, 2010
"... Suppose we have samples of a subset of a collection of random variables. No additional information is provided about the number of latent variables, nor of the relationship between the latent and observed variables. Is it possible to discover the number of hidden components, and to learn a statistic ..."
Abstract
-
Cited by 76 (4 self)
- Add to MetaCart
Suppose we have samples of a subset of a collection of random variables. No additional information is provided about the number of latent variables, nor of the relationship between the latent and observed variables. Is it possible to discover the number of hidden components, and to learn a statistical model over the entire collection of variables? We address this question in the setting in which the latent and observed variables are jointly Gaussian, with the conditional statistics of the observed variables conditioned on the latent variables being specified by a graphical model. As a first step we give natural conditions under which such latent-variable Gaussian graphical models are identifiable given marginal statistics of only the observed variables. Essentially these conditions require that the conditional graphical model among the observed variables is sparse, while the effect of the latent variables is “spread out ” over most of the observed variables. Next we propose a tractable convex program based on regularized maximum-likelihood for model selection in this latent-variable setting; the regularizer uses both the ℓ1 norm and the nuclear norm. Our modeling framework can be viewed as a combination of dimensionality reduction (to identify latent variables) and graphical modeling (to capture remaining statistical structure not attributable to the latent variables), and it consistently estimates both the number of hidden components and the conditional graphical model structure among the observed variables. These results are applicable in the high-dimensional setting in which the number of latent/observed variables grows with the number of samples of the observed variables. The geometric properties of the algebraic varieties of sparse matrices and of low-rank matrices play an important role in our analysis.
Supplement A to “Overlapping stochastic block models with application to the French blogosphere network.” DOI: 10.12.14/10-AOAS382SUPP
, 2010
"... ar ..."
T.: Spectral methods for learning multivariate latent tree structure
- In: Advances in Neural Information Processing Systems 24
, 2011
"... Abstract This work considers the problem of learning the structure of multivariate linear tree models, which include a variety of directed tree graphical models with continuous, discrete, and mixed latent variables such as linear-Gaussian models, hidden Markov models, Gaussian mixture models, and M ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
(Show Context)
Abstract This work considers the problem of learning the structure of multivariate linear tree models, which include a variety of directed tree graphical models with continuous, discrete, and mixed latent variables such as linear-Gaussian models, hidden Markov models, Gaussian mixture models, and Markov evolutionary trees. The setting is one where we only have samples from certain observed variables in the tree, and our goal is to estimate the tree structure (i.e., the graph of how the underlying hidden variables are connected to each other and to the observed variables). We propose the Spectral Recursive Grouping algorithm, an efficient and simple bottom-up procedure for recovering the tree structure from independent samples of the observed variables. Our finite sample size bounds for exact recovery of the tree structure reveal certain natural dependencies on underlying statistical and structural properties of the underlying joint distribution. Furthermore, our sample complexity guarantees have no explicit dependence on the dimensionality of the observed variables, making the algorithm applicable to many high-dimensional settings. At the heart of our algorithm is a spectral quartet test for determining the relative topology of a quartet of variables from second-order statistics.
On generic identifiability of 3-tensors of small rank
- SIAM Journal on Matrix Analysis and Applications
"... ar ..."
A concise proof of Kruskal’s theorem on tensor decomposition
- LINEAR ALGEBRA AND ITS APPLICATIONS
, 2010
"... ..."
Smoothed likelihood for multivariate mixtures
, 2010
"... We introduce an algorithm for estimating the parameters in a finite mixture of completely unspecified multivariate components in at least three dimensions under the assumption of conditionally independent coordinate dimensions. We prove that this algorithm, based on a majorization-minimization idea, ..."
Abstract
-
Cited by 12 (6 self)
- Add to MetaCart
We introduce an algorithm for estimating the parameters in a finite mixture of completely unspecified multivariate components in at least three dimensions under the assumption of conditionally independent coordinate dimensions. We prove that this algorithm, based on a majorization-minimization idea, possesses a desirable descent property just as any EM algorithm does. We discuss the similarities between our algorithm and a related one—the so-called nonlinearly smoothed EM, or NEMS, algorithm for the non-mixture setting. We also demonstrate via simulation studies that the new algorithm gives very similar results to another algorithm that does not satisfy any descent algorithm, thus validating the latter algorithm, which can be simpler to program. We provide code for implementing the new algorithm in a publicly-available R package. Keywords: mixtures
Nonparametric Identification and Estimation of Random Coefficients in Nonlinear Economic Models
, 2010
"... We show how to nonparametrically identify and estimate the distribution of random coefficients that characterizes the heterogeneity among agents in a general class of economic choice models. We introduce an axiom that we term separability and prove that separability of a structural model ensures ide ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
We show how to nonparametrically identify and estimate the distribution of random coefficients that characterizes the heterogeneity among agents in a general class of economic choice models. We introduce an axiom that we term separability and prove that separability of a structural model ensures identification. Identification naturally gives rise to a nonparametric minimum distance estimator. We prove identification of distributions of utility functions in multinomial choice, distributions of labor supply responses to tax changes, and distributions of wage functions in the Roy selection model. We also reconsider the problem of endogeneity in economic choice models, leading to new results on the two-stage least squares model.
When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity
, 2013
"... Overcomplete latent representations have been very popular for unsupervised feature learning in recent years. In this paper, we specify which overcomplete models can be identified given observable moments of a certain order. We consider probabilistic admixture or topic models in the overcomplete reg ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
Overcomplete latent representations have been very popular for unsupervised feature learning in recent years. In this paper, we specify which overcomplete models can be identified given observable moments of a certain order. We consider probabilistic admixture or topic models in the overcomplete regime, where the number of latent topics can greatly exceed the size of the observed word vocabulary. While general overcomplete topic models are not identifiable, we establish generic identifiability under a constraint, referredto as topic persistence. Our sufficient conditions for identifiability involve a novel set of “higher order ” expansion conditions on the topic-word matrix or the population structure of the model. This set of higher-order expansion conditions allow for overcomplete models, and require the existence of a perfect matching from latent topics to higher order observed words. We establish that random structured topic models are identifiable w.h.p. in the overcomplete regime. Our identifiability results allows for general (non-degenerate) distributions for modeling the topic proportions, and thus, we can handle arbitrarily correlated topics in our framework. Our identifiability results imply uniqueness of a class of tensor decompositions with structured sparsity which is contained in the class of Tucker decompositions, but is more general than the Candecomp/Parafac (CP) decomposition.