Results 1  10
of
17
Tensor decompositions for learning latent variable models
, 2014
"... This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models—including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation—which exploits a certain tensor structure in their loworder observable mo ..."
Abstract

Cited by 72 (5 self)
 Add to MetaCart
This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models—including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation—which exploits a certain tensor structure in their loworder observable moments (typically, of second and thirdorder). Specifically, parameter estimation is reduced to the problem of extracting a certain (orthogonal) decomposition of a symmetric tensor derived from the moments; this decomposition can be viewed as a natural generalization of the singular value decomposition for matrices. Although tensor decompositions are generally intractable to compute, the decomposition of these specially structured tensors can be efficiently obtained by a variety of approaches, including power iterations and maximization approaches (similar to the case of matrices). A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin’s perturbation theorem for the singular vectors of matrices. This implies a robust and computationally tractable estimation approach for several popular latent variable models.
The Euclidean distance degree of an algebraic variety
, 2013
"... The nearest point map of a real algebraic variety with respect to Euclidean distance is an algebraic function. For instance, for varieties of low rank matrices, the EckartYoung Theorem states that this map is given by the singular value decomposition. This article develops a theory of such nearest ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
The nearest point map of a real algebraic variety with respect to Euclidean distance is an algebraic function. For instance, for varieties of low rank matrices, the EckartYoung Theorem states that this map is given by the singular value decomposition. This article develops a theory of such nearest point maps from the perspective of computational algebraic geometry. The Euclidean distance degree of a variety is the number of critical points of the squared distance to a generic point outside the variety. Focusing on varieties seen in applications, we present numerous tools for exact computations.
Detection of crossing white matter fibers with highorder tensors and
"... rankk decompositions ..."
(Show Context)
On the global convergence of the alternating least squares method for rankone approximation to generic tensors
, 2013
"... Tensor decomposition has important applications in various disciplines, but it remains an extremely challenging task even to this date. A slightly more manageable endeavor has been to find a low rank approximation in place of the decomposition. Even for this less stringent undertaking, it is an es ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Tensor decomposition has important applications in various disciplines, but it remains an extremely challenging task even to this date. A slightly more manageable endeavor has been to find a low rank approximation in place of the decomposition. Even for this less stringent undertaking, it is an established fact that tensors beyond matrices can fail to have best low rank approximations, with the notable exception that the best rankone approximation always exists for tensors of any order. Toward the latter, the most popular approach is the notion of alternating least squares whose specific numerical scheme appears in the form as a variant of the power method. Though the limiting behavior of the objective values is well understood, a proof of global convergence for the iterates themselves has been elusive. This paper partially addresses the missing piece by showing that for almost all tensors, the iterates generated by the alternating least squares method for the rankone approximation converge globally. The underlying technique employed is an eclectic mix of knowledge from algebraic geometry and dynamical system.
FACTORIZATION STRATEGIES FOR THIRDORDER TENSORS ∗
"... Abstract. Operations with tensors, or multiway arrays, have become increasingly prevalent in recent years. Traditionally, tensors are represented or decomposed as a sum of rank1 outer products using either the CANDECOMP/PARAFAC (CP) or the Tucker models, or some variation thereof. Such decompositio ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Operations with tensors, or multiway arrays, have become increasingly prevalent in recent years. Traditionally, tensors are represented or decomposed as a sum of rank1 outer products using either the CANDECOMP/PARAFAC (CP) or the Tucker models, or some variation thereof. Such decompositions are motivated by specific applications where the goal is to find an approximate such representation for a given multiway array. The specifics of the approximate representation (such as how many terms to use in the sum, orthogonality constraints, etc.) depend on the application. In this paper, we explore an alternate representation of tensors which shows promise with respect to the tensor approximation problem. Reminiscent of matrix factorizations, we present a new factorization of a tensor as a product of tensors. To derive the new factorization, we define a closed multiplication operation between tensors. A major motivation for considering this new type of tensor multiplication is to devise new types of factorizations for tensors which can then be used in applications. Specifically, this new multiplication allows us to introduce concepts such as tensor transpose, inverse, and identity, which lead to the notion of an orthogonal tensor. The multiplication also gives rise to a linear operator, and the null space of the resulting operator is identified. We extend the concept of outer products of vectors to outer products of matrices. All derivations are presented for thirdorder tensors. However, they can be easily extended to the orderp (p> 3) case. We conclude with an application in image deblurring. Key words. multilinear algebra, tensor decomposition, singular value decomposition, multidimensional arrays AMS subject classifications. 15A69, 65F30 1. Introduction. With
Uniqueness of Tensor Decompositions with Applications to Polynomial Identifiability. ArXiv 1304.8087
, 2013
"... We give a robust version of the celebrated result of Kruskal on the uniqueness of tensor decompositions: we prove that given a tensor whose decomposition satisfies a robust form of Kruskal’s rank condition, it is possible to approximately recover the decomposition if the tensor is known up to a suff ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
We give a robust version of the celebrated result of Kruskal on the uniqueness of tensor decompositions: we prove that given a tensor whose decomposition satisfies a robust form of Kruskal’s rank condition, it is possible to approximately recover the decomposition if the tensor is known up to a sufficiently small (inverse polynomial) error. Kruskal’s theorem has found many applications in proving the identifiability of parameters for various latent variable models and mixture models such as Hidden Markov models, topic models etc. Our robust version immediately implies identifiability using only polynomially many samples in many of these settings. This polynomial identifiability is an essential first step towards efficient learning algorithms for these models. Recently, algorithms based on tensor decompositions have been used to estimate the parameters of various hidden variable models efficiently in special cases as long as they satisfy certain “nondegeneracy ” properties. Our methods give a way to go beyond this nondegeneracy barrier, and establish polynomial identifiablity of the parameters under much milder conditions. Given the importance of Kruskal’s theorem in the tensor literature, we expect that this robust version will have several applications beyond the settings we explore in this work.
CAYLEY’S HYPERDETERMINANT: A COMBINATORIAL APPROACH VIA REPRESENTATION THEORY
"... Abstract. Cayley’s hyperdeterminant is a homogeneous polynomial of degree 4 in the 8 entries of a 2×2×2 array. It is the simplest (nonconstant) polynomial which is invariant under changes of basis in three directions. We use elementary facts about representations of the 3dimensional simple Lie alge ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Cayley’s hyperdeterminant is a homogeneous polynomial of degree 4 in the 8 entries of a 2×2×2 array. It is the simplest (nonconstant) polynomial which is invariant under changes of basis in three directions. We use elementary facts about representations of the 3dimensional simple Lie algebra sl2(C) to reduce the problem of finding the invariant polynomials for a 2 × 2 × 2 array to a combinatorial problem on the enumeration of 2 × 2 × 2 arrays with nonnegative integer entries. We then apply results from linear algebra to obtain a new proof that Cayley’s hyperdeterminant generates all the invariants. 1.
Provable Algorithms for Machine Learning Problems
, 2013
"... Modern machine learning algorithms can extract useful information from text, images and videos. All these applications involve solving NPhard problems in average case using heuristics. What properties of the input allow it to be solved efficiently? Theoretically analyzing the heuristics is often v ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Modern machine learning algorithms can extract useful information from text, images and videos. All these applications involve solving NPhard problems in average case using heuristics. What properties of the input allow it to be solved efficiently? Theoretically analyzing the heuristics is often very challenging. Few results were known. This thesis takes a different approach: we identify natural properties of the input, then design new algorithms that provably works assuming the input has these properties. We are able to give new, provable and sometimes practical algorithms for learning tasks related to text corpus, images and social networks. The first part of the thesis presents new algorithms for learning thematic structure in documents. We show under a reasonable assumption, it is possible to provably learn many topic models, including the famous Latent Dirichlet Allocation. Our algorithm is the first provable algorithms for topic modeling. An implementation runs 50 times faster than latest MCMC implementation and produces comparable results. The second part of the thesis provides ideas for provably learning deep, sparse representations. We start with sparse linear representations, and give the first algorithm for dictionary learning problem with provable guarantees. Then we apply similar ideas to deep learning: under reasonable assumptions our algorithms can learn a deep network built by denoising autoencoders. The final part of the thesis develops a framework for learning latent variable models. We demonstrate how various latent variable models can be reduced to orthogonal tensor decomposition, and then be solved using tensor power method. We give a tight perturbation analysis for tensor power method, which reduces the number of samples required to learn many latent variable models. In theory, the assumptions in this thesis help us understand why intractable problems in machine learning can often be solved; in practice, the results suggest inherently new approaches for machine learning. We hope the assumptions and algorithms inspire new research problems and learning algorithms. iii
A GREEDY ALGORITHM FOR MODEL SELECTION OF TENSOR DECOMPOSITIONS
"... Various tensor decompositions use different arrangements of factors to explain multiway data. Components from different decompositions can vary in the number of parameters. Allowing a model to contain components from different decompositions results in a combinatoric number of possible models. We c ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Various tensor decompositions use different arrangements of factors to explain multiway data. Components from different decompositions can vary in the number of parameters. Allowing a model to contain components from different decompositions results in a combinatoric number of possible models. We consider model selection to balance approximation error with the number of parameters, but due to the number of possibilities posthoc model selection is infeasible. Instead, we incrementally build a model. This approach is analogous to sparse coding with a union of dictionaries. The proposed greedy approach can estimate a model consisting of a combination of tensor decompositions. Index Terms — tensor decompositions, greedy algorithm, model selection 1.
ON THE GLOBAL CONVERGENCE OF THE HIGHORDER POWER METHOD FOR RANKONE TENSOR APPROXIMATION
, 2013
"... Tensor decomposition has consequential applications in various fields of disciplines, but it remains to be an extremely challenging task even to this date. A slightly more manageable endeavor has been to find a low rank approximation in place of the decomposition. Even for this less stringent task ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Tensor decomposition has consequential applications in various fields of disciplines, but it remains to be an extremely challenging task even to this date. A slightly more manageable endeavor has been to find a low rank approximation in place of the decomposition. Even for this less stringent task, it is an established fact that tensors beyond matrices can fail to have best low rank approximations, with the notable exception that the best rankone approximation always exists for tensors of any order. Toward the latter, the most popular approach is via the notion of alternating projection. The specific numerical scheme appears as a variant of the power method. The so called proofs of convergence in the literature, however, have been only on the limiting behavior of the objective values. This is not enough, as the dynamics of the iterates itself should also be analyzed. The purpose of this paper is to fill the gap by presenting a rigorous convergence analysis of the power method for the rankone approximation. Also briefly discussed is a comparison of the power method with the global optimization.