Results 1  10
of
256
Tensor decompositions for learning latent variable models
, 2014
"... This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models—including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation—which exploits a certain tensor structure in their loworder observable mo ..."
Abstract

Cited by 83 (7 self)
 Add to MetaCart
This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models—including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation—which exploits a certain tensor structure in their loworder observable moments (typically, of second and thirdorder). Specifically, parameter estimation is reduced to the problem of extracting a certain (orthogonal) decomposition of a symmetric tensor derived from the moments; this decomposition can be viewed as a natural generalization of the singular value decomposition for matrices. Although tensor decompositions are generally intractable to compute, the decomposition of these specially structured tensors can be efficiently obtained by a variety of approaches, including power iterations and maximization approaches (similar to the case of matrices). A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin’s perturbation theorem for the singular vectors of matrices. This implies a robust and computationally tractable estimation approach for several popular latent variable models.
Fast Approximation of Matrix Coherence and Statistical Leverage
"... The statistical leverage scores of a matrix A are the squared rownorms of the matrix containing its (top) left singular vectors and the coherence is the largest leverage score. These quantities are of interest in recentlypopular problems such as matrix completion and Nyströmbased lowrank matrix ..."
Abstract

Cited by 53 (11 self)
 Add to MetaCart
(Show Context)
The statistical leverage scores of a matrix A are the squared rownorms of the matrix containing its (top) left singular vectors and the coherence is the largest leverage score. These quantities are of interest in recentlypopular problems such as matrix completion and Nyströmbased lowrank matrix approximation as well as in largescale statistical data analysis applications more generally; moreover, they are of interest since they define the key structural nonuniformity that must be dealt with in developing fast randomized matrix algorithms. Our main result is a randomized algorithm that takes as input an arbitrary n×d matrix A, with n ≫ d, and that returns as output relativeerror approximations to all n of the statistical leverage scores. The proposed algorithm runs (under assumptions on the precise values of n and d) in O(nd logn) time, as opposed to the O(nd 2) time required by the naïve algorithm that involves computing an orthogonal basis for the range of A. Our analysis may be viewed in terms of computing a relativeerror approximation to an underconstrained leastsquares approximation problem, or, relatedly, it may be viewed as an application of JohnsonLindenstrauss type ideas. Several practicallyimportant extensions of our basic result are also described, including the approximation of socalled crossleverage scores, the extension of these ideas to matrices with n≈d, and the extension to streaming environments.
Randomized Matrix Computations
, 2012
"... We propose new effective randomized algorithms for some fundamental matrix computations such as preconditioning of an ill conditioned matrix that has a small numerical nullity or rank, its 2by2 block triangulation, numerical stabilization of Gaussian elimination with no pivoting, and approximation ..."
Abstract

Cited by 49 (5 self)
 Add to MetaCart
(Show Context)
We propose new effective randomized algorithms for some fundamental matrix computations such as preconditioning of an ill conditioned matrix that has a small numerical nullity or rank, its 2by2 block triangulation, numerical stabilization of Gaussian elimination with no pivoting, and approximation of a matrix by lowrank matrices and by structured matrices. Our technical advances include estimating the condition number of a random Toeplitz matrix, novel techniques of randomized preprocessing, a proof of their preconditioning power, and a dual version of the Sherman–Morrison–Woodbury formula. According to both our formal study and numerical tests we significantly accelerate the known algorithms and improve their output accuracy.
Multiview learning of word embeddings via cca
 In Proc. of NIPS
, 2011
"... Recently, there has been substantial interest in using large amounts of unlabeled data to learn word representations which can then be used as features in supervised classifiers for NLP tasks. However, most current approaches are slow to train, do not model the context of the word, and lack theoreti ..."
Abstract

Cited by 41 (8 self)
 Add to MetaCart
(Show Context)
Recently, there has been substantial interest in using large amounts of unlabeled data to learn word representations which can then be used as features in supervised classifiers for NLP tasks. However, most current approaches are slow to train, do not model the context of the word, and lack theoretical grounding. In this paper, we present a new learning method, Low Rank MultiView Learning (LRMVL) which uses a fast spectral method to estimate low dimensional contextspecific word representations from unlabeled data. These representation features can then be used with any supervised learner. LRMVL is extremely fast, gives guaranteed convergence to a global optimum, is theoretically elegant, and achieves stateoftheart performance on named entity recognition (NER) and chunking problems. 1 Introduction and Related Work Over the past decade there has been increased interest in using unlabeled data to supplement the labeled data in semisupervised learning settings to overcome the inherent data sparsity and get improved generalization accuracies in high dimensional domains like NLP. Approaches like [1, 2]
Revisiting the Nyström method for improved largescale machine learning
"... We reconsider randomized algorithms for the lowrank approximation of SPSD matrices such as Laplacian and kernel matrices that arise in data analysis and machine learning applications. Our main results consist of an empirical evaluation of the performance quality and running time of sampling and pro ..."
Abstract

Cited by 34 (5 self)
 Add to MetaCart
(Show Context)
We reconsider randomized algorithms for the lowrank approximation of SPSD matrices such as Laplacian and kernel matrices that arise in data analysis and machine learning applications. Our main results consist of an empirical evaluation of the performance quality and running time of sampling and projection methods on a diverse suite of SPSD matrices. Our results highlight complementary aspects of sampling versus projection methods, and they point to differences between uniform and nonuniform sampling methods based on leverage scores. We complement our empirical results with a suite of worstcase theoretical bounds for both random sampling and random projection methods. These bounds are qualitatively superior to existing bounds—e.g., improved additiveerror bounds for spectral and Frobenius norm error and relativeerror bounds for trace norm error. 1.
Nearoptimal Columnbased Matrix Reconstruction
, 2011
"... We consider lowrank reconstruction of a matrix using a subset of its columns and we present asymptotically optimal algorithms for both spectral norm and Frobenius norm reconstruction. The main tools we introduce to obtain our results are: (i) the use of fast approximate SVDlike decompositions for ..."
Abstract

Cited by 34 (4 self)
 Add to MetaCart
(Show Context)
We consider lowrank reconstruction of a matrix using a subset of its columns and we present asymptotically optimal algorithms for both spectral norm and Frobenius norm reconstruction. The main tools we introduce to obtain our results are: (i) the use of fast approximate SVDlike decompositions for columnbased matrix reconstruction, and (ii) two deterministic algorithms for selecting rows from matrices with orthonormal columns, building upon the sparse representation theorem for decompositions of the identity that appeared in [1].
OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings
, 2012
"... An oblivious subspace embedding (OSE) given some parameters ε, d is a distribution D over matrices Π ∈ R m×n such that for any linear subspace W ⊆ R n with dim(W) = d it holds that PΠ∼D(∀x ∈ W ‖Πx‖2 ∈ (1 ± ε)‖x‖2)> 2/3. We show an OSE exists with m = O(d 2 /ε 2) and where every Π in the support ..."
Abstract

Cited by 32 (7 self)
 Add to MetaCart
An oblivious subspace embedding (OSE) given some parameters ε, d is a distribution D over matrices Π ∈ R m×n such that for any linear subspace W ⊆ R n with dim(W) = d it holds that PΠ∼D(∀x ∈ W ‖Πx‖2 ∈ (1 ± ε)‖x‖2)> 2/3. We show an OSE exists with m = O(d 2 /ε 2) and where every Π in the support of D has exactly s = 1 nonzero entries per column. This improves the previously best known bound in [ClarksonWoodruff, arXiv abs/1207.6365]. Our quadratic dependence on d is optimal for any OSE with s = 1 [NelsonNguy ˜ ên, 2012]. We also give two OSE’s, which we call Oblivious Sparse NormApproximating Projections (OSNAPs), that both allow the parameter settings m = Õ(d/ε2) and s = polylog(d)/ε, or m = O(d1+γ /ε2) and s = O(1/ε) for any constant γ> 0. 1 This m is nearly optimal since m ≥ d is required simply to ensure no nonzero vector of W lands in the kernel of Π. These are the first constructions with m = o(d 2) to have s = o(d). In fact, our OSNAPs are nothing more than the sparse JohnsonLindenstrauss matrices of [KaneNelson, SODA 2012]. Our analyses all yield OSE’s that are sampled using either O(1)wise or O(log d)wise
The spectral norm error of the naive nystrom extension
, 2011
"... Abstract. The näıve Nyström extension forms a lowrank approximation to a positivesemidefinite matrix by uniformly randomly sampling from its columns. This paper provides the first relativeerror bound on the spectral norm error incurred in this process. This bound follows from a natural connecti ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
(Show Context)
Abstract. The näıve Nyström extension forms a lowrank approximation to a positivesemidefinite matrix by uniformly randomly sampling from its columns. This paper provides the first relativeerror bound on the spectral norm error incurred in this process. This bound follows from a natural connection between the Nyström extension and the column subset selection problem. The main tool is a matrix Chernoff bound for sampling without replacement. 1.
Large scale Bayesian inference and experimental design for sparse linear models
 Journal of Physics: Conference Series
"... Abstract. Many problems of lowlevel computer vision and image processing, such as denoising, deconvolution, tomographic reconstruction or superresolution, can be addressed by maximizing the posterior distribution of a sparse linear model (SLM). We show how higherorder Bayesian decisionmaking prob ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Many problems of lowlevel computer vision and image processing, such as denoising, deconvolution, tomographic reconstruction or superresolution, can be addressed by maximizing the posterior distribution of a sparse linear model (SLM). We show how higherorder Bayesian decisionmaking problems, such as optimizing image acquisition in magnetic resonance scanners, can be addressed by querying the SLM posterior covariance, unrelated to the density’s mode. We propose a scalable algorithmic framework, with which SLM posteriors over full, highresolution images can be approximated for the first time, solving a variational optimization problem which is convex if and only if posterior mode finding is convex. These methods successfully drive the optimization of sampling trajectories for realworld magnetic resonance imaging through Bayesian experimental design, which has not been attempted before. Our methodology provides new insight into similarities and differences between sparse reconstruction and approximate Bayesian inference, and has important implications for compressive sensing of realworld images. Parts of this work have been presented at