Results 1  10
of
244
Tensor decompositions for learning latent variable models
, 2014
"... This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models—including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation—which exploits a certain tensor structure in their loworder observable mo ..."
Abstract

Cited by 72 (5 self)
 Add to MetaCart
This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models—including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation—which exploits a certain tensor structure in their loworder observable moments (typically, of second and thirdorder). Specifically, parameter estimation is reduced to the problem of extracting a certain (orthogonal) decomposition of a symmetric tensor derived from the moments; this decomposition can be viewed as a natural generalization of the singular value decomposition for matrices. Although tensor decompositions are generally intractable to compute, the decomposition of these specially structured tensors can be efficiently obtained by a variety of approaches, including power iterations and maximization approaches (similar to the case of matrices). A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin’s perturbation theorem for the singular vectors of matrices. This implies a robust and computationally tractable estimation approach for several popular latent variable models.
Randomized Matrix Computations
, 2012
"... We propose new effective randomized algorithms for some fundamental matrix computations such as preconditioning of an ill conditioned matrix that has a small numerical nullity or rank, its 2by2 block triangulation, numerical stabilization of Gaussian elimination with no pivoting, and approximation ..."
Abstract

Cited by 52 (6 self)
 Add to MetaCart
(Show Context)
We propose new effective randomized algorithms for some fundamental matrix computations such as preconditioning of an ill conditioned matrix that has a small numerical nullity or rank, its 2by2 block triangulation, numerical stabilization of Gaussian elimination with no pivoting, and approximation of a matrix by lowrank matrices and by structured matrices. Our technical advances include estimating the condition number of a random Toeplitz matrix, novel techniques of randomized preprocessing, a proof of their preconditioning power, and a dual version of the Sherman–Morrison–Woodbury formula. According to both our formal study and numerical tests we significantly accelerate the known algorithms and improve their output accuracy.
Fast Approximation of Matrix Coherence and Statistical Leverage
"... The statistical leverage scores of a matrix A are the squared rownorms of the matrix containing its (top) left singular vectors and the coherence is the largest leverage score. These quantities are of interest in recentlypopular problems such as matrix completion and Nyströmbased lowrank matrix ..."
Abstract

Cited by 48 (11 self)
 Add to MetaCart
(Show Context)
The statistical leverage scores of a matrix A are the squared rownorms of the matrix containing its (top) left singular vectors and the coherence is the largest leverage score. These quantities are of interest in recentlypopular problems such as matrix completion and Nyströmbased lowrank matrix approximation as well as in largescale statistical data analysis applications more generally; moreover, they are of interest since they define the key structural nonuniformity that must be dealt with in developing fast randomized matrix algorithms. Our main result is a randomized algorithm that takes as input an arbitrary n×d matrix A, with n ≫ d, and that returns as output relativeerror approximations to all n of the statistical leverage scores. The proposed algorithm runs (under assumptions on the precise values of n and d) in O(nd logn) time, as opposed to the O(nd 2) time required by the naïve algorithm that involves computing an orthogonal basis for the range of A. Our analysis may be viewed in terms of computing a relativeerror approximation to an underconstrained leastsquares approximation problem, or, relatedly, it may be viewed as an application of JohnsonLindenstrauss type ideas. Several practicallyimportant extensions of our basic result are also described, including the approximation of socalled crossleverage scores, the extension of these ideas to matrices with n≈d, and the extension to streaming environments.
Multiview learning of word embeddings via cca
 In Proc. of NIPS
, 2011
"... Recently, there has been substantial interest in using large amounts of unlabeled data to learn word representations which can then be used as features in supervised classifiers for NLP tasks. However, most current approaches are slow to train, do not model the context of the word, and lack theoreti ..."
Abstract

Cited by 40 (7 self)
 Add to MetaCart
(Show Context)
Recently, there has been substantial interest in using large amounts of unlabeled data to learn word representations which can then be used as features in supervised classifiers for NLP tasks. However, most current approaches are slow to train, do not model the context of the word, and lack theoretical grounding. In this paper, we present a new learning method, Low Rank MultiView Learning (LRMVL) which uses a fast spectral method to estimate low dimensional contextspecific word representations from unlabeled data. These representation features can then be used with any supervised learner. LRMVL is extremely fast, gives guaranteed convergence to a global optimum, is theoretically elegant, and achieves stateoftheart performance on named entity recognition (NER) and chunking problems. 1 Introduction and Related Work Over the past decade there has been increased interest in using unlabeled data to supplement the labeled data in semisupervised learning settings to overcome the inherent data sparsity and get improved generalization accuracies in high dimensional domains like NLP. Approaches like [1, 2]
Nearoptimal Columnbased Matrix Reconstruction
, 2011
"... We consider lowrank reconstruction of a matrix using a subset of its columns and we present asymptotically optimal algorithms for both spectral norm and Frobenius norm reconstruction. The main tools we introduce to obtain our results are: (i) the use of fast approximate SVDlike decompositions for ..."
Abstract

Cited by 32 (3 self)
 Add to MetaCart
(Show Context)
We consider lowrank reconstruction of a matrix using a subset of its columns and we present asymptotically optimal algorithms for both spectral norm and Frobenius norm reconstruction. The main tools we introduce to obtain our results are: (i) the use of fast approximate SVDlike decompositions for columnbased matrix reconstruction, and (ii) two deterministic algorithms for selecting rows from matrices with orthonormal columns, building upon the sparse representation theorem for decompositions of the identity that appeared in [1].
Revisiting the Nyström method for improved largescale machine learning
"... We reconsider randomized algorithms for the lowrank approximation of SPSD matrices such as Laplacian and kernel matrices that arise in data analysis and machine learning applications. Our main results consist of an empirical evaluation of the performance quality and running time of sampling and pro ..."
Abstract

Cited by 30 (4 self)
 Add to MetaCart
(Show Context)
We reconsider randomized algorithms for the lowrank approximation of SPSD matrices such as Laplacian and kernel matrices that arise in data analysis and machine learning applications. Our main results consist of an empirical evaluation of the performance quality and running time of sampling and projection methods on a diverse suite of SPSD matrices. Our results highlight complementary aspects of sampling versus projection methods, and they point to differences between uniform and nonuniform sampling methods based on leverage scores. We complement our empirical results with a suite of worstcase theoretical bounds for both random sampling and random projection methods. These bounds are qualitatively superior to existing bounds—e.g., improved additiveerror bounds for spectral and Frobenius norm error and relativeerror bounds for trace norm error. 1.
OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings
, 2012
"... An oblivious subspace embedding (OSE) given some parameters ε, d is a distribution D over matrices Π ∈ R m×n such that for any linear subspace W ⊆ R n with dim(W) = d it holds that PΠ∼D(∀x ∈ W ‖Πx‖2 ∈ (1 ± ε)‖x‖2)> 2/3. We show an OSE exists with m = O(d 2 /ε 2) and where every Π in the support ..."
Abstract

Cited by 29 (7 self)
 Add to MetaCart
An oblivious subspace embedding (OSE) given some parameters ε, d is a distribution D over matrices Π ∈ R m×n such that for any linear subspace W ⊆ R n with dim(W) = d it holds that PΠ∼D(∀x ∈ W ‖Πx‖2 ∈ (1 ± ε)‖x‖2)> 2/3. We show an OSE exists with m = O(d 2 /ε 2) and where every Π in the support of D has exactly s = 1 nonzero entries per column. This improves the previously best known bound in [ClarksonWoodruff, arXiv abs/1207.6365]. Our quadratic dependence on d is optimal for any OSE with s = 1 [NelsonNguy ˜ ên, 2012]. We also give two OSE’s, which we call Oblivious Sparse NormApproximating Projections (OSNAPs), that both allow the parameter settings m = Õ(d/ε2) and s = polylog(d)/ε, or m = O(d1+γ /ε2) and s = O(1/ε) for any constant γ> 0. 1 This m is nearly optimal since m ≥ d is required simply to ensure no nonzero vector of W lands in the kernel of Π. These are the first constructions with m = o(d 2) to have s = o(d). In fact, our OSNAPs are nothing more than the sparse JohnsonLindenstrauss matrices of [KaneNelson, SODA 2012]. Our analyses all yield OSE’s that are sampled using either O(1)wise or O(log d)wise
Paved with good intentions: Analysis of a randomized Kaczmarz method
, 2012
"... ABSTRACT. The block Kaczmarz method is an iterative scheme for solving overdetermined leastsquares problems. At each step, the algorithm projects the current iterate onto the solution space of a subset of the constraints. This paper describes a block Kaczmarz algorithm that uses a randomized contro ..."
Abstract

Cited by 21 (6 self)
 Add to MetaCart
ABSTRACT. The block Kaczmarz method is an iterative scheme for solving overdetermined leastsquares problems. At each step, the algorithm projects the current iterate onto the solution space of a subset of the constraints. This paper describes a block Kaczmarz algorithm that uses a randomized control scheme to choose the subset at each step. This algorithm is the first block Kaczmarz method with an (expected) linear rate of convergence that can be expressed in terms of the geometric properties of the matrix and its submatrices. The analysis reveals that the algorithm is most effective when it is given a good row paving of the matrix, a partition of the rows into wellconditioned blocks. The operator theory literature provides detailed information about the existence and construction of good row pavings. Together, these results yield an efficient block Kaczmarz scheme that applies to many overdetermined leastsquares problem. 1.
The spectral norm error of the naive nystrom extension
, 2011
"... Abstract. The näıve Nyström extension forms a lowrank approximation to a positivesemidefinite matrix by uniformly randomly sampling from its columns. This paper provides the first relativeerror bound on the spectral norm error incurred in this process. This bound follows from a natural connecti ..."
Abstract

Cited by 21 (2 self)
 Add to MetaCart
(Show Context)
Abstract. The näıve Nyström extension forms a lowrank approximation to a positivesemidefinite matrix by uniformly randomly sampling from its columns. This paper provides the first relativeerror bound on the spectral norm error incurred in this process. This bound follows from a natural connection between the Nyström extension and the column subset selection problem. The main tool is a matrix Chernoff bound for sampling without replacement. 1.