• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions (2011)

by N Halko, P Martinsson, J A Tropp
Venue:SIAM Review
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 256
Next 10 →

Tensor decompositions for learning latent variable models

by Animashree Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade, Matus Telgarsky , 2014
"... This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models—including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation—which exploits a certain tensor structure in their low-order observable mo ..."
Abstract - Cited by 83 (7 self) - Add to MetaCart
This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models—including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation—which exploits a certain tensor structure in their low-order observable moments (typically, of second- and third-order). Specifically, parameter estimation is reduced to the problem of extracting a certain (orthog-onal) decomposition of a symmetric tensor derived from the moments; this decomposition can be viewed as a natural generalization of the singular value decomposition for matrices. Although tensor decompositions are generally intractable to compute, the decomposition of these specially structured tensors can be efficiently obtained by a variety of approaches, including power iterations and maximization approaches (similar to the case of matrices). A detailed analysis of a robust tensor power method is provided, establishing an analogue of Wedin’s perturbation theorem for the singular vectors of matrices. This implies a ro-bust and computationally tractable estimation approach for several popular latent variable models.

Randomized Algorithms for Matrices and Data

by Michael W. Mahoney , 2010
"... ..."
Abstract - Cited by 60 (2 self) - Add to MetaCart
Abstract not found

Fast Approximation of Matrix Coherence and Statistical Leverage

by Petros Drineas, Malik Magdon-ismail, Michael W. Mahoney, David P. Woodruff, Mehryar Mohri
"... The statistical leverage scores of a matrix A are the squared row-norms of the matrix containing its (top) left singular vectors and the coherence is the largest leverage score. These quantities are of interest in recently-popular problems such as matrix completion and Nyström-based low-rank matrix ..."
Abstract - Cited by 53 (11 self) - Add to MetaCart
The statistical leverage scores of a matrix A are the squared row-norms of the matrix containing its (top) left singular vectors and the coherence is the largest leverage score. These quantities are of interest in recently-popular problems such as matrix completion and Nyström-based low-rank matrix approximation as well as in large-scale statistical data analysis applications more generally; moreover, they are of interest since they define the key structural nonuniformity that must be dealt with in developing fast randomized matrix algorithms. Our main result is a randomized algorithm that takes as input an arbitrary n×d matrix A, with n ≫ d, and that returns as output relative-error approximations to all n of the statistical leverage scores. The proposed algorithm runs (under assumptions on the precise values of n and d) in O(nd logn) time, as opposed to the O(nd 2) time required by the naïve algorithm that involves computing an orthogonal basis for the range of A. Our analysis may be viewed in terms of computing a relative-error approximation to an underconstrained least-squares approximation problem, or, relatedly, it may be viewed as an application of Johnson-Lindenstrauss type ideas. Several practically-important extensions of our basic result are also described, including the approximation of so-called cross-leverage scores, the extension of these ideas to matrices with n≈d, and the extension to streaming environments.
(Show Context)

Citation Context

... useful features or data points in downstream data analysis applications [36, 30], or we might be seeking to develop high-quality numerical implementations of low-rank matrix approximation algorithms =-=[24]-=-, etc. In all these cases, we only care that the estimated leverage scores are a good approximation to the leverage scores of some “good” low-rank approximation to A. The following definition captures...

Randomized Matrix Computations

by Victor Y. Pan, Guoliang Qian, Ai-long Zheng , 2012
"... We propose new effective randomized algorithms for some fundamental matrix computations such as preconditioning of an ill conditioned matrix that has a small numerical nullity or rank, its 2-by-2 block triangulation, numerical stabilization of Gaussian elimination with no pivoting, and approximation ..."
Abstract - Cited by 49 (5 self) - Add to MetaCart
We propose new effective randomized algorithms for some fundamental matrix computations such as preconditioning of an ill conditioned matrix that has a small numerical nullity or rank, its 2-by-2 block triangulation, numerical stabilization of Gaussian elimination with no pivoting, and approximation of a matrix by low-rank matrices and by structured matrices. Our technical advances include estimating the condition number of a random Toeplitz matrix, novel techniques of randomized preprocessing, a proof of their preconditioning power, and a dual version of the Sherman–Morrison–Woodbury formula. According to both our formal study and numerical tests we significantly accelerate the known algorithms and improve their output accuracy.
(Show Context)

Citation Context

...is expected to be well conditioned if the ratio σ/||A||2 is neither large nor small, e.g., if 1100 ≤ σ/||A||2 ≤ 100. 1.3 Randomized algorithms We recall some known randomized matrix algorithms [D83], =-=[HMT11]-=- and study some new ones. Suppose we are given a normalized nonsingular n × n matrix A, such that ||A|| = 1, and its small positive numerical nullity r = nnul(A) (cf. Section 8.2 on computing numerica...

Multiview learning of word embeddings via cca

by Paramveer S. Dhillon, Dean Foster, Lyle Ungar - In Proc. of NIPS , 2011
"... Recently, there has been substantial interest in using large amounts of unlabeled data to learn word representations which can then be used as features in supervised classifiers for NLP tasks. However, most current approaches are slow to train, do not model the context of the word, and lack theoreti ..."
Abstract - Cited by 41 (8 self) - Add to MetaCart
Recently, there has been substantial interest in using large amounts of unlabeled data to learn word representations which can then be used as features in supervised classifiers for NLP tasks. However, most current approaches are slow to train, do not model the context of the word, and lack theoretical grounding. In this paper, we present a new learning method, Low Rank Multi-View Learning (LR-MVL) which uses a fast spectral method to estimate low dimensional context-specific word representations from unlabeled data. These representation features can then be used with any supervised learner. LR-MVL is extremely fast, gives guaranteed convergence to a global optimum, is theoretically elegant, and achieves state-ofthe-art performance on named entity recognition (NER) and chunking problems. 1 Introduction and Related Work Over the past decade there has been increased interest in using unlabeled data to supplement the labeled data in semi-supervised learning settings to overcome the inherent data sparsity and get improved generalization accuracies in high dimensional domains like NLP. Approaches like [1, 2]
(Show Context)

Citation Context

..., A . for A few iterations (∼ 5) of the above algorithm are sufficient to converge to the solution. (Since the problem is convex, there is a single solution, so there is no issue of local minima.) As =-=[14]-=- show for PCA, one can start with a random matrix that is only slightly larger than the true rank k of the correlation matrix, and with extremely high likelihood converge in a few iterations to within...

Revisiting the Nyström method for improved large-scale machine learning

by Alex Gittens, Michael W. Mahoney
"... We reconsider randomized algorithms for the low-rank approximation of SPSD matrices such as Laplacian and kernel matrices that arise in data analysis and machine learning applications. Our main results consist of an empirical evaluation of the performance quality and running time of sampling and pro ..."
Abstract - Cited by 34 (5 self) - Add to MetaCart
We reconsider randomized algorithms for the low-rank approximation of SPSD matrices such as Laplacian and kernel matrices that arise in data analysis and machine learning applications. Our main results consist of an empirical evaluation of the performance quality and running time of sampling and projection methods on a diverse suite of SPSD matrices. Our results highlight complementary aspects of sampling versus projection methods, and they point to differences between uniform and nonuniform sampling methods based on leverage scores. We complement our empirical results with a suite of worst-case theoretical bounds for both random sampling and random projection methods. These bounds are qualitatively superior to existing bounds—e.g., improved additive-error bounds for spectral and Frobenius norm error and relative-error bounds for trace norm error. 1.
(Show Context)

Citation Context

... A large part of the recent body of this work on randomized matrix algorithms has been summarized in the recent monograph of Mahoney [46] and the recent review article of Halko, Martinsson, and Tropp =-=[32]-=-. Here, we note that, on the empirical side, both random projection methods (e.g., [12, 26, 61] and [6]) and random sampling methods (e.g., [54, 48]) have been used in applications for clustering and ...

Nearoptimal Column-based Matrix Reconstruction

by Christos Boutsidis, Petros Drineas, Malik Magdon-ismail , 2011
"... We consider low-rank reconstruction of a matrix using a subset of its columns and we present asymptotically optimal algorithms for both spectral norm and Frobenius norm reconstruction. The main tools we introduce to obtain our results are: (i) the use of fast approximate SVD-like decompositions for ..."
Abstract - Cited by 34 (4 self) - Add to MetaCart
We consider low-rank reconstruction of a matrix using a subset of its columns and we present asymptotically optimal algorithms for both spectral norm and Frobenius norm reconstruction. The main tools we introduce to obtain our results are: (i) the use of fast approximate SVD-like decompositions for column-based matrix reconstruction, and (ii) two deterministic algorithms for selecting rows from matrices with orthonormal columns, building upon the sparse representation theorem for decompositions of the identity that appeared in [1].
(Show Context)

Citation Context

...tion (SVD) of A in O(mnmin{m,n}) time.) It is well known that Ak optimally approximatesA among all rank-k matrices, with respect to any unitarily invariant norm. There is considerable interest (e.g., =-=[4, 6, 9, 7, 11, 15, 19, 20, 21]-=-) in determining a minimum set of r n columns of A which is approximately as good as Ak at reconstructing A. Such columns are important for interpreting data [21], building robust machine learning a...

OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings

by Jelani Nelson, Huy L. Nguy , 2012
"... An oblivious subspace embedding (OSE) given some parameters ε, d is a distribution D over matrices Π ∈ R m×n such that for any linear subspace W ⊆ R n with dim(W) = d it holds that PΠ∼D(∀x ∈ W ‖Πx‖2 ∈ (1 ± ε)‖x‖2)> 2/3. We show an OSE exists with m = O(d 2 /ε 2) and where every Π in the support ..."
Abstract - Cited by 32 (7 self) - Add to MetaCart
An oblivious subspace embedding (OSE) given some parameters ε, d is a distribution D over matrices Π ∈ R m×n such that for any linear subspace W ⊆ R n with dim(W) = d it holds that PΠ∼D(∀x ∈ W ‖Πx‖2 ∈ (1 ± ε)‖x‖2)> 2/3. We show an OSE exists with m = O(d 2 /ε 2) and where every Π in the support of D has exactly s = 1 non-zero entries per column. This improves the previously best known bound in [Clarkson-Woodruff, arXiv abs/1207.6365]. Our quadratic dependence on d is optimal for any OSE with s = 1 [Nelson-Nguy ˜ ên, 2012]. We also give two OSE’s, which we call Oblivious Sparse Norm-Approximating Projections (OSNAPs), that both allow the parameter settings m = Õ(d/ε2) and s = polylog(d)/ε, or m = O(d1+γ /ε2) and s = O(1/ε) for any constant γ> 0. 1 This m is nearly optimal since m ≥ d is required simply to ensure no non-zero vector of W lands in the kernel of Π. These are the first constructions with m = o(d 2) to have s = o(d). In fact, our OSNAPs are nothing more than the sparse Johnson-Lindenstrauss matrices of [Kane-Nelson, SODA 2012]. Our analyses all yield OSE’s that are sampled using either O(1)-wise or O(log d)wise

The spectral norm error of the naive nystrom extension

by Alex Gittens , 2011
"... Abstract. The näıve Nyström extension forms a low-rank approximation to a positive-semidefinite matrix by uniformly randomly sampling from its columns. This paper provides the first relative-error bound on the spectral norm error incurred in this process. This bound follows from a natural connecti ..."
Abstract - Cited by 22 (2 self) - Add to MetaCart
Abstract. The näıve Nyström extension forms a low-rank approximation to a positive-semidefinite matrix by uniformly randomly sampling from its columns. This paper provides the first relative-error bound on the spectral norm error incurred in this process. This bound follows from a natural connection between the Nyström extension and the column subset selection problem. The main tool is a matrix Chernoff bound for sampling without replacement. 1.
(Show Context)

Citation Context

...̈m extension allows a larger number of column samples to be drawn and leads to smaller empirical approximation errors. The approximation W̃ is constructed using the randomized methodology espoused in =-=[HMT11]-=-. The analysis of the error combines bounds provided in [HMT11] with a matrix sparsification argument. In addition to ` and k, the Nyström algorithm presented in [LKL10] depends on two additional par...

Large scale Bayesian inference and experimental design for sparse linear models

by Matthias W. Seeger, Hannes Nickisch - Journal of Physics: Conference Series
"... Abstract. Many problems of low-level computer vision and image processing, such as denoising, deconvolution, tomographic reconstruction or superresolution, can be addressed by maximizing the posterior distribution of a sparse linear model (SLM). We show how higher-order Bayesian decision-making prob ..."
Abstract - Cited by 22 (2 self) - Add to MetaCart
Abstract. Many problems of low-level computer vision and image processing, such as denoising, deconvolution, tomographic reconstruction or superresolution, can be addressed by maximizing the posterior distribution of a sparse linear model (SLM). We show how higher-order Bayesian decision-making problems, such as optimizing image acquisition in magnetic resonance scanners, can be addressed by querying the SLM posterior covariance, unrelated to the density’s mode. We propose a scalable algorithmic framework, with which SLM posteriors over full, high-resolution images can be approximated for the first time, solving a variational optimization problem which is convex if and only if posterior mode finding is convex. These methods successfully drive the optimization of sampling trajectories for real-world magnetic resonance imaging through Bayesian experimental design, which has not been attempted before. Our methodology provides new insight into similarities and differences between sparse reconstruction and approximate Bayesian inference, and has important implications for compressive sensing of real-world images. Parts of this work have been presented at
(Show Context)

Citation Context

...ibutions from undesired parts of the spectrum (A’s largest eigenvalues in our case) are filtered out by replacing A with polynomials of itself. Recently, randomized PCA approaches have become popular =-=[15]-=-, although their relevance for variance approximation is unclear. Nevertheless, for large scale problems of interest, standard Lanczos can be run for k ≪ n iterations only, at which point most of the ...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University