Results 1 
8 of
8
Randomized block krylov methods for stronger and faster approximate singular value decomposition.
 In Advances in Neural Information Processing Systems 28 (NIPS),
, 2015
"... Abstract Since being analyzed by Rokhlin, Szlam, and Tygert [1] and popularized by Halko, Martinsson, and Tropp [2], randomized Simultaneous Power Iteration has become the method of choice for approximate singular value decomposition. It is more accurate than simpler sketching algorithms, yet stil ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract Since being analyzed by Rokhlin, Szlam, and Tygert [1] and popularized by Halko, Martinsson, and Tropp [2], randomized Simultaneous Power Iteration has become the method of choice for approximate singular value decomposition. It is more accurate than simpler sketching algorithms, yet still converges quickly for any matrix, independently of singular value gaps. AfterÕ(1/ ) iterations, it gives a lowrank approximation within (1 + ) of optimal for spectral norm error. We give the first provable runtime improvement on Simultaneous Iteration: a simple randomized block Krylov method, closely related to the classic Block Lanczos algorithm, gives the same guarantees in justÕ(1/ √ ) iterations and performs substantially better experimentally. Despite their long history, our analysis is the first of a Krylov subspace method that does not depend on singular value gaps, which are unreliable in practice. Furthermore, while it is a simple accuracy benchmark, even (1 + ) error for spectral norm lowrank approximation does not imply that an algorithm returns high quality principal components, a major issue for data applications. We address this problem for the first time by showing that both Block Krylov Iteration and a minor modification of Simultaneous Iteration give nearly optimal PCA for any matrix. This result further justifies their strength over noniterative sketching methods. Finally, we give insight beyond the worst case, justifying why both algorithms can run much faster in practice than predicted. We clarify how simple techniques can take advantage of common matrix properties to significantly improve runtime.
Fast stochastic algorithms for svd and pca: Convergence properties and convexity
 CoRR
, 2015
"... Abstract We study the convergence properties of the VRPCA algorithm introduced by ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract We study the convergence properties of the VRPCA algorithm introduced by
Towards More Efficient SPSD Matrix Approximation and CUR Matrix Decomposition
, 2016
"... Abstract Symmetric positive semidefinite (SPSD) matrix approximation methods have been extensively used to speed up largescale eigenvalue computation and kernel learning methods. The standard sketch based method, which we call the prototype model, produces relatively accurate approximations, but ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Symmetric positive semidefinite (SPSD) matrix approximation methods have been extensively used to speed up largescale eigenvalue computation and kernel learning methods. The standard sketch based method, which we call the prototype model, produces relatively accurate approximations, but is inefficient on large square matrices. The Nyström method is highly efficient, but can only achieve low accuracy. In this paper we propose a novel model that we call the fast SPSD matrix approximation model. The fast model is nearly as efficient as the Nyström method and as accurate as the prototype model. We show that the fast model can potentially solve eigenvalue problems and kernel learning problems in linear time with respect to the matrix size n to achieve 1 + relativeerror, whereas both the prototype model and the Nyström method cost at least quadratic time to attain comparable error bound. Empirical comparisons among the prototype model, the Nyström method, and our fast model demonstrate the superiority of the fast model. We also contribute new understandings of the Nyström method. The Nyström method is a special instance of our fast model and is approximation to the prototype model. Our technique can be straightforwardly applied to make the CUR matrix decomposition more efficiently computed without much affecting the accuracy.
and let Ai denote the ith column of A. Then by decoupling
, 2015
"... 1 Gordon’s theorem Let T be a finite subset of some normed vector space with norm ‖·‖X. We say that a sequence T0 ⊆ T1 ⊆... ⊆ T is admissible if T0  = 1 and Tr  ≤ 22r for all r ≥ 1, and Tr = T for all r ≥ r0 for some r0. We define the γ2functional γ2(T, ‖ · ‖X) = inf sup x∈T r=1 2r/2 · dX( ..."
Abstract
 Add to MetaCart
(Show Context)
1 Gordon’s theorem Let T be a finite subset of some normed vector space with norm ‖·‖X. We say that a sequence T0 ⊆ T1 ⊆... ⊆ T is admissible if T0  = 1 and Tr  ≤ 22r for all r ≥ 1, and Tr = T for all r ≥ r0 for some r0. We define the γ2functional γ2(T, ‖ · ‖X) = inf sup x∈T r=1 2r/2 · dX(x, Tr), where the inf is taken over all admissible sequences. We also let dX(T) denote the diameter of T with respect to norm ‖·‖X. For the remainder of this section we make the definitions pirx = argminy∈Tr‖y − x‖X and ∆rx = pirx − pir−1x. Throughout this section we let ‖ · ‖ denote the `2→2 operator norm in the case of matrix arguments, and the `2 norm in the case of vector arguments. Krahmer, Mendelson, and Rauhut showed the following theorem [KMR14]. Theorem 1. Let A ⊂ Rm×n be arbitrary. Let ε1,..., εn be independent ±1 random variables. Then
General Terms
"... A fundamental challenge in processing the massive quantities of information generated by modern applications is in extracting suitable representations of the data that can be stored, manipulated and interrogated on a single machine. A promising approach is in the design and analysis of compact summ ..."
Abstract
 Add to MetaCart
(Show Context)
A fundamental challenge in processing the massive quantities of information generated by modern applications is in extracting suitable representations of the data that can be stored, manipulated and interrogated on a single machine. A promising approach is in the design and analysis of compact summaries: data structures which capture key features of the data, and which can be created effectively over distributed data sets. Popular summary structures include the count distinct algorithms, which compactly approximate item set cardinalities, and sketches which allow vector norms and products to be estimated. These are very attractive, since they can be computed in parallel and combined to yield a single, compact summary of the data. This tutorial introduces the concepts and examples of compact summaries.