Results 1  10
of
59
Fast Approximation of Matrix Coherence and Statistical Leverage
"... The statistical leverage scores of a matrix A are the squared rownorms of the matrix containing its (top) left singular vectors and the coherence is the largest leverage score. These quantities are of interest in recentlypopular problems such as matrix completion and Nyströmbased lowrank matrix ..."
Abstract

Cited by 52 (11 self)
 Add to MetaCart
(Show Context)
The statistical leverage scores of a matrix A are the squared rownorms of the matrix containing its (top) left singular vectors and the coherence is the largest leverage score. These quantities are of interest in recentlypopular problems such as matrix completion and Nyströmbased lowrank matrix approximation as well as in largescale statistical data analysis applications more generally; moreover, they are of interest since they define the key structural nonuniformity that must be dealt with in developing fast randomized matrix algorithms. Our main result is a randomized algorithm that takes as input an arbitrary n×d matrix A, with n ≫ d, and that returns as output relativeerror approximations to all n of the statistical leverage scores. The proposed algorithm runs (under assumptions on the precise values of n and d) in O(nd logn) time, as opposed to the O(nd 2) time required by the naïve algorithm that involves computing an orthogonal basis for the range of A. Our analysis may be viewed in terms of computing a relativeerror approximation to an underconstrained leastsquares approximation problem, or, relatedly, it may be viewed as an application of JohnsonLindenstrauss type ideas. Several practicallyimportant extensions of our basic result are also described, including the approximation of socalled crossleverage scores, the extension of these ideas to matrices with n≈d, and the extension to streaming environments.
OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings
, 2012
"... An oblivious subspace embedding (OSE) given some parameters ε, d is a distribution D over matrices Π ∈ R m×n such that for any linear subspace W ⊆ R n with dim(W) = d it holds that PΠ∼D(∀x ∈ W ‖Πx‖2 ∈ (1 ± ε)‖x‖2)> 2/3. We show an OSE exists with m = O(d 2 /ε 2) and where every Π in the support ..."
Abstract

Cited by 32 (7 self)
 Add to MetaCart
An oblivious subspace embedding (OSE) given some parameters ε, d is a distribution D over matrices Π ∈ R m×n such that for any linear subspace W ⊆ R n with dim(W) = d it holds that PΠ∼D(∀x ∈ W ‖Πx‖2 ∈ (1 ± ε)‖x‖2)> 2/3. We show an OSE exists with m = O(d 2 /ε 2) and where every Π in the support of D has exactly s = 1 nonzero entries per column. This improves the previously best known bound in [ClarksonWoodruff, arXiv abs/1207.6365]. Our quadratic dependence on d is optimal for any OSE with s = 1 [NelsonNguy ˜ ên, 2012]. We also give two OSE’s, which we call Oblivious Sparse NormApproximating Projections (OSNAPs), that both allow the parameter settings m = Õ(d/ε2) and s = polylog(d)/ε, or m = O(d1+γ /ε2) and s = O(1/ε) for any constant γ> 0. 1 This m is nearly optimal since m ≥ d is required simply to ensure no nonzero vector of W lands in the kernel of Π. These are the first constructions with m = o(d 2) to have s = o(d). In fact, our OSNAPs are nothing more than the sparse JohnsonLindenstrauss matrices of [KaneNelson, SODA 2012]. Our analyses all yield OSE’s that are sampled using either O(1)wise or O(log d)wise
Improving CUR Matrix Decomposition and the Nyström Approximation via Adaptive Sampling
"... The CUR matrix decomposition and the Nyström approximation are two important lowrank matrix approximation techniques. The Nyström method approximates a symmetric positive semidefinite matrix in terms of a small number of its columns, while CUR approximates an arbitrary data matrix by a small number ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
The CUR matrix decomposition and the Nyström approximation are two important lowrank matrix approximation techniques. The Nyström method approximates a symmetric positive semidefinite matrix in terms of a small number of its columns, while CUR approximates an arbitrary data matrix by a small number of its columns and rows. Thus, CUR decomposition can be regarded as an extension of the Nyström approximation. In this paper we establish a more general error bound for the adaptive column/row sampling algorithm, based on which we propose more accurate CUR and Nyström algorithms with expected relativeerror bounds. The proposed CUR and Nyström algorithms also have low time complexity and can avoid maintaining the whole data matrix in RAM. In addition, we give theoretical analysis for the lower error bounds of the standard Nyström method and the ensemble Nyström method. The main theoretical results established in this paper are novel, and our analysis makes no special assumption on the data matrices.
Tail bounds for all eigenvalues of a sum of random matrices
, 2011
"... This work introduces the minimax Laplace transform method, a modification of the cumulantbased matrix Laplace transform method developed in [Tro11c] that yields both upper and lower bounds on each eigenvalue of a sum of random selfadjoint matrices. This machinery is used to derive eigenvalue ana ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
This work introduces the minimax Laplace transform method, a modification of the cumulantbased matrix Laplace transform method developed in [Tro11c] that yields both upper and lower bounds on each eigenvalue of a sum of random selfadjoint matrices. This machinery is used to derive eigenvalue analogs of the classical Chernoff, Bennett, and Bernstein bounds. Two examples demonstrate the efficacy of the minimax Laplace transform. The first concerns the effects of column sparsification on the spectrum of a matrix with orthonormal rows. Here, the behavior of the singular values can be described in terms of coherencelike quantities. The second example addresses the question of relative accuracy in the estimation of eigenvalues of the covariance matrix of a random process. Standard results on the convergence of sample covariance matrices provide bounds on the number of samples needed to obtain relative accuracy in the spectral norm, but these results only guarantee relative accuracy in the estimate of the maximum eigenvalue. The minimax Laplace transform argument establishes that if the lowest eigenvalues decay sufficiently fast, Ω(ε−2κ2` ` log p) samples, where κ ` = λ1(C)/λ`(C), are sufficient to ensure that the dominant ` eigenvalues of the covariance matrix of a N (0,C) random vector are estimated to within a factor of 1 ± ε with high probability.
Sharp analysis of lowrank kernel matrix approximations
 JMLR: WORKSHOP AND CONFERENCE PROCEEDINGS VOL 30 (2013) 1–25
, 2013
"... We consider supervised learning problems within the positivedefinite kernel framework, such as kernel ridge regression, kernel logistic regression or the support vector machine. With kernels leading to infinitedimensional feature spaces, a common practical limiting difficulty is the necessity of c ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
We consider supervised learning problems within the positivedefinite kernel framework, such as kernel ridge regression, kernel logistic regression or the support vector machine. With kernels leading to infinitedimensional feature spaces, a common practical limiting difficulty is the necessity of computing the kernel matrix, which most frequently leads to algorithms with running time at least quadratic in the number of observations n, i.e., O(n 2). Lowrank approximations of the kernel matrix are often considered as they allow the reduction of running time complexities to O(p 2 n), where p is the rank of the approximation. The practicality of such methods thus depends on the required rank p. In this paper, we show that in the context of kernel ridge regression, for approximations based on a random subset of columns of the original kernel matrix, the rank p may be chosen to be linear in the degrees of freedom associated with the problem, a quantity which is classically used in the statistical analysis of such methods, and is often seen as the implicit number of parameters of nonparametric estimators. This result enables simple algorithms that have subquadratic running time complexity, but provably exhibit the same predictive performance than existing algorithms, for any given problem instance, and not only for worstcase situations.
Sketching as a tool for numerical linear algebra
 Foundations and Trends in Theoretical Computer Science
"... ar ..."
(Show Context)
Sketched SVD: Recovering spectral features from compressive measurements. ArXiv eprints
, 2012
"... ar ..."
(Show Context)
Efficient Algorithms and Error Analysis for the Modified Nyström Method
"... Many kernel methods suffer from high time and space complexities and are thus prohibitive in bigdata applications. To tackle the computational challenge, the Nyström method has been extensively used to reduce time and space complexities by sacrificing some accuracy. The Nyström method speedups ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Many kernel methods suffer from high time and space complexities and are thus prohibitive in bigdata applications. To tackle the computational challenge, the Nyström method has been extensively used to reduce time and space complexities by sacrificing some accuracy. The Nyström method speedups computation by constructing an approximation of the kernel matrix using only a few columns of the matrix. Recently, a variant of the Nyström method called the modified Nyström method has demonstrated significant improvement over the standard Nyström method in approximation accuracy, both theoretically and empirically. In this paper, we propose two algorithms that make the modified Nyström method practical. First, we devise a simple column selection algorithm with a provable error bound. Our algorithm is more efficient and easier to implement than and nearly as accurate as the stateoftheart algorithm. Second, with the selected columns at hand, we propose an algorithm that computes the approximation in lower time complexity than the approach in the previous work. Furthermore, we prove that the modified Nyström method is exact under certain conditions, and we establish a lower error bound for the modified Nyström method. 1
Subspace embeddings for the polynomial kernel
 In NIPS
, 2014
"... Sketching is a powerful dimensionality reduction tool for accelerating statistical learning algorithms. However, its applicability has been limited to a certain extent since the crucial ingredient, the socalled oblivious subspace embedding, can only be applied to data spaces with an explicit repres ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
(Show Context)
Sketching is a powerful dimensionality reduction tool for accelerating statistical learning algorithms. However, its applicability has been limited to a certain extent since the crucial ingredient, the socalled oblivious subspace embedding, can only be applied to data spaces with an explicit representation as the column span or row span of a matrix, while in many settings learning is done in a highdimensional space implicitly defined by the data matrix via a kernel transformation. We propose the first fast oblivious subspace embeddings that are able to embed a space induced by a nonlinear kernel without explicitly mapping the data to the highdimensional space. In particular, we propose an embedding for mappings induced by the polynomial kernel. Using the subspace embeddings, we obtain the fastest known algorithms for computing an implicit low rank approximation of the higherdimension mapping of the data matrix, and for computing an approximate kernel PCA of the data, as well as doing approximate kernel principal component regression. 1