Results 1  10
of
11
Revisiting the Nyström method for improved largescale machine learning
"... We reconsider randomized algorithms for the lowrank approximation of SPSD matrices such as Laplacian and kernel matrices that arise in data analysis and machine learning applications. Our main results consist of an empirical evaluation of the performance quality and running time of sampling and pro ..."
Abstract

Cited by 34 (5 self)
 Add to MetaCart
(Show Context)
We reconsider randomized algorithms for the lowrank approximation of SPSD matrices such as Laplacian and kernel matrices that arise in data analysis and machine learning applications. Our main results consist of an empirical evaluation of the performance quality and running time of sampling and projection methods on a diverse suite of SPSD matrices. Our results highlight complementary aspects of sampling versus projection methods, and they point to differences between uniform and nonuniform sampling methods based on leverage scores. We complement our empirical results with a suite of worstcase theoretical bounds for both random sampling and random projection methods. These bounds are qualitatively superior to existing bounds—e.g., improved additiveerror bounds for spectral and Frobenius norm error and relativeerror bounds for trace norm error. 1.
Sampling Methods for the Nyström Method
 JOURNAL OF MACHINE LEARNING RESEARCH
"... The Nyström method is an efficient technique to generate lowrank matrix approximations and is used in several largescale learning applications. A key aspect of this method is the procedure according to which columns are sampled from the original matrix. In this work, we explore the efficacy of a v ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
The Nyström method is an efficient technique to generate lowrank matrix approximations and is used in several largescale learning applications. A key aspect of this method is the procedure according to which columns are sampled from the original matrix. In this work, we explore the efficacy of a variety of fixed and adaptive sampling schemes. We also propose a family of ensemblebased sampling algorithms for the Nyström method. We report results of extensive experiments that provide a detailed comparison of various fixed and adaptive sampling techniques, and demonstrate the performance improvement associated with the ensemble Nyström method when used in conjunction with either fixed or adaptive sampling schemes. Corroborating these empirical findings, we present a theoretical analysis of the Nyström method, providing novel error bounds guaranteeing a better convergence rate of the ensemble Nyström method in comparison to the standard Nyström method.
Nyström approximation of Wishart matrices
 in Proceedings of the IEEE Conference on Acoustics, Speech, and Signal Processing
, 2010
"... Spectral methods requiring the computation of eigenvalues and eigenvectors of a positive definite matrix are an essential part of signal processing. However, for sufficiently highdimensional data sets, the eigenvalue problem cannot be solved without approximate methods. We examine a technique for a ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Spectral methods requiring the computation of eigenvalues and eigenvectors of a positive definite matrix are an essential part of signal processing. However, for sufficiently highdimensional data sets, the eigenvalue problem cannot be solved without approximate methods. We examine a technique for approximate spectral analysis and lowrank matrix reconstruction known as the Nyström method, which recasts the eigendecomposition of large matrices as a subset selection problem. In particular, we focus on the performance of the Nyström method when used to approximate random matrices from the Wishart ensemble. We provide statistical results for the approximation error, as well as an experimental analysis of various subset sampling techniques. Index Terms — highdimensional data analysis, kernel methods, Nyström extension, Wishart distribution 1.
Estimating principal components of large covariance matrices using the Nyström method
 in Proceedings of the IEEE Conference on Acoustics, Speech, and Signal Processing
, 2011
"... Covariance matrix estimates are an essential part of many signal processing algorithms, and are often used to determine a lowdimensional principal subspace via their spectral decomposition. However, for sufficiently highdimensional matrices exact eigenanalysis is computationally intractable, and ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Covariance matrix estimates are an essential part of many signal processing algorithms, and are often used to determine a lowdimensional principal subspace via their spectral decomposition. However, for sufficiently highdimensional matrices exact eigenanalysis is computationally intractable, and in the case of limited data, sample eigenvalues and eigenvectors are known to be poor estimators of their true counterparts. To address these issues, we propose a covariance estimator that is computationally efficient while also performing shrinkage on the sample eigenvalues. Our approach is based on the Nyström method, which uses a datadependent orthogonal projection to obtain a fast lowrank approximation of a large positive semidefinite matrix. We provide a theoretical analysis of the error properties of our estimator as well as empirical results. Index Terms — approximate spectral analysis, highdimensional data, lowrank approximation, Nyström extension, shrinkage estimators 1.
Approximation of Positive Semidefinite Matrices Using the Nyström Method
, 2011
"... Positive semidefinite matrices arise in a variety of fields, including statistics, signal processing, and machine learning. Unfortunately, when these matrices are highdimensional and/or must be operated upon many times, expensive calculations such as the spectral decomposition quickly become a comp ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Positive semidefinite matrices arise in a variety of fields, including statistics, signal processing, and machine learning. Unfortunately, when these matrices are highdimensional and/or must be operated upon many times, expensive calculations such as the spectral decomposition quickly become a computational bottleneck. A common alternative is to replace the original positive semidefinite matrices with lowrank approximations whose spectral decompositions can be more easily computed. In this thesis, we develop approaches based on the Nyström method, which approximates a positive semidefinite matrix using a datadependent orthogonal projection. As the Nyström approximation is conditioned on a given principal submatrix of its argument, it essentially recasts lowrank approximation as a subset selection problem. We begin by deriving the Nyström approximation and developing a number of fundamental results, including new characterizations of its spectral properties and approximation error. We then address the problem of subset selection through a study of randomized sampling algorithms. We provide new bounds for the approximation error under uniformly random
Sampling from Determinantal Point Processes for Scalable Manifold Learning
"... Abstract. High computational costs of manifold learning prohibit its application for large datasets. A common strategy to overcome this problem is to perform dimensionality reduction on selected landmarks and to successively embed the entire dataset with the Nyström method. The two main challenges ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. High computational costs of manifold learning prohibit its application for large datasets. A common strategy to overcome this problem is to perform dimensionality reduction on selected landmarks and to successively embed the entire dataset with the Nyström method. The two main challenges that arise are: (i) the landmarks selected in nonEuclidean geometries must result in a low reconstruction error, (ii) the graph constructed from sparsely sampled landmarks must approximate the manifold well. We propose to sample the landmarks from determinantal distributions on nonEuclidean spaces. Since current determinantal sampling algorithms have the same complexity as those for manifold learning, we present an efficient approximation with linear complexity. Further, we recover the local geometry after the sparsification by assigning each landmark a local covariance matrix, estimated from the original point set. The resulting neighborhood selection based on the Bhattacharyya distance improves the embedding of sparsely sampled manifolds. Our experiments show a significant performance improvement compared to stateoftheart landmark selection techniques on synthetic and medical data.
Ensemble Nyström
"... A common problem in many areas of largescale machine learning involves manipulation of a large matrix. This matrix may be a kernel matrix arising in Support Vector Machines [9, 15], Kernel Principal Component Analysis [47] or manifold learning [43,51]. Large matrices also naturally arise in other a ..."
Abstract
 Add to MetaCart
(Show Context)
A common problem in many areas of largescale machine learning involves manipulation of a large matrix. This matrix may be a kernel matrix arising in Support Vector Machines [9, 15], Kernel Principal Component Analysis [47] or manifold learning [43,51]. Large matrices also naturally arise in other applications, e.g., clustering, collaborative filtering, matrix completion, and robust PCA. For these largescale problems, the number of matrix entries can easily be in the order of billions or more, making them hard to process or even store. An attractive solution to this problem involves the Nyström method, in which one samples a small number of columns from the original matrix and generates its lowrank approximation using the sampled columns [53]. The accuracy of the Nyström method depends on the number columns sampled from the original matrix. Larger the number of samples, higher the accuracy but slower the method. In the Nyström method, one needs to perform SVD on a l × l matrix where l is the number of columns sampled from the original matrix. This SVD operation is typically carried out on a single machine. Thus, the maximum value of l used for an application is limited by the capacity of the machine. That is why in practice, one restricts l to be less than 20K or 30K, even when the size of matrix is in millions. This restricts the accuracy of the Nyström method in very largescale settings. This chapter describes a family of algorithms based on mixtures of Nyström approximations called, Ensemble Nyström algorithms, which yields more accurate lowrank approximations than the standard Nyström method. The core idea of Ensemble Nyström is to sample many subsets of columns from the original matrix, each containing a relatively small number of columns. Then, Nyström method is