Results 1  10
of
17
Revisiting the Nyström method for improved largescale machine learning
"... We reconsider randomized algorithms for the lowrank approximation of SPSD matrices such as Laplacian and kernel matrices that arise in data analysis and machine learning applications. Our main results consist of an empirical evaluation of the performance quality and running time of sampling and pro ..."
Abstract

Cited by 34 (5 self)
 Add to MetaCart
(Show Context)
We reconsider randomized algorithms for the lowrank approximation of SPSD matrices such as Laplacian and kernel matrices that arise in data analysis and machine learning applications. Our main results consist of an empirical evaluation of the performance quality and running time of sampling and projection methods on a diverse suite of SPSD matrices. Our results highlight complementary aspects of sampling versus projection methods, and they point to differences between uniform and nonuniform sampling methods based on leverage scores. We complement our empirical results with a suite of worstcase theoretical bounds for both random sampling and random projection methods. These bounds are qualitatively superior to existing bounds—e.g., improved additiveerror bounds for spectral and Frobenius norm error and relativeerror bounds for trace norm error. 1.
Sketching as a tool for numerical linear algebra
 Foundations and Trends in Theoretical Computer Science
"... ar ..."
(Show Context)
Optimal CUR matrix decompositions
 In Proceedings of the 46th Annual ACM Symposium on Theory of Computing (STOC
, 2014
"... ar ..."
(Show Context)
Efficient Algorithms and Error Analysis for the Modified Nyström Method
"... Many kernel methods suffer from high time and space complexities and are thus prohibitive in bigdata applications. To tackle the computational challenge, the Nyström method has been extensively used to reduce time and space complexities by sacrificing some accuracy. The Nyström method speedups ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Many kernel methods suffer from high time and space complexities and are thus prohibitive in bigdata applications. To tackle the computational challenge, the Nyström method has been extensively used to reduce time and space complexities by sacrificing some accuracy. The Nyström method speedups computation by constructing an approximation of the kernel matrix using only a few columns of the matrix. Recently, a variant of the Nyström method called the modified Nyström method has demonstrated significant improvement over the standard Nyström method in approximation accuracy, both theoretically and empirically. In this paper, we propose two algorithms that make the modified Nyström method practical. First, we devise a simple column selection algorithm with a provable error bound. Our algorithm is more efficient and easier to implement than and nearly as accurate as the stateoftheart algorithm. Second, with the selected columns at hand, we propose an algorithm that computes the approximation in lower time complexity than the approach in the previous work. Furthermore, we prove that the modified Nyström method is exact under certain conditions, and we establish a lower error bound for the modified Nyström method. 1
Provably Correct Active Sampling Algorithms for Matrix Column Subset Selection with Missing Data ∗
, 2015
"... We consider the problem of matrix column subset selection, which selects a subset of columns from an input matrix such that the input can be well approximated by the span of the selected columns. Column subset selection has been applied to numerous realworld data applications such as population gen ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
We consider the problem of matrix column subset selection, which selects a subset of columns from an input matrix such that the input can be well approximated by the span of the selected columns. Column subset selection has been applied to numerous realworld data applications such as population genetics summarization, electronic circuits testing and recommendation systems. In many applications the complete data matrix is unavailable and one needs to select representative columns by inspecting only a small portion of the input matrix. In this paper we propose the first provably correct column subset selection algorithms for partially observed data matrices. Our proposed algorithms exhibit different merits and drawbacks in terms of statistical accuracy, computational efficiency, sample complexity and sampling schemes, which provides a nice exploration of the tradeoff between these desired properties for column subset selection. The proposed methods employ the idea of feedback driven sampling and are inspired by several sampling schemes previously introduced for lowrank matrix approximation tasks [DMM08, FKV04, DV06, KS14]. Our analysis shows that two of the proposed algorithms enjoy a relative error bound, which is preferred for column subset selection and matrix approximation purposes. We also demonstrate through both theoretical and empirical analysis the power of feedback driven sampling compared to uniform random sampling on input matrices with highly correlated columns. 1
CUR Algorithm for Partially Observed Matrices
"... CUR matrix decomposition computes the low rank approximation of a given matrix by using the actual rows and columns of the matrix. It has been a very useful tool for handling large matrices. One limitation with the existing algorithms for CURmatrix decomposition is that they cannot deal with entrie ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
CUR matrix decomposition computes the low rank approximation of a given matrix by using the actual rows and columns of the matrix. It has been a very useful tool for handling large matrices. One limitation with the existing algorithms for CURmatrix decomposition is that they cannot deal with entries in a partially observed matrix, while incomplete matrices are found in many real world applications. In this work, we alleviate this limitation by developing a CUR decomposition algorithm for partially observed matrices. In particular, the proposed algorithm computes the low rank approximation of the target matrix based on (i) the randomly sampled rows and columns, and (ii) a subset of observed entries that are randomly sampled from the matrix. Our analysis shows the error bound, measured by spectral norm, for the proposed algorithm when the target matrix is of full rank. We also show that only O(nr ln r) observed entries are needed by the proposed algorithm to perfectly recover a rank r matrix of size n × n, which improves the sample complexity of the existing algorithms for matrix completion. Empirical studies on both synthetic and realworld datasets verify our theoretical claims and demonstrate the effectiveness of the proposed algorithm.
Towards More Efficient SPSD Matrix Approximation and CUR Matrix Decomposition
, 2016
"... Abstract Symmetric positive semidefinite (SPSD) matrix approximation methods have been extensively used to speed up largescale eigenvalue computation and kernel learning methods. The standard sketch based method, which we call the prototype model, produces relatively accurate approximations, but ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Symmetric positive semidefinite (SPSD) matrix approximation methods have been extensively used to speed up largescale eigenvalue computation and kernel learning methods. The standard sketch based method, which we call the prototype model, produces relatively accurate approximations, but is inefficient on large square matrices. The Nyström method is highly efficient, but can only achieve low accuracy. In this paper we propose a novel model that we call the fast SPSD matrix approximation model. The fast model is nearly as efficient as the Nyström method and as accurate as the prototype model. We show that the fast model can potentially solve eigenvalue problems and kernel learning problems in linear time with respect to the matrix size n to achieve 1 + relativeerror, whereas both the prototype model and the Nyström method cost at least quadratic time to attain comparable error bound. Empirical comparisons among the prototype model, the Nyström method, and our fast model demonstrate the superiority of the fast model. We also contribute new understandings of the Nyström method. The Nyström method is a special instance of our fast model and is approximation to the prototype model. Our technique can be straightforwardly applied to make the CUR matrix decomposition more efficiently computed without much affecting the accuracy.
Column Subset Selection with Missing Data via Active Sampling
"... Column subset selection of massive data matrices has found numerous applications in realworld data systems. In this paper, we propose and analyze two sampling based algorithms for column subset selection without access to the complete input matrix. To our knowledge, these are the first algorithms ..."
Abstract
 Add to MetaCart
(Show Context)
Column subset selection of massive data matrices has found numerous applications in realworld data systems. In this paper, we propose and analyze two sampling based algorithms for column subset selection without access to the complete input matrix. To our knowledge, these are the first algorithms for column subset selection with missing data that are provably correct. The proposed methods work for row/column coherent matrices by employing the idea of adaptive sampling. Furthermore, when the input matrix has a noisy lowrank structure, one algorithm enjoys a relative error bound. 1