Results 1  10
of
11
Revisiting the Nyström method for improved largescale machine learning
"... We reconsider randomized algorithms for the lowrank approximation of SPSD matrices such as Laplacian and kernel matrices that arise in data analysis and machine learning applications. Our main results consist of an empirical evaluation of the performance quality and running time of sampling and pro ..."
Abstract

Cited by 34 (5 self)
 Add to MetaCart
(Show Context)
We reconsider randomized algorithms for the lowrank approximation of SPSD matrices such as Laplacian and kernel matrices that arise in data analysis and machine learning applications. Our main results consist of an empirical evaluation of the performance quality and running time of sampling and projection methods on a diverse suite of SPSD matrices. Our results highlight complementary aspects of sampling versus projection methods, and they point to differences between uniform and nonuniform sampling methods based on leverage scores. We complement our empirical results with a suite of worstcase theoretical bounds for both random sampling and random projection methods. These bounds are qualitatively superior to existing bounds—e.g., improved additiveerror bounds for spectral and Frobenius norm error and relativeerror bounds for trace norm error. 1.
Faster Ridge Regression via the Subsampled Randomized Hadamard Transform
"... We propose a fast algorithm for ridge regression when the number of features is much larger than the number of observations (p n). The standard way to solve ridge regression in this setting works in the dual space and gives a running time of O(n2p). Our algorithm Subsampled Randomized Hadamard Tran ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
We propose a fast algorithm for ridge regression when the number of features is much larger than the number of observations (p n). The standard way to solve ridge regression in this setting works in the dual space and gives a running time of O(n2p). Our algorithm Subsampled Randomized Hadamard Transform Dual Ridge Regression (SRHTDRR) runs in time O(np log(n)) and works by preconditioning the design matrix by a Randomized WalshHadamard Transform with a subsequent subsampling of features. We provide risk bounds for our SRHTDRR algorithm in the fixed design setting and show experimental results on synthetic and real datasets. 1
QuasiMonte Carlo Feature Maps for ShiftInvariant Kernels. ICML
, 2014
"... Abstract We consider the problem of improving the efficiency of randomized Fourier feature maps to accelerate training and testing speed of kernel methods on large data sets. These approximate feature maps arise as Monte Carlo approximations to integral representations of shiftinvariant kernel fun ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Abstract We consider the problem of improving the efficiency of randomized Fourier feature maps to accelerate training and testing speed of kernel methods on large data sets. These approximate feature maps arise as Monte Carlo approximations to integral representations of shiftinvariant kernel functions (e.g., Gaussian kernel). In this paper, we propose to use QuasiMonte Carlo (QMC) approximations instead, where the relevant integrands are evaluated on a lowdiscrepancy sequence of points as opposed to random point sets as in the Monte Carlo approach. We derive a new discrepancy measure called box discrepancy based on theoretical characterizations of the integration error with respect to a given sequence. We then propose to learn QMC sequences adapted to our setting based on explicit box discrepancy minimization. Our theoretical analyses are complemented with empirical results that demonstrate the effectiveness of classical and adaptive QMC techniques for this problem.
1Improved Bounds for the Nyström Method with Application to Kernel Classification
"... Abstract—We develop two approaches for analyzing the approximation error bound for the Nyström method that approximates a positive semidefinite (PSD) matrix by sampling a small set of columns, one based on the concentration inequality of integral operator, and one based on the random matrix theor ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Abstract—We develop two approaches for analyzing the approximation error bound for the Nyström method that approximates a positive semidefinite (PSD) matrix by sampling a small set of columns, one based on the concentration inequality of integral operator, and one based on the random matrix theory. We show that the approximation error, measured in the spectral norm, can be improved from O(N/ m) to O(N/m1−ρ) in the case of large eigengap, where N is the total number of data points, m is the number of sampled data points, and ρ ∈ (0, 1/2) is a positive constant that characterizes the eigengap. When the eigenvalues of the kernel matrix follow a ppower law, our analysis based on random matrix theory further improves the bound to O(N/mp−1) under an incoherence assumption. We present a kernel classification approach based on the Nyström method and derive its generalization performance using the improved bound. We show that when the eigenvalues of kernel matrix follow a ppower law, we can reduce the number of support vectors to N2p/(p 2−1), which is sublinear in N when p> 1 + 2, without seriously sacrificing its generalization performance. Index Terms—Nyström method, approximation error, concentration inequality, kernel methods, random matrix theory I.
regression. Submitted to the Electronic Journal of Statistics. 2013. <hal00846715>
, 2013
"... bewteen multitask and singletask oracle risks in kernel ridge regression ..."
Abstract
 Add to MetaCart
(Show Context)
bewteen multitask and singletask oracle risks in kernel ridge regression
Stat260/CS294: Randomized Algorithms for Matrices and Data
"... Major research project: An important component of the class will be a major research project. The goal will be to drill down in much more detail on some topic related to what was covered in the lectures. Important dates: Please email a ps or pdf of the following reports to the TA by 5PM on the date ..."
Abstract
 Add to MetaCart
(Show Context)
Major research project: An important component of the class will be a major research project. The goal will be to drill down in much more detail on some topic related to what was covered in the lectures. Important dates: Please email a ps or pdf of the following reports to the TA by 5PM on the date specified—do not be late. • Wed, Oct 23, 2013: A brief statement stating (1) who you will be working with and (2) the project you plan to address. (One or two sentences as text is fine—we want to make sure people are working on a range of projects and will get back very soon if there is any problem.) • Wed, Oct 30, 2013: Initial proposal, consisting of not more than a one page summary of the proposed project. (This will be mostly to make sure you are on the right track—we will get back to you within a few days if there are any issues, and we can also discuss any concerns you might have.) • Wed, Nov 20, 2013: Midterm report, consisting of approximately three to four pages with a brief summary of relevant literature, summary of proposed directions, and any questions or problems encountered.
cal Methods
"... 5.2. SiGMa Simple Greedy Matching: a tool for aligning large knowledgebases 3 ..."
Abstract
 Add to MetaCart
5.2. SiGMa Simple Greedy Matching: a tool for aligning large knowledgebases 3
Fast Randomized Kernel Methods With Statistical Guarantees
"... One approach to improving the running time of kernelbased machine learning methods is to build a small sketch of the input and use it in lieu of the full kernel matrix in the machine learning task of interest. Here, we describe a version of this approach that comes with running time guarantees as w ..."
Abstract
 Add to MetaCart
One approach to improving the running time of kernelbased machine learning methods is to build a small sketch of the input and use it in lieu of the full kernel matrix in the machine learning task of interest. Here, we describe a version of this approach that comes with running time guarantees as well as improved guarantees on its statistical performance. By extending the notion of statistical leverage scores to the setting of kernel ridge regression, our main statistical result is to identify an importance sampling distribution that reduces the size of the sketch (i.e., the required number of columns to be sampled) to the effective dimensionality of the problem. This quantity is often much smaller than previous bounds that depend on the maximal degrees of freedom. Our main algorithmic result is to present a fast algorithm to compute approximations to these scores. This algorithm runs in time that is linear in the number of samples—more precisely, the running time is O(np2), where the parameter p depends only on the trace of the kernel matrix and the regularization parameter—and it can be applied to the matrix of feature vectors, without having to form the full kernel matrix. This is obtained via a variant of lengthsquared sampling that we adapt to the kernel setting in a way that is of independent interest. Lastly, we provide empirical results illustrating our theory, and we discuss how this new notion of the statistical leverage of a data point captures in a fine way the difficulty of the original statistical learning problem. 1