Results 1  10
of
34
Improving CUR Matrix Decomposition and the Nyström Approximation via Adaptive Sampling
"... The CUR matrix decomposition and the Nyström approximation are two important lowrank matrix approximation techniques. The Nyström method approximates a symmetric positive semidefinite matrix in terms of a small number of its columns, while CUR approximates an arbitrary data matrix by a small number ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
The CUR matrix decomposition and the Nyström approximation are two important lowrank matrix approximation techniques. The Nyström method approximates a symmetric positive semidefinite matrix in terms of a small number of its columns, while CUR approximates an arbitrary data matrix by a small number of its columns and rows. Thus, CUR decomposition can be regarded as an extension of the Nyström approximation. In this paper we establish a more general error bound for the adaptive column/row sampling algorithm, based on which we propose more accurate CUR and Nyström algorithms with expected relativeerror bounds. The proposed CUR and Nyström algorithms also have low time complexity and can avoid maintaining the whole data matrix in RAM. In addition, we give theoretical analysis for the lower error bounds of the standard Nyström method and the ensemble Nyström method. The main theoretical results established in this paper are novel, and our analysis makes no special assumption on the data matrices.
A statistical perspective on algorithmic leveraging
, 2013
"... One popular method for dealing with largescale data sets is sampling. Using the empirical statistical leverage scores as an importance sampling distribution, the method of algorithmic leveraging samples and rescales data matrices to reduce the data size before performing computations on the subpr ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
(Show Context)
One popular method for dealing with largescale data sets is sampling. Using the empirical statistical leverage scores as an importance sampling distribution, the method of algorithmic leveraging samples and rescales data matrices to reduce the data size before performing computations on the subproblem. Existing work has focused on algorithmic issues, but none of it addresses statistical aspects of this method. Here, we provide an effective framework to evaluate the statistical properties of algorithmic leveraging in the context of estimating parameters in a linear regression model. In particular, for several versions of leveragebased sampling, we derive results for the bias and variance. We show that from the statistical perspective of bias and variance, neither leveragebased sampling nor uniform sampling dominates the other. This result is particularly striking, given the wellknown result that, from the algorithmic perspective of worstcase analysis, leveragebased sampling provides uniformly superior worstcase algorithmic results, when compared with uniform sampling. Based on these theoretical results, we propose and analyze two new leveraging algorithms: one constructs a smaller leastsquares problem with “shrinked” leverage scores (SLEV), and the other solves a smaller and unweighted (or biased) leastsquares problem (LEVUNW). The empirical results indicate that our theory is a good predictor of practical performance of existing and new leveragebased algorithms and that the new algorithms achieve improved performance.
Fast Detection of Overlapping Communities via Online Tensor Methods on GPUs
"... We present a scalable tensorbased approach for detecting hidden overlapping communities under the mixed membership stochastic block model. We employ stochastic gradient descent for performing tensor decompositions, which provides flexibility to tradeoff node subsampling with accuracy. Our GPU impl ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
(Show Context)
We present a scalable tensorbased approach for detecting hidden overlapping communities under the mixed membership stochastic block model. We employ stochastic gradient descent for performing tensor decompositions, which provides flexibility to tradeoff node subsampling with accuracy. Our GPU implementation of the tensorbased approach is extremely fast and scalable, and involves a careful optimization of GPUCPU storage and communication. We validate our results on datasets from popular social networks (Facebook, Yelp and DBLP), where ground truth is available, using notions of pvalues and false discovery rates, and obtain high accuracy for membership recovery. We compare our results, both in terms of execution time and accuracy, to the stateoftheart algorithms such as the variational method, and report better performance. For instance, on the Yelp network consisting of about 40,000 nodes and 500 communities, we recover the latent communities in under 30 minutes, and on the DBLP network consisting of about 120,000 nodes and 500 communities, we recover the latent communities in about 2.8 hours. In comparison, the variational method takes more than an order of magnitude higher execution time on the same datasets.
Sketching as a tool for numerical linear algebra
 Foundations and Trends in Theoretical Computer Science
"... ar ..."
(Show Context)
Optimal CUR matrix decompositions
 In Proceedings of the 46th Annual ACM Symposium on Theory of Computing (STOC
, 2014
"... ar ..."
(Show Context)
A Tensor Approach to Learning Mixed Membership Community Models
"... Community detection is the task of detecting hidden communities from observed interactions. Guaranteed community detection has so far been mostly limited to models with nonoverlapping communities such as the stochastic block model. In this paper, we remove this restriction, and provide guaranteed ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
Community detection is the task of detecting hidden communities from observed interactions. Guaranteed community detection has so far been mostly limited to models with nonoverlapping communities such as the stochastic block model. In this paper, we remove this restriction, and provide guaranteed community detection for a family of probabilistic network models with overlapping communities, termed as the mixed membership Dirichlet model, first introduced by Airoldi et al. (2008). This model allows for nodes to have fractional memberships in multiple communities and assumes that the community memberships are drawn from a Dirichlet distribution. Moreover, it contains the stochastic block model as a special case. We propose a unified approach to learning these models via a tensor spectral decomposition method. Our estimator is based on loworder moment tensor of the observed network, consisting of 3star counts. Our learning method is fast and is based on simple linear algebraic operations, e.g., singular value decomposition and tensor power iterations. We provide guaranteed recovery of community memberships and model parameters and present a careful finite sample analysis of our learning method. As an important special case, our results match the best known scaling requirements for the (homogeneous) stochastic block model.
QuasiMonte Carlo Feature Maps for ShiftInvariant Kernels. ICML
, 2014
"... Abstract We consider the problem of improving the efficiency of randomized Fourier feature maps to accelerate training and testing speed of kernel methods on large data sets. These approximate feature maps arise as Monte Carlo approximations to integral representations of shiftinvariant kernel fun ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Abstract We consider the problem of improving the efficiency of randomized Fourier feature maps to accelerate training and testing speed of kernel methods on large data sets. These approximate feature maps arise as Monte Carlo approximations to integral representations of shiftinvariant kernel functions (e.g., Gaussian kernel). In this paper, we propose to use QuasiMonte Carlo (QMC) approximations instead, where the relevant integrands are evaluated on a lowdiscrepancy sequence of points as opposed to random point sets as in the Monte Carlo approach. We derive a new discrepancy measure called box discrepancy based on theoretical characterizations of the integration error with respect to a given sequence. We then propose to learn QMC sequences adapted to our setting based on explicit box discrepancy minimization. Our theoretical analyses are complemented with empirical results that demonstrate the effectiveness of classical and adaptive QMC techniques for this problem.
Efficient Algorithms and Error Analysis for the Modified Nyström Method
"... Many kernel methods suffer from high time and space complexities and are thus prohibitive in bigdata applications. To tackle the computational challenge, the Nyström method has been extensively used to reduce time and space complexities by sacrificing some accuracy. The Nyström method speedups ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Many kernel methods suffer from high time and space complexities and are thus prohibitive in bigdata applications. To tackle the computational challenge, the Nyström method has been extensively used to reduce time and space complexities by sacrificing some accuracy. The Nyström method speedups computation by constructing an approximation of the kernel matrix using only a few columns of the matrix. Recently, a variant of the Nyström method called the modified Nyström method has demonstrated significant improvement over the standard Nyström method in approximation accuracy, both theoretically and empirically. In this paper, we propose two algorithms that make the modified Nyström method practical. First, we devise a simple column selection algorithm with a provable error bound. Our algorithm is more efficient and easier to implement than and nearly as accurate as the stateoftheart algorithm. Second, with the selected columns at hand, we propose an algorithm that computes the approximation in lower time complexity than the approach in the previous work. Furthermore, we prove that the modified Nyström method is exact under certain conditions, and we establish a lower error bound for the modified Nyström method. 1
Memory Efficient Kernel Approximation
, 2014
"... The scalability of kernel machines is a big challenge when facing millions of samples due to storage and computation issues for large kernel matrices, that are usually dense. Recently, many papers have suggested tackling this problem by using a lowrank approximation of the kernel matrix. In this pa ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
The scalability of kernel machines is a big challenge when facing millions of samples due to storage and computation issues for large kernel matrices, that are usually dense. Recently, many papers have suggested tackling this problem by using a lowrank approximation of the kernel matrix. In this paper, we first make the observation that the structure of shiftinvariant kernels changes from lowrank to blockdiagonal (without any lowrank structure) when varying the scale parameter. Based on this observation, we propose a new kernel approximation algorithm – Memory Efficient Kernel Approximation (MEKA), which considers both lowrank and clustering structure of the kernel matrix. We show that the resulting algorithm outperforms stateoftheart lowrank kernel approximation methods in terms of speed, approximation error, and memory usage. As an example, on the mnist2m dataset with twomillion samples, our method takes 550 seconds on a single machine using less than 500 MBytes memory to achieve 0.2313 test RMSE for kernel ridge regression, while standard Nyström approximation takes more than 2700 seconds and uses more than 2 GBytes memory on the same problem to achieve 0.2318 test RMSE.
1Improved Bounds for the Nyström Method with Application to Kernel Classification
"... Abstract—We develop two approaches for analyzing the approximation error bound for the Nyström method that approximates a positive semidefinite (PSD) matrix by sampling a small set of columns, one based on the concentration inequality of integral operator, and one based on the random matrix theor ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Abstract—We develop two approaches for analyzing the approximation error bound for the Nyström method that approximates a positive semidefinite (PSD) matrix by sampling a small set of columns, one based on the concentration inequality of integral operator, and one based on the random matrix theory. We show that the approximation error, measured in the spectral norm, can be improved from O(N/ m) to O(N/m1−ρ) in the case of large eigengap, where N is the total number of data points, m is the number of sampled data points, and ρ ∈ (0, 1/2) is a positive constant that characterizes the eigengap. When the eigenvalues of the kernel matrix follow a ppower law, our analysis based on random matrix theory further improves the bound to O(N/mp−1) under an incoherence assumption. We present a kernel classification approach based on the Nyström method and derive its generalization performance using the improved bound. We show that when the eigenvalues of kernel matrix follow a ppower law, we can reduce the number of support vectors to N2p/(p 2−1), which is sublinear in N when p> 1 + 2, without seriously sacrificing its generalization performance. Index Terms—Nyström method, approximation error, concentration inequality, kernel methods, random matrix theory I.