Results 11  20
of
34
Multiresolution Matrix Factorization
"... The types of large matrices that appear in modern Machine Learning problems often have complex hierarchical structures that go beyond what can be found by traditional linear algebra tools, such as eigendecompositions. Inspired by ideas from multiresolution analysis, this paper introduces a new no ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
The types of large matrices that appear in modern Machine Learning problems often have complex hierarchical structures that go beyond what can be found by traditional linear algebra tools, such as eigendecompositions. Inspired by ideas from multiresolution analysis, this paper introduces a new notion of matrix factorization that can capture structure in matrices at multiple different scales. The resulting Multiresolution Matrix Factorizations (MMFs) not only provide a wavelet basis for sparse approximation, but can also be used for matrix compression (similar to Nyström approximations) and as a prior for matrix completion. 1.
Semisupervised eigenvectors for largescale locallybiased learning
 Journal of Machine Learning Research
"... In many applications, one has side information, e.g., labels that are provided in a semisupervised manner, about a specific target region of a large data set, and one wants to perform machine learning and data analysis tasks “nearby ” that prespecified target region. For example, one might be inter ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
In many applications, one has side information, e.g., labels that are provided in a semisupervised manner, about a specific target region of a large data set, and one wants to perform machine learning and data analysis tasks “nearby ” that prespecified target region. For example, one might be interested in the clustering structure of a data graph near a prespecified “seed set ” of nodes, or one might be interested in finding partitions in an image that are near a prespecified “ground truth ” set of pixels. Locallybiased problems of this sort are particularly challenging for popular eigenvectorbased machine learning and data analysis tools. At root, the reason is that eigenvectors are inherently global quantities, thus limiting the applicability of eigenvectorbased methods in situations where one is interested in very local properties of the data. In this paper, we address this issue by providing a methodology to construct semisupervised eigenvectors of a graph Laplacian, and we illustrate how these locallybiased eigenvectors can be used to perform locallybiased machine learning. These semisupervised eigenvectors capture successivelyorthogonalized directions of maximum variance, condi
ACCAMS: Additive CoClustering to Approximate Matrices Succinctly
"... Matrix completion and approximation are popular tools to capture a user’s preferences for recommendation and to approximate missing data. Instead of using lowrank factorization we take a drastically different approach, based on the simple insight that an additive model of coclusterings allows o ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Matrix completion and approximation are popular tools to capture a user’s preferences for recommendation and to approximate missing data. Instead of using lowrank factorization we take a drastically different approach, based on the simple insight that an additive model of coclusterings allows one to approximate matrices efficiently. This allows us to build a concise model that, per bit of model learned, significantly beats all factorization approaches in matrix completion. Even more surprisingly, we find that summing over small coclusterings is more effective in modeling matrices than classic coclustering, which uses just one large partitioning of the matrix. Following Occam’s razor principle, the fact that our model is more concise and yet just as accurate as more complex models suggests that it better captures the latent preferences and decision making processes present in the real world. We provide an iterative minimization algorithm, a collapsed Gibbs sampler, theoretical guarantees for matrix approximation, and excellent empirical evidence for the efficacy of our approach. We achieve stateoftheart results for matrix completion on Netflix at a fraction of the model complexity.
FARFIELD COMPRESSION FOR FAST KERNEL SUMMATION METHODS IN HIGH DIMENSIONS
"... Abstract. We consider fast kernel summations in high dimensions: given a large set of points in d dimensions (with d 3) and a pairpotential function (the kernel function), we compute a weighted sum of all pairwise kernel interactions for each point in the set. Direct summation is equivalent to a ( ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We consider fast kernel summations in high dimensions: given a large set of points in d dimensions (with d 3) and a pairpotential function (the kernel function), we compute a weighted sum of all pairwise kernel interactions for each point in the set. Direct summation is equivalent to a (dense) matrixvector multiplication and scales quadratically with the number of points. Fast kernel summation algorithms reduce this cost to loglinear or linear complexity. Treecodes and Fast Multipole Methods (FMMs) deliver tremendous speedups by constructing approximate representations of interactions of points that are far from each other. In algebraic terms, these representations correspond to lowrank approximations of blocks of the overall interaction matrix. Existing approaches require an excessive number of kernel evaluations with increasing d and number of points in the dataset. To address this issue, we use a randomized algebraic approach in which we first sample the rows of a block and then construct its approximate, lowrank interpolative decomposition. We examine the feasibility of this approach theoretically and experimentally. We provide a new theoretical result showing a tighter bound on the reconstruction error from uniformly sampling rows than the existing stateoftheart. We demonstrate that our sampling approach is competitive with existing (but prohibitively expensive) methods from the literature. We also construct kernel matrices for the Laplacian, Gaussian, and polynomial kernels – all commonly used in physics and data analysis. We explore the numerical properties of blocks of these matrices, and show that they are amenable to our approach. Depending on the data set, our randomized algorithm can successfully compute low rank approximations in high dimensions. We report results for data sets from four dimensions up to 1000 dimensions.
Outofsample Extension for Latent Position Graphs
, 2013
"... We consider the problem of vertex classification for graphs constructed from the latent position model. It was shown previously that the approach of embedding the graphs into some Euclidean space followed by classification in that space can yields a universally consistent vertex classifier. However ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
We consider the problem of vertex classification for graphs constructed from the latent position model. It was shown previously that the approach of embedding the graphs into some Euclidean space followed by classification in that space can yields a universally consistent vertex classifier. However, a major technical difficulty of the approach arises when classifying unlabeled outofsample vertices without including them in the embedding stage. In this paper, we studied the outofsample extension for the graph embedding step and its impact on the subsequent inference tasks. We show that, under the latent position graph model and for sufficiently large n, the mapping of the outofsample vertices is close to its true latent position. We then demonstrate that successful inference for the outofsample vertices is possible.
Randomized LowMemory Singular Value Projection
, 2013
"... Affine rank minimization algorithms typically rely on calculating the gradient of a data error followed by a singular value decomposition at every iteration. Because these two steps are expensive, heuristic approximations are often used to reduce computational burden. To this end, we propose a recov ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Affine rank minimization algorithms typically rely on calculating the gradient of a data error followed by a singular value decomposition at every iteration. Because these two steps are expensive, heuristic approximations are often used to reduce computational burden. To this end, we propose a recovery scheme that merges the two steps with randomized approximations, and as a result, operates on space proportional to the degrees of freedom in the problem. We theoretically establish the estimation guarantees of the algorithm as a function of approximation tolerance. While the theoretical approximation requirements are overly pessimistic, we demonstrate that in practice the algorithm performs well on the quantum tomography recovery problem. 1
Towards More Efficient SPSD Matrix Approximation and CUR Matrix Decomposition
, 2016
"... Abstract Symmetric positive semidefinite (SPSD) matrix approximation methods have been extensively used to speed up largescale eigenvalue computation and kernel learning methods. The standard sketch based method, which we call the prototype model, produces relatively accurate approximations, but ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Symmetric positive semidefinite (SPSD) matrix approximation methods have been extensively used to speed up largescale eigenvalue computation and kernel learning methods. The standard sketch based method, which we call the prototype model, produces relatively accurate approximations, but is inefficient on large square matrices. The Nyström method is highly efficient, but can only achieve low accuracy. In this paper we propose a novel model that we call the fast SPSD matrix approximation model. The fast model is nearly as efficient as the Nyström method and as accurate as the prototype model. We show that the fast model can potentially solve eigenvalue problems and kernel learning problems in linear time with respect to the matrix size n to achieve 1 + relativeerror, whereas both the prototype model and the Nyström method cost at least quadratic time to attain comparable error bound. Empirical comparisons among the prototype model, the Nyström method, and our fast model demonstrate the superiority of the fast model. We also contribute new understandings of the Nyström method. The Nyström method is a special instance of our fast model and is approximation to the prototype model. Our technique can be straightforwardly applied to make the CUR matrix decomposition more efficiently computed without much affecting the accuracy.
Leveraging for big data regression
"... Rapid advance in science and technology in the past decade brings an extraordinary amount of data, offering researchers an unprecedented opportunity to tackle complex research challenges. The opportunity, however, has not yet been fully utilized, because effective and efficient statistical tools fo ..."
Abstract
 Add to MetaCart
(Show Context)
Rapid advance in science and technology in the past decade brings an extraordinary amount of data, offering researchers an unprecedented opportunity to tackle complex research challenges. The opportunity, however, has not yet been fully utilized, because effective and efficient statistical tools for analyzing superlarge dataset are still lacking. One major challenge is that the advance of computing resources still lags far behind the exponential growth of database. To facilitate scientific discoveries using current computing resources, one may use an emerging family of statistical methods, called leveraging. Leveraging methods are designed under a subsampling framework, in which one samples a small proportion of the data (subsample) from the full sample, and then performs intended computations for the full sample using the small subsample as a surrogate. The key of the success of the leveraging methods is to construct nonuniform sampling probabilities so that influential data points are sampled with high probabilities. These methods stand as the very unique development of their type in big data analytics and allow pervasive access to massive amounts of information without resorting to high performance computing and cloud computing.
Sparse Kernel Clustering of Massive HighDimensional Data sets with Large Number of Clusters
"... ABSTRACT In clustering applications involving documents and images, in addition to the large number of data points (N ) and their high dimensionality (d), the number of clusters (C) into which the data need to be partitioned is also large. Kernelbased clustering algorithms, which have been shown t ..."
Abstract
 Add to MetaCart
(Show Context)
ABSTRACT In clustering applications involving documents and images, in addition to the large number of data points (N ) and their high dimensionality (d), the number of clusters (C) into which the data need to be partitioned is also large. Kernelbased clustering algorithms, which have been shown to perform better than linear clustering algorithms, have high running time complexity in terms of N , d and C. We propose an efficient sparse kernel kmeans clustering algorithm, which incrementally samples the most informative points from the data set using importance sampling, and constructs a sparse kernel matrix using these sampled points. Each row in this matrix corresponds to a data point's similarity with its pnearest neighbors among the sampled points (p ≪ N ). This sparse kernel matrix is used to perform clustering and obtain the cluster labels. This combination of sampling and sparsity reduces both the running time and memory complexity of kernel clustering. In order to further enhance its efficiency, the proposed algorithm projects the data on to the top C eigenvectors of the sparse kernel matrix and clusters these eigenvectors using a modified k means algorithm. The running time of the proposed sparse kernel k means algorithm is linear in N and d, and logarithmic in C. We show analytically that only a small number of points need to be sampled from the data set, and the resulting approximation error is wellbounded. We demonstrate, using several large highdimensional text and image data sets, that the proposed algorithm is significantly faster than classical kernelbased clustering algorithms, while maintaining clustering quality.