Results 1 - 10
of
23
On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2005
"... A problem for many kernel-based methods is that the amount of computation required to find the solution scales as O(n³), where n is the number of training examples. We develop and analyze an algorithm to compute an easily-interpretable low-rank approximation to an nn Gram matrix G such that compu ..."
Abstract
-
Cited by 56 (6 self)
- Add to MetaCart
A problem for many kernel-based methods is that the amount of computation required to find the solution scales as O(n³), where n is the number of training examples. We develop and analyze an algorithm to compute an easily-interpretable low-rank approximation to an nn Gram matrix G such that computations of interest may be performed more rapidly. The approximation is of the form G k = CW , where C is a matrix consisting of a small number c of columns of G and W k is the best rank-k approximation to W , the matrix formed by the intersection between those c columns of G and the corresponding c rows of G. An important aspect of the algorithm is the probability distribution used to randomly sample the columns; we will use a judiciously-chosen and data-dependent nonuniform probability distribution. Let F denote the spectral norm and the Frobenius norm, respectively, of a matrix, and let G k be the best rank-k approximation to G. We prove that by choosing O(k/# ) columns both in expectation and with high probability, for both # = 2, F , and for all k : 0 rank(W ). This approximation can be computed using O(n) additional space and time, after making two passes over the data from external storage. The relationships between this algorithm, other related matrix decompositions, and the Nyström method from integral equation theory are discussed.
Fast Monte Carlo algorithms for matrices I: Approximating matrix multiplication
- SIAM Journal on Computing
, 2004
"... ..."
Matrix approximation and projective clustering via volume sampling
- In SODA
, 2006
"... We present two new results for the problem of approximating a given real m × n matrix A by a rank-k matrix D, where k < min{m, n}, so as to minimize ||A − D| | 2 F. It is known that by sampling O(k/ɛ) rows of the matrix, one can find a low-rank approximation with additive error ɛ||A| | 2 F. Our firs ..."
Abstract
-
Cited by 46 (2 self)
- Add to MetaCart
We present two new results for the problem of approximating a given real m × n matrix A by a rank-k matrix D, where k < min{m, n}, so as to minimize ||A − D| | 2 F. It is known that by sampling O(k/ɛ) rows of the matrix, one can find a low-rank approximation with additive error ɛ||A| | 2 F. Our first result shows that with adaptive sampling in t rounds and O(k/ɛ) samples in each round, the additive error drops exponentially as ɛt; the computation time is nearly linear in the number of nonzero entries. This demonstrates that multiple passes can be highly beneficial for a natural (and widely studied) algorithmic problem. Our second result is that there exists a subset of O(k2 /ɛ) rows such that their span contains a rank-k approximation with multiplicative (1 + ɛ) error (i.e., the sum of squares distance has a small “core-set ” whose span determines a good approximation). This existence theorem leads to a PTAS for the following projective clustering problem: Given a set of points P in Rd, and integers k, j, find a set of j subspaces F1,..., Fj, each of dimension at most k, that minimize ∑ p∈P mini d(p, Fi) 2. 1
Sampling from large matrices: an approach through geometric functional analysis
- Journal of the ACM
, 2006
"... Abstract. We study random submatrices of a large matrix A. We show how to approximately compute A from its random submatrix of the smallest possible size O(r log r) with a small error in the spectral norm, where r = �A�2 F /�A�22 is the numerical rank of A. The numerical rank is always bounded by, a ..."
Abstract
-
Cited by 43 (4 self)
- Add to MetaCart
Abstract. We study random submatrices of a large matrix A. We show how to approximately compute A from its random submatrix of the smallest possible size O(r log r) with a small error in the spectral norm, where r = �A�2 F /�A�22 is the numerical rank of A. The numerical rank is always bounded by, and is a stable relaxation of, the rank of A. This yields an asymptotically optimal guarantee in an algorithm for computing low-rank approximations of A. We also prove asymptotically optimal estimates on the spectral norm and the cut-norm of random submatrices of A. The result for the cut-norm yields a slight improvement on the best known sample complexity for an approximation algorithm for MAX-2CSP problems. We use methods of Probability in Banach spaces, in particular the law of large numbers for operator-valued random variables. 1.
On graph problems in a semi-streaming model
- In 31st International Colloquium on Automata, Languages and Programming
, 2004
"... Abstract. We formalize a potentially rich new streaming model, the semi-streaming model, that we believe is necessary for the fruitful study of efficient algorithms for solving problems on massive graphs whose edge sets cannot be stored in memory. In this model, the input graph, G = (V, E), is prese ..."
Abstract
-
Cited by 38 (4 self)
- Add to MetaCart
Abstract. We formalize a potentially rich new streaming model, the semi-streaming model, that we believe is necessary for the fruitful study of efficient algorithms for solving problems on massive graphs whose edge sets cannot be stored in memory. In this model, the input graph, G = (V, E), is presented as a stream of edges (in adversarial order), and the storage space of an algorithm is bounded by O(n · polylog n), where n = |V |. We are particularly interested in algorithms that use only one pass over the input, but, for problems where this is provably insufficient, we also look at algorithms using constant or, in some cases, logarithmically many passes. In the course of this general study, we give semi-streaming constant approximation algorithms for the unweighted and weighted matching problems, along with a further algorithm improvement for the bipartite case. We also exhibit log n / log log n semistreaming approximations to the diameter and the problem of computing the distance between specified vertices in a weighted graph. These are complemented by Ω(log (1−ɛ) n) lower bounds. 1
Graph distances in the streaming model: the value of space
- In ACM-SIAM Symposium on Discrete Algorithms
, 2005
"... We investigate the importance of space when solving problems based on graph distance in the streaming model. In this model, the input graph is presented as a stream of edges in an arbitrary order. The main computational restriction of the model is that we have limited space and therefore cannot stor ..."
Abstract
-
Cited by 38 (8 self)
- Add to MetaCart
We investigate the importance of space when solving problems based on graph distance in the streaming model. In this model, the input graph is presented as a stream of edges in an arbitrary order. The main computational restriction of the model is that we have limited space and therefore cannot store all the streamed data; we are forced to make space-efficient summaries of the data as we go along. For a graph of n vertices and m edges, we show that testing many graph properties, including connectivity (ergo any reasonable decision problem about distances) and bipartiteness, requires Ω(n) bits of space. Given this, we then investigate how the power of the model increases as we relax our space restriction. Our main result is an efficient randomized algorithm that constructs a (2t + 1)-spanner in one pass. With high probability, it uses O(t · n 1+1/t log 2 n) bits of space and processes each edge in the stream in O(t 2 · n 1/t log n) time. We find approximations to diameter and girth via the log n constructed spanner. For t = Ω (), the space log log n requirement of the algorithm is O(n·polylog n), and the per-edge processing time is O(polylog n). We also show a corresponding lower bound of t for the approximation ratio achievable when the space restriction is O(t · n1+1/t log 2 n). We then consider the scenario in which we are allowed multiple passes over the input stream. Here, we investigate whether allowing these extra passes will compensate for a given space restriction. We show that ∗This work was supported by the DoD University Research Initiative (URI) administered by the Office of Naval Research
Indexing Hierarchical Structures Using Graph Spectra
, 2005
"... Hierarchical image structures are abundant in computer vision and have been used to encode part structure, scale spaces, and a variety of multiresolution features. In this paper, we describe a framework for indexing such representations that embeds the topological structure of a directed acyclic g ..."
Abstract
-
Cited by 33 (9 self)
- Add to MetaCart
Hierarchical image structures are abundant in computer vision and have been used to encode part structure, scale spaces, and a variety of multiresolution features. In this paper, we describe a framework for indexing such representations that embeds the topological structure of a directed acyclic graph (DAG) into a low-dimensional vector space. Based on a novel spectral characterization of a DAG, this topological signature allows us to efficiently retrieve a promising set of candidates from a database of models using a simple nearest-neighbor search. We establish the insensitivity of the signature to minor perturbation of graph structure due to noise, occlusion, or node split/merge. To accommodate large-scale occlusion, the DAG rooted at each nonleaf node of the query "votes" for model objects that share that "part," effectively accumulating local evidence in a model DAG's topological subspaces. We demonstrate the approach with a series of indexing experiments in the domain of view-based 3D object recognition using shock graphs.
A randomized algorithm for a tensor-based generalization of the Singular Value Decomposition
- In Linear
, 2005
"... ~A ..."
RELATIVE-ERROR CUR MATRIX DECOMPOSITIONS
- SIAM J. MATRIX ANAL. APPL
, 2008
"... Many data analysis applications deal with large matrices and involve approximating the matrix using a small number of “components.” Typically, these components are linear combinations of the rows and columns of the matrix, and are thus difficult to interpret in terms of the original features of the ..."
Abstract
-
Cited by 21 (7 self)
- Add to MetaCart
Many data analysis applications deal with large matrices and involve approximating the matrix using a small number of “components.” Typically, these components are linear combinations of the rows and columns of the matrix, and are thus difficult to interpret in terms of the original features of the input data. In this paper, we propose and study matrix approximations that are explicitly expressed in terms of a small number of columns and/or rows of the data matrix, and thereby more amenable to interpretation in terms of the original data. Our main algorithmic results are two randomized algorithms which take as input an m × n matrix A and a rank parameter k. In our first algorithm, C is chosen, and we let A ′ = CC + A, where C + is the Moore–Penrose generalized inverse of C. In our second algorithm C, U, R are chosen, and we let A ′ = CUR. (C and R are matrices that consist of actual columns and rows, respectively, of A, and U is a generalized inverse of their intersection.) For each algorithm, we show that with probability at least 1 − δ, ‖A − A ′ ‖F ≤ (1 + ɛ) ‖A − Ak‖F, where Ak is the “best ” rank-k approximation provided by truncating the SVD of A, and where ‖X‖F is the Frobenius norm of the matrix X. The number of columns of C and rows of R is a low-degree polynomial in k, 1/ɛ, and log(1/δ). Both the Numerical Linear Algebra community and the Theoretical Computer Science community have studied variants

