Results 1 - 10
of
17
An Improved Approximation Algorithm for the Column Subset Selection Problem
"... We consider the problem of selecting the “best ” subset of exactly k columns from an m × n matrix A. In particular, we present and analyze a novel two-stage algorithm that runs in O(min{mn 2, m 2 n}) time and returns as output an m × k matrix C consisting of exactly k columns of A. In the first stag ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
We consider the problem of selecting the “best ” subset of exactly k columns from an m × n matrix A. In particular, we present and analyze a novel two-stage algorithm that runs in O(min{mn 2, m 2 n}) time and returns as output an m × k matrix C consisting of exactly k columns of A. In the first stage (the randomized stage), the algorithm randomly selects O(k log k) columns according to a judiciously-chosen probability distribution that depends on information in the topk right singular subspace of A. In the second stage (the deterministic stage), the algorithm applies a deterministic column-selection procedure to select and return exactly k columns from the set of columns selected in the first stage. Let C be the m × k matrix containing those k columns, let PC denote the projection matrix onto the span of those columns, and let Ak denote the “best ” rank-k approximation to the matrix A as computed with the singular value decomposition. Then, we prove that ‖A − PCA‖2 ≤ O k 3 4 log 1
Colibri: Fast Mining of Large Static and Dynamic Graphs
"... Low-rank approximations of the adjacency matrix of a graph are essential in finding patterns (such as communities) and detecting anomalies. Additionally, it is desirable to track the low-rank structure as the graph evolves over time, efficiently and within limited storage. Real graphs typically have ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
Low-rank approximations of the adjacency matrix of a graph are essential in finding patterns (such as communities) and detecting anomalies. Additionally, it is desirable to track the low-rank structure as the graph evolves over time, efficiently and within limited storage. Real graphs typically have thousands or millions of nodes, but are usually very sparse. However, standard decompositions such as SVD do not preserve sparsity. This has led to the development of methods such as CUR and CMD, which seek a nonorthogonal basis by sampling the columns and/or rows of the sparse matrix. However, these approaches will typically produce overcomplete bases, which wastes both space and time. In this paper we propose the family of Colibri methods to deal with these challenges. Our version for static graphs, Colibri-S, iteratively finds a nonredundant basis and we prove that it has no loss of accuracy compared to the best competitors (CUR and CMD), while achieving significant savings in space and time: on real data, Colibri-S requires much less space and is orders of magnitude faster (in proportion to the square of the number of non-redundant columns). Additionally, we propose an efficient update algorithm for dynamic, time-evolving graphs, Colibri-D. Our evaluation on a large, real network traffic dataset shows that Colibri-D is over 100 times faster than the best published competitor (CMD).
Dense fast random projections and Lean Walsh transforms
- In Proceedings of the 12th International Workshop on Randomization and Computation (RANDOM
, 2008
"... Random projection methods give distributions over k × d matrices such that if a matrix Ψ (chosen according to the distribution) is applied to a finite set of vectors xi ∈ R d the resulting vectors Ψxi ∈ R k approximately preserve the original metric with constant probability. First, we show that any ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Random projection methods give distributions over k × d matrices such that if a matrix Ψ (chosen according to the distribution) is applied to a finite set of vectors xi ∈ R d the resulting vectors Ψxi ∈ R k approximately preserve the original metric with constant probability. First, we show that any matrix (composed with a random ±1 diagonal matrix) is a good random projector for a subset of vectors in R d. Second, we describe a family of tensor product matrices which we term Lean Walsh. We show that using Lean Walsh matrices as random projections outperforms, in terms or running time, the best known current result (due to Matousek) under comparable assumptions.
Faster least squares approximation
- Numerische Mathematik
"... Least squares approximation is a technique to find an approximate solution to a system of linear equations that has no exact solution. Methods dating back to Gauss and Legendre find a solution in O(nd 2) time, where n is the number of constraints and d is the number of variables. We present two rand ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Least squares approximation is a technique to find an approximate solution to a system of linear equations that has no exact solution. Methods dating back to Gauss and Legendre find a solution in O(nd 2) time, where n is the number of constraints and d is the number of variables. We present two randomized algorithms that provide very accurate relative-error approximations to the solution of a least squares approximation problem more rapidly than existing exact algorithms. Both of our algorithms preprocess the data with a randomized Hadamard transform. One then uniformly randomly samples constraints and solves the smaller problem on those constraints, and the other performs a sparse random projection and solves the smaller problem on those projected coordinates. In both cases, the solution to the smaller problem provides a relative-error approximation to the exact solution and can be computed in o(nd 2) time. 1
Unsupervised Feature Selection for Principal Components Analysis [Extended Abstract]
"... Principal Components Analysis (PCA) is the predominant linear dimensionality reduction technique, and has been widely applied on datasets in all scientific domains. We consider, both theoretically and empirically, the topic of unsupervised feature selection for PCA, by leveraging algorithms for the ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Principal Components Analysis (PCA) is the predominant linear dimensionality reduction technique, and has been widely applied on datasets in all scientific domains. We consider, both theoretically and empirically, the topic of unsupervised feature selection for PCA, by leveraging algorithms for the so-called Column Subset Selection Problem (CSSP). In words, the CSSP seeks the“best”subset of exactly k columns from an m×n data matrix A, and has been extensively studied in the Numerical Linear Algebra community. We present a novel two-stage algorithm for the CSSP. From a theoretical perspective, for small to moderate values of k, this algorithm significantly improves upon the best previously-existing results [24, 12] for the CSSP. From an empirical perspective, we evaluate this algorithm as an unsupervised feature selection strategy in three application domains of modern statistical data analysis: finance, document-term data, and genetics. We pay particular attention to how this algorithm may be used to select representative or landmark features from an object-feature matrix in an unsupervised manner. In all three application domains, we are able to identify k landmark features, i.e., columns of the data matrix, that capture nearly the same amount of information as does the subspace that is spanned by the top k “eigenfeatures.”
Interpretable nonnegative matrix decompositions
- In Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
, 2008
"... A matrix decomposition expresses a matrix as a product of at least two factor matrices. Equivalently, it expresses each column of the input matrix as a linear combination of the columns in the first factor matrix. The interpretability of the decompositions is a key issue in data-analysis tasks. We p ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
A matrix decomposition expresses a matrix as a product of at least two factor matrices. Equivalently, it expresses each column of the input matrix as a linear combination of the columns in the first factor matrix. The interpretability of the decompositions is a key issue in data-analysis tasks. We propose two new matrix-decomposition problems: the nonnegative CX and nonnegative CUR problems, that give naturally interpretable factors. They extend the recently-proposed column and column-row based decompositions, and are aimed to be used with nonnegative matrices. Our decompositions represent the input matrix as a nonnegative linear combination of a subset of columns (or columns and rows) from the input matrix. We present two algorithms to solve these problems and provide an extensive experimental evaluation where we asses the quality of our algorithms ’ results as well as the intuitiveness of nonnegative CX and CUR decompositions. We show that our algorithms return intuitive answers with smaller reconstruction errors than the previously-proposed methods for column and column-row decompositions.
Accelerated dense random projections
, 2009
"... In dimensionality reduction, a set of points in Rd is mapped into Rk, with the target dimension k smaller than the original dimension d, while distances between all pairs of points are approximately preserved. Currently popular methods for achieving this involve random projection, or choosing a line ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
In dimensionality reduction, a set of points in Rd is mapped into Rk, with the target dimension k smaller than the original dimension d, while distances between all pairs of points are approximately preserved. Currently popular methods for achieving this involve random projection, or choosing a linear mapping (a k × d matrix) from a distribution that is independent of the input points. Applying the mapping (chosen according to this distribution) is shown to give the desired property with at least constant probability. The contributions in this thesis are twofold. First, we provide a framework for designing such distributions. Second, we derive efficient random projection algorithms using this framework. Our results achieve performance exceeding other existing approaches. When the target dimension is significantly smaller than the original dimension we gain significant improvement by designing efficient algorithms for applying certain linear algebraic transforms.
Random projections for the nonnegative least-squares problem
- LINEAR ALGEBRA AND ITS APPLICATIONS
, 2009
"... ..."
BLENDENPIK: SUPERCHARGING LAPACK'S LEAST-SQUARES SOLVER
"... Abstract. Several innovative random-sampling and random-mixing techniques for solving problems in linear algebra have been proposed in the last decade, but they have not yet made a signi cant impact on numerical linear algebra. We show that by using an high quality implementation of one of these tec ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract. Several innovative random-sampling and random-mixing techniques for solving problems in linear algebra have been proposed in the last decade, but they have not yet made a signi cant impact on numerical linear algebra. We show that by using an high quality implementation of one of these techniques we obtain a solver that performs extremely well in the traditional yardsticks of numerical linear algebra: it is signi cantly faster than high-performance implementations of existing state-of-the-art algorithms, and it is numerically backward stable. More speci cally, we describe a least-square solver for dense highly overdetermined systems that achieves residuals similar to those of direct QR factorization based solvers (lapack), outperforms lapack by large factors, and scales signi cantly better than any QR-based solver. 1.
Unsupervised Feature Selection for the k-means Clustering Problem
"... We present a novel feature selection algorithm for the k-means clustering problem. Our algorithm is randomized and, assuming an accuracy parameter ϵ ∈ (0, 1), selects and appropriately rescales in an unsupervised manner Θ(k log(k/ϵ)/ϵ 2) features from a dataset of arbitrary dimensions. We prove that ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We present a novel feature selection algorithm for the k-means clustering problem. Our algorithm is randomized and, assuming an accuracy parameter ϵ ∈ (0, 1), selects and appropriately rescales in an unsupervised manner Θ(k log(k/ϵ)/ϵ 2) features from a dataset of arbitrary dimensions. We prove that, if we run any γ-approximate k-means algorithm (γ ≥ 1) on the features selected using our method, we can find a (1 + (1 + ϵ)γ)-approximate partition with high probability. 1

