Results 1 - 10
of
13
A Derandomized Sparse Johnson-Lindenstrauss Transform
"... Recent work of [Dasgupta-Kumar-Sarlós, STOC 2010] gave a sparse Johnson-Lindenstrauss transform and left as a main open question whether their construction could be efficiently derandomized. We answer their question affirmatively by giving an alternative proof of their result requiring only bounded ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Recent work of [Dasgupta-Kumar-Sarlós, STOC 2010] gave a sparse Johnson-Lindenstrauss transform and left as a main open question whether their construction could be efficiently derandomized. We answer their question affirmatively by giving an alternative proof of their result requiring only bounded independence hash functions. Furthermore, the sparsity bound obtained in our proof is improved. The main ingredient in our proof is a spectral moment bound for quadratic forms that was recently used in [Diakonikolas-Kane-Nelson, FOCS 2010].
Acceleration of Randomized Kaczmarz Method via the Johnson-Lindenstrauss Lemma
, 2010
"... The Kaczmarz method is an algorithm for finding the solution to an overdetermined system of linear equations Ax = b by iteratively projecting onto the solution spaces. The randomized versionputforthbyStrohmerandVershyninyieldsprovablyexponentialconvergenceinexpectation, which for highly overdetermin ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The Kaczmarz method is an algorithm for finding the solution to an overdetermined system of linear equations Ax = b by iteratively projecting onto the solution spaces. The randomized versionputforthbyStrohmerandVershyninyieldsprovablyexponentialconvergenceinexpectation, which for highly overdetermined systems even outperforms the conjugate gradient method. In this article we present a modified version of the randomized Kaczmarz method which at each iteration selects the optimal projection from a randomly chosen set, which in most cases significantly improves the convergence rate. We utilize a Johnson-Lindenstrauss dimension reduction technique to keep the runtime on the same order as the original randomized version, adding only extra preprocessing time. We present a series of empirical studies which demonstrate the remarkable acceleration in convergence to the solution using this modified approach. 1
Optimal Bounds for Johnson-Lindenstrauss Transforms and Streaming Problems with Sub-Constant Error
"... The Johnson-Lindenstrauss transform is a dimensionality reduction technique with a wide range of applications to theoretical computer science. It is specified by a distribution over projection matrices from R n → R k where k ≪ d and states that k = O(ε −2 log 1/δ) dimensions suffice to approximate t ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The Johnson-Lindenstrauss transform is a dimensionality reduction technique with a wide range of applications to theoretical computer science. It is specified by a distribution over projection matrices from R n → R k where k ≪ d and states that k = O(ε −2 log 1/δ) dimensions suffice to approximate the norm of any fixed vector in R d to within a factor of 1 ± ε with probability at least 1 − δ. In this paper we show that this bound on k is optimal up to a constant factor, improving upon a previous Ω((ε −2 log 1/δ) / log(1/ε)) dimension bound of Alon. Our techniques are based on lower bounding the information cost of a novel one-way communication game and yield the first space lower bounds in a data stream model that depend on the error probability δ. For many streaming problems, the most naïve way of achieving error probability δ is to first achieve constant probability, then take the median of O(log 1/δ) independent repetitions. Our techniques show that for a wide range of problems this is in fact optimal! As an example, we show that estimating the ℓp-distance for any p ∈ [0, 2] requires Ω(ε −2 log n log 1/δ) space, even for vectors in {0, 1} n. This is optimal in all parameters and closes a long line of work on this problem. We also show the number of distinct elements requires Ω(ε −2 log 1/δ + log n) space, which is optimal if ε −2 = Ω(log n). We also improve previous lower bounds for entropy in the strict turnstile and general turnstile models by a multiplicative factor of Ω(log 1/δ). Finally, we give an application to one-way communication complexity under product distributions, showing that unlike in the case of constant δ, the VC-dimension does not characterize the complexity when δ = o(1).
IMPROVED ANALYSIS OF THE SUBSAMPLED RANDOMIZED HADAMARD TRANSFORM
"... Abstract. This paper presents an improved analysis of a structured dimension-reduction map called the subsampled randomized Hadamard transform. This argument demonstrates that the map preserves the Euclidean geometry of an entire subspace of vectors. The new proof is much simpler than previous appro ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. This paper presents an improved analysis of a structured dimension-reduction map called the subsampled randomized Hadamard transform. This argument demonstrates that the map preserves the Euclidean geometry of an entire subspace of vectors. The new proof is much simpler than previous approaches, and it offers—for the first time—optimal constants in the estimate on the number of dimensions required for the embedding. 1.
LOW-RANK MATRIX RECOVERY VIA ITERATIVELY REWEIGHTED LEAST SQUARES MINIMIZATION
"... Abstract. We present and analyze an efficient implementation of an iteratively reweighted least squares algorithm for recovering a matrix from a small number of linear measurements. The algorithm is designed for the simultaneous promotion of both a minimal nuclear norm and an approximatively low-ran ..."
Abstract
- Add to MetaCart
Abstract. We present and analyze an efficient implementation of an iteratively reweighted least squares algorithm for recovering a matrix from a small number of linear measurements. The algorithm is designed for the simultaneous promotion of both a minimal nuclear norm and an approximatively low-rank solution. Under the assumption that the linear measurements fulfill a suitable generalization of the Null Space Property known in the context of compressed sensing, the algorithm is guaranteed to recover iteratively any matrix with an error of the order of the best k-rank approximation. In certain relevant cases, for instance for the matrix completion problem, our version of this algorithm can take advantage of the Woodbury matrix identity, which allows to expedite the solution of the least squares problems required at each iteration. We present numerical experiments which confirm the robustness of the algorithm for the solution of matrix completion problems, and demonstrate its competitiveness with respect to other techniques proposed recently in the literature. AMS subject classification: 65J22, 65K10, 52A41, 49M30. Key Words: low-rank matrix recovery, iteratively reweighted least squares, matrix completion.
A Randomized Approximate Nearest Neighbors Algorithm
, 2010
"... We present a randomized algorithm for the approximate nearest neighbor problem in d-dimensional Euclidean space. Given N points {xj} in R d, the algorithm attempts to find k nearest neighbors for each of xj, where k is a user-specified integer parameter. The algorithm is iterative, and its CPU time ..."
Abstract
- Add to MetaCart
We present a randomized algorithm for the approximate nearest neighbor problem in d-dimensional Euclidean space. Given N points {xj} in R d, the algorithm attempts to find k nearest neighbors for each of xj, where k is a user-specified integer parameter. The algorithm is iterative, and its CPU time requirements are proportional to T ·N ·(d·(log d)+ k · (log k) · (log N)) + N · k 2 · (d + log k), with T the number of iterations performed. The memory requirements of the procedure are of the order N · (d + k). A byproduct of the scheme is a data structure, permitting a rapid search for the k nearest neighbors among {xj} for an arbitrary point x ∈ R d. The cost of each such query is proportional to T · (d · (log d) + log(N/k) + k 2 · (d + log k)), and the memory requirements for the requisite data structure are of the order N · (d + k) + T · (d + N · k). The algorithm utilizes random rotations and a basic divide-and-conquer scheme, followed by a local graph search. We analyze the scheme’s behavior for certain types of distributions
Compressed Sensing with Coherent and Redundant Dictionaries
, 2010
"... This article presents novel results concerning the recovery of signals from undersampled data in the common situation where such signals are not sparse in an orthonormal basis or incoherent dictionary, but in a truly redundant dictionary. This work thus bridges a gap in the literature and shows not ..."
Abstract
- Add to MetaCart
This article presents novel results concerning the recovery of signals from undersampled data in the common situation where such signals are not sparse in an orthonormal basis or incoherent dictionary, but in a truly redundant dictionary. This work thus bridges a gap in the literature and shows not only that compressed sensing is viable in this context, but also that accurate recovery is possible via an ℓ1-analysis optimization problem. We introduce a condition on the measurement/sensing matrix, which is a natural generalization of the now well-known restricted isometry property, and which guarantees accurate recovery of signals that are nearly sparse in (possibly) highly overcomplete and coherent dictionaries. This condition imposes no incoherence restriction on the dictionary and our results may be the first of this kind. We discuss practical examples and the implications of our results on those applications, and complement our study by demonstrating the potential of ℓ1-analysis for such problems. 1
The Johnson-Lindenstrauss Transform: An Empirical Study
"... The Johnson-Lindenstrauss Lemma states that a set of n points may be embedded in a space of dimension O(log n/ε2) while preserving all pairwise distances within a factor of (1 + ɛ) with high probability. It has inspired a number of proofs that extend the result, simplify it, and improve the efficien ..."
Abstract
- Add to MetaCart
The Johnson-Lindenstrauss Lemma states that a set of n points may be embedded in a space of dimension O(log n/ε2) while preserving all pairwise distances within a factor of (1 + ɛ) with high probability. It has inspired a number of proofs that extend the result, simplify it, and improve the efficiency of computing the resulting embedding. The lemma is a critical tool in the realm of dimensionality reduction and high dimensional approximate computational geometry. It is also employed for data mining in domains that analyze intrinsically high dimensional objects such as images and text. However, while algorithms performing the dimensionality reduction have become increasingly sophisticated, there is little understanding of the behavior of these embeddings in practice. In this paper, we present the first comprehensive study of the empirical behavior of algorithms for dimensionality reduction based on the JL Lemma. Our study answers a number of important questions about the quality of the embeddings and the performance of algorithms used to compute them. Among our key results: (i) Determining a likely range for the big-Oh constant in practice for the dimension of the target space, and demonstrating the accuracy of the predicted bounds. (ii) Finding ‘best in class ’ algorithms over wide ranges of data size and source dimensionality, and showing that these depend heavily on parameters of the data as well its sparsity. (iii) Developing the best implementation for each method, making use of non-standard optimized code for key subroutines. (iv) Identifying critical computational bottlenecks that can spur further theoretical study of efficient algorithms.
BERTINORO WORKSHOP PARTICIPANTS:
, 2011
"... ABSTRACT. This document contains a list of open problems and research directions that have been suggested ..."
Abstract
- Add to MetaCart
ABSTRACT. This document contains a list of open problems and research directions that have been suggested
Sketching and Streaming High-Dimensional Vectors
, 2011
"... A sketch of a dataset is a small-space data structure supporting some prespecified set of queries (and possibly updates) while consuming space substantially sublinear in the space required to actually store all the data. Furthermore, it is often desirable, or required by the application, that the sk ..."
Abstract
- Add to MetaCart
A sketch of a dataset is a small-space data structure supporting some prespecified set of queries (and possibly updates) while consuming space substantially sublinear in the space required to actually store all the data. Furthermore, it is often desirable, or required by the application, that the sketch itself be computable by a small-space algorithm given just one pass over the data, a so-called streaming algorithm. Sketching and streaming have found numerous applications in network traffic monitoring, data mining, trend detection, sensor networks, and databases. In this thesis, I describe several new contributions in the area of sketching and streaming algorithms. • The first space-optimal streaming algorithm for the distinct elements problem. Our algorithm also achieves O(1) update and reporting times. • A streaming algorithm for Hamming norm estimation in the turnstile model which achieves the best known space complexity.

