• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

An almost optimal unrestricted fast johnson-lindenstrauss transform (2011)

by Nir Ailon, Edo Liberty
Venue:In SODA
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 13
Next 10 →

A Derandomized Sparse Johnson-Lindenstrauss Transform

by Daniel M. Kane, et al.
"... Recent work of [Dasgupta-Kumar-Sarlós, STOC 2010] gave a sparse Johnson-Lindenstrauss transform and left as a main open question whether their construction could be efficiently derandomized. We answer their question affirmatively by giving an alternative proof of their result requiring only bounded ..."
Abstract - Cited by 5 (1 self) - Add to MetaCart
Recent work of [Dasgupta-Kumar-Sarlós, STOC 2010] gave a sparse Johnson-Lindenstrauss transform and left as a main open question whether their construction could be efficiently derandomized. We answer their question affirmatively by giving an alternative proof of their result requiring only bounded independence hash functions. Furthermore, the sparsity bound obtained in our proof is improved. The main ingredient in our proof is a spectral moment bound for quadratic forms that was recently used in [Diakonikolas-Kane-Nelson, FOCS 2010].

Acceleration of Randomized Kaczmarz Method via the Johnson-Lindenstrauss Lemma

by Yonina C. Eldar, Deanna Needell , 2010
"... The Kaczmarz method is an algorithm for finding the solution to an overdetermined system of linear equations Ax = b by iteratively projecting onto the solution spaces. The randomized versionputforthbyStrohmerandVershyninyieldsprovablyexponentialconvergenceinexpectation, which for highly overdetermin ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
The Kaczmarz method is an algorithm for finding the solution to an overdetermined system of linear equations Ax = b by iteratively projecting onto the solution spaces. The randomized versionputforthbyStrohmerandVershyninyieldsprovablyexponentialconvergenceinexpectation, which for highly overdetermined systems even outperforms the conjugate gradient method. In this article we present a modified version of the randomized Kaczmarz method which at each iteration selects the optimal projection from a randomly chosen set, which in most cases significantly improves the convergence rate. We utilize a Johnson-Lindenstrauss dimension reduction technique to keep the runtime on the same order as the original randomized version, adding only extra preprocessing time. We present a series of empirical studies which demonstrate the remarkable acceleration in convergence to the solution using this modified approach. 1

Optimal Bounds for Johnson-Lindenstrauss Transforms and Streaming Problems with Sub-Constant Error

by T. S. Jayram, David Woodruff
"... The Johnson-Lindenstrauss transform is a dimensionality reduction technique with a wide range of applications to theoretical computer science. It is specified by a distribution over projection matrices from R n → R k where k ≪ d and states that k = O(ε −2 log 1/δ) dimensions suffice to approximate t ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
The Johnson-Lindenstrauss transform is a dimensionality reduction technique with a wide range of applications to theoretical computer science. It is specified by a distribution over projection matrices from R n → R k where k ≪ d and states that k = O(ε −2 log 1/δ) dimensions suffice to approximate the norm of any fixed vector in R d to within a factor of 1 ± ε with probability at least 1 − δ. In this paper we show that this bound on k is optimal up to a constant factor, improving upon a previous Ω((ε −2 log 1/δ) / log(1/ε)) dimension bound of Alon. Our techniques are based on lower bounding the information cost of a novel one-way communication game and yield the first space lower bounds in a data stream model that depend on the error probability δ. For many streaming problems, the most naïve way of achieving error probability δ is to first achieve constant probability, then take the median of O(log 1/δ) independent repetitions. Our techniques show that for a wide range of problems this is in fact optimal! As an example, we show that estimating the ℓp-distance for any p ∈ [0, 2] requires Ω(ε −2 log n log 1/δ) space, even for vectors in {0, 1} n. This is optimal in all parameters and closes a long line of work on this problem. We also show the number of distinct elements requires Ω(ε −2 log 1/δ + log n) space, which is optimal if ε −2 = Ω(log n). We also improve previous lower bounds for entropy in the strict turnstile and general turnstile models by a multiplicative factor of Ω(log 1/δ). Finally, we give an application to one-way communication complexity under product distributions, showing that unlike in the case of constant δ, the VC-dimension does not characterize the complexity when δ = o(1).

IMPROVED ANALYSIS OF THE SUBSAMPLED RANDOMIZED HADAMARD TRANSFORM

by Joel A. Tropp
"... Abstract. This paper presents an improved analysis of a structured dimension-reduction map called the subsampled randomized Hadamard transform. This argument demonstrates that the map preserves the Euclidean geometry of an entire subspace of vectors. The new proof is much simpler than previous appro ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract. This paper presents an improved analysis of a structured dimension-reduction map called the subsampled randomized Hadamard transform. This argument demonstrates that the map preserves the Euclidean geometry of an entire subspace of vectors. The new proof is much simpler than previous approaches, and it offers—for the first time—optimal constants in the estimate on the number of dimensions required for the embedding. 1.

LOW-RANK MATRIX RECOVERY VIA ITERATIVELY REWEIGHTED LEAST SQUARES MINIMIZATION

by Massimo Fornasier, Holger Rauhut, Rachel Ward
"... Abstract. We present and analyze an efficient implementation of an iteratively reweighted least squares algorithm for recovering a matrix from a small number of linear measurements. The algorithm is designed for the simultaneous promotion of both a minimal nuclear norm and an approximatively low-ran ..."
Abstract - Add to MetaCart
Abstract. We present and analyze an efficient implementation of an iteratively reweighted least squares algorithm for recovering a matrix from a small number of linear measurements. The algorithm is designed for the simultaneous promotion of both a minimal nuclear norm and an approximatively low-rank solution. Under the assumption that the linear measurements fulfill a suitable generalization of the Null Space Property known in the context of compressed sensing, the algorithm is guaranteed to recover iteratively any matrix with an error of the order of the best k-rank approximation. In certain relevant cases, for instance for the matrix completion problem, our version of this algorithm can take advantage of the Woodbury matrix identity, which allows to expedite the solution of the least squares problems required at each iteration. We present numerical experiments which confirm the robustness of the algorithm for the solution of matrix completion problems, and demonstrate its competitiveness with respect to other techniques proposed recently in the literature. AMS subject classification: 65J22, 65K10, 52A41, 49M30. Key Words: low-rank matrix recovery, iteratively reweighted least squares, matrix completion.

A Randomized Approximate Nearest Neighbors Algorithm

by Peter W. Jones, Andrei Osipov, Vladimir Rokhlin , 2010
"... We present a randomized algorithm for the approximate nearest neighbor problem in d-dimensional Euclidean space. Given N points {xj} in R d, the algorithm attempts to find k nearest neighbors for each of xj, where k is a user-specified integer parameter. The algorithm is iterative, and its CPU time ..."
Abstract - Add to MetaCart
We present a randomized algorithm for the approximate nearest neighbor problem in d-dimensional Euclidean space. Given N points {xj} in R d, the algorithm attempts to find k nearest neighbors for each of xj, where k is a user-specified integer parameter. The algorithm is iterative, and its CPU time requirements are proportional to T ·N ·(d·(log d)+ k · (log k) · (log N)) + N · k 2 · (d + log k), with T the number of iterations performed. The memory requirements of the procedure are of the order N · (d + k). A byproduct of the scheme is a data structure, permitting a rapid search for the k nearest neighbors among {xj} for an arbitrary point x ∈ R d. The cost of each such query is proportional to T · (d · (log d) + log(N/k) + k 2 · (d + log k)), and the memory requirements for the requisite data structure are of the order N · (d + k) + T · (d + N · k). The algorithm utilizes random rotations and a basic divide-and-conquer scheme, followed by a local graph search. We analyze the scheme’s behavior for certain types of distributions

Compressed Sensing with Coherent and Redundant Dictionaries

by Emmanuel J. C, Yonina C. Eldar, Deanna Needell, Paige R , 2010
"... This article presents novel results concerning the recovery of signals from undersampled data in the common situation where such signals are not sparse in an orthonormal basis or incoherent dictionary, but in a truly redundant dictionary. This work thus bridges a gap in the literature and shows not ..."
Abstract - Add to MetaCart
This article presents novel results concerning the recovery of signals from undersampled data in the common situation where such signals are not sparse in an orthonormal basis or incoherent dictionary, but in a truly redundant dictionary. This work thus bridges a gap in the literature and shows not only that compressed sensing is viable in this context, but also that accurate recovery is possible via an ℓ1-analysis optimization problem. We introduce a condition on the measurement/sensing matrix, which is a natural generalization of the now well-known restricted isometry property, and which guarantees accurate recovery of signals that are nearly sparse in (possibly) highly overcomplete and coherent dictionaries. This condition imposes no incoherence restriction on the dictionary and our results may be the first of this kind. We discuss practical examples and the implications of our results on those applications, and complement our study by demonstrating the potential of ℓ1-analysis for such problems. 1

The Johnson-Lindenstrauss Transform: An Empirical Study

by Suresh Venkatasubramanian, Qiushi Wang
"... The Johnson-Lindenstrauss Lemma states that a set of n points may be embedded in a space of dimension O(log n/ε2) while preserving all pairwise distances within a factor of (1 + ɛ) with high probability. It has inspired a number of proofs that extend the result, simplify it, and improve the efficien ..."
Abstract - Add to MetaCart
The Johnson-Lindenstrauss Lemma states that a set of n points may be embedded in a space of dimension O(log n/ε2) while preserving all pairwise distances within a factor of (1 + ɛ) with high probability. It has inspired a number of proofs that extend the result, simplify it, and improve the efficiency of computing the resulting embedding. The lemma is a critical tool in the realm of dimensionality reduction and high dimensional approximate computational geometry. It is also employed for data mining in domains that analyze intrinsically high dimensional objects such as images and text. However, while algorithms performing the dimensionality reduction have become increasingly sophisticated, there is little understanding of the behavior of these embeddings in practice. In this paper, we present the first comprehensive study of the empirical behavior of algorithms for dimensionality reduction based on the JL Lemma. Our study answers a number of important questions about the quality of the embeddings and the performance of algorithms used to compute them. Among our key results: (i) Determining a likely range for the big-Oh constant in practice for the dimension of the target space, and demonstrating the accuracy of the predicted bounds. (ii) Finding ‘best in class ’ algorithms over wide ranges of data size and source dimensionality, and showing that these depend heavily on parameters of the data as well its sparsity. (iii) Developing the best implementation for each method, making use of non-standard optimized code for key subroutines. (iv) Identifying critical computational bottlenecks that can spur further theoretical study of efficient algorithms.

BERTINORO WORKSHOP PARTICIPANTS:

by Lists Compiled Piotr Indyk, Andrew Mcgregor, Ilan Newman, Krzysztof Onak, Christian Sohler, Gilad Tsur, Paul Valiant, Roger Wattenhofer, David Woodruff, Ning Xie, Yuichi Yoshida, Sudipto Guha, Piotr Indyk, T. S. Jayram, Christiane Lammersen, Michael Mahoney , 2011
"... ABSTRACT. This document contains a list of open problems and research directions that have been suggested ..."
Abstract - Add to MetaCart
ABSTRACT. This document contains a list of open problems and research directions that have been suggested

Sketching and Streaming High-Dimensional Vectors

by Jelani Nelson, Erik D. Demaine , 2011
"... A sketch of a dataset is a small-space data structure supporting some prespecified set of queries (and possibly updates) while consuming space substantially sublinear in the space required to actually store all the data. Furthermore, it is often desirable, or required by the application, that the sk ..."
Abstract - Add to MetaCart
A sketch of a dataset is a small-space data structure supporting some prespecified set of queries (and possibly updates) while consuming space substantially sublinear in the space required to actually store all the data. Furthermore, it is often desirable, or required by the application, that the sketch itself be computable by a small-space algorithm given just one pass over the data, a so-called streaming algorithm. Sketching and streaming have found numerous applications in network traffic monitoring, data mining, trend detection, sensor networks, and databases. In this thesis, I describe several new contributions in the area of sketching and streaming algorithms. • The first space-optimal streaming algorithm for the distinct elements problem. Our algorithm also achieves O(1) update and reporting times. • A streaming algorithm for Hamming norm estimation in the turnstile model which achieves the best known space complexity.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University