Results 1 
6 of
6
Toward a unified theory of sparse dimensionality reduction in Euclidean space
"... Let Φ ∈ Rm×n be a sparse JohnsonLindenstrauss transform [KN14] with s nonzeroes per column. For a subset T of the unit sphere, ε ∈ (0, 1/2) given, we study settings for m, s required to ensure E Φ sup x∈T ∣∣‖Φx‖22 − 1∣ ∣ < ε, i.e. so that Φ preserves the norm of every x ∈ T simultaneously and ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
Let Φ ∈ Rm×n be a sparse JohnsonLindenstrauss transform [KN14] with s nonzeroes per column. For a subset T of the unit sphere, ε ∈ (0, 1/2) given, we study settings for m, s required to ensure E Φ sup x∈T ∣∣‖Φx‖22 − 1∣ ∣ < ε, i.e. so that Φ preserves the norm of every x ∈ T simultaneously and multiplicatively up to 1 + ε. We introduce a new complexity parameter, which depends on the geometry of T, and show that it suffices to choose s and m such that this parameter is small. Our result is a sparse analog of Gordon’s theorem, which was concerned with a dense Φ having i.i.d. Gaussian entries. We qualitatively unify several results related to the JohnsonLindenstrauss lemma, subspace embeddings, and Fourierbased restricted isometries. Our work also
Randomized block krylov methods for stronger and faster approximate singular value decomposition.
 In Advances in Neural Information Processing Systems 28 (NIPS),
, 2015
"... Abstract Since being analyzed by Rokhlin, Szlam, and Tygert [1] and popularized by Halko, Martinsson, and Tropp [2], randomized Simultaneous Power Iteration has become the method of choice for approximate singular value decomposition. It is more accurate than simpler sketching algorithms, yet stil ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract Since being analyzed by Rokhlin, Szlam, and Tygert [1] and popularized by Halko, Martinsson, and Tropp [2], randomized Simultaneous Power Iteration has become the method of choice for approximate singular value decomposition. It is more accurate than simpler sketching algorithms, yet still converges quickly for any matrix, independently of singular value gaps. AfterÕ(1/ ) iterations, it gives a lowrank approximation within (1 + ) of optimal for spectral norm error. We give the first provable runtime improvement on Simultaneous Iteration: a simple randomized block Krylov method, closely related to the classic Block Lanczos algorithm, gives the same guarantees in justÕ(1/ √ ) iterations and performs substantially better experimentally. Despite their long history, our analysis is the first of a Krylov subspace method that does not depend on singular value gaps, which are unreliable in practice. Furthermore, while it is a simple accuracy benchmark, even (1 + ) error for spectral norm lowrank approximation does not imply that an algorithm returns high quality principal components, a major issue for data applications. We address this problem for the first time by showing that both Block Krylov Iteration and a minor modification of Simultaneous Iteration give nearly optimal PCA for any matrix. This result further justifies their strength over noniterative sketching methods. Finally, we give insight beyond the worst case, justifying why both algorithms can run much faster in practice than predicted. We clarify how simple techniques can take advantage of common matrix properties to significantly improve runtime.
Approximation and Streaming Algorithms for Projective Clustering via Random Projections
"... Abstract Let P be a set of n points in R d . In the projective clustering problem, given k, q and norm ρ ∈ [1, ∞], we have to compute a set F of k qdimensional flats such that represents the (Euclidean) distance of p to the closest flat in F. We let f q k (P, ρ) denote the minimal value and interp ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract Let P be a set of n points in R d . In the projective clustering problem, given k, q and norm ρ ∈ [1, ∞], we have to compute a set F of k qdimensional flats such that represents the (Euclidean) distance of p to the closest flat in F. We let f q k (P, ρ) denote the minimal value and interpret f q k (P, ∞) to be max r∈P d(r, F). When ρ = 1, 2 and ∞ and q = 0, the problem corresponds to the kmedian, kmean and the kcenter clustering problems respectively. For every 0 < ε < 1, S ⊂ P and ρ ≥ 1, we show that the orthogonal projection of P onto a randomly chosen flat of dimension O(((q + 1) 2 log(1/ε)/ε 3 ) log n) will εapproximate f q 1 (S, ρ). This result combines the concepts of geometric coresets and subspace embeddings based on the JohnsonLindenstrauss Lemma. As a consequence, an orthogonal projection of P to an O(((q + 1) 2 log((q + 1)/ε)/ε 3 ) log n) dimensional randomly chosen subspace εapproximates projective clusterings for every k and ρ simultaneously. Note that the dimension of this subspace is independent of the number of clusters k. Using this dimension reduction result, we obtain new approximation and streaming algorithms for projective clustering problems. For example, given a stream of n points, we show how to compute an εapproximate projective clustering for every k and ρ simultaneously using only O((n + d)((q + 1) 2 log((q + 1)/ε))/ε 3 log n) space. Compared to standard streaming algorithms with Ω(kd) space requirement, our approach is a significant improvement when the number of input points and their dimensions are of the same order of magnitude.
Dimensionality Reduction of Massive Sparse Datasets Using Coresets
"... Abstract In this paper we present a practical solution with performance guarantees to the problem of dimensionality reduction for very large scale sparse matrices. We show applications of our approach to computing the Principle Component Analysis (PCA) of any n × d matrix, using one pass over the s ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract In this paper we present a practical solution with performance guarantees to the problem of dimensionality reduction for very large scale sparse matrices. We show applications of our approach to computing the Principle Component Analysis (PCA) of any n × d matrix, using one pass over the stream of its rows. Our solution uses coresets: a scaled subset of the n rows that approximates their sum of squared distances to every kdimensional affine subspace. An open theoretical problem has been to compute such a coreset that is independent of both n and d. An open practical problem has been to compute a nontrivial approximation to the PCA of very large but sparse databases such as the Wikipedia documentterm matrix in a reasonable time. We answer both of these questions affirmatively. Our main technical result is a new framework for deterministic coreset constructions based on a reduction to the problem of counting items in a stream.
Latent Fault Detection With Unbalanced Workloads
"... Big data means big datacenters, comprised of hundreds or thousands of machines. With so many machines, failures are commonplace. Failure detection is crucial: undetected failures may lead to data loss and outages. Recent health monitoring approaches use anomaly detection to forecast failures – anom ..."
Abstract
 Add to MetaCart
(Show Context)
Big data means big datacenters, comprised of hundreds or thousands of machines. With so many machines, failures are commonplace. Failure detection is crucial: undetected failures may lead to data loss and outages. Recent health monitoring approaches use anomaly detection to forecast failures – anomalous machines are considered to be at risk of future failures. Our previous work focused on detecting latent faults in large web services, which are often characterized by scaleout architecture where load is dynamically balanced. We proposed a robust and unsupervised latent fault detector for such systems, with statistical bounds on the rate of false positives. That detector, however, is unsuitable for applications without dynamic load balancing, such as staticallybalanced keyvalue stores, Hadoop jobs, and supercomputer applications. We describe an improved latent fault detection method for unbalanced workloads. It retains the advantages of our previous methods: it is unsupervised, robust to changes, and statistically sound. Moreover, the statistical bounds for the new method scale better with the number of machines, and so dramatically reduce the number of measurements needed. Preliminary evaluation on supercomputer logs shows that the new method is able to correctly predict some failures, while our previous methods completely fail in this setting.