Results 1  10
of
22
Relative Errors for Deterministic LowRank Matrix Approximations
 In SODA
, 2014
"... Abstract We consider processing an n × d matrix A in a stream with rowwise updates according to a recent algorithm called Frequent Directions (Liberty, KDD 2013). This algorithm maintains an × d matrix Q deterministically, processing each row in O(d 2 ) time; the processing time can be decreased t ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
(Show Context)
Abstract We consider processing an n × d matrix A in a stream with rowwise updates according to a recent algorithm called Frequent Directions (Liberty, KDD 2013). This algorithm maintains an × d matrix Q deterministically, processing each row in O(d 2 ) time; the processing time can be decreased to O(d ) with a slight modification in the algorithm and a constant increase in space. Then for any unit vector x, the matrix Q satisfies We show that if one sets = k + k/ε and returns Q k , a k × d matrix that is simply the top k rows of Q, then we achieve the following properties:
Composable Coresets for Diversity and Coverage maximization (Extended Abstract)
, 2014
"... In this paper we consider efficient construction of “composable coresets” for basic diversity and coverage maximization problems. A coreset for a pointset in a metric space is a subset of the pointset with the property that an approximate solution to the whole pointset can be obtained given the ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
In this paper we consider efficient construction of “composable coresets” for basic diversity and coverage maximization problems. A coreset for a pointset in a metric space is a subset of the pointset with the property that an approximate solution to the whole pointset can be obtained given the coreset alone. A composable coreset has the property that for a collection of sets, the approximate solution to the union of the sets in the collection can be obtained given the union of the composable coresets for the point sets in the collection. Using composable coresets one can obtain efficient solutions to a wide variety of massive data processing applications, including nearest neighbor search, streaming algorithms and mapreduce computation. Our main results are algorithms for constructing com
Optimus: A Dynamic Rewriting Framework for DataParallel Execution Plans
"... In distributed dataparallel computing, a user program is compiled into an execution plan graph (EPG), typically a directed acyclic graph. This EPG is the core data structure used by modern distributed execution engines for task distribution, job management, and fault tolerance. Once submitted for e ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
In distributed dataparallel computing, a user program is compiled into an execution plan graph (EPG), typically a directed acyclic graph. This EPG is the core data structure used by modern distributed execution engines for task distribution, job management, and fault tolerance. Once submitted for execution, the EPG remains largely unchanged at runtime except for some limited modifications. This makes it difficult to employ dynamic optimization techniques that could substantially improve the distributed execution based on runtime information. This paper presents Optimus, a framework for dynamically rewriting an EPG at runtime. Optimus extends dynamic rewrite mechanisms present in systems such as Dryad and CIEL by integrating rewrite policy with a highlevel
Aggregation and Degradation in JetStream: Streaming analytics in the wide area
 In NSDI
, 2014
"... We present JetStream, a system that allows realtime analysis of large, widelydistributed changing data sets. Traditional approaches to distributed analytics require users to specify in advance which data is to be backhauled to a central location for analysis. This is a poor match for domains where ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
We present JetStream, a system that allows realtime analysis of large, widelydistributed changing data sets. Traditional approaches to distributed analytics require users to specify in advance which data is to be backhauled to a central location for analysis. This is a poor match for domains where available bandwidth is scarce and it is infeasible to collect all potentially useful data. JetStream addresses bandwidth limits in two ways, both of which are explicit in the programming model. The system incorporates structured storage in the form of OLAP data cubes, so data can be stored for analysis near where it is generated. Using cubes, queries can aggregate data in ways and locations of their choosing. The system also includes adaptive filtering and other transformations that adjusts data quality to match available bandwidth. Many bandwidthsaving transformations are possible; we discuss which are appropriate for which data and how they can best be combined. We implemented a range of analytic queries on web request logs and image data. Queries could be expressed in a few lines of code. Using structured storage on source nodes conserved network bandwidth by allowing data to be collected only when needed to fulfill queries. Our adaptive control mechanisms are responsive enough to keep endtoend latency within a few seconds, even when available bandwidth drops by a factor of two, and are flexible enough to express practical policies. 1
Beating the Direct Sum Theorem in Communication Complexity with Implications for Sketching
"... A direct sum theorem for two parties and a function f states that the communication cost of solving k copies of f simultaneously with error probability 1/3 is at least k · R1/3(f), where R1/3(f) is the communication required to solve a single copy of f with error probability 1/3. We improve this for ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
A direct sum theorem for two parties and a function f states that the communication cost of solving k copies of f simultaneously with error probability 1/3 is at least k · R1/3(f), where R1/3(f) is the communication required to solve a single copy of f with error probability 1/3. We improve this for a natural family of functions f, showing that the 1way communication required to solve k copies of f simultaneously with probability 2/3 is Ω(k · R1/k(f)). Since R1/k(f) may be as large as Ω(R1/3(f) · log k), we asymptotically beat the direct sum bound for such functions, showing that the trivial upper bound of solving each of the k copies of f with probability 1 − O(1/k) and taking a union bound is optimal! In order to achieve this, our direct sum involves a novel measure of information cost which allows a protocol to abort with constant probability, and otherwise must be correct with very high probability. Moreover, for the functions considered, we show strong lower bound on the communication cost of protocols with these relaxed guarantees; indeed, our lower bounds match those for protocols that are not allowed to abort. In the distributed and streaming models, where one wants to be correct not only on a single query, but simultaneously on a sequence of n queries, we obtain optimal lower bounds on the communication or space complexity. Lower bounds obtained from our direct sum result show that a number of techniques in the sketching literature are optimal, including the following: • (JL transform) Lower bound of Ω ( 1 ɛ2 log n) on the δ dimension of (oblivious) JohnsonLindenstrauss transforms. • (ℓpestimation) Lower bound for the size of encodings of n vectors in [±M] d that allow ℓ1 or ℓ2estimation of (log d + log M)). Ω(nɛ −2 log n δ • (Matrix sketching) Lower bound of Ω ( 1 ɛ2 log n) on the δ dimension of a matrix sketch S satisfying the entrywise guarantee (ASS T B)i,j − (AB)i,j  ≤ ɛ‖Ai‖2‖B j ‖2. • (Database joins) Lower bound of Ω(n 1 ɛ2 log n log M) for δ sketching frequency vectors of n tables in a database, each with M records, in order to allow join size estimation. 1
Homomorphic Fingerprints under Misalignments: Sketching Edit and Shift Distances
, 2013
"... Fingerprinting is a widelyused technique for efficiently verifying that two files are identical. More generally, linear sketching is a form of lossy compression (based on random projections) that also enables the “dissimilarity ” of nonidentical files to be estimated. Many sketches have been propos ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Fingerprinting is a widelyused technique for efficiently verifying that two files are identical. More generally, linear sketching is a form of lossy compression (based on random projections) that also enables the “dissimilarity ” of nonidentical files to be estimated. Many sketches have been proposed for dissimilarity measures that decompose coordinatewise such as the Hamming distance between alphanumeric strings, or the Euclidean distance between vectors. However, virtually nothing is known on sketches that would accommodate alignment errors. With such errors, Hamming or Euclidean distances are rendered useless: a small misalignment may result in a file that looks very dissimilar to the original file according such measures. In this paper, we present the first linear sketch that is robust to a small number of alignment errors. Specifically, the sketch can be used to determine whether two files are within a small Hamming distance of being a cyclic shift of each other. Furthermore, the sketch is homomorphic with respect to rotations: it is possible to construct the sketch of a cyclic shift of a file given only the sketch of the original file. The relevant dissimilarity measure, known as the shift distance, arises in the context of embedding edit distance and our result addressed an open problem [26, Question 13] with a rather surprising outcome. Our sketch projects a length n file into D(n) · polylog n dimensions where D(n) ≪ n is the number of divisors of n. The striking fact is that this is nearoptimal, i.e., the D(n) dependence is inherent to a problem that is ostensibly about lossy compression. In contrast, we then show that any sketch for estimating the edit distance between two files, even when small, requires sketches whose size is nearly linear in n. This lower bound addresses a longstanding open problem on the low distor
Outputsensitive skyline algorithms in external memory
, 2013
"... This paper presents new results in external memory for finding the skyline (a.k.a. maxima) of N points in ddimensional space. The state of the art uses O((N/B) logd−2M/B(N/B)) I/Os for fixed d ≥ 3, and O((N/B) logM/B(N/B)) I/Os for d = 2, where M and B are the sizes (in words) of memory and a disk ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
This paper presents new results in external memory for finding the skyline (a.k.a. maxima) of N points in ddimensional space. The state of the art uses O((N/B) logd−2M/B(N/B)) I/Os for fixed d ≥ 3, and O((N/B) logM/B(N/B)) I/Os for d = 2, where M and B are the sizes (in words) of memory and a disk block, respectively. We give algorithms whose running time depends on the number K of points in the skyline. Specifically, we achieve O((N/B) logd−2M/B(K/B)) expected cost for fixed d ≥ 3, and O((N/B) logM/B(K/B)) worstcase cost for d = 2. As a side product, we solve two problems both of independent interest. The first one, the Mskyline problem, aims at reporting M arbitrary skyline points, or the entire skyline if its size is at most M. We settle this problem in O(N/B) expected time in any fixed dimensionality d. The second one, theMpivot problem, is more fundamental: given a set S of N elements drawn from an ordered domain, it outputs M evenly scattered elements (called pivots) from S, namely, S has asymptotically the same number of elements between each pair of consecutive pivots. We give a deterministic algorithm for solving the problem in O(N/B) I/Os. 1
Diversity Maximization via Composable Coresets
"... Given a set S of points in a metric space, and a diversity measure div(·) defined over subsets of S, the goal of the diversity maximization problem is to find a subset T ⊆ S of size k that maximizes div(T). Motivated by applications in massive data processing, we consider the composable coreset fr ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Given a set S of points in a metric space, and a diversity measure div(·) defined over subsets of S, the goal of the diversity maximization problem is to find a subset T ⊆ S of size k that maximizes div(T). Motivated by applications in massive data processing, we consider the composable coreset framework in which a coreset for a diversity measure is called αcomposable, if for any collection of sets and their corresponding coresets, the maximum diversity of the union of the coresets αapproximates the maximum diversity of the union of the sets. We present composable coresets with nearoptimal approximation factors for several notions of diversity, including remoteclique, remotecycle, and remotetree. We also prove a general lower bound on the approximation factor of composable coresets for a large class of diversity maximization problems. 1
HipMer: An Extremescale De Novo Genome Assembler
"... De novo whole genome assembly reconstructs genomic sequences from short, overlapping, and potentially erroneous DNA segments and is one of the most important computations in modern genomics. This work presents HipMer, the first highquality endtoend de novo assembler designed for extreme scale a ..."
Abstract
 Add to MetaCart
De novo whole genome assembly reconstructs genomic sequences from short, overlapping, and potentially erroneous DNA segments and is one of the most important computations in modern genomics. This work presents HipMer, the first highquality endtoend de novo assembler designed for extreme scale analysis, via efficient parallelization of the Meraculous code. First, we significantly improve scalability of parallel kmer analysis for complex repetitive genomes that exhibit skewed frequency distributions. Next, we optimize the traversal of the de Bruijn graph of kmers by employing a novel communicationavoiding parallel algorithm in a variety of usecase scenarios. Finally, we parallelize the Meraculous scaffolding modules by leveraging the onesided communication capabilities of the Unified Parallel C while effectively mitigating load imbalance. Largescale results on a Cray XC30 using grandchallenge genomes demonstrate efficient performance and scalability on thousands of cores. Overall, our pipeline accelerates Meraculous performance by orders of magnitude, enabling the complete assembly of the human genome in just 8.4 minutes on 15K cores of the Cray XC30, and creating unprecedented capability for extremescale genomic analysis.