Results 1 - 10
of
47
An Information Statistics Approach to Data Stream and Communication Complexity
, 2003
"... We present a new method for proving strong lower bounds in communication complexity. ..."
Abstract
-
Cited by 118 (6 self)
- Add to MetaCart
We present a new method for proving strong lower bounds in communication complexity.
A near-optimal algorithm for computing the entropy of a stream
- In ACM-SIAM Symposium on Discrete Algorithms
, 2007
"... We describe a simple algorithm for approximating the empirical entropy of a stream of m values in a single pass, using O(ε −2 log(δ −1) log m) words of space. Our algorithm is based upon a novel extension of a method introduced by Alon, Matias, and Szegedy [1]. We show a space lower bound of Ω(ε −2 ..."
Abstract
-
Cited by 36 (17 self)
- Add to MetaCart
We describe a simple algorithm for approximating the empirical entropy of a stream of m values in a single pass, using O(ε −2 log(δ −1) log m) words of space. Our algorithm is based upon a novel extension of a method introduced by Alon, Matias, and Szegedy [1]. We show a space lower bound of Ω(ε −2 / log(ε −1)), meaning that our algorithm is near optimal in terms of its dependency on ε. This improves over previous work on this problem [8, 13, 17, 5]. We show that generalizing to kth order entropy requires close to linear space for all k ≥ 1, and give additive approximations using our algorithm. Lastly, we show how to compute a multiplicative approximation to the entropy of a random walk on an undirected graph. 1
Simpler algorithm for estimating frequency moments of data streams
- PROCEEDINGS OF THE SEVENTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHM
, 2006
"... The problem of estimating the kth frequency moment Fk over a data stream by looking at the items exactly once as they arrive was posed in [1, 2]. A succession of algorithms have been proposed for this problem [1, 2, 6, 8, 7]. Recently, Indyk and Woodruff [11] have presented the first algorithm for e ..."
Abstract
-
Cited by 28 (2 self)
- Add to MetaCart
The problem of estimating the kth frequency moment Fk over a data stream by looking at the items exactly once as they arrive was posed in [1, 2]. A succession of algorithms have been proposed for this problem [1, 2, 6, 8, 7]. Recently, Indyk and Woodruff [11] have presented the first algorithm for estimating Fk, for k > 2, using space Õ(n1-2/k), matching the space lower bound (up to poly-logarithmic factors) for this problem [1, 2, 3, 4, 13] (n is the number of distinct items occurring in the stream.) In this paper, we present a simpler 1-pass algorithm for estimating Fk.
Robust lower bounds for communication and stream computation
- in Proceedings of the 40th Annual ACM Symposium on Theory of Computing (British
, 2008
"... We study the communication complexity of evaluating functions when the input data is randomly allocated (according to some known distribution) amongst two or more players, possibly with information overlap. This naturally extends previously studied variable partition models such as the best-case and ..."
Abstract
-
Cited by 16 (5 self)
- Add to MetaCart
We study the communication complexity of evaluating functions when the input data is randomly allocated (according to some known distribution) amongst two or more players, possibly with information overlap. This naturally extends previously studied variable partition models such as the best-case and worst-case partition models [32, 29]. We aim to understand whether the hardness of a communication problem holds for almost every allocation of the input, as opposed to holding for perhaps just a few atypical partitions. A key application is to the heavily studied data stream model. There is a strong connection between our communication lower bounds and lower bounds in the data stream model that are “robust” to the ordering of the data. That is, we prove lower bounds for when the order of the items in the stream is chosen not adversarially but rather uniformly (or near-uniformly) from the set of all permuations. This random-order data stream model has attracted recent interest, since lower bounds here give stronger evidence for the inherent hardness of streaming problems. Our results include the first random-partition communication lower bounds for problems including multi-party set disjointness and gap-Hamming-distance. Both are tight. We also extend and improve previous results [19, 7] for a form of pointer jumping that is relevant to the problem of selection (in particular, median finding). Collectively, these results yield lower bounds for a variety of problems in the random-order data stream model, including estimating the number of distinct elements, approximating frequency moments, and quantile estimation.
Estimating the sortedness of a data stream
- In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms
, 2007
"... The distance to monotonicity of a sequence is the minimum number of edit operations required to transform the sequence into an increasing order; this measure is complementary to the length of the longest increasing subsequence (LIS). We address the question of estimating these quantities in the one- ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
The distance to monotonicity of a sequence is the minimum number of edit operations required to transform the sequence into an increasing order; this measure is complementary to the length of the longest increasing subsequence (LIS). We address the question of estimating these quantities in the one-pass data stream model and present the first sub-linear space algorithms for both problems. We first present O ( √ n)-space deterministic algorithms that approximate the distance to monotonicity and the LIS to within a factor that is arbitrarily close to 1. We also show a lower bound of Ω(n) on the space required by any randomized algorithm to compute the LIS (or alternatively the distance from monotonicity) exactly, demonstrating that approximation is necessary for sub-linear space computation; this bound improves upon the existing lower bound of Ω ( √ n) [LNVZ06]. Our main result is a randomized algorithm that uses only O(log 2 n) space and approximates the distance to monotonicity to within a factor that is arbitrarily close to 4. In contrast, we believe that any significant reduction in the space complexity for approximating the length of the LIS is considerably hard. We conjecture that any deterministic (1 + ɛ) approximation algorithm for LIS requires Ω ( √ n) space, and as a step towards this conjecture, prove a space lower bound of Ω ( √ n) for a restricted yet natural class of deterministic algorithms. 1
Declaring Independence via the Sketching of Sketches
"... We consider the problem of identifying correlations in data streams. Surprisingly, our work seems to be the first to consider this natural problem. In the centralized model, we consider a stream of pairs (i, j)â[n] 2 whose frequencies define a joint distribution (X, Y). In the distributed model, e ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
We consider the problem of identifying correlations in data streams. Surprisingly, our work seems to be the first to consider this natural problem. In the centralized model, we consider a stream of pairs (i, j)â[n] 2 whose frequencies define a joint distribution (X, Y). In the distributed model, each coordinate of the pair may appear separately in the stream. We present a range of algorithms for approximating to what extent X and Y are independent, i.e., how close the joint distribution is to the product of the marginals. We consider various measures of closeness including â1, â2, and the mutual information between X and Y. Our algorithms are based on âsketching sketchesâ, i.e., composing smallspace linear synopses of the distributions. Perhaps ironically, the biggest technical challenges that arise relate to ensuring that different components of our estimates are sufficiently independent.
Sketching and Streaming Entropy via Approximation Theory
"... We conclude a sequence of work by giving near-optimal sketching and streaming algorithms for estimating Shannon entropy in the most general streaming model, with arbitrary insertions and deletions. This improves on prior results that obtain suboptimal space bounds in the general model, and near-opti ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
We conclude a sequence of work by giving near-optimal sketching and streaming algorithms for estimating Shannon entropy in the most general streaming model, with arbitrary insertions and deletions. This improves on prior results that obtain suboptimal space bounds in the general model, and near-optimal bounds in the insertion-only model without sketching. Our high-level approach is simple: we give algorithms to estimate Rényi and Tsallis entropy, and use them to extrapolate an estimate of Shannon entropy. The accuracy of our estimates is proven using approximation theory arguments and extremal properties of Chebyshev polynomials, a technique which may be useful for other problems. Our work also yields the best-known and near-optimal additive approximations for entropy, and hence also for conditional entropy and mutual information.
A lower bound for the bounded round quantum communication complexity of set disjointness
"... We show lower bounds in the multi-party quantum communication complexity model. In this model, there are t parties where the ith party has input Xi ⊆ [n]. These parties communicate with each other by transmitting qubits to determine with high probability the value of some function F of their combin ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
We show lower bounds in the multi-party quantum communication complexity model. In this model, there are t parties where the ith party has input Xi ⊆ [n]. These parties communicate with each other by transmitting qubits to determine with high probability the value of some function F of their combined input (X1,...,Xt). We consider the class of Boolean valued functions whose value depends only on X1 ∩...∩ Xt; that is, for each F in this class there is an fF : 2[n] → {0,1}, such that F(X1,...,Xt) = fF(X1 ∩...∩ Xt). We show that the t-party k-round communication complexity of F is Ω(sm(fF)/(k2)), where sm(fF) stands for the monotone sensitivity of fF' and is defined by sm(fF) = ▵ maxS⊆[n] |{i : fF(S ∪ {i}) ≠ fF(S)}|. For two-party quantum communication protocols for the set disjointness problem, this implies that the two parties must exchange Ω(n/k2) qubits. An upper bound of O(n/k) can be derived from the O(√n) upper bound due to S. Aaronson and A. Ambainis (2003). For k = 1, our lower bound matches the Ω(n) lower bound observed by H. Buhrman and R. de Wolf (2001) (based on a result of A. Nayak (1999)), and for 2 ≤ k ≪ n14 /, improves the lower bound of Ω(√n) shown by A. Razborov (2002). For protocols with no restrictions on the number of rounds, we can conclude that the two parties must exchange Ω(n13/) qubits. This, however, falls short of the optimal Ω (√n) lower bound shown by A. Razborov (2002). Our result is obtained by adapting to the quantum setting the elegant information-theoretic arguments of Z. Bar-Yossef et al. (2002). Using this method we can show similar lower bounds for the L∞ function considered in Z. Bar-Yossef et al. (2002).
On the exact space complexity of sketching and streaming small norms
- In SODA
, 2010
"... We settle the 1-pass space complexity of (1 ± ε)approximating the Lp norm, for real p with 1 ≤ p ≤ 2, of a length-n vector updated in a length-m stream with updates to its coordinates. We assume the updates are integers in the range [−M, M]. In particular, we show the space required is Θ(ε −2 log(mM ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
We settle the 1-pass space complexity of (1 ± ε)approximating the Lp norm, for real p with 1 ≤ p ≤ 2, of a length-n vector updated in a length-m stream with updates to its coordinates. We assume the updates are integers in the range [−M, M]. In particular, we show the space required is Θ(ε −2 log(mM) + log log(n)) bits. Our result also holds for 0 < p < 1; although Lp is not a norm in this case, it remains a well-defined function. Our upper bound improves upon previous algorithms of [Indyk, JACM ’06] and [Li, SODA ’08]. This improvement comes from showing an improved derandomization of the Lp sketch of Indyk by using k-wise independence for small k, as opposed to using the heavy hammer of a generic pseudorandom generator against space-bounded computation such as Nisan’s PRG. Our lower bound improves upon previous work of [Alon-Matias-Szegedy, JCSS ’99] and [Woodruff, SODA ’04], and is based on showing a direct sum property for the 1-way communication of the gap-Hamming problem. 1
The data stream space complexity of cascaded norms
- In FOCS
, 2009
"... Abstract — We consider the problem of estimating cascaded aggregates over a matrix presented as a sequence of updates in a data stream. A cascaded aggregate P ◦ Q is defined by evaluating aggregate Q repeatedly over each row of the matrix, and then evaluating aggregate P over the resulting vector of ..."
Abstract
-
Cited by 9 (6 self)
- Add to MetaCart
Abstract — We consider the problem of estimating cascaded aggregates over a matrix presented as a sequence of updates in a data stream. A cascaded aggregate P ◦ Q is defined by evaluating aggregate Q repeatedly over each row of the matrix, and then evaluating aggregate P over the resulting vector of values. This problem was introduced by Cormode and Muthukrishnan, PODS, 2005 [CM]. We analyze the space complexity of estimating cascaded norms on an n × d matrix to within a small relative error. Let Lp denote the p-th norm, where p is a non-negative integer. We abbreviate the cascaded norm L k ◦ Lp by L k,p. (1) For any constant k ≥ p ≥ 2, we obtain a 1-pass Õ(n1−2/k d 1−2/p)-space algorithm for estimating Lk,p. This is optimal up to polylogarithmic factors and resolves an open question of [CM] regarding the space complexity of L4,2. We also obtain 1-pass space-optimal algorithms for estimating L∞,k and Lk,∞. (2) We prove a space lower bound of Ω(n1−1/k) on estimating Lk,0 and Lk,1, resolving an open question due to Indyk, IITK Data Streams Workshop (Problem 8), 2006. We also resolve two more questions of [CM] concerning Lk,2 estimation and block heavy hitter problems. Ganguly, Bansal and Dube (FAW, 2008) claimed an Õ(1)-space algorithm for estimating Lk,p for any k, p ∈ [0,2]. Our lower bounds show this claim is incorrect. 1.

