Results 1  10
of
90
An Information Statistics Approach to Data Stream and Communication Complexity
, 2003
"... We present a new method for proving strong lower bounds in communication complexity. ..."
Abstract

Cited by 240 (8 self)
 Add to MetaCart
We present a new method for proving strong lower bounds in communication complexity.
A nearoptimal algorithm for computing the entropy of a stream
 In ACMSIAM Symposium on Discrete Algorithms
, 2007
"... We describe a simple algorithm for approximating the empirical entropy of a stream of m values in a single pass, using O(ε −2 log(δ −1) log m) words of space. Our algorithm is based upon a novel extension of a method introduced by Alon, Matias, and Szegedy [1]. We show a space lower bound of Ω(ε −2 ..."
Abstract

Cited by 74 (20 self)
 Add to MetaCart
We describe a simple algorithm for approximating the empirical entropy of a stream of m values in a single pass, using O(ε −2 log(δ −1) log m) words of space. Our algorithm is based upon a novel extension of a method introduced by Alon, Matias, and Szegedy [1]. We show a space lower bound of Ω(ε −2 / log(ε −1)), meaning that our algorithm is near optimal in terms of its dependency on ε. This improves over previous work on this problem [8, 13, 17, 5]. We show that generalizing to kth order entropy requires close to linear space for all k ≥ 1, and give additive approximations using our algorithm. Lastly, we show how to compute a multiplicative approximation to the entropy of a random walk on an undirected graph. 1
Simpler algorithm for estimating frequency moments of data streams
 PROCEEDINGS OF THE SEVENTEENTH ANNUAL ACMSIAM SYMPOSIUM ON DISCRETE ALGORITHM
, 2006
"... The problem of estimating the kth frequency moment Fk over a data stream by looking at the items exactly once as they arrive was posed in [1, 2]. A succession of algorithms have been proposed for this problem [1, 2, 6, 8, 7]. Recently, Indyk and Woodruff [11] have presented the first algorithm for e ..."
Abstract

Cited by 45 (4 self)
 Add to MetaCart
The problem of estimating the kth frequency moment Fk over a data stream by looking at the items exactly once as they arrive was posed in [1, 2]. A succession of algorithms have been proposed for this problem [1, 2, 6, 8, 7]. Recently, Indyk and Woodruff [11] have presented the first algorithm for estimating Fk, for k > 2, using space Õ(n12/k), matching the space lower bound (up to polylogarithmic factors) for this problem [1, 2, 3, 4, 13] (n is the number of distinct items occurring in the stream.) In this paper, we present a simpler 1pass algorithm for estimating Fk.
A lower bound for the bounded round quantum communication complexity of set disjointness
"... We show lower bounds in the multiparty quantum communication complexity model. In this model, there are t parties where the ith party has input Xi ⊆ [n]. These parties communicate with each other by transmitting qubits to determine with high probability the value of some function F of their combin ..."
Abstract

Cited by 36 (11 self)
 Add to MetaCart
We show lower bounds in the multiparty quantum communication complexity model. In this model, there are t parties where the ith party has input Xi ⊆ [n]. These parties communicate with each other by transmitting qubits to determine with high probability the value of some function F of their combined input (X1,...,Xt). We consider the class of Boolean valued functions whose value depends only on X1 ∩...∩ Xt; that is, for each F in this class there is an fF : 2[n] → {0,1}, such that F(X1,...,Xt) = fF(X1 ∩...∩ Xt). We show that the tparty kround communication complexity of F is Ω(sm(fF)/(k2)), where sm(fF) stands for the monotone sensitivity of fF' and is defined by sm(fF) = &utri; maxS⊆[n] {i : fF(S ∪ {i}) ≠ fF(S)}. For twoparty quantum communication protocols for the set disjointness problem, this implies that the two parties must exchange Ω(n/k2) qubits. An upper bound of O(n/k) can be derived from the O(√n) upper bound due to S. Aaronson and A. Ambainis (2003). For k = 1, our lower bound matches the Ω(n) lower bound observed by H. Buhrman and R. de Wolf (2001) (based on a result of A. Nayak (1999)), and for 2 ≤ k &Lt; n14 /, improves the lower bound of Ω(√n) shown by A. Razborov (2002). For protocols with no restrictions on the number of rounds, we can conclude that the two parties must exchange Ω(n13/) qubits. This, however, falls short of the optimal Ω (√n) lower bound shown by A. Razborov (2002). Our result is obtained by adapting to the quantum setting the elegant informationtheoretic arguments of Z. BarYossef et al. (2002). Using this method we can show similar lower bounds for the L∞ function considered in Z. BarYossef et al. (2002).
On the exact space complexity of sketching and streaming small norms
 In SODA
, 2010
"... We settle the 1pass space complexity of (1 ± ε)approximating the Lp norm, for real p with 1 ≤ p ≤ 2, of a lengthn vector updated in a lengthm stream with updates to its coordinates. We assume the updates are integers in the range [−M, M]. In particular, we show the space required is Θ(ε −2 log(mM ..."
Abstract

Cited by 35 (11 self)
 Add to MetaCart
(Show Context)
We settle the 1pass space complexity of (1 ± ε)approximating the Lp norm, for real p with 1 ≤ p ≤ 2, of a lengthn vector updated in a lengthm stream with updates to its coordinates. We assume the updates are integers in the range [−M, M]. In particular, we show the space required is Θ(ε −2 log(mM) + log log(n)) bits. Our result also holds for 0 < p < 1; although Lp is not a norm in this case, it remains a welldefined function. Our upper bound improves upon previous algorithms of [Indyk, JACM ’06] and [Li, SODA ’08]. This improvement comes from showing an improved derandomization of the Lp sketch of Indyk by using kwise independence for small k, as opposed to using the heavy hammer of a generic pseudorandom generator against spacebounded computation such as Nisan’s PRG. Our lower bound improves upon previous work of [AlonMatiasSzegedy, JCSS ’99] and [Woodruff, SODA ’04], and is based on showing a direct sum property for the 1way communication of the gapHamming problem. 1
Sketching and Streaming Entropy via Approximation Theory
"... We conclude a sequence of work by giving nearoptimal sketching and streaming algorithms for estimating Shannon entropy in the most general streaming model, with arbitrary insertions and deletions. This improves on prior results that obtain suboptimal space bounds in the general model, and nearopti ..."
Abstract

Cited by 32 (0 self)
 Add to MetaCart
(Show Context)
We conclude a sequence of work by giving nearoptimal sketching and streaming algorithms for estimating Shannon entropy in the most general streaming model, with arbitrary insertions and deletions. This improves on prior results that obtain suboptimal space bounds in the general model, and nearoptimal bounds in the insertiononly model without sketching. Our highlevel approach is simple: we give algorithms to estimate Rényi and Tsallis entropy, and use them to extrapolate an estimate of Shannon entropy. The accuracy of our estimates is proven using approximation theory arguments and extremal properties of Chebyshev polynomials, a technique which may be useful for other problems. Our work also yields the bestknown and nearoptimal additive approximations for entropy, and hence also for conditional entropy and mutual information.
Robust lower bounds for communication and stream computation
 in Proceedings of the 40th Annual ACM Symposium on Theory of Computing (British
, 2008
"... We study the communication complexity of evaluating functions when the input data is randomly allocated (according to some known distribution) amongst two or more players, possibly with information overlap. This naturally extends previously studied variable partition models such as the bestcase and ..."
Abstract

Cited by 28 (7 self)
 Add to MetaCart
(Show Context)
We study the communication complexity of evaluating functions when the input data is randomly allocated (according to some known distribution) amongst two or more players, possibly with information overlap. This naturally extends previously studied variable partition models such as the bestcase and worstcase partition models [32, 29]. We aim to understand whether the hardness of a communication problem holds for almost every allocation of the input, as opposed to holding for perhaps just a few atypical partitions. A key application is to the heavily studied data stream model. There is a strong connection between our communication lower bounds and lower bounds in the data stream model that are “robust” to the ordering of the data. That is, we prove lower bounds for when the order of the items in the stream is chosen not adversarially but rather uniformly (or nearuniformly) from the set of all permuations. This randomorder data stream model has attracted recent interest, since lower bounds here give stronger evidence for the inherent hardness of streaming problems. Our results include the first randompartition communication lower bounds for problems including multiparty set disjointness and gapHammingdistance. Both are tight. We also extend and improve previous results [19, 7] for a form of pointer jumping that is relevant to the problem of selection (in particular, median finding). Collectively, these results yield lower bounds for a variety of problems in the randomorder data stream model, including estimating the number of distinct elements, approximating frequency moments, and quantile estimation.
Fast moment estimation in data streams in optimal space
 In Proceedings of the 43rd ACM Symposium on Theory of Computing (STOC
, 2011
"... We give a spaceoptimal algorithm with update time O(log 2 (1/ε) log log(1/ε)) for (1 ± ε)approximating the pth frequency moment, 0 < p < 2, of a lengthn vector updated in a data stream. This provides a nearly exponential improvement in the update time complexity over the previous spaceoptim ..."
Abstract

Cited by 27 (6 self)
 Add to MetaCart
(Show Context)
We give a spaceoptimal algorithm with update time O(log 2 (1/ε) log log(1/ε)) for (1 ± ε)approximating the pth frequency moment, 0 < p < 2, of a lengthn vector updated in a data stream. This provides a nearly exponential improvement in the update time complexity over the previous spaceoptimal algorithm of [KaneNelsonWoodruff, SODA 2010], which had update time Ω(1/ε 2). 1
Asymptotically optimal lower bounds on the NIHmultiparty information complexity of the andfunction and disjointness
 In STACS 2009
, 2009
"... Abstract. Here we prove an asymptotically optimal lower bound on the information complexity of the kparty disjointness function with the unique intersection promise, an important special case of the well known disjointness problem, and the ANDkfunction in the number in the hand model. Our Ω(n/k) b ..."
Abstract

Cited by 26 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Here we prove an asymptotically optimal lower bound on the information complexity of the kparty disjointness function with the unique intersection promise, an important special case of the well known disjointness problem, and the ANDkfunction in the number in the hand model. Our Ω(n/k) bound for disjointness improves on an earlier Ω(n/(k log k)) bound by Chakrabarti et al. (2003), who obtained an asymptotically tight lower bound for oneway protocols, but failed to do so for the general case. Our result eliminates both the gap between the upper and the lower bound for unrestricted protocols and the gap between the lower bounds for oneway protocols and unrestricted protocols. 1.
Declaring Independence via the Sketching of Sketches
"... We consider the problem of identifying correlations in data streams. Surprisingly, our work seems to be the first to consider this natural problem. In the centralized model, we consider a stream of pairs (i, j)â[n] 2 whose frequencies define a joint distribution (X, Y). In the distributed model, e ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
We consider the problem of identifying correlations in data streams. Surprisingly, our work seems to be the first to consider this natural problem. In the centralized model, we consider a stream of pairs (i, j)â[n] 2 whose frequencies define a joint distribution (X, Y). In the distributed model, each coordinate of the pair may appear separately in the stream. We present a range of algorithms for approximating to what extent X and Y are independent, i.e., how close the joint distribution is to the product of the marginals. We consider various measures of closeness including â1, â2, and the mutual information between X and Y. Our algorithms are based on âsketching sketchesâ, i.e., composing smallspace linear synopses of the distributions. Perhaps ironically, the biggest technical challenges that arise relate to ensuring that different components of our estimates are sufficiently independent.