#### DMCA

## An improved data stream summary: The Count-Min sketch and its applications (2004)

### Cached

### Download Links

- [dimacs.rutgers.edu]
- [www.research.att.com]
- [www.cse.unsw.edu.au]
- [dimacs.rutgers.edu]
- [madalgo.au.dk]
- [www.madalgo.au.dk]
- [www.madalgo.au.dk]
- [www.eecs.harvard.edu]
- [www.cc.gatech.edu]
- [www.research.att.com]
- [dimacs.rutgers.edu]
- [people.cs.umass.edu]
- [cse.seu.edu.cn]
- DBLP

### Other Repositories/Bibliography

Venue: | J. Algorithms |

Citations: | 405 - 44 self |

### Citations

2174 | Randomized Algorithms
- Motwani, Raghavan
- 1995
(Show Context)
Citation Context ...ct The space used by Count-Min sketches is the array of wd counts, which takes wd words, and d hash functions, each of which can be stored using 2 words when using the pairwise functions described in =-=[15]-=-. 4 Approximate Query Answering Using CM Sketches For each of the three queries introduced in Section 2: Point, Range, and Inner Product queries, we show how they can be answered using Count-Min sketc... |

835 | The space complexity of approximating the frequency moments
- Alon, Matias, et al.
- 1999
(Show Context)
Citation Context ...ed in the data stream context that allow a number of simple aggregation functions to be approximated. Quantities for which efficient sketches have been designed include the L1 and L2 norms of vectors =-=[2]-=-, the number of distinct items in a sequence (ie number of non-zero entries in a(t)) [8], join and self-join sizes of relations (representable as inner-products of vectors a(t), b(t)) [2, 1], item and... |

764 | Models and issues in data stream systems
- Babcock, Babu, et al.
(Show Context)
Citation Context ....sThere has been a frenzy of activity recently in the Algorithm, Database and Networking communities on such data stream problems, with multiple surveys, tutorials, workshops and research papers. See =-=[7, 3, 16]-=- for detailed description of the motivations driving this area. In recent years, several different sketches have been proposed in the data stream context that allow a number of simple aggregation func... |

528 | Data streams: Algorithms and applications
- Muthukrishnan
(Show Context)
Citation Context ....sThere has been a frenzy of activity recently in the Algorithm, Database and Networking communities on such data stream problems, with multiple surveys, tutorials, workshops and research papers. See =-=[7, 3, 16]-=- for detailed description of the motivations driving this area. In recent years, several different sketches have been proposed in the data stream context that allow a number of simple aggregation func... |

403 | Approximate Frequency Counts Over Data Streams - Manku, Motwani - 2002 |

355 | New directions in traffic measurement and accounting - Estan, Varghese - 2002 |

332 | Finding frequent items in data streams
- Charikar, Chen, et al.
- 2002
(Show Context)
Citation Context ...tinct items in a sequence (ie number of non-zero entries in a(t)) [8], join and self-join sizes of relations (representable as inner-products of vectors a(t), b(t)) [2, 1], item and range sum queries =-=[12, 4]-=-. These sketches are of interest not simply because they can be used to directly approximate quantities of interest, but also because they have been used considerably as “black box” devices in order t... |

318 | Stable distributions, pseudorandom generators, embeddings, and data stream computation - Indyk |

212 | Surfing wavelets on streams: One-pass summaries for approximate aggregate queries
- Gilbert, Kotidis, et al.
- 2001
(Show Context)
Citation Context ...tinct items in a sequence (ie number of non-zero entries in a(t)) [8], join and self-join sizes of relations (representable as inner-products of vectors a(t), b(t)) [2, 1], item and range sum queries =-=[12, 4]-=-. These sketches are of interest not simply because they can be used to directly approximate quantities of interest, but also because they have been used considerably as “black box” devices in order t... |

203 | Space-efficient online computation of quantile summaries
- Greenwald, Khanna
- 2001
(Show Context)
Citation Context ... previously best known space bounds for finding approximate quantiles is O( 1 1 ε (log2 ε + log2 log 1 δ )) space for a randomized sampling and O( 1 ε log(ε||a||1)) space for a deterministic solution =-=[14]-=-. These bounds are not completely comparable, but our result is the first on the more powerful Turnstile model to be comparable to the Cash Register model bounds in the leading 1/ε term. 5.2 Heavy Hit... |

195 | What’s Hot and What’s Not: Tracking Most Frequent Items Dynamically, in
- Cormode, Muthukrishnan
- 2003
(Show Context)
Citation Context ...t is the first on the more powerful Turnstile model to be comparable to the Cash Register model bounds in the leading 1/ε term. 5.2 Heavy Hitters in the Turnstile Model We adopt the solution given in =-=[5]-=-, which describes a divide and conquer procedure to find the heavy hitters. This keeps sketches for computing range sums: log n different sketches, one for each different dyadic range. When an update ... |

189 | Counting distinct elements in a data stream - Bar-Yossef, Jayram, et al. - 2002 |

179 | Processing complex aggregate queries over data streams - Dobra, Garofalakis, et al. - 2002 |

145 | Computing iceberg queries efficiently - Fang, Shivakumar, et al. - 1998 |

123 | Approximate medians and other quantiles in one pass and with limited memory - Manku, Rajagopalan, et al. - 1998 |

120 | Tracking join and self-join sizes in limited storage
- Alon, Gibbons, et al.
- 1999
(Show Context)
Citation Context ...s of vectors [2], the number of distinct items in a sequence (ie number of non-zero entries in a(t)) [8], join and self-join sizes of relations (representable as inner-products of vectors a(t), b(t)) =-=[2, 1]-=-, item and range sum queries [12, 4]. These sketches are of interest not simply because they can be used to directly approximate quantities of interest, but also because they have been used considerab... |

119 | Extensions of Lipshitz mapping into Hilbert space - Johnson, Lindenstrauss |

117 |
Querying and mining data streams: you only get one look
- Garofalakis, Rastogi
- 2002
(Show Context)
Citation Context ...queries are of interest in summarizing the data distribution approximately; and inner-product queries allow approximation of join size of relations. Fuller discussion of these aspects can be found in =-=[9, 16]-=-. We will also study use of these queries to compute more complex functions on data streams. As examples, we will focus on the two following problems. Recall that ||a||1 = �n i=1 |ai(t)|; more general... |

110 | How to summarize the universe: Dynamic maintenance of quantiles
- Gilbert, Kotidis, et al.
- 2002
(Show Context)
Citation Context ...rectly approximate quantities of interest, but also because they have been used considerably as “black box” devices in order to compute more sophisticated aggregates and complex quantities: quantiles =-=[13]-=-, wavelets [12], and histograms [11]. Sketches thus far designed are typically linear functions of their input, and can be represented as projections of an underlying vector representing the data with... |

109 | Estimating simple functions on the union of data streams - Gibbons, Tirthapura - 2001 |

107 |
small-space algorithms for approximate histogram maintenance. STOC
- Fast
- 2002
(Show Context)
Citation Context ...erest, but also because they have been used considerably as “black box” devices in order to compute more sophisticated aggregates and complex quantities: quantiles [13], wavelets [12], and histograms =-=[11]-=-. Sketches thus far designed are typically linear functions of their input, and can be represented as projections of an underlying vector representing the data with certain randomly chosen projection ... |

107 | An Approximate L1-Difference Algorithm for Massive Data Streams - Feigenbaum, Kannan, et al. - 2002 |

99 | Dynamic multidimensional histograms - Thaper, Guha, et al. - 2002 |

84 | What's new: Finding significant differences in network data streams," INFOCOM04
- Cormode, Muthukrishnan
- 2004
(Show Context)
Citation Context ...h is quite simple, and is likely to find many applications, including in hardware solutions for these problems. We have recently applied these ideas to the problem of change detection on data streams =-=[6]-=-, and we also believe that it can be applied to improve the time and space bounds for constructing approximate wavelet and histogram representations of data streams [11]. Also, the CM Sketch can also ... |

83 |
Probabilistic counting
- Flajolet, Martin
- 1983
(Show Context)
Citation Context ...approximated. Quantities for which efficient sketches have been designed include the L1 and L2 norms of vectors [2], the number of distinct items in a sequence (ie number of non-zero entries in a(t)) =-=[8]-=-, join and self-join sizes of relations (representable as inner-products of vectors a(t), b(t)) [2, 1], item and range sum queries [12, 4]. These sketches are of interest not simply because they can b... |

78 | Optimal space lower bounds for all frequency moments
- Woodruff
- 2004
(Show Context)
Citation Context ...was shown for a number of data stream problems: approximating frequency moments Fk(t) = � k (ai(t)) k , estimating the number of distinct items, and computing the Hamming distance between two strings =-=[17]-=-. It is an interesting contrast that for a number of similar seeming problems (finding Heavy Hitters and Quantiles in the most general data stream model) we are able to give an O( 1) upper bound. Conc... |

58 | Finding hierarchical heavy hitters in data streams - CORMODE, KORN, et al. - 2003 |

45 | smallspace algorithms for approximate histogram maintenance, STOC 2002 - Gilbert, Guha, et al. |

20 | Comparing data streams using hamming norms - Cormode, Datar, et al. - 2002 |

7 |
Data streaming in computer networks
- Estan, Varghese
- 2003
(Show Context)
Citation Context ....sThere has been a frenzy of activity recently in the Algorithm, Database and Networking communities on such data stream problems, with multiple surveys, tutorials, workshops and research papers. See =-=[7, 3, 16]-=- for detailed description of the motivations driving this area. In recent years, several different sketches have been proposed in the data stream context that allow a number of simple aggregation func... |

5 |
Synopsis structures for massive data sets, in
- Gibbons, Matias
- 1999
(Show Context)
Citation Context ...represent a explicitly. Since the space is sublinear in data and input size, the data structures used by the algorithms to represent the input data stream is merely a summary—aka a sketch or synopsis =-=[10]-=-)—of it; because of this compression, almost no function that one needs to compute on a can be done precisely, so some approximation is provably needed. Second, processing an update should be fast and... |