Tracking join and self-join sizes in limited storage (2002)
Cached
Download Links
- [theory.stanford.edu]
- [www.math.tau.ac.il]
- DBLP
Other Repositories/Bibliography
| Citations: | 89 - 0 self |
BibTeX
@MISC{Alon02trackingjoin,
author = {Noga Alon and Phillip B. Gibbons and Yossi Matias and Mario Szegedy},
title = {Tracking join and self-join sizes in limited storage},
year = {2002}
}
Years of Citing Articles
OpenURL
Abstract
This paper presents algorithms for tracking (approximate) join and self-join sizes in limited storage, in the presence of insertions and deletions to the data set(s). Such algorithms detect changes in join and self-join sizes without an expensive recomputation from the base data, and without the large space overhead required to maintain such sizes exactly. Query optimizers rely on fast, high-quality estimates of join sizes in order to select between various join plans, and estimates of self-join sizes are used to indicate the degree of skew in the data. For self-joins, we considertwo approaches proposed in [Alon, Matias, and Szegedy. The Space Complexity of Approximating the Frequency Moments. JCSS, vol. 58, 1999, p.137-147], which we denote tug-of-war and sample-count. Wepresent fast algorithms for implementing these approaches, and extensions to handle deletions as well as insertions. We also report on the rst experimental study of the two approaches, on a range of synthetic and real-world data sets. Our study shows that tug-of-war provides more accurate estimates for a given storage limit than sample-count, which in turn is far more accurate than a standard sampling-based approach. For example, tug-of-war needed only 4{256 memory words, depending on the data set, in order to estimate the self-join size







