## Computing on Data Streams (1998)

### Cached

### Download Links

Citations: | 166 - 3 self |

### BibTeX

@MISC{Henzinger98computingon,

author = {Monika Rauch Henzinger and Prabhakar Raghavan and Sridar Rajagopalan},

title = {Computing on Data Streams},

year = {1998}

}

### Years of Citing Articles

### OpenURL

### Abstract

In this paper we study the space requirement of algorithms that make only one (or a small number of) pass(es) over the input data. We study such algorithms under a model of data streams that we introduce here. We give a number of upper and lower bounds for problems stemming from queryprocessing, invoking in the process tools from the area of communication complexity.

### Citations

754 |
Amortized efficiency of list update and paging rules
- SLEATOR, TARJAN
- 1985
(Show Context)
Citation Context ...techniques can be used to prove lower bounds on the space requirements. Our model appears at first sight to be closely related to papers on I/O complexity [HK81], hierarchical memory [AACS87], paging =-=[ST85]-=- and competitive analysis [KMRS88], as well as external memory algorithms [VV96]. However, our model is considerably more stringent: whereas in these papers on memory manage5sment one can bring back (... |

714 | The space complexity of approximating the frequency moments
- Alon, Matias, et al.
- 1999
(Show Context)
Citation Context ... kth largest out of n elements using at most P passes over the data. They showed an upper bound of n 1/P log n and an almost matching lower bound of n 1/P for large enough k. Alon, Matias and Szegedy =-=[AMS96]-=- studied the space complexity of estimating the frequency moments of a sequence of elements in one-pass. In this context, they show (almost) tight upper and lower bounds for a large number of frequenc... |

438 |
Syntactic clustering of the web
- Broder, Glassman, et al.
- 1997
(Show Context)
Citation Context ... = ∀u1, u2,...!∃(v1,v2,...): f(u1,...,v1,...)for(u1, u2,...,v1,v2,...)∈ R. The Consistency Verification problem. Verify that R satisfies φ. (3) More traditional graph problems like connectivity arise =-=[BGMZ97]-=-, while analyzing various properties of the web. In database query optimization estimating the size of the transitive closure is important [LN89]. This motivates our study of study of various traditio... |

396 | Some complexity questions related to distributive computing - Yao |

329 | Lore: A database management system for semistructured data
- McHugh, Abiteboul, et al.
- 1997
(Show Context)
Citation Context ...terministic and randomized algorithms, and (iii) between exact and approximation algorithms. We first describe some classes of problems which can be described in this context. (1) Systems such as LORE=-=[MAGQW]-=- and WEBSQL [AMM, MM, MMM] view a database as a graph/hypergraph. For instance, a directed edge might represent a hyperlink on the web, or a citation between scientific papers, or a pair of cities con... |

252 | Querying the World Wide Web - Mendelzon, Mihaila, et al. - 1997 |

251 | Improved histograms for selectivity estimation of range predicates
- Poosala, Haas, et al.
- 1996
(Show Context)
Citation Context ...te f . 1.4 Related previous work Estimation of order statistics and outliers [ARS97, AS95, JC85, RML97, Olk93] has received much attention in in the context of sorting [DNS91], selectivity estimation =-=[PIHS96]-=-, query optimization [SALP79] and in providing online user feedback [Hel]. The survey by Yannakakis [Yan90] is a comprehensive account of graph-theoretic methods in database theory. Classical work on ... |

204 |
Competitive snoopy caching
- Karlin, Manasse, et al.
- 1988
(Show Context)
Citation Context ...lower bounds on the space requirements. Our model appears at first sight to be closely related to papers on I/O complexity [HK81], hierarchical memory [AACS87], paging [ST85] and competitive analysis =-=[KMRS88]-=-, as well as external memory algorithms [VV96]. However, our model is considerably more stringent: whereas in these papers on memory manage5sment one can bring back (into fast memory) a data item that... |

177 |
I/O Complexity: The Red-Blue Pebbling Game
- Hong, Kung
- 1981
(Show Context)
Citation Context ...oments and show how communication complexity techniques can be used to prove lower bounds on the space requirements. Our model appears at first sight to be closely related to papers on I/O complexity =-=[HK81]-=-, hierarchical memory [AACS87], paging [ST85] and competitive analysis [KMRS88], as well as external memory algorithms [VV96]. However, our model is considerably more stringent: whereas in these paper... |

133 |
A model for hierarchical memory
- Aggarwal, Alpern, et al.
- 1987
(Show Context)
Citation Context ...ation complexity techniques can be used to prove lower bounds on the space requirements. Our model appears at first sight to be closely related to papers on I/O complexity [HK81], hierarchical memory =-=[AACS87]-=-, paging [ST85] and competitive analysis [KMRS88], as well as external memory algorithms [VV96]. However, our model is considerably more stringent: whereas in these papers on memory manage5sment one c... |

124 |
Selection and sorting with limited storage
- Munro, Paterson
- 1980
(Show Context)
Citation Context ...k on time-space tradeoffs [Cob66, Tom80] may be interpreted as lower bounds on workspace for problems such as verifying palindromes, perfect squares and undirected st connectivity. Paterson and Munro =-=[MP80]-=- studied the space required in selecting the kth largest out of n elements using at most P passes over the data. They showed an upper bound of n 1/P log n and an almost matching lower bound of n 1/P f... |

110 | Random Sampling from Databases - Olken - 1993 |

68 | Applications of a Web query language - Arocena, Mendelzon, et al. - 1997 |

56 | The p2 algorithm for dynamic calculation for quantiles and histograms without storing observations - Jain, Chlamtac - 1985 |

56 |
Graph-theoretic methods in database theory
- Yannakakis
- 1990
(Show Context)
Citation Context ...3] has received much attention in in the context of sorting [DNS91], selectivity estimation [PIHS96], query optimization [SALP79] and in providing online user feedback [Hel]. The survey by Yannakakis =-=[Yan90]-=- is a comprehensive account of graph-theoretic methods in database theory. Classical work on time-space tradeoffs [Cob66, Tom80] may be interpreted as lower bounds on workspace for problems such as ve... |

53 | Making Commitments in the Face of Uncertainty: How to Pick a Winner Almost Every Time - Awerbuch, Azar, et al. - 1996 |

39 | The recognition problem for the set of perfect squares - Cobham - 1966 |

33 | A one-pass algorithm for accurately estimating quantiles for disk-resident data - Alsabti, Ranka, et al. - 1997 |

25 |
Estimating the size of generalized transitive closures
- Lipton, Naughton
- 1989
(Show Context)
Citation Context ...e traditional graph problems like connectivity arise [BGMZ97], while analyzing various properties of the web. In database query optimization estimating the size of the transitive closure is important =-=[LN89]-=-. This motivates our study of study of various traditional graph properties. (4) As pointed out in [SALP79, AMS96] estimates of the frequency moments of a data set can be used effectively for database... |

19 | Time-space tradeoffs for computing functions, using connectivity properties of their circuits - Tompa - 1980 |

18 | A one-pass space-efficient algorithm for finding quantiles - Agrawal, Swami - 1995 |

12 | Online processing redux
- Hellerstein
- 1997
(Show Context)
Citation Context ... [ARS97, AS95, JC85, RML97, Olk93] has received much attention in in the context of sorting [DNS91], selectivity estimation [PIHS96], query optimization [SALP79] and in providing online user feedback =-=[Hel]-=-. The survey by Yannakakis [Yan90] is a comprehensive account of graph-theoretic methods in database theory. Classical work on time-space tradeoffs [Cob66, Tom80] may be interpreted as lower bounds on... |

6 | Formal models of the Web - Mendelzon, Milo - 1997 |

4 |
I/O-Efficient Algorithms and Environments
- Vengroff, Vitter
- 1996
(Show Context)
Citation Context ...l appears at first sight to be closely related to papers on I/O complexity [HK81], hierarchical memory [AACS87], paging [ST85] and competitive analysis [KMRS88], as well as external memory algorithms =-=[VV96]-=-. However, our model is considerably more stringent: whereas in these papers on memory manage5sment one can bring back (into fast memory) a data item that was previously evicted (and is required again... |

3 | Quantile Estimation from Grouped Data: The Cell Midpoint - Schmeiser, Deutsch - 1977 |

1 |
Compactly encoding arbitrary files with differential compression
- Ajtai, Burns, et al.
(Show Context)
Citation Context ...sustain performance for basic systems operations, core utilities are restricted to read the input only once. For example, storage managers (such as IBM’s ADSM [ADSM]) use one-pass differential backup =-=[ABFLS]-=-. Networks are bringing to the desktop ever-increasing quantities of data in the form of data streams. For data in networked storage, each pass over the data results in an additional, expensive networ... |

1 |
Parallel Sorting on a Shard-Nothing Architecture using Probabilistic Splitting
- DeWitt, Naughton, et al.
- 1991
(Show Context)
Citation Context ...ceiver needs to be able to compute f . 1.4 Related previous work Estimation of order statistics and outliers [ARS97, AS95, JC85, RML97, Olk93] has received much attention in in the context of sorting =-=[DNS91]-=-, selectivity estimation [PIHS96], query optimization [SALP79] and in providing online user feedback [Hel]. The survey by Yannakakis [Yan90] is a comprehensive account of graph-theoretic methods in da... |

1 |
Approximate medians and other order statistics in one pass and with limited memory: Theory and Database applications
- Manku, Rajagopalan, et al.
(Show Context)
Citation Context ...problem requires finding a number whose rank is in the interval [m/2 − ɛm, m/2 + ɛm]. It can be solved by a one-pass Monte Carlo algorithm with error probability 1/10 and O(log n(log 1/ɛ) 2 /ɛ) space =-=[RML97]-=-. We give a corresponding lower bound in Section 5. Theorem 5 Any 1-pass Las Vegas algorithm for the approximate median problem requires �(1/ɛ) space. Easy one-pass reductions from the communication c... |