#### DMCA

## Models and issues in data stream systems (2002)

### Cached

### Download Links

- [infolab.usc.edu]
- [www.inf.unibz.it]
- [www.cse.unsw.edu.au]
- [www.cs.princeton.edu]
- [www.cs.princeton.edu]
- [www.math.tau.ac.il]
- [ilpubs.stanford.edu:8090]
- [www.cs.brandeis.edu]
- [dbpubs.stanford.edu:8090]
- [www.cs.brandeis.edu]
- [www.cs.auc.dk]
- [people.cs.aau.dk]
- [www.cs.ucr.edu]
- [www.cse.ohio-state.edu]
- [www.cs.ucr.edu]
- [web.cse.ohio-state.edu]
- CiteULike
- DBLP

### Other Repositories/Bibliography

Venue: | IN PODS |

Citations: | 786 - 19 self |

### Citations

2196 | Randomized algorithms
- Motwani, Raghavan
- 1995
(Show Context)
Citation Context ...the data stream model. 6.1 Random Samples Random samples can be used as a summary structure in many scenarios where a small sample is expected to capture the essential characteristics of the data set =-=[65]-=-. It is perhaps the easiest form of summarization in a DSMS and other synopses can be built from a sample itself. In fact, the join synopsis in the AQUA system [2] is nothing but a uniform sample of t... |

845 | The space complexity of approximating the frequency moments
- Alon, Matias, et al.
- 1999
(Show Context)
Citation Context ...h area in the algorithms community in recent years, as discussed in detail in Section 6. This work has led to some general techniques for data reduction and synopsis construction, including: sketches =-=[5, 35]-=-, random sampling [1, 2, 22], histograms [51, 70], and wavelets [17, 92]. Based on these summarization techniques, we have seen some work on approximate query answering. For example, 7srecent work [27... |

760 | Communication Complexity
- Kushilevitz, Nisan
- 1997
(Show Context)
Citation Context ...am model. Henzinger, Raghavan, and Rajagopalan [49] provided space lower bounds for concrete problems in the data stream model. These lower bounds are derived from results in communication complexity =-=[56]-=-. To understand the connection, observe that the memory used by any one-pass algorithm for a 21 ¨� ����� � ��� ,s� function , after seeing a prefix of the data stream, is lower bounded by the one-way ... |

584 | NiagaraCQ: A scalable continuous query system for Internet databases
- Chen, DeWitt, et al.
(Show Context)
Citation Context ...capability over network packet streams. The Tangram stream query processing system [68, 69] uses stream processing techniques to analyze large quantities of stored data. The OpenCQ [57] and NiagaraCQ =-=[24]-=- systems support continuous queries for monitoring persistent data sets spread over a wide-area network, e.g., web sites over the Internet. OpenCQ uses a query processing algorithm based on incrementa... |

533 | Fast sub-sequence matching in time-series databases”.
- Faloutsos, Ranganathan, et al.
- 1994
(Show Context)
Citation Context ...eed to refer to the sequencing aspect of streams, particularly in the form of sliding windows over streams. Related work in this category also includes work on temporal [80] and time-series databases =-=[31]-=-, where the ordering of tuples implied by time can be used in querying, indexing, and query optimization. The body of work on materialized views relates to continuous queries, since materialized views... |

411 | Eddies: Continuously adaptive query processing
- Avnur, Hellerstein
- 2000
(Show Context)
Citation Context ...vidual data items, and they do not directly support the continuous queries [84] that are typical of data stream applications. Furthermore, it is recognized that both approximation [13] and adaptivity =-=[8]-=- are key ingredients in executing queries and performing other processing (e.g., data analysis and mining) over rapid data streams, while traditional DBMS’s focus largely on the opposite goal of preci... |

402 | Mining high-speed data streams.
- Domingos, Hulten
- 2000
(Show Context)
Citation Context ...pourri of algorithmic results for data streams. Data Mining � ¨�� � � space. Space lower bounds for maintaining simple Decision trees are another form of synopsis used for prediction. Domingos et al. =-=[28, 29]-=- have studied the problem of maintaining decision trees over data streams. Clustering is yet another way to summarize data. Consider the � -median � formulation for clustering: Given data points in a ... |

381 | Online aggregation
- Hellerstein, Wang
- 1997
(Show Context)
Citation Context ...of the data stream rather than over the entire data stream. We obtain an approximate answer, but in some cases one can give confidence bounds on the degree of error introduced by the sampling process =-=[48]-=-. Unfortunately, for many situations (including most queries involving joins [20, 22]), sampling-based approaches cannot give reliable approximation guarantees. Designing sampling-based algorithms tha... |

364 | Efficient filtering of XML documents for selective dissemination of information - Altmel, Franklin - 2000 |

349 | External memory algorithms and data structures: dealing with massive data
- Vitter
(Show Context)
Citation Context ...ts Since data streams are potentially unbounded in size, the amount of storage required to compute an exact answer to a data stream query may also grow without bound. While external memory algorithms =-=[91]-=- for handling data sets larger than main memory have been studied, such algorithms are not well suited to data stream applications since they do not support continuous queries and are typically too sl... |

338 | Mining Time-changing Data Streams.
- Hulten, Spencer, et al.
- 2001
(Show Context)
Citation Context ...pourri of algorithmic results for data streams. Data Mining � ¨�� � � space. Space lower bounds for maintaining simple Decision trees are another form of synopsis used for prediction. Domingos et al. =-=[28, 29]-=- have studied the problem of maintaining decision trees over data streams. Clustering is yet another way to summarize data. Consider the � -median � formulation for clustering: Given data points in a ... |

335 | Random sampling with a reservoir
- VITTER
- 1985
(Show Context)
Citation Context ...s. If so, the larger the windows (stored in available memory), the better the approximation. Other examples include duplicate elimination using limited-size hash tables, and sampling using reservoirs =-=[90]-=-. The Aurora system [16] also proposes adaptivity and approximations, and uses load-shedding techniques based on application-specified measures of quality of service for graceful degradation in the fa... |

324 | Stable distributions, pseudorandom generators, embeddings and data stream computation.
- Indyk
- 2000
(Show Context)
Citation Context ...o the pertinent bit positions that are � set to . Feigenbaum et al. [33] showed how to construct such a family ��� of range-summable -valued hash functions with limited (four-way) independence. Indyk =-=[50]-=- provided a uniform framework to ��� compute the norm (for � ��� � � ) using the so-called -stable distributions, � ��� improving upon the previous paper [33] for ��¨ estimating the norm, in that it a... |

308 | Continuous queries over data streams.
- Babu, Widom
- 2001
(Show Context)
Citation Context ...etwork traffic management, which involves monitoring network packet header information across a set of routers to obtain information on traffic flow patterns. Based on a description of Babu and Widom =-=[10]-=-, we delve into this example in some detail to help illustrate that continuous queries arise naturally in real applications and that conventional DBMS technology does not adequately support such queri... |

295 | Clustering data streams
- Guha, Mishra, et al.
- 2000
(Show Context)
Citation Context ... such that the sum of � the errors over the data points is minimized. The “error” for each data point is the distance from that point to the nearest of the � chosen representative points. Guha et al. =-=[44]-=- presented a single-pass algorithm for maintaining approximate � -medians � (cluster ����������������� � � � centers) that uses ����� space � ����� for some using amortized time per data element, � to... |

281 | Fjording the stream: An architecture for queries over streaming sensor data.
- Madden, Franklin
- 2002
(Show Context)
Citation Context ...distributed clickstream analyses, e.g., to track heavily accessed web pages as part of their real-time performance monitoring. There are several emerging applications in the area of sensor monitoring =-=[16, 58]-=- where a large number ¤ of sensors are distributed in the physical world and generate streams of data that need to be combined, monitored, and analyzed. 3sThe application domain that we use for more d... |

270 | Continuously adaptive continuous queries over streams
- Madden, Shah, et al.
- 2002
(Show Context)
Citation Context ... deal with append-only input data, they may provide approximate rather than exact answers, and their processing strategy may adapt as characteristics of the data streams change. The Telegraph project =-=[8, 47, 58, 59]-=- shares some target applications and basic technical ideas with a DSMS. Telegraph uses an adaptive query engine (based on the Eddy concept [8]) to process queries efficiently in volatile and unpredict... |

269 | Maintaining stream statistics over sliding windows
- Datar, Gionis, et al.
(Show Context)
Citation Context ...e buffered in memory, there are also theoretical challenges in designing algorithms that can give approximate answers using only the available memory. Some recent results in this vein can be found in =-=[9, 26]-=-. While existing work on sequence and temporal databases has addressed many of the issues involved in time-sensitive queries (a class that includes sliding window queries) in a relational database con... |

263 | Multi-query optimization
- Sellis
- 1998
(Show Context)
Citation Context ...ciently find ¤ the plan that, with the best memory allocation, minimizes approximation? Should plans be modified when conditions change? Even further, since synopses could be shared among query plans =-=[75]-=-, how do we optimally consider ¤ a set of queries, which may be weighted by importance? In addition to memory management, we are faced the problem of scheduling multiple query plans in a DSMS. The sch... |

248 | Trajectory sampling for direct traffic observation.
- DUFFIELD, GROSSGLAUSER
- 2000
(Show Context)
Citation Context ...ventional DBMS technology does not adequately support such queries. Consider the network traffic management system of a large network, e.g., the backbone network of an Internet Service Provider (ISP) =-=[30]-=-. Such systems monitor a variety of continuous data streams that may be characterized as unpredictable and arriving at a high rate, including both packet traces and network performance measurements. T... |

245 | Wavelet-based histograms for selectivity estimation.
- Matias, Vitter, et al.
- 1998
(Show Context)
Citation Context ... the difference between the original signal and the dyadic interval with constant value. 20sRecent papers have demonstrated the efficacy of wavelets for different tasks such as selectivity estimation =-=[63]-=-, data cube approximation [93] and computing multi-dimensional aggregates [92]. This body of work indicates that estimates obtained from wavelets were more accurate than those obtained from histograms... |

226 | An adaptive query execution system for data integration
- Ives, Florescu, et al.
- 1999
(Show Context)
Citation Context ...klin [58] focus on query execution strategies over data streams generated by sensors, and Madden et al. [59] discuss adaptive processing techniques for multiple continuous queries. The Tukwila system =-=[53]-=- also supports adaptive query processing, in order to perform dynamic data integration over autonomous data sources. 6sThe Aurora project [16] is building a new data processing system targeted exclusi... |

216 | Approximate query processing using wavelets.
- Chakrabarti, Garofalakis, et al.
- 2000
(Show Context)
Citation Context ...imensional aggregates [92]. This body of work indicates that estimates obtained from wavelets were more accurate than those obtained from histograms with the same amount of memory. Chakrabarti et al. =-=[17]-=- propose the use of wavelets for general purpose approximate query processing and demonstrate how to compute joins, aggregations, and selections entirely in the wavelet coefficient domain. To extend t... |

215 | Surfing wavelets on streams: One-pass summaries for approximate aggregate queries.
- Gilbert, Kotidis, et al.
- 2001
(Show Context)
Citation Context ...roximate query answering. For example, 7srecent work [27, 37] develops histogram-based techniques to provide approximate answers for correlated aggregate queries over data streams, and Gilbert et al. =-=[40]-=- present a general approach for building smallspace summaries over data streams to provide approximate answers for many classes of aggregate queries. However, research problems abound in the area of a... |

215 |
Continuous queries over appendonly databases
- Terry, Goldberg, et al.
- 1992
(Show Context)
Citation Context ...anagement system (DBMS) and operate on it there. Traditional DBMS’s are not designed for rapid and continuous loading of individual data items, and they do not directly support the continuous queries =-=[84]-=- that are typical of data stream applications. Furthermore, it is recognized that both approximation [13] and adaptivity [8] are key ingredients in executing queries and performing other processing (e... |

206 | Space-efficient online computation of quantile summaries
- Greenwald, Khanna
- 2001
(Show Context)
Citation Context ...arallel database systems employ value range data partitioning that requires generation of quantiles or splitters that partition the data into approximately equal parts. Recently, Greenwald and Khanna =-=[41]-=- presented a single-pass deterministic algorithm for efficient computation of quantiles. Their algorithm needs sample of the values seen so far (quantiles), along with a range of possible ranks that t... |

198 | Approximate computation of multidimensional aggregates of sparse data using wavelets
- Vitter, Wang
- 1999
(Show Context)
Citation Context ...ail in Section 6. This work has led to some general techniques for data reduction and synopsis construction, including: sketches [5, 35], random sampling [1, 2, 22], histograms [51, 70], and wavelets =-=[17, 92]-=-. Based on these summarization techniques, we have seen some work on approximate query answering. For example, 7srecent work [27, 37] develops histogram-based techniques to provide approximate answers... |

186 | Processing complex aggregate queries over data streams.
- Dobra, Gehrke, et al.
- 2002
(Show Context)
Citation Context ...35], random sampling [1, 2, 22], histograms [51, 70], and wavelets [17, 92]. Based on these summarization techniques, we have seen some work on approximate query answering. For example, 7srecent work =-=[27, 37]-=- develops histogram-based techniques to provide approximate answers for correlated aggregate queries over data streams, and Gilbert et al. [40] present a general approach for building smallspace summa... |

184 | Updating derived relations: Detecting irrelevant and autonomously computable updates
- Blakeley, Coburn, et al.
- 1989
(Show Context)
Citation Context ...ous queries, since materialized views are effectively queries that need to be reevaluated or incrementally updated whenever the base data changes. Of particular importance is work on self-maintenance =-=[15, 45, 71]-=-—ensuring that enough data has been saved to maintain a view even when the base data is unavailable—and the related problem of data expiration [36]— determining when certain base data can be discarded... |

179 | Continual queries for internet scale event-driven information delivery
- Liu, Pu, et al.
- 1999
(Show Context)
Citation Context ...estricted querying capability over network packet streams. The Tangram stream query processing system [68, 69] uses stream processing techniques to analyze large quantities of stored data. The OpenCQ =-=[57]-=- and NiagaraCQ [24] systems support continuous queries for monitoring persistent data sets spread over a wide-area network, e.g., web sites over the Internet. OpenCQ uses a query processing algorithm ... |

173 | Computing on data streams.
- Henzinger, Raghavan, et al.
- 1998
(Show Context)
Citation Context ...gorithm maintains a data structure which can be used to compute the value � of the function on demand, and then the time required to process each such query also becomes of interest. Henzinger et al. =-=[49]-=- defined a similar model but also allowed the algorithm to make multiple passes over the stream data, making the number of passes itself a complexity measure. We will restrict our attention to algorit... |

171 | Optimal histograms with quality guarantees
- JAGADISH, POOSALA, et al.
- 1998
(Show Context)
Citation Context ...nt of such frequent items is related to Iceberg queries [32]. We give an overview of recent work on computing such histograms over data streams. V-Optimal Histograms over Data Streams Jagadish et al. =-=[54]-=- showed how to compute optimal V-Optimal Histograms for a given data set � using � dynamic programming. � ��� The algorithm uses � � � ¦�� space and � requires time, where is the size of the ¦ data se... |

169 | Join synopses for approximate query answering.
- Acharya, Gibbons, et al.
- 1999
(Show Context)
Citation Context ...community in recent years, as discussed in detail in Section 6. This work has led to some general techniques for data reduction and synopsis construction, including: sketches [5, 35], random sampling =-=[1, 2, 22]-=-, histograms [51, 70], and wavelets [17, 92]. Based on these summarization techniques, we have seen some work on approximate query answering. For example, 7srecent work [27, 37] develops histogram-bas... |

157 |
On random sampling over joins
- Chaudhuri, Motwani, et al.
- 1999
(Show Context)
Citation Context ...community in recent years, as discussed in detail in Section 6. This work has led to some general techniques for data reduction and synopsis construction, including: sketches [5, 35], random sampling =-=[1, 2, 22]-=-, histograms [51, 70], and wavelets [17, 92]. Based on these summarization techniques, we have seen some work on approximate query answering. For example, 7srecent work [27, 37] develops histogram-bas... |

155 | An efficient cost-driven index selection tool for microsoft sql server.
- Chaudhuri, Narasayya
- 1997
(Show Context)
Citation Context ...e good approximate answers to a broad range of possible future queries. The problem is similar in some ways to problems in physical database design such as selection of indexes and materialized views =-=[23]-=-. However, there is an important difference: in a traditional database system, when an index or view is lacking, it is possible to go to the underlying relation, albeit at an increased cost. In the da... |

155 |
On computing correlated aggregates over continual data streams.
- Gehrke, Korn, et al.
- 2001
(Show Context)
Citation Context ...35], random sampling [1, 2, 22], histograms [51, 70], and wavelets [17, 92]. Based on these summarization techniques, we have seen some work on approximate query answering. For example, 7srecent work =-=[27, 37]-=- develops histogram-based techniques to provide approximate answers for correlated aggregate queries over data streams, and Gilbert et al. [40] present a general approach for building smallspace summa... |

150 | Computing iceberg queries efficiently.
- Fang, Shivakumar, et al.
- 1998
(Show Context)
Citation Context ... counts of items that occur with frequency above ¤ a threshold, and approximate the other counts by an uniform distribution. Maintaining the count of such frequent items is related to Iceberg queries =-=[32]-=-. We give an overview of recent work on computing such histograms over data streams. V-Optimal Histograms over Data Streams Jagadish et al. [54] showed how to compute optimal V-Optimal Histograms for ... |

150 | A taxonomy of time in databases.
- Snodgrass, Ahn
- 1985
(Show Context)
Citation Context ...lications, continuous queries need to refer to the sequencing aspect of streams, particularly in the form of sliding windows over streams. Related work in this category also includes work on temporal =-=[80]-=- and time-series databases [31], where the ordering of tuples implied by time can be used in querying, indexing, and query optimization. The body of work on materialized views relates to continuous qu... |

149 | Reductions in streaming algorithms, with an application to counting triangles in graphs.
- Bar-Yossef, Kumar, et al.
- 2002
(Show Context)
Citation Context ... unary representation of the vector. It has bit � positions (elements), where is the dimension of the � underlying vector. A in the ��� unary 2 As discussed in Section 6.7, recently Bar-Yossef et al. =-=[12]-=- and Gibbons and Tirthapura [38] have devised algorithms which, under certain conditions, provide arbitrarily small approximation factors without recourse to perfect hash functions. 3 Hash functions w... |

143 | Data-streams and histograms
- Guha, Koudas, et al.
- 2001
(Show Context)
Citation Context ...rogramming. � ��� The algorithm uses � � � ¦�� space and � requires time, where is the size of the ¦ data set and is the number of buckets. This is prohibitive for data streams. Guha, Koudas and Shim =-=[43]-=- adapted this algorithm to sorted data streams. Their algorithm constructs an � arbitrarily-close V-Optimal Histogram (i.e., with error arbitrarily close to that of the � ¦ � ����� ��� optimal histogr... |

139 | Rate-based query optimization for streaming information sources.
- Viglas, Naughton
- 2002
(Show Context)
Citation Context ...ies for efficient evaluation. Within the NiagaraCQ project, Shanmugasundaram et al. [79] discuss the problem of supporting blocking operators in query plans over data streams, and Viglas and Naughton =-=[89]-=- propose rate-based optimization for queries over data streams, a new optimization methodology that is based on stream-arrival and data-processing rates. The Chronicle data model [55] introduced appen... |

138 |
Selection and sorting with limited storage.
- Munro, Paterson
- 1980
(Show Context)
Citation Context ...o circumvent the negative results. Saks and Sun [73] provide space lower bounds for distance approximation between two vectors under � � the norm, for � , in the data stream model. Munro and Paterson =-=[66]-=- showed that any algorithm that ��� computes quantiles exactly in � passes requires � � � ��� for estimatstatistics like count, sum, min/max, and number of distinct values under the sliding windows mo... |

138 | Xjoin: A reactively-scheduled pipelined join operator.
- Urhan, Franklin
- 2000
(Show Context)
Citation Context ... be able to handle the average stream rate quite comfortably by buffering the streams when their rate is high and catching up during the slow periods. This is the approach used in the XJoin algorithm =-=[88]-=-. Sampling In the second scenario,computeAnswer may be fast, but theupdate operation is slow — it takes longer than the average inter-arrival time of the data elements. It is futile to attempt to make... |

132 |
Widom: A First Course in Database Systems
- Ullman, J
- 1997
(Show Context)
Citation Context ...nal example, §�� , is a continuous query for monitoring the source-destination pairs in the top 5 percent in terms of backbone traffic. For ease of exposition, we employ the WITH construct from SQL99 =-=[87]-=-. §�� ¦ ��¨ �©� � � ��� ¨ � ��������� : WITH Load AS (SELECT FROM src, dest, sum(len) AS traffic GROUP BY src, dest) SELECT src, dest, traffic FROM Load AS WHERE (SELECT count(*) FROM Load AS WHERE .t... |

131 | Making views self-maintainable for data warehousing
- Quass, Gupta, et al.
- 1996
(Show Context)
Citation Context ...ous queries, since materialized views are effectively queries that need to be reevaluated or incrementally updated whenever the base data changes. Of particular importance is work on self-maintenance =-=[15, 45, 71]-=-—ensuring that enough data has been saved to maintain a view even when the base data is unavailable—and the related problem of data expiration [36]— determining when certain base data can be discarded... |

128 | Sampling-based estimation of the number of distinct values of an attribute.
- Haas, Naughton, et al.
- 1995
(Show Context)
Citation Context ...fficiently estimating the number of distinct values (� � ) has received particular attention in the database literature, particularly in the context of using single pass or random sampling algorithms =-=[18, 46]-=-. A sketching technique to compute ��� was presented earlier by Flajolet and Martin [35]; however, this had the drawback of requiring explicit families of hash functions with very strong independence ... |

126 | Random sampling for histogram construction: How much is enough?
- Chaudhuri, Motwani, et al.
- 1998
(Show Context)
Citation Context ...as the error for the combined quantile does not exceed ��� . This algorithm improves upon the previous set of results by Manku, Rajagopalan, and Lindsay [61, 62] and Chaudhuri, Motwani, and Narasayya =-=[21]-=-. � ¨� ����� � ����� space and guarantees a precision of ��� . They employ a novel data structure that maintains a End-Biased Histograms and Iceberg Queries Many applications maintain simple aggregate... |

124 | Approximate medians and other quantiles in one pass and with limited memory
- Manku, Rajagopalan, et al.
- 1998
(Show Context)
Citation Context ... merge quantiles with “similar” errors so long as the error for the combined quantile does not exceed ��� . This algorithm improves upon the previous set of results by Manku, Rajagopalan, and Lindsay =-=[61, 62]-=- and Chaudhuri, Motwani, and Narasayya [21]. � ¨� ����� � ����� space and guarantees a precision of ��� . They employ a novel data structure that maintains a End-Biased Histograms and Iceberg Queries ... |

123 | Tracking join and self-join sizes in limited storage.
- Alon, Gibbons, et al.
- 1999
(Show Context)
Citation Context ...uses ������� ����� only ��� space and provides arbitrarily small approximation factors. This technique has found many � ��� applications in the database literature, including join size estimation ��¨ =-=[4]-=-, estimating norm of vectors [33], and processing complex aggregate queries over multiple streams [27, 37]. It remains an open problem to come up with techniques to maintain correlated aggregates [37]... |

122 | Sampling from a moving window over streaming data.
- Babcock, Datar, et al.
- 2002
(Show Context)
Citation Context ...ion of the sketching-based algorithms to the sliding windows model. They also provide space lower bounds for various problems in the sliding windows model. In another paper, Babock, Datar and Motwani =-=[9]-=- adapt the reservoir sampling algorithm to the sliding windows case. In their paper for computing Iceberg queries over data streams, Manku and Motwani [60] also present techniques to adapt their algor... |

110 | Estimating simple functions on the union of data streams.
- Gibbons, Tirthapura
- 2001
(Show Context)
Citation Context ...tor. It has bit � positions (elements), where is the dimension of the � underlying vector. A in the ��� unary 2 As discussed in Section 6.7, recently Bar-Yossef et al. [12] and Gibbons and Tirthapura =-=[38]-=- have devised algorithms which, under certain conditions, provide arbitrarily small approximation factors without recourse to perfect hash functions. 3 Hash functions with four-way independence can be... |

107 |
small-space algorithms for approximate histogram maintenance. STOC
- Fast
- 2002
(Show Context)
Citation Context ...e V-Optimal Histogram (i.e., with error arbitrarily close to that of the � ¦ � ����� ��� optimal histogram), using � space and ����� ��� time per data element. � ¦ � In a recent paper, Gilbert et al. =-=[39]-=-, removed the restriction that the data stream be sorted, providing algorithms based on the sketching technique described ��� earlier for computing norms. The idea is to view each data element as an u... |

106 |
An approximate L1-difference algorithm for massive data streams
- Feigenbaum, Kannan, et al.
(Show Context)
Citation Context ...e and provides arbitrarily small approximation factors. This technique has found many � ��� applications in the database literature, including join size estimation ��¨ [4], estimating norm of vectors =-=[33]-=-, and processing complex aggregate queries over multiple streams [27, 37]. It remains an open problem to come up with techniques to maintain correlated aggregates [37] that have provable guarantees. T... |

106 | Data cube approximation and histograms via wavelets.
- Vitter, Wang, et al.
- 1998
(Show Context)
Citation Context ...iginal signal and the dyadic interval with constant value. 20sRecent papers have demonstrated the efficacy of wavelets for different tasks such as selectivity estimation [63], data cube approximation =-=[93]-=- and computing multi-dimensional aggregates [92]. This body of work indicates that estimates obtained from wavelets were more accurate than those obtained from histograms with the same amount of memor... |

105 | The design and implementation of a sequence database system.
- Seshadri, Livny, et al.
- 1996
(Show Context)
Citation Context ...aintained incrementally without storing any of the chronicles. An algebra and a declarative query language for querying ordered relations (sequences) was proposed by Seshadri, Livny, and Ramakrishnan =-=[76, 77, 78]-=-. In many applications, continuous queries need to refer to the sequencing aspect of streams, particularly in the form of sliding windows over streams. Related work in this category also includes work... |

101 | Random sampling techniques for space efficient online computation of order statistics of large datasets.
- Manku, Rajagopalan, et al.
- 1999
(Show Context)
Citation Context ... merge quantiles with “similar” errors so long as the error for the combined quantile does not exceed ��� . This algorithm improves upon the previous set of results by Manku, Rajagopalan, and Lindsay =-=[61, 62]-=- and Chaudhuri, Motwani, and Narasayya [21]. � ¨� ����� � ����� space and guarantees a precision of ��� . They employ a novel data structure that maintains a End-Biased Histograms and Iceberg Queries ... |

99 | Characterizing memory requirements for queries over continuous data streams.
- Arasu, Babcock, et al.
- 2002
(Show Context)
Citation Context ... algorithm will not be able to keep pace with the data stream. For this reason, we are interested in algorithms that are able to confine themselves to main memory without accessing disk. Arasu et al. =-=[7]-=- took some initial steps towards distinguishing between queries that can be answered exactly using a given bounded amount of memory and queries that must be approximated unless disk accesses are allow... |

98 | Congressional samples for approximate answering of group-by queries.
- Acharya, Gibbons, et al.
- 2000
(Show Context)
Citation Context ...community in recent years, as discussed in detail in Section 6. This work has led to some general techniques for data reduction and synopsis construction, including: sketches [5, 35], random sampling =-=[1, 2, 22]-=-, histograms [51, 70], and wavelets [17, 92]. Based on these summarization techniques, we have seen some work on approximate query answering. For example, 7srecent work [27, 37] develops histogram-bas... |

97 | Adaptive query processing: Technology in evolution. - Hellerstein, Franklin - 2000 |

96 | Histogram-based approximation of set-valued query-answers.
- Ioannidis, Poosala
- 1999
(Show Context)
Citation Context ...rs, as discussed in detail in Section 6. This work has led to some general techniques for data reduction and synopsis construction, including: sketches [5, 35], random sampling [1, 2, 22], histograms =-=[51, 70]-=-, and wavelets [17, 92]. Based on these summarization techniques, we have seen some work on approximate query answering. For example, 7srecent work [27, 37] develops histogram-based techniques to prov... |

92 | Dynamic maintenance of wavelet-based histograms.
- Matias, Vitter, et al.
- 2000
(Show Context)
Citation Context ...icient domain. To extend this body of work to data streams, it becomes important to devise techniques for computing wavelets in the streaming model. In a related development, Matias, Vitter, and Wang =-=[64]-=- show how to dynamically maintain the top wavelet coefficients efficiently as the underlying data distribution is updated. There has been recent work in computing the top wavelet coefficients in the d... |

88 | Data integration using self-maintainable views.
- Gupta, Jagadish, et al.
- 1996
(Show Context)
Citation Context ...ous queries, since materialized views are effectively queries that need to be reevaluated or incrementally updated whenever the base data changes. Of particular importance is work on self-maintenance =-=[15, 45, 71]-=-—ensuring that enough data has been saved to maintain a view even when the base data is unavailable—and the related problem of data expiration [36]— determining when certain base data can be discarded... |

86 |
Towards estimation error guarantees for distinct values.
- Charikar, Chaudhuri, et al.
- 2000
(Show Context)
Citation Context ...fficiently estimating the number of distinct values (� � ) has received particular attention in the database literature, particularly in the context of using single pass or random sampling algorithms =-=[18, 46]-=-. A sketching technique to compute ��� was presented earlier by Flajolet and Martin [35]; however, this had the drawback of requiring explicit families of hash functions with very strong independence ... |

83 |
Probabilistic counting.
- Flajolet, Martin
- 1983
(Show Context)
Citation Context ...h area in the algorithms community in recent years, as discussed in detail in Section 6. This work has led to some general techniques for data reduction and synopsis construction, including: sketches =-=[5, 35]-=-, random sampling [1, 2, 22], histograms [51, 70], and wavelets [17, 92]. Based on these summarization techniques, we have seen some work on approximate query answering. For example, 7srecent work [27... |

70 |
Alert: An architecture for transforming a passive DBMS into an active DBMS.
- Schreier, Pirahesh, et al.
- 1991
(Show Context)
Citation Context ... email and bulletin board messages. A restricted subset of SQL was used as the query language in order to provide guarantees about efficient evaluation and append-only query results. The Alert system =-=[74]-=- provides a mechanism for implementing event-condition-action style triggers in a conventional SQL database, by using continuous queries defined over special append-only active tables. The XFilter con... |

69 | Monitoring XML data on the web.
- Nguyen, Abiteboul, et al.
- 2001
(Show Context)
Citation Context ...y active tables. The XFilter content-based filtering system [6] performs efficient filtering of XML documents based on user profiles expressed as continuous queries in the XPath language [94]. Xyleme =-=[67]-=- is a similar content-based filtering system that enables very high throughput with a restricted query language. The Tribeca stream database manager [83] provides restricted querying capability over n... |

66 | Hancock: a language for extracting signatures from data streams.
- Cortes, Fisher, et al.
- 2000
(Show Context)
Citation Context ...from the query processing architecture, user and application interfaces need to be reinvestigated in a DSMS given the dynamic environment in which it operates. Systems such as Aurora [16] and Hancock =-=[25]-=- completely eliminate declarative querying and provide only procedural mechanisms for querying. In contrast, we will provide a declarative language for continuous queries, similar to SQL but extended ... |

66 | Expiring data in a warehouse
- Garcia-Molina, Labio, et al.
- 1998
(Show Context)
Citation Context ...cular importance is work on self-maintenance [15, 45, 71]—ensuring that enough data has been saved to maintain a view even when the base data is unavailable—and the related problem of data expiration =-=[36]-=-— determining when certain base data can be discarded without compromising the ability to maintain a view. Nevertheless, several differences exist between materialized views and continuous queries in ... |

66 | View maintenance issues for the Chronicle data model.
- Jagadish, Mumick, et al.
- 1995
(Show Context)
Citation Context ...las and Naughton [89] propose rate-based optimization for queries over data streams, a new optimization methodology that is based on stream-arrival and data-processing rates. The Chronicle data model =-=[55]-=- introduced append-only ordered sequences of tuples (chronicles), a form of data streams. They defined a restricted view definition language and algebra (chronicle algebra) that operates over chronicl... |

66 | Seq: A model for sequence databases.
- Seshadri, Livny, et al.
- 1995
(Show Context)
Citation Context ...aintained incrementally without storing any of the chronicles. An algebra and a declarative query language for querying ordered relations (sequences) was proposed by Seshadri, Livny, and Ramakrishnan =-=[76, 77, 78]-=-. In many applications, continuous queries need to refer to the sequencing aspect of streams, particularly in the form of sliding windows over streams. Related work in this category also includes work... |

65 |
Space lower bounds for distance approximation in the data stream model.
- Saks, Sun
- 2002
(Show Context)
Citation Context ...eminder that while it may be possible to prove strong space lower bounds for stream computations, considerations from applications sometimes enable us to circumvent the negative results. Saks and Sun =-=[73]-=- provide space lower bounds for distance approximation between two vectors under � � the norm, for � , in the data stream model. Munro and Paterson [66] showed that any algorithm that ��� computes qua... |

65 | Sequence query processing.
- Seshadri, Livny, et al.
- 1994
(Show Context)
Citation Context ...aintained incrementally without storing any of the chronicles. An algebra and a declarative query language for querying ordered relations (sequences) was proposed by Seshadri, Livny, and Ramakrishnan =-=[76, 77, 78]-=-. In many applications, continuous queries need to refer to the sequencing aspect of streams, particularly in the form of sliding windows over streams. Related work in this category also includes work... |

58 | Sampling algorithms: Lower bounds and applications.
- Bar-Yossef, Kumar, et al.
- 2001
(Show Context)
Citation Context ...mber of distinct values under the sliding windows model can be found in the work of Datar et al. [26]. A general lower bound technique for sampling-based algorithms was presented by Bar-Yossef et al. =-=[11]-=-. It is useful for deriving space lower bounds for data stream algorithms that resort to oblivious sampling. It remains an interesting open problem to obtain similar general lower bound techniques for... |

58 | A robust, optimization-based approach for approximate answering of aggregate queries.
- Chaudhuri, Das, et al.
- 2001
(Show Context)
Citation Context ...le of the base relation. Recently stratified sampling has been proposed as an alternative to uniform sampling to reduce error due to the variance in data and also to reduce error for group-by queries =-=[1, 19]-=-. To actually compute a random sample over a data stream is relatively easy. The reservoir sampling algorithm of Vitter [90] makes one pass over the data set and is well suited for the data stream mod... |

57 | Approximating a data stream for querying and estimation: Algorithms and performance evaluation.
- Guha, Koudas
- 2002
(Show Context)
Citation Context ...ing windows case. In their paper for computing Iceberg queries over data streams, Manku and Motwani [60] also present techniques to adapt their algorithms to the sliding window model. Guha and Koudas =-=[42]-=- have adapted their earlier paper [43], to provide a technique for maintaining V-Optimal Histograms over sorted data streams for the sliding window model; however, they require the buffering of all th... |

42 |
Tribeca: A stream database manager for network traffic analysis.
- Sullivan
- 1996
(Show Context)
Citation Context ...ous queries in the XPath language [94]. Xyleme [67] is a similar content-based filtering system that enables very high throughput with a restricted query language. The Tribeca stream database manager =-=[83]-=- provides restricted querying capability over network packet streams. The Tangram stream query processing system [68, 69] uses stream processing techniques to analyze large quantities of stored data. ... |

41 | Online dynamic reordering for interactive data processing.
- Raman, Raman, et al.
- 1999
(Show Context)
Citation Context ...handling blocking operators as interior nodes in a query tree is to replace them with non-blocking analogs that perform approximately the same task. An example of this approach is the juggle operator =-=[72]-=-, which is a non-blocking version of sort: it aims to locally reorder a data stream so that tuples that come earlier in the desired sort order are produced before tuples that come later in the sort or... |

40 | Fast approximate answers to aggregate queries on a data cube.
- Poosala, Ganti
- 1999
(Show Context)
Citation Context ...rs, as discussed in detail in Section 6. This work has led to some general techniques for data reduction and synopsis construction, including: sketches [5, 35], random sampling [1, 2, 22], histograms =-=[51, 70]-=-, and wavelets [17, 92]. Based on these summarization techniques, we have seen some work on approximate query answering. For example, 7srecent work [27, 37] develops histogram-based techniques to prov... |

35 | Counting inversions in a data stream.
- Ajtai, Jayram, et al.
- 2001
(Show Context)
Citation Context ...tedness Measuring the “sortedness” of a data stream could be useful in some applications; for example, it is useful in determining the choice of a sort algorithm for the underlying data. Ajtai et al. =-=[3]-=- have studied the problem of estimating the number of inversions (a measure of sortedness) in a permutation to � within a factor ������� , ����������� where the permutation is presented in a data stre... |

34 | Architecting a network query engine for producing partial results.
- Shanmugasundaram, Tufte, et al.
- 2000
(Show Context)
Citation Context ...nce, while NiagaraCQ addresses scalability in number of queries by proposing techniques for grouping continuous queries for efficient evaluation. Within the NiagaraCQ project, Shanmugasundaram et al. =-=[79]-=- discuss the problem of supporting blocking operators in query plans over data streams, and Viglas and Naughton [89] propose rate-based optimization for queries over data streams, a new optimization m... |

31 | Testing and spot checking of data streams.
- Feigenbaum, Kannan, et al.
- 2000
(Show Context)
Citation Context ...seful for analyzing large graphical structures such as the web graph. Property Testing �� ��� � ��� � ��� ��� ����� � � � � ��� ��¨�������������� ��������� ��� � � � ��������� � � � Feigenbaum et al. =-=[34]-=- introduced the concept of streaming property testers and streaming spot checkers. These are programs that make one pass over the data and using small space verify if the data satisfies certain proper... |

22 |
et al. Adaptive query processing: Technology in evolution
- Hellerstein, Franklin
- 2000
(Show Context)
Citation Context ... deal with append-only input data, they may provide approximate rather than exact answers, and their processing strategy may adapt as characteristics of the data streams change. The Telegraph project =-=[8, 47, 58, 59]-=- shares some target applications and basic technical ideas with a DSMS. Telegraph uses an adaptive query engine (based on the Eddy concept [8]) to process queries efficiently in volatile and unpredict... |

16 |
et al. The New Jersey data reduction report
- Barbara
- 1997
(Show Context)
Citation Context ...uous loading of individual data items, and they do not directly support the continuous queries [84] that are typical of data stream applications. Furthermore, it is recognized that both approximation =-=[13]-=- and adaptivity [8] are key ingredients in executing queries and performing other processing (e.g., data analysis and mining) over rapid data streams, while traditional DBMS’s focus largely on the opp... |

15 | Fast, small-space algorithms for approximate histogram maintenance.
- Gilbert, Guha, et al.
- 2002
(Show Context)
Citation Context ...a Streams Jagadish et al. [54] showed how to compute optimal V-Optimal Histograms for a given data set using dynamic programming. The algorithm uses O(N) space and requires O(N2B) time, where N is the size of the data set and B is the number of buckets. This is prohibitive for data streams. Guha, Koudas and Shim [43] adapted this algorithm to sorted data streams. Their algorithm constructs an arbitrarily-close V-Optimal Histogram (i.e., with error arbitrarily close to that of the optimal histogram), using O(B2 logN) space and O(B2 logN) time per data element. In a recent paper, Gilbert et al. [39], removed the restriction that the data stream be sorted, providing algorithms based on the sketching technique described earlier for computingL2 norms. The idea is to view each data element as an update to an underlying vector of length N that we are trying to approximate using the best B-bucket histogram. The time to process a data element, the time to reconstruct the histogram, and the size of the sketch are each bounded by poly(B; logN; 1=), where is the relative error we are willing to tolerate. Their algorithm proceeds by first constructing a robust approximation to the underlying “si... |

11 |
Enhancing relational operators for querying over punctuated data streams.
- Tucker, Maier, et al.
- 2002
(Show Context)
Citation Context ... problem is how to extend this work to other types of blocking operators, as well as to quantify the error that is introduced by approximating blocking operators with non-blocking ones. Tucker et al. =-=[86]-=- have proposed a different approach to blocking operators. They suggest augmenting data streams with assertions about what can and cannot appear in the remainder of the data stream. These assertions, ... |

10 |
Approximate frequency counts over streaming data.
- Manku, Motwani
- 2002
(Show Context)
Citation Context ...ve an efficient algorithm to compute Iceberg queries over disk-resident data. Their algorithm requires multiple passes which is not suited to the streaming model. In a recent paper, Manku and Motwani =-=[60]-=- presented randomized and deterministic algorithms for frequency counting and iceberg queries over data streams. The randomized algorithm uses adaptive sampling and the main idea is that any item whic... |

10 |
The Tangram stream query processing system
- Parker, Muntz, et al.
- 1989
(Show Context)
Citation Context ...h throughput with a restricted query language. The Tribeca stream database manager [83] provides restricted querying capability over network packet streams. The Tangram stream query processing system =-=[68, 69]-=- uses stream processing techniques to analyze large quantities of stored data. The OpenCQ [57] and NiagaraCQ [24] systems support continuous queries for monitoring persistent data sets spread over a w... |

8 | Mining time-changing data streams.
- Domingos, Hulten, et al.
- 2001
(Show Context)
Citation Context ...s context. 6.7 Miscellaneous In this section, we give a potpourri of algorithmic results for data streams. Data Mining Decision trees are another form of synopsis used for prediction. Domingos et al. =-=[28, 29]-=- have studied the problem of maintaining decision trees over data streams. Clustering is yet another way to summarize data. Consider the � -median formulation for clustering: Given � data points in a ... |

6 |
Monitoring streams – a new class of dbms applications.
- Carney, Cetinternel, et al.
- 2002
(Show Context)
Citation Context ...distributed clickstream analyses, e.g., to track heavily accessed web pages as part of their real-time performance monitoring. There are several emerging applications in the area of sensor monitoring =-=[16, 58]-=- where a large number ¤ of sensors are distributed in the physical world and generate streams of data that need to be combined, monitored, and analyzed. 3sThe application domain that we use for more d... |

5 | On sampling and relational operators.
- Chaudhuri, Motwani
- 1999
(Show Context)
Citation Context ...ate answer, but in some cases one can give confidence bounds on the degree of error introduced by the sampling process [48]. Unfortunately, for many situations (including most queries involving joins =-=[20, 22]-=-), sampling-based approaches cannot give reliable approximation guarantees. Designing sampling-based algorithms that can produce approximate answers that are provably close to the exact answer is an i... |

4 |
SVP: A model capturing sets, lists, streams, and parallelism
- Parker, Simon, et al.
- 1992
(Show Context)
Citation Context ...h throughput with a restricted query language. The Tribeca stream database manager [83] provides restricted querying capability over network packet streams. The Tangram stream query processing system =-=[68, 69]-=- uses stream processing techniques to analyze large quantities of stored data. The OpenCQ [57] and NiagaraCQ [24] systems support continuous queries for monitoring persistent data sets spread over a w... |

4 | A First Course in Database Systems. Prentice-Hall International - ULLMAN, WIDOM - 1997 |

2 |
Analytic functions in oracle 8i. Available at http://www-db.stanford.edu/dbseminar/Archive /SpringY2000/speakers/agupta/paper.pdf
- Bellamkonda, Borzkaya, et al.
(Show Context)
Citation Context ...nd the expressiveness of the query language for sliding windows. It is possible to formulate sliding window queries in SQL by referring to timestamps explicitly, but it is often quite awkward. SQL-99 =-=[14, 81]-=- introduces analytical functions that partially address the shortcomings of SQL for expressing sliding window queries by allowing the specification of moving averages and other aggregation operations ... |

2 |
On-line analytical processing (sql/olap). Available from http://www.ansi.org/, document #ISO/IEC
- Standard
(Show Context)
Citation Context ...nd the expressiveness of the query language for sliding windows. It is possible to formulate sliding window queries in SQL by referring to timestamps explicitly, but it is often quite awkward. SQL-99 =-=[14, 81]-=- introduces analytical functions that partially address the shortcomings of SQL for expressing sliding window queries by allowing the specification of moving averages and other aggregation operations ... |

1 | The New Jersey data reduction report. - B - 1997 |