MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time (2002) [90 citations — 7 self]

Download:
Download as a PDF | Download as a PS
by Yunyue Zhu Dennis Shasha
In VLDB
http://www.cs.nyu.edu/csweb/Research/TechReports/TR2002-827/TR2002-827.ps.gz
Add To MetaCart

Abstract:

Consider the problem of monitoring tens of thousands of time series data streams in an online fashion and making decisions based on them. In addition to single stream statistics such as average and standard deviation, we also want to find high correlations among all pairs of streams. A stock market trader might use such a tool to spot arbitrage opportunities. This paper proposes efficient methods for solving this problem based on Discrete Fourier Transforms and a three level time in-terval hierarchy. Extensive experiments on synthetic data and real world financial trading data show that our algorithm beats the direct

Citations

311 Efcient similarity search in sequence databases – Agrawal, Faloutsos, et al. - 1993
309 et al., Fast subsequence matching in timeseries databases – Faloutsos - 1994
172 Clustering Data Streams – Guha, Motwani, et al. - 2000
161 Mining high-speed data streams – Domingos, Hulten - 2000
158 Continuous queries over data streams – Babu, Widom
140 Maintaining stream statistics over sliding windows – Datar, Gionis, et al. - 2002
135 Surfing wavelets on streams: One-pass summaries for approximate aggregate queries – Gilbert, Kotidis, et al. - 2001
130 Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases – Keogh, Chakrabarti, et al. - 2001
127 Efficient time-series matching by wavelets – Chan, Fu - 1999
109 On similarity-based queries for time-series data – Rafiei - 1999
108 On computing correlated aggregates over continual data streams – Gehrke, Korn, et al. - 2001
105 Space-efficient online computation of quantile summaries – Greenwald, Khanna - 2001
94 Fast time sequence indexing for arbitrary Lp norms. In: A. El Abbadi et al. (eds – Yi, Faloutsos - 2000
84 Approximate medians and other quantiles in one pass and with limited memory – Manku, Rajagopalan, et al. - 1998
79 On similarity queries for time-series data: constraint specification and implementation – Goldin, Kanellakis - 1995
74 Efficiently supporting ad hoc queries in large datasets of time sequences – Korn, Jagadish, et al. - 1997
73 A probabilistic approach to fast pattern matching in time series databases – Keogh, Smyth - 1997
72 Optimal expected-time algorithms for closest-point problems – Bentley, Weide, et al. - 1980
68 Online association rule mining – Hidber - 1999
66 Random sampling techniques for space efficient online computation of order statistics of large datasets – Manku, Rajagopalan, et al. - 1999
44 F (2000) Hancock: a language for extracting signatures from data streams – Cortes, Fisher, et al.
39 Similarity search over time-series data using wavelets – Popivanov, Miller - 2002
39 Online data mining for co-evolving time sequences – Yi, Sidiropoulos, et al. - 2000
38 Efficient retrieval of similar time sequences using DFT – Rafiei, Mendelzon - 1998
37 A comparison of dft and dwt based similarity search in time-series databases – Wu, Agrawal, et al. - 2000
16 HierarchyScan: A Hierarchical Similarity Search Algorithm for Databases of Long Sequences – Li, Yu, et al. - 1996
10 E#ciently supporting ad hoc queries in large datasets of time sequences – Korn, Jagadish, et al. - 1997
9 Random sampling techniques for space e#cientonline computation of order statistics of large datasets – Manku, Rajagopalan, et al. - 1999
5 Demon: Data evolution and monitoring – Ganti, Gehrke, et al. - 2000
4 Managing financial time series data: Object-relational and object database systems – Molesky, Caruso - 1998
4 Similarity-based queries for time series data – Ra, Mendelzon - 1997
2 Space-efficient online computation of quantfie summaries – Greenwald, Khanna - 2001
1 Our method for computing correlation coecients can be applied to the interpolation of values missing from one data stream. We exploit the fact that high-valued correlation coecients imply high-valued regression coecients. We can choose streams that are hi – com
1 Managing time series data: Object-relational and object database systems – Molesky, Caruso