Consider the problem of monitoring tens of thousands of time series data streams in an online fashion and making decisions based on them. In addition to single stream statistics such as average and standard deviation, we also want to find high correlations among all pairs of streams. A stock market trader might use such a tool to spot arbitrage opportunities. This paper proposes efficient methods for solving this problem based on Discrete Fourier Transforms and a three level time in-terval hierarchy. Extensive experiments on synthetic data and real world financial trading data show that our algorithm beats the direct
|
311
|
Efcient similarity search in sequence databases
– Agrawal, Faloutsos, et al.
- 1993
|
|
309
|
et al., Fast subsequence matching in timeseries databases
– Faloutsos
- 1994
|
|
172
|
Clustering Data Streams
– Guha, Motwani, et al.
- 2000
|
|
161
|
Mining high-speed data streams
– Domingos, Hulten
- 2000
|
|
158
|
Continuous queries over data streams
– Babu, Widom
|
|
140
|
Maintaining stream statistics over sliding windows
– Datar, Gionis, et al.
- 2002
|
|
135
|
Surfing wavelets on streams: One-pass summaries for approximate aggregate queries
– Gilbert, Kotidis, et al.
- 2001
|
|
130
|
Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases
– Keogh, Chakrabarti, et al.
- 2001
|
|
127
|
Efficient time-series matching by wavelets
– Chan, Fu
- 1999
|
|
109
|
On similarity-based queries for time-series data
– Rafiei
- 1999
|
|
108
|
On computing correlated aggregates over continual data streams
– Gehrke, Korn, et al.
- 2001
|
|
105
|
Space-efficient online computation of quantile summaries
– Greenwald, Khanna
- 2001
|
|
94
|
Fast time sequence indexing for arbitrary Lp norms. In: A. El Abbadi et al. (eds
– Yi, Faloutsos
- 2000
|
|
84
|
Approximate medians and other quantiles in one pass and with limited memory
– Manku, Rajagopalan, et al.
- 1998
|
|
79
|
On similarity queries for time-series data: constraint specification and implementation
– Goldin, Kanellakis
- 1995
|
|
74
|
Efficiently supporting ad hoc queries in large datasets of time sequences
– Korn, Jagadish, et al.
- 1997
|
|
73
|
A probabilistic approach to fast pattern matching in time series databases
– Keogh, Smyth
- 1997
|
|
72
|
Optimal expected-time algorithms for closest-point problems
– Bentley, Weide, et al.
- 1980
|
|
68
|
Online association rule mining
– Hidber
- 1999
|
|
66
|
Random sampling techniques for space efficient online computation of order statistics of large datasets
– Manku, Rajagopalan, et al.
- 1999
|
|
44
|
F (2000) Hancock: a language for extracting signatures from data streams
– Cortes, Fisher, et al.
|
|
39
|
Similarity search over time-series data using wavelets
– Popivanov, Miller
- 2002
|
|
39
|
Online data mining for co-evolving time sequences
– Yi, Sidiropoulos, et al.
- 2000
|
|
38
|
Efficient retrieval of similar time sequences using DFT
– Rafiei, Mendelzon
- 1998
|
|
37
|
A comparison of dft and dwt based similarity search in time-series databases
– Wu, Agrawal, et al.
- 2000
|
|
16
|
HierarchyScan: A Hierarchical Similarity Search Algorithm for Databases of Long Sequences
– Li, Yu, et al.
- 1996
|
|
10
|
E#ciently supporting ad hoc queries in large datasets of time sequences
– Korn, Jagadish, et al.
- 1997
|
|
9
|
Random sampling techniques for space e#cientonline computation of order statistics of large datasets
– Manku, Rajagopalan, et al.
- 1999
|
|
5
|
Demon: Data evolution and monitoring
– Ganti, Gehrke, et al.
- 2000
|
|
4
|
Managing financial time series data: Object-relational and object database systems
– Molesky, Caruso
- 1998
|
|
4
|
Similarity-based queries for time series data
– Ra, Mendelzon
- 1997
|
|
2
|
Space-efficient online computation of quantfie summaries
– Greenwald, Khanna
- 2001
|
|
1
|
Our method for computing correlation coecients can be applied to the interpolation of values missing from one data stream. We exploit the fact that high-valued correlation coecients imply high-valued regression coecients. We can choose streams that are hi
– com
|
|
1
|
Managing time series data: Object-relational and object database systems
– Molesky, Caruso
|