Download:
by Jessica Lin, Michail Vlachos, Eamonn Keogh, Dimitrios Gunopulos
In EDBT
http://www.cs.ucr.edu/~mvlachos/pubs/edbt04.pdf
Add To MetaCart
Abstract:
Abstract. We present a novel anytime version of partitional clustering algorithm, such as k-Means and EM, for time series. The algorithm works by leveraging off the multi-resolution property of wavelets. The dilemma of choosing the initial centers is mitigated by initializing the centers at each approximation level, using the final centers returned by the coarser representations. In addition to casting the clustering algorithms as anytime algorithms, this approach has two other very desirable properties. By working at lower dimensionalities we can efficiently avoid local minima. Therefore, the quality of the clustering is usually better than the batch algorithm. In addition, even if the algorithm is run to completion, our approach is much faster than its batch counterpart. We explain, and empirically demonstrate these surprising and desirable properties with comprehensive experiments on several publicly available real data sets. We further demonstrate that our approach can be generalized to a framework of much broader range of algorithms or data mining problems. 1
Citations
|
4344
|
Maximum likelihood from incomplete data via the EM algorithm
– Dempster, Laird, et al.
- 1977
|
|
311
|
Efcient similarity search in sequence databases
– Agrawal, Faloutsos, et al.
- 1993
|
|
309
|
et al., Fast subsequence matching in timeseries databases
– Faloutsos
- 1994
|
|
180
|
Scaling Clustering Algorithms to Large Databases
– Bradley, Fayyad, et al.
- 1998
|
|
130
|
Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases
– Keogh, Chakrabarti, et al.
- 2001
|
|
127
|
Efficient time-series matching by wavelets
– Chan, Fu
- 1999
|
|
104
|
On the need for time series data mining benchmarks: A survey and empirical demonstration
– Keogh, Kasetty
|
|
101
|
An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback
– Keogh, Pazzani
- 1998
|
|
96
|
Some methods for classification and analysis of multivariate observations
– McQueen
|
|
94
|
Fast time sequence indexing for arbitrary Lp norms. In: A. El Abbadi et al. (eds
– Yi, Faloutsos
- 2000
|
|
74
|
Efficiently supporting ad hoc queries in large datasets of time sequences
– Korn, Jagadish, et al.
- 1997
|
|
39
|
Similarity search over time-series data using wavelets
– Popivanov, Miller
- 2002
|
|
38
|
Efficient retrieval of similar time sequences using DFT
– Rafiei, Mendelzon
- 1998
|
|
37
|
A comparison of dft and dwt based similarity search in time-series databases
– Wu, Agrawal, et al.
- 2000
|
|
36
|
Adaptive dimension reduction for clustering high dimensional data
– Ding, He, et al.
- 2002
|
|
27
|
The ucr time series data mining archive. http://www.cs.ucr.edu/∼eamonn/TSDMA/ index.html
– Keogh, Folias
- 2002
|
|
18
|
Iterative deepening dynamic time warping for time series
– Chu, Hart, et al.
- 2002
|
|
16
|
The Haar Wavelet Transform in the Time Series Similarity Paradigm
– Struzik
- 1999
|
|
13
|
Anytime algorithm development tools
– Grass, Zilberstein
- 1996
|
|
10
|
Anytime Exploratory data analysis for massive data sets
– Smyth, Wolpert
- 1997
|
|
10
|
A waveletbased anytime algorithm for k-means clustering of time series
– Vlachos, Lin, et al.
- 2003
|
|
5
|
Ten Lectures on Wavelets. Number 61
– Daubechies
- 1992
|
|
4
|
Initialization of Iterative Refinement Clustering Algorithms
– unknown authors
- 1998
|
|
3
|
An Expectation Maximization (EM) Algorithm for the Identification and
– Lawrence
- 1990
|
|
3
|
TSA-tree: a Wavelet Based Approach to Improve the Efficiency of Multi-Level Surprise and Trend Queries
– Shahabi, Tian
- 2000
|