Results 1  10
of
29
Searching and mining trillions of time series subsequences under dynamic time warping
 In SIGKDD
, 2012
"... Most time series data mining algorithms use similarity search as a core subroutine, and thus the time taken for similarity search is the bottleneck for virtually all time series data mining algorithms. The difficulty of scaling search to large datasets largely explains why most academic work on time ..."
Abstract

Cited by 43 (3 self)
 Add to MetaCart
(Show Context)
Most time series data mining algorithms use similarity search as a core subroutine, and thus the time taken for similarity search is the bottleneck for virtually all time series data mining algorithms. The difficulty of scaling search to large datasets largely explains why most academic work on time series data mining has plateaued at considering a few millions of time series objects, while much of industry and science sits on billions of time series objects waiting to be explored. In this work we show that by using a combination of four novel ideas we can search and mine truly massive time series for the first time. We demonstrate the following extremely unintuitive fact; in large datasets we can exactly search under DTW much more quickly than the current stateoftheart Euclidean distance search algorithms. We demonstrate our work on the largest set of time series experiments ever attempted. In particular, the largest dataset we consider is larger than the combined size of all of the time series datasets considered in all data mining papers ever published. We show that our ideas allow us to solve higherlevel time series data mining problem such as motif discovery and clustering at scales that would otherwise be untenable. In addition to mining massive datasets, we will show that our ideas also have implications for realtime monitoring of data streams, allowing us to handle much faster arrival rates and/or use cheaper and lower powered devices than are currently possible.
Stream Monitoring under the Time Warping Distance
"... Data stream processing has recently attracted an increasing amount of interest. The goal of this paper is to monitor numerical streams, and to find subsequences that are similar to a given query sequence, under the DTW (Dynamic Time Warping) distance. Applications include word spotting, sensor patte ..."
Abstract

Cited by 30 (3 self)
 Add to MetaCart
(Show Context)
Data stream processing has recently attracted an increasing amount of interest. The goal of this paper is to monitor numerical streams, and to find subsequences that are similar to a given query sequence, under the DTW (Dynamic Time Warping) distance. Applications include word spotting, sensor pattern matching, and monitoring of biomedical signals (e.g., EKG, ECG), and monitoring of environmental (seismic and volcanic) signals. DTW is a very popular distance measure, permitting accelerations and decelerations, and it has been studied for finite, stored sequence sets. However, in many applications such as network analysis and sensor monitoring, massive amounts of data arrive continuously and it is infeasible to save all the historical data. We propose SPRING, a novel algorithm that can solve the problem. We provide a theoretical analysis and prove that SPRING does not sacrifice accuracy, while it requires constant space and time per timetick. These are dramatic improvements over the naive method. Our experiments on real and realistic data illustrate that SPRING does indeed detect the qualifying subsequences correctly and that it can offer dramatic improvements in speed (up to 650,000 times) over the naive implementation. 1
An Efficient and Accurate Method for Evaluating Time Series Similarity
, 2007
"... A variety of techniques currently exist for measuring the similarity between time series datasets. Of these techniques, the methods whose matching criteria is bounded by a specified ǫ threshold value, such as the LCSS and the EDR techniques, have been shown to be robust in the presence of noise, tim ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
A variety of techniques currently exist for measuring the similarity between time series datasets. Of these techniques, the methods whose matching criteria is bounded by a specified ǫ threshold value, such as the LCSS and the EDR techniques, have been shown to be robust in the presence of noise, time shifts, and data scaling. Our work proposes a new algorithm, called the Fast Time Series Evaluation (FTSE) method, which can be used to evaluate such threshold value techniques, including LCSS and EDR. Using FTSE, we show that these techniques can be evaluated faster than using either traditional dynamic programming or even warprestricting methods such as the SakoeChiba band and the Itakura Parallelogram. We also show that FTSE can be used in a framework that can evaluate a richer range of ǫ thresholdbased scoring techniques, of which EDR and LCSS are just two examples. This framework, called Swale, extends the ǫ thresholdbased scoring techniques to include arbitrary match rewards and gap penalties. Through extensive empirical evaluation, we show that Swale can obtain greater accuracy than existing methods.
Approximate embeddingbased subsequence matching of time series
 In SIGMOD ’08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data
, 2008
"... A method for approximate subsequence matching is introduced, that significantly improves the efficiency of subsequence matching in large time series data sets under the dynamic time warping (DTW) distance measure. Our method is called EBSM, shorthand for EmbeddingBased Subsequence Matching. The key ..."
Abstract

Cited by 21 (6 self)
 Add to MetaCart
A method for approximate subsequence matching is introduced, that significantly improves the efficiency of subsequence matching in large time series data sets under the dynamic time warping (DTW) distance measure. Our method is called EBSM, shorthand for EmbeddingBased Subsequence Matching. The key idea is to convert subsequence matching to vector matching using an embedding. This embedding maps each database time series into a sequence of vectors, so that every step of every time series in the database is mapped to a vector. The embedding is computed by applying full dynamic time warping between reference objects and each database time series. At runtime, given a query object, an embedding of that object is computed in the same manner, by running dynamic time warping between the reference objects and the query. Comparing the embedding of the query with the database vectors is used to efficiently identify relatively few areas of interest in the database sequences. Those areas of interest are then fully explored using the exact DTWbased subsequence matching algorithm. Experiments on a large, public time series data set produce speedups of over one order of magnitude compared to bruteforce search, with very small losses (< 1%) in retrieval accuracy.
Time series knowledge mining
, 2006
"... An important goal of knowledge discovery is the search for patterns in data that can help explain the underlying process that generated the data. The patterns are required to be new, useful, and understandable to humans. In this work we present a new method for the understandable description of loca ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
An important goal of knowledge discovery is the search for patterns in data that can help explain the underlying process that generated the data. The patterns are required to be new, useful, and understandable to humans. In this work we present a new method for the understandable description of local temporal relationships in multivariate data, called Time Series Knowledge Mining (TSKM). We define the Time Series Knowledge Representation (TSKR) as a new language for expressing temporal knowledge. The patterns have a hierarchical structure, each level corresponds to a single temporal concept. On the lowest level, intervals are used to represent duration. Overlapping parts of intervals represent coincidence on the next level. Several such blocks of intervals are connected with a partial order relation on the highest level. Each pattern element consists of a semiotic triple to connect syntactic and semantic information with pragmatics. The patterns are very compact, but offer details for each element on demand. In comparison with related approaches, the TSKR is shown to have advantages in robustness, expressivity, and comprehensibility. Efficient algorithms for the discovery of the patterns are proposed. The search for coincidence as well as partial order can be formulated as variants of the well known frequent itemset problem. One of the best known algorithms for this problem is therefore adapted for our purposes. Human interaction is used during the mining to analyze and validate partial results as early as possible and guide further processing steps. The efficacy of the methods is demonstrated using several data sets. In an application to sports medicine the results were recognized as valid and useful by an expert of the field.
Experimental comparison of representation methods and distance measures for time series data
 Data Mining and Knowledge Discovery
"... ar ..."
(Show Context)
1 Indexbased Most Similar Trajectory Search
, 2006
"... The problem of trajectory similarity in moving object databases is a relatively new topic in the spatial and spatiotemporal database literature. Existing work focuses on the spatial notion of similarity ignoring the temporal dimension of trajectories and disregarding the presence of a generalpurpos ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
(Show Context)
The problem of trajectory similarity in moving object databases is a relatively new topic in the spatial and spatiotemporal database literature. Existing work focuses on the spatial notion of similarity ignoring the temporal dimension of trajectories and disregarding the presence of a generalpurpose spatiotemporal index. In this work, we address the issue of spatiotemporal trajectory similarity search by defining a similarity metric, proposing an efficient approximation method to reduce its calculation cost, and developing novel metrics and heuristics to support kmostsimilartrajectory search in spatiotemporal databases exploiting on existing Rtreelike structures that are already found there to support more traditional queries. Our experimental study, based on real and synthetic datasets, verifies that the proposed similarity metric efficiently retrieves spatiotemporally similar trajectories in cases where related work fails, while at the same time the proposed algorithm is shown to be efficient and highly scalable. 1.
Shapes based Trajectory Queries for Moving Objects
 Proceedings of ACM GIS
, 2005
"... An interesting issue in moving objects databases is to find similar trajectories of moving objects. Previous work on this topic focuses on movement patterns (trajectories with time dimension) of moving objects, rather than spatial shapes (trajectories without time dimension) of their trajectories. I ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
(Show Context)
An interesting issue in moving objects databases is to find similar trajectories of moving objects. Previous work on this topic focuses on movement patterns (trajectories with time dimension) of moving objects, rather than spatial shapes (trajectories without time dimension) of their trajectories. In this paper we propose a simple and effective way to compare spatial shapes of moving object trajectories. We introduce a new distance function based on “one way distance” (OWD). Algorithms for evaluating OWD in both continuous (piece wise linear) and discrete (grid representation) cases are developed. An index structure for OWD in grid representation, which guarantees no false dismissals, is also given to improve the efficiency of similarity search. Empirical studies show that OWD outperforms existent methods not only in precision, but also in efficiency. And the results of OWD in continuous case can be approximated by discrete case efficiently.
Faster Retrieval with a TwoPass DynamicTimeWarping Lower Bound
, 2009
"... The Dynamic Time Warping (DTW) is a popular similarity measure between time series. The DTW fails to satisfy the triangle inequality and its computation requires quadratic time. Hence, to find closest neighbors quickly, we use bounding techniques. We can avoid most DTW computations with an inexpensi ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
The Dynamic Time Warping (DTW) is a popular similarity measure between time series. The DTW fails to satisfy the triangle inequality and its computation requires quadratic time. Hence, to find closest neighbors quickly, we use bounding techniques. We can avoid most DTW computations with an inexpensive lower bound (LB Keogh). We compare LB Keogh with a tighter lower bound (LB Improved). We find that LB Improvedbased search is faster. As an example, our approach is 2–3 times faster over randomwalk and shape time series.
Mining approximate topk subspace anomalies in multidimensional timeseries data
 In VLDB
, 2007
"... Market analysis is a representative data analysis process with many applications. In such an analysis, critical numerical measures, such as profit and sales, fluctuate over time and form timeseries data. Moreover, the time series data correspond to market segments, which are described by a set of a ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
(Show Context)
Market analysis is a representative data analysis process with many applications. In such an analysis, critical numerical measures, such as profit and sales, fluctuate over time and form timeseries data. Moreover, the time series data correspond to market segments, which are described by a set of attributes, such as age, gender, education, income level, and productcategory, that form a multidimensional structure. To better understand market dynamics and predict future trends, it is crucial to study the dynamics of timeseries in multidimensional market segments. This is a topic that has been largely ignored in time series and data cube research. In this study, we examine the issues of anomaly detection in multidimensional timeseries data. We propose timeseries data cube to capture the multidimensional space formed by the attribute structure. This facilitates the detection of anomalies based on expected values derived from higher level, “more general ” timeseries. Anomaly detection in a timeseries data cube poses computational challenges, especially for highdimensional, large data sets. To this end, we also propose an efficient search algorithm to iteratively select subspaces in the original highdimensional space and detect anomalies within each one. Our experiments with both synthetic and realworld data demonstrate the effectiveness and efficiency of the proposed solution. 1.