Results 1 -
4 of
4
Fast retrieval of similar subsequences in long sequence databases
- In 3 rd IEEE Knowledge and Data Engineering Exchange Workshop
, 1999
"... shpark,dongwon,wwc¡ Although the Euclidean distance has been the most popular similarity measure in sequence databases, recent techniques prefer to use high-cost distance functions such as the time warping distance and the editing distance for wider applicability. However, if these distance function ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
shpark,dongwon,wwc¡ Although the Euclidean distance has been the most popular similarity measure in sequence databases, recent techniques prefer to use high-cost distance functions such as the time warping distance and the editing distance for wider applicability. However, if these distance functions are applied to the retrieval of similar subsequences, the number of subsequences to be inspected during the search is quadratic to the ¢ average length of data sequences. In this paper, we propose a novel subsequence matching scheme, called the aligned subsequence matching, where the number of subsequences to be compared with a query sequence is reduced to ¢ linear to. We also present an indexing technique to speed-up the aligned subsequence matching using the similarity measure of the modified time warping distance. The experiments on the synthetic data sequences demonstrate the effectiveness of our proposed approach; ours consistently outperformed the sequential scanning and achieved up to 6.5 times speed-up. 1.
AIMS: An Immersidata Management System
, 2003
"... Weintroduce a system to address the challenges involved in managing the multidimensional sensor data streams generated within immersiveenvironments. We call this data type, immersidata,which is defined as the data acquired from a user's interactions with an immersiveenvironment. Managementof ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
Weintroduce a system to address the challenges involved in managing the multidimensional sensor data streams generated within immersiveenvironments. We call this data type, immersidata,which is defined as the data acquired from a user's interactions with an immersiveenvironment. Managementof immersidata is challenging because they are: 1) multidimensional, 2) spatio-temporal, 3) continuous data streams (CDS), 4) large in size and bandwidth requirements, and 5) noisy.
Parallel Algorithms for High-dimensional Proximity Joins
"... We consider the problem of parallelizing highdimensional proximity joins. We present a parallel multidimensional join algorithm based on an the epsilon-kdB tree and compare it with the more common approach of space partitioning. An evaluation of the algorithms on an IBM SP2 shared-nothing multiproce ..."
Abstract
- Add to MetaCart
We consider the problem of parallelizing highdimensional proximity joins. We present a parallel multidimensional join algorithm based on an the epsilon-kdB tree and compare it with the more common approach of space partitioning. An evaluation of the algorithms on an IBM SP2 shared-nothing multiprocessor is presented using both synthetic and real-life datasets. We also examine the e ectiveness of the algorithms in the context of a speci c data-mining problem, that of nding similar time-series. The empirical results show that our algorithm exhibits good performance and scalability, aswell an ability to handle dataskew. 1
Index Interpolation: An Approach to Subsequence Matching Supporting Normalization Transform in Time-Series Databases
"... In this paper, w epropose a subsequence matching algorithm that supports normalization transform in timeseries databases. Normalization transform enables nding sequences with similar uctuation patterns although they are not close to each other before the normalization transform. Application of the e ..."
Abstract
- Add to MetaCart
In this paper, w epropose a subsequence matching algorithm that supports normalization transform in timeseries databases. Normalization transform enables nding sequences with similar uctuation patterns although they are not close to each other before the normalization transform. Application of the existing whole matching algorithm supporting normalization transform to the subsequence matching is feasible, but requires an index for ev ery possible length of the query sequence causing serious overhead on both storage space and update time. The proposed algorithm generates indexes only for a small number of di erent lengths of query sequences. F or subsequence matching it selects the most appropriate index among them. We can obtain better searc h performance by using more indexes. We callour approach index interp olation. We formally pro ve that the proposed algorithm does not cause false dismissal. F or performance evaluation, we haveconducted experiments using the indexes for only ve di erent lengths out of the lengths 256 512 of the query sequence. The results show that the proposed algorithm outperforms the sequential scan by up to 14.6 times on the average when the selectivity of the query is 10;5. 1

