Results 1 -
5 of
5
Simple Space-Time Trade-Offs for AESA
"... Abstract. We consider indexing and range searching in metric spaces. The best method known is AESA, in practice requiring the fewest number of distance evaluations to answer range queries. The problem with AESA is its space complexity, requiring storage for Θ(n 2) distance values to index n objects. ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract. We consider indexing and range searching in metric spaces. The best method known is AESA, in practice requiring the fewest number of distance evaluations to answer range queries. The problem with AESA is its space complexity, requiring storage for Θ(n 2) distance values to index n objects. We give several methods to reduce this cost. The main observation is that exact distance values are not needed, but lower and upper bounds suffice. The simplest of our methods need only Θ(n 2) bits (as opposed to words) of storage, but the price to pay is more distance evaluations, the exact cost depending on the dimension, as compared to AESA. To reduce this efficiency gap we extend our method to use b distance bounds, requiring Θ(n 2 log 2 (b)) bits of storage. The scheme uses also Θ(b) or Θ(bn) words of auxiliary space. We experimentally show that using b ∈ {1,..., 16} (depending on the problem instance) gives good results. Our preprocessing and side computation costs are the same as for AESA. We propose several improvements, achieving e.g. O(n 1+α) construction cost for some 0 < α < 1, and a variant using even less space. 1
On Tighter Inequalities for Efficient Similarity Search in Metric Spaces
"... Abstract—Similarity search consists of the efficient retrieval of relevant information satisfying user formulated query conditions from a database with prebuilt indexing structures. Since the evaluation of the distance functions between queries and indexed objects is often computationally expensive, ..."
Abstract
- Add to MetaCart
Abstract—Similarity search consists of the efficient retrieval of relevant information satisfying user formulated query conditions from a database with prebuilt indexing structures. Since the evaluation of the distance functions between queries and indexed objects is often computationally expensive, there have been many attempts to build indexing structures that use as few distance computations as possible to answer queries. Among these methods, for 20 years the Approximating and Eliminating Search Algorithm (AESA) has been the baseline in terms of the required distance computations. By storing a pre-computed inter-object distance matrix, AESA is able to extensively apply the triangle-inequality based pruning rules to avoid unnecessary distance computations. In this paper, to further improve the performance of AESA, we introduce a novel group of pruning rules that are proven to be tighter than the triangleinequality based rules and hence can further reduce the number of distance computations during the search. The new pruning rules require the assumption of positive semi-definite metric space models and can be used in most modern applications. With some slight modification, they can be easily extended to search algorithms in general metric spaces. In the simulations, when incorporated with the proposed pruning rules, AESA showed a significant improvement in distance-computation reduction. For low dimensional problems, applying the new pruning rules cut the distance computations in half, and for high dimensional problems, the reduction was sometimes more than 90%. The pruning rules were also applied to LAESA, a variant of AESA which imposes a linear storage requirement. For this algorithm, they not only helped to save more distance computations, but considerably reduced the storage requirement as well.
Speeding up Spatial Approximation Search in Metric Spaces
"... Proximity searching consists in retrieving from a database those elements that are similar to a query object. The usual model for proximity searching is a metric space where the distance, which models the proximity, is expensive to compute. An index uses precomputed distances to speed up query proce ..."
Abstract
- Add to MetaCart
Proximity searching consists in retrieving from a database those elements that are similar to a query object. The usual model for proximity searching is a metric space where the distance, which models the proximity, is expensive to compute. An index uses precomputed distances to speed up query processing. Among all the known indices, the baseline for performance for about twenty years has been AESA. This index uses an iterative procedure, where at each iteration it first chooses the next promising element (“pivot”) to compare to the query, and then it discards database elements that can be proved not relevant to the query using the pivot. The next pivot in AESA is chosen as the one minimizing the sum of lower bounds to the distance to the query proved by previous pivots. In this paper we introduce the new index iAESA, which establishes a new performance baseline for metric space searching. The difference with AESA is the method to select the next pivot. In iAESA, each candidate sorts previous pivots by closeness to it, and chooses the next pivot as the candidate whose order is most similar to that of the query. We also propose a modification to AESA-like algorithms to turn them into probabilistic algorithms. Our empirical results confirm a consistent improvement in query performance. For example, we perform as few as 60 % of the distance evaluations of AESA in a database of documents, a
Faster Retrieval with a Two-Pass Dynamic-Time-Warping Lower Bound
, 811
"... The Dynamic Time Warping (DTW) is a popular similarity measure between time series. The DTW fails to satisfy the triangle inequality and its computation requires quadratic time. Hence, to find closest neighbors quickly, we use bounding techniques. We can avoid most DTW computations with an inexpensi ..."
Abstract
- Add to MetaCart
The Dynamic Time Warping (DTW) is a popular similarity measure between time series. The DTW fails to satisfy the triangle inequality and its computation requires quadratic time. Hence, to find closest neighbors quickly, we use bounding techniques. We can avoid most DTW computations with an inexpensive lower bound (LB Keogh). We compare LB Keogh with a tighter lower bound (LB Improved). We find that LB Improved-based search is faster. As an example, our approach is 2–3 times faster over random-walk and shape time series. Key words: time series, very large databases, indexing, classification 1
Faster Sequential Search with a Two-Pass Dynamic-Time-Warping Lower Bound
, 807
"... The Dynamic Time Warping (DTW) is a popular similarity measure between time series. The DTW fails to satisfy the triangle inequality and its computation requires quadratic time. Hence, to find closest neighbors quickly, we use bounding techniques. We can avoid most DTW computations with an inexpensi ..."
Abstract
- Add to MetaCart
The Dynamic Time Warping (DTW) is a popular similarity measure between time series. The DTW fails to satisfy the triangle inequality and its computation requires quadratic time. Hence, to find closest neighbors quickly, we use bounding techniques. We can avoid most DTW computations with an inexpensive lower bound (LB Keogh). We compare LB Keogh with a tighter lower bound (LB Improved). We find that LB Improved-based search is faster for sequential search. As an example, our approach is 3 times faster over random-walk and shape time series. We also review some of the mathematical properties of the DTW. We derive a tight triangle inequality for the DTW. We show that the DTW becomes the l1 distance when time series are separated by a constant. 1

