Results 1  10
of
34
Indexdriven similarity search in metric spaces
 ACM Transactions on Database Systems
, 2003
"... Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some similarity measure. In this article, we focus on methods for similarity search th ..."
Abstract

Cited by 192 (8 self)
 Add to MetaCart
Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some similarity measure. In this article, we focus on methods for similarity search that make the general assumption that similarity is represented with a distance metric d. Existing methods for handling similarity search in this setting typically fall into one of two classes. The first directly indexes the objects based on distances (distancebased indexing), while the second is based on mapping to a vector space (mappingbased approach). The main part of this article is dedicated to a survey of distancebased indexing methods, but we also briefly outline how search occurs in mappingbased methods. We also present a general framework for performing search based on distances, and present algorithms for common types of queries that operate on an arbitrary “search hierarchy. ” These algorithms can be applied on each of the methods presented, provided a suitable search hierarchy is defined.
Topk Query Evaluation with Probabilistic Guarantees
 In VLDB
, 2004
"... Topk queries based on ranking elements of multidimensional datasets are a fundamental building block for many kinds of information discovery. The best known generalpurpose algorithm for evaluating topk queries is Fagin’s threshold algorithm (TA). Since the user’s goal behind topk queries is to i ..."
Abstract

Cited by 105 (16 self)
 Add to MetaCart
(Show Context)
Topk queries based on ranking elements of multidimensional datasets are a fundamental building block for many kinds of information discovery. The best known generalpurpose algorithm for evaluating topk queries is Fagin’s threshold algorithm (TA). Since the user’s goal behind topk queries is to identify one or a few relevant and novel data items, it is intriguing to use approximate variants of TA to reduce runtime costs. This paper introduces a family of approximate topk algorithms based on probabilistic arguments. When scanning index lists of the underlying multidimensional data space in descending order of local scores, various forms of convolution and derived bounds are employed to predict when it is safe, with high probability, to drop candidate items and to prune the index scans. The precision and the efficiency of the developed methods are experimentally evaluated based on a large Web corpus and a structured data collection.
KLEE: A Framework for Distributed TopK Query Algorithms
 In VLDB
, 2005
"... This paper addresses the efficient processing of topk queries in widearea distributed data repositories where the index lists for the attribute values (or text terms) of a query are distributed across a number of data peers and the computational costs include network latency, bandwidth consumption ..."
Abstract

Cited by 97 (14 self)
 Add to MetaCart
This paper addresses the efficient processing of topk queries in widearea distributed data repositories where the index lists for the attribute values (or text terms) of a query are distributed across a number of data peers and the computational costs include network latency, bandwidth consumption, and local peer work. We present KLEE, a novel algorithmic framework for distributed topk queries, designed for high performance and flexibility. KLEE makes a strong case for approximate topk algorithms over widely distributed data sources. It shows how great gains in efficiency can be enjoyed at low resultquality penalties. Further, KLEE affords the queryinitiating peer the flexibility to tradeoff result quality and expected performance and to tradeoff the number of communication phases engaged during query execution versus network bandwidth performance. We have implemented KLEE and related algorithms and conducted a comprehensive performance evaluation. Our evaluation employed realworld and synthetic large, webdata collections, and query benchmarks. Our experimental results show that KLEE can achieve major performance gains in terms of network bandwidth, query response times, and much lighter peer loads, all with small errors in result precision and other resultquality measures.
LB_Keogh supports exact indexing of shapes under rotation invariance with arbitrary representations and distance measures
 IN VLDB, 2006
, 2006
"... The matching of twodimensional shapes is an important problem with applications in domains as diverse as biometrics, industry, medicine and anthropology. The distance measure used must be invariant to many distortions, including scale, offset, noise, partial occlusion, etc. Most of these distortion ..."
Abstract

Cited by 55 (12 self)
 Add to MetaCart
(Show Context)
The matching of twodimensional shapes is an important problem with applications in domains as diverse as biometrics, industry, medicine and anthropology. The distance measure used must be invariant to many distortions, including scale, offset, noise, partial occlusion, etc. Most of these distortions are relatively easy to handle, either in the representation of the data or in the similarity measure used. However rotation invariance seems to be uniquely difficult. Current approaches typically try to achieve rotation invariance in the representation of the data, at the expense of discrimination ability, or in the distance measure, at the expense of efficiency. In this work we show that we can take the slow but accurate approaches and dramatically speed them up. On real world problems our technique can take current approaches and make them four orders of magnitude faster, without false dismissals. Moreover, our technique can be used with any of the dozens of existing shape representations and with all the most popular distance measures including Euclidean distance, Dynamic Time Warping and Longest Common Subsequence.
WARP: accurate retrieval of shapes using phase of fourier descriptors and time warping distance
 IEEE Trans. Pattern Anal. Mach. Intell
, 2005
"... AbstractEffective and efficient retrieval of similar shapes from large image databases is still a challenging problem in spite of the high relevance that shape information can have in describing image contents. In this paper, we propose a novel Fourierbased approach, called WARP, for matching and ..."
Abstract

Cited by 54 (1 self)
 Add to MetaCart
(Show Context)
AbstractEffective and efficient retrieval of similar shapes from large image databases is still a challenging problem in spite of the high relevance that shape information can have in describing image contents. In this paper, we propose a novel Fourierbased approach, called WARP, for matching and retrieving similar shapes. The unique characteristics of WARP are the exploitation of the phase of Fourier coefficients and the use of the Dynamic Time Warping (DTW) distance to compare shape descriptors. While phase information provides a more accurate description of object boundaries than using only the amplitude of Fourier coefficients, the DTW distance permits us to accurately match images even in the presence of (limited) phase shiftings. In terms of classical precision/recall measures, we experimentally demonstrate that WARP can gain, say, up to 35 percent in precision at a 20 percent recall level with respect to Fourierbased techniques that use neither phase nor DTW distance.
Effective Proximity Retrieval by Ordering Permutations
, 2007
"... We introduce a new probabilistic proximity search algorithm for range and Knearest neighbor (KNN) searching in both coordinate and metric spaces. Although there exist solutions for these problems, they boil down to a linear scan when the space is intrinsically highdimensional, as is the case in m ..."
Abstract

Cited by 34 (6 self)
 Add to MetaCart
We introduce a new probabilistic proximity search algorithm for range and Knearest neighbor (KNN) searching in both coordinate and metric spaces. Although there exist solutions for these problems, they boil down to a linear scan when the space is intrinsically highdimensional, as is the case in many pattern recognition tasks. This, for example, renders the KNN approach to classification rather slow in large databases. Our novel idea is to predict closeness between elements according to how they order their distances towards a distinguished set of anchor objects. Each element in the space sorts the anchor objects from closest to farthest to it, and the similarity between orders turns out to be an excellent predictor of the closeness between the corresponding elements. We present extensive experiments comparing our method against stateoftheart exact and approximate techniques, both in synthetic and real, metric and nonmetric databases, measuring both CPU time and distance computations. The experiments demonstrate that our technique almost always improves upon the performance of alternative techniques, in some cases by a wide margin.
Reverse Nearest Neighbor Search in Metric Spaces
 TKDE
"... Abstract—Given a set D of objects, a reverse nearest neighbor (RNN) query returns the objects o in D such that o is closer to a query object q than to any other object in D, according to a certain similarity metric. The existing RNN solutions are not sufficient because they either 1) rely on precomp ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Given a set D of objects, a reverse nearest neighbor (RNN) query returns the objects o in D such that o is closer to a query object q than to any other object in D, according to a certain similarity metric. The existing RNN solutions are not sufficient because they either 1) rely on precomputed information that is expensive to maintain in the presence of updates or 2) are applicable only when the data consists of “Euclidean objects ” and similarity is measured using the L2 norm. In this paper, we present the first algorithms for efficient RNN search in generic metric spaces. Our techniques require no detailed representations of objects, and can be applied as long as their mutual distances can be computed and the distance metric satisfies the triangle inequality. We confirm the effectiveness of the proposed methods with extensive experiments. Index Terms—Reverse nearest neighbor, metric space. 1
Unified Framework for Fast Exact and Approximate Search in Dissimilarity Spaces
, 2007
"... In multimedia systems we usually need to retrieve DB objects based on their similarity to a query object, while the similarity assessment is provided by a measure which defines a (dis)similarity score for every pair of DB objects. In most existing applications, the similarity measure is required to ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
(Show Context)
In multimedia systems we usually need to retrieve DB objects based on their similarity to a query object, while the similarity assessment is provided by a measure which defines a (dis)similarity score for every pair of DB objects. In most existing applications, the similarity measure is required to be a metric, where the triangle inequality is utilized to speedup the search for relevant objects by use of metric access methods (MAMs), e.g. the Mtree. A recent research has shown, however, that nonmetric measures are more appropriate for similarity modeling due to their robustness and ease to model a madetomeasure similarity. Unfortunately, due to the lack of triangle inequality, the nonmetric measures cannot be directly utilized by MAMs. From another point of view, some sophisticated similarity measures could be available in a blackbox nonanalytic form (e.g. as an algorithm or even a hardware device), where no information about their topological properties is provided, so we have to consider them as nonmetric measures as well. From yet another point of view, the concept of similarity measuring itself is inherently imprecise and we often prefer fast but approximate retrieval over an exact but slower one. To date, the mentioned aspects of similarity retrieval have been solved separately, i.e. exact vs. approximate search or metric vs. nonmetric search. In this paper we introduce a similarity retrieval framework which incorporates both of the aspects into a single unified model. Based on the framework, we show that for any dissimilarity measure (either a metric or nonmetric) we are able to change the ”amount ” of triangle inequality, and so to obtain an approximate or full metric which can be used for MAMbased retrieval. Due to the varying ”amoun ” of triangle inequality, the measure is modified in a way suitable for either an exact but slower or an approximate but faster retrieval. Additionally, we introduce the TriGen algorithm aimed to construct the desired modification of any blackbox distance automatically, using just a small fraction of the database.
Dynamic Skyline Queries in Metric Spaces
"... Skyline query is of great importance in many applications, such as multicriteria decision making and business planning. In particular, a skyline point is a data object in the database whose attribute vector is not dominated by that of any other objects. Previous methods to retrieve skyline points u ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
(Show Context)
Skyline query is of great importance in many applications, such as multicriteria decision making and business planning. In particular, a skyline point is a data object in the database whose attribute vector is not dominated by that of any other objects. Previous methods to retrieve skyline points usually assume static data objects in the database (i.e. their attribute vectors are fixed), whereas several recent work focus on skyline queries with dynamic attributes. In this paper, we propose a novel variant of skyline queries, namely metric skyline, whose dynamic attributes are defined in the metric space (i.e. not limited to the Euclidean space). We illustrate an efficient and effective pruning mechanism to answer metric skyline queries through a metric index. Extensive experiments have demonstrated the efficiency and effectiveness of our proposed pruning techniques over the metric index in answering metric skyline queries. 1.