Results 11  20
of
624
The Hybrid Tree: An Index Structure for High Dimensional Feature Spaces
 In Proceedings of ICDE’99
, 1999
"... Feature based similarity search is emerging as an important search paradigm in database systems. The technique used is to map the data items as points into a high dimensional feature space which is indexed using a multidimensional data structure. Similarity search then corresponds to a range search ..."
Abstract

Cited by 117 (13 self)
 Add to MetaCart
(Show Context)
Feature based similarity search is emerging as an important search paradigm in database systems. The technique used is to map the data items as points into a high dimensional feature space which is indexed using a multidimensional data structure. Similarity search then corresponds to a range search over the data structure. Although several data structures have been proposed for feature indexing, none of them is known to scale beyond 1015 dimensional spaces. This paper introduces the hybrid tree – a multidimensional data structure for indexing high dimensional feature spaces. Unlike other multidimensional data structures, the hybrid tree cannot be classified as either a pure data partitioning (DP) index structure (e.g., Rtree, SStree, SRtree) or a pure space partitioning (SP) one (e.g., KDBtree, hBtree); rather, it “combines ” positive aspects of the two types of index structures a single data structure to achieve search performance more scalable to high dimensionalities than either of the above techniques (hence, the name “hybrid”). Furthermore, unlike many data structures (e.g., distance based index structures like SStree, SRtree), the hybrid tree can support queries based on arbitrary distance functions. Our experiments on “real” high dimensional large size feature databases demonstrate that the hybrid tree scales well to high dimensionality and large database sizes. It significantly outperforms both purely DPbased and SPbased index mechanisms as well as linear scan at all dimensionalities for large sized databases. 1.
BMultiProbe LSH: Efficient indexing for highdimensional similarity search
 in Proc. 33rd Int. Conf. Very Large Data Bases
"... Similarity indices for highdimensional data are very desirable for building contentbased search systems for featurerich data such as audio, images, videos, and other sensor data. Recently, locality sensitive hashing (LSH) and its variations have been proposed as indexing techniques for approximate ..."
Abstract

Cited by 111 (3 self)
 Add to MetaCart
(Show Context)
Similarity indices for highdimensional data are very desirable for building contentbased search systems for featurerich data such as audio, images, videos, and other sensor data. Recently, locality sensitive hashing (LSH) and its variations have been proposed as indexing techniques for approximate similarity search. A significant drawback of these approaches is the requirement for a large number of hash tables in order to achieve good search quality. This paper proposes a new indexing scheme called multiprobe LSH that overcomes this drawback. Multiprobe LSH is built on the wellknown LSH technique, but it intelligently probes multiple buckets that are likely to contain query results in a hash table. Our method is inspired by and improves upon recent theoretical work on entropybased LSH designed to reduce the space requirement of the basic LSH method. We have implemented the multiprobe LSH method and evaluated the implementation with two different highdimensional datasets. Our evaluation shows that the multiprobe LSH method substantially improves upon previously proposed methods in both space and time efficiency. To achieve the same search quality, multiprobe LSH has a similar timeefficiency as the basic LSH method while reducing the number of hash tables by an order of magnitude. In comparison with the entropybased LSH method, to achieve the same search quality, multiprobe LSH uses less query time and 5 to 8 times fewer number of hash tables. 1.
Locationbased Spatial Queries
 In SIGMOD
, 2003
"... In this paper we propose an approach that enables mobile clients to determine the validity of previous queries based on their current locations. In order to make this possible, the server returns in addition to the query result, a validity region around the client's location within which the re ..."
Abstract

Cited by 108 (12 self)
 Add to MetaCart
In this paper we propose an approach that enables mobile clients to determine the validity of previous queries based on their current locations. In order to make this possible, the server returns in addition to the query result, a validity region around the client's location within which the result remains the same. We focus on two of the most common spatial query types, namely nearest neighbor and window queries, define the validity region in each case and propose the corresponding query processing algorithms. In addition, we provide analytical models for estimating the expected size of the validity region. Our techniques can significantly reduce the number of queries issued to the server, while introducing minimal computational and network overhead compared to traditional spatial queries.
The Atree: An Index Structure for HighDimensional Spaces Using Relative Approximation
, 2000
"... We propose a novel index structure, Atree (Approximation tree), for similarity search of highdimensional data. The basic idea of the Atree is the introduction of Virtual Bounding Rectangles (VBRs), which contain and approximate MBRs and data objects. VBRs can be represented rather compactly, and ..."
Abstract

Cited by 107 (0 self)
 Add to MetaCart
We propose a novel index structure, Atree (Approximation tree), for similarity search of highdimensional data. The basic idea of the Atree is the introduction of Virtual Bounding Rectangles (VBRs), which contain and approximate MBRs and data objects. VBRs can be represented rather compactly, and thus affect the tree configuration both quantitatively and qualitatively. Firstly, since tree nodes can install large number of entries of VBRs, fanout of nodes becomes large, thus leads to fast search. More importantly, we have a free hand in arranging MBRs and VBRs in tree nodes. In the Atrees, nodes contain entries of an MBR and its children VBRs. Therefore, by fetching a node of an Atree, we can obtain the information of exact position of a parent MBR and approximate position of its children. We have performed experiments using both synthetic and real data sets. For the real data sets, the Atree outperforms the SRtree and the VAFile in all range of dimensionality up to 64 dimension, which is the highest dimension in our experiments. The Atree achieves 77.3 % (77.7%, resp.) savings in page accesses compared to the SRtree (the VAFile, resp.) for 64dimensional real data.
Nearest Neighbors In HighDimensional Spaces
, 2004
"... In this chapter we consider the following problem: given a set P of points in a highdimensional space, construct a data structure which given any query point q nds the point in P closest to q. This problem, called nearest neighbor search is of significant importance to several areas of computer sci ..."
Abstract

Cited by 95 (3 self)
 Add to MetaCart
In this chapter we consider the following problem: given a set P of points in a highdimensional space, construct a data structure which given any query point q nds the point in P closest to q. This problem, called nearest neighbor search is of significant importance to several areas of computer science, including pattern recognition, searching in multimedial data, vector compression [GG91], computational statistics [DW82], and data mining. Many of these applications involve data sets which are very large (e.g., a database containing Web documents could contain over one billion documents). Moreover, the dimensionality of the points is usually large as well (e.g., in the order of a few hundred). Therefore, it is crucial to design algorithms which scale well with the database size as well as with the dimension. The nearestneighbor problem is an example of a large class of proximity problems, which, roughly speaking, are problems whose definitions involve the notion of...
Independent Quantization: An Index Compression Technique for HighDimensional Data Spaces
 IN ICDE
, 1999
"... Two major approaches have been proposed to efficiently process queries in databases: Speeding up the search by using index structures, and speeding up the search by operating on a compressed database, such as a signature file. Both approaches have their limitations: Indexing techniques are inefficie ..."
Abstract

Cited by 91 (22 self)
 Add to MetaCart
Two major approaches have been proposed to efficiently process queries in databases: Speeding up the search by using index structures, and speeding up the search by operating on a compressed database, such as a signature file. Both approaches have their limitations: Indexing techniques are inefficient in extreme configurations, such as highdimensional spaces, where even a simple scan may be cheaper than an indexbased search. Compression techniques are not very efficient in all other situations. We propose to combine both techniques to search for nearest neighbors in a highdimensional space. For this purpose, we develop a compressed index, called the IQtree, with a threelevel structure: The first level is a regular (flat) directory consisting of minimum bounding boxes, the second level contains data points in a compressed representation, and the third level contains the actual data. We overcome several engineering challenges in constructing an effective index structure of this type...
iDistance: An Adaptive B+tree Based Indexing Method for Nearest Neighbor Search
"... In this paper, we present an efficient B+tree based indexing method, called iDistance, for Knearest neighbor (KNN) search in a highdimensional metric space. iDistance partitions the data based on a space or datapartitioning strategy, and selects a reference point for each partition. The data po ..."
Abstract

Cited by 90 (10 self)
 Add to MetaCart
In this paper, we present an efficient B+tree based indexing method, called iDistance, for Knearest neighbor (KNN) search in a highdimensional metric space. iDistance partitions the data based on a space or datapartitioning strategy, and selects a reference point for each partition. The data points in each partition are transformed into a single dimensional value based on their similarity with respect to the reference point. This allows the points to be indexed using a B +tree structure and KNN search to be performed using onedimensional range search. The choice of partition and reference point adapt the index structure to the data distribution. We conducted extensive experiments to evaluate the iDistance technique, and report results demonstrating its effectiveness. We also present a cost model for iDistance KNN search, which can be exploited in query optimization.
pSearch: Information Retrieval in Structured Overlays
, 2002
"... We describe an efficient peertopeer information retrieval system, pSearch, that supports stateoftheart content and semanticbased fulltext searches. pSearch avoids the scalability problem of existing systems that employ centralized indexing, or index/query flooding. It also avoids the nondete ..."
Abstract

Cited by 89 (6 self)
 Add to MetaCart
(Show Context)
We describe an efficient peertopeer information retrieval system, pSearch, that supports stateoftheart content and semanticbased fulltext searches. pSearch avoids the scalability problem of existing systems that employ centralized indexing, or index/query flooding. It also avoids the nondeterminism that is exhibited by heuristicbased approaches. In pSearch, documents in the network are organized around their vector representations (based on modern document ranking algorithms) such that the search space for a given query is organized around related documents, achieving both eciency and accuracy.
Approximating MultiDimensional Aggregate Range Queries Over Real Attributes
, 2000
"... Finding approximate answers to multidimensional range queries over real valued attributes has significant applications in data exploration and database query optimization. In this paper we consider the following problem: given a table of d attributes whose domain is the real numbers, and a quer ..."
Abstract

Cited by 86 (9 self)
 Add to MetaCart
Finding approximate answers to multidimensional range queries over real valued attributes has significant applications in data exploration and database query optimization. In this paper we consider the following problem: given a table of d attributes whose domain is the real numbers, and a query that specifies a range in each dimension, find a good approximation of the number of records in the table that satisfy the query. We present a new histogram technique that is designed to approximate the density of multidimensional datasets with real attributes. Our technique finds buckets of variable size, and allows the buckets to overlap. Overlapping buckets allow more efficient approximation of the density. The size of the cells is based on the local density of the data. This technique leads to a faster and more compact approximation of the data distribution. We also show how to generalize kernel density estimators, and how to apply them on the multidimensional query approxim...
TimeParameterized Queries in SpatioTemporal Databases
, 2002
"... Timeparameterized queries (TP queries for short) retrieve (i) the actual result at the time that the query is issued, (ii) the validity period of the result given the current motion of the query and the database objects, and (iii) the change that causes the expiration of the result. Due to the hi ..."
Abstract

Cited by 84 (4 self)
 Add to MetaCart
Timeparameterized queries (TP queries for short) retrieve (i) the actual result at the time that the query is issued, (ii) the validity period of the result given the current motion of the query and the database objects, and (iii) the change that causes the expiration of the result. Due to the highly dynamic nature of several spatiotemporal applications, TP queries are important both as standalone methods, as well as building blocks of more complex operations. However, little work has been done towards their efficient processing. In this paper, we propose a general framework that covers timeparameterized variations of the most common spatial queries, namely window queries, knearest neighbors and spatial joins. In particular, each of these TP queries is reduced to nearest neighbor search where the distance functions are def'med according to the query type. This reduction allows the application and extension of wellknown branch and bound techniques to the current problem. The proposed methods can be applied with mobile queries, mobile objects or both, given a suitable indexing method. Our experimental evaluation is based on Rtrees and their extensions for dynamic objects.