#### DMCA

## Nearest Neighbor Search in Multidimensional Spaces (1999)

Citations: | 11 - 0 self |

### Citations

10612 | Introduction to Algorithms.
- Cormen, Leiserson, et al.
- 1990
(Show Context)
Citation Context ...aves have pointers to the data in the secondary memory. It can be proven that the number of accessed blocks on average and in the worst case for the B-trees is logarithmic in the size of the database =-=[29, 28]-=-. The problem becomes more complex when we move to spaces of higher dimension. There is a multitude of data structures that deal with the problem of indexing multidimensional spaces. In general, we ca... |

8148 |
Basic local alignment search tool
- Altschul, Gish, et al.
- 1990
(Show Context)
Citation Context ...is, given a set of objects, return the object most similar to a query object. Examples of such areas include information retrieval [76, 74, 32, 19], multimedia applications [39, 41, 73], DNA matching =-=[4, 72, 63, 84]-=-, data compression [49], pattern recognition [36, 31], and machine learning [30]. Typically, the objects are mapped into a multidimensional space of features, and similarity search is reduced to a Nea... |

4844 |
Pattern classification and scene analysis
- Duda, Hart
- 1973
(Show Context)
Citation Context ...o a query object. Examples of such areas include information retrieval [76, 74, 32, 19], multimedia applications [39, 41, 73], DNA matching [4, 72, 63, 84], data compression [49], pattern recognition =-=[36, 31]-=-, and machine learning [30]. Typically, the objects are mapped into a multidimensional space of features, and similarity search is reduced to a Nearest Neighbor Search in this feature space. In most a... |

4018 |
An Introduction to Modern Information Retrieval
- Salton, McGill
- 1983
(Show Context)
Citation Context ...e of applications that involve the notion of similarity search, that is, given a set of objects, return the object most similar to a query object. Examples of such areas include information retrieval =-=[76, 74, 32, 19]-=-, multimedia applications [39, 41, 73], DNA matching [4, 72, 63, 84], data compression [49], pattern recognition [36, 31], and machine learning [30]. Typically, the objects are mapped into a multidime... |

3778 | Indexing by latent semantic analysis
- Deerwester, Dumais, et al.
- 1990
(Show Context)
Citation Context ...e of applications that involve the notion of similarity search, that is, given a set of objects, return the object most similar to a query object. Examples of such areas include information retrieval =-=[76, 74, 32, 19]-=-, multimedia applications [39, 41, 73], DNA matching [4, 72, 63, 84], data compression [49], pattern recognition [36, 31], and machine learning [30]. Typically, the objects are mapped into a multidime... |

2750 | R-trees: a dynamic index structure for spatial searching
- Guttman
- 1984
(Show Context)
Citation Context ...ST MINMAXDIST pr 1 pr 11 pr 12 pr 2 pr 21 pr 22 pr 23 Figure 4: MINDIST and MINMAXDIST. The schedule of the HS and RKV algorithms secondary memory. 7.2 Indexes for Nearest Neighbor queries The R-tree =-=[51]-=- is probably the index most commonly used in practice. The R-tree is a direct generalization of the B-tree for spaces of dimension greater than one. It consists of a tree of nested Minimum Bounding Re... |

2132 |
Vector Quantization and Signal Compression
- Gersho, Gray
- 1992
(Show Context)
Citation Context ... the object most similar to a query object. Examples of such areas include information retrieval [76, 74, 32, 19], multimedia applications [39, 41, 73], DNA matching [4, 72, 63, 84], data compression =-=[49]-=-, pattern recognition [36, 31], and machine learning [30]. Typically, the objects are mapped into a multidimensional space of features, and similarity search is reduced to a Nearest Neighbor Search in... |

1997 |
Improved tools for biological sequence comparison.
- Pearson, Lipman
- 1988
(Show Context)
Citation Context ...is, given a set of objects, return the object most similar to a query object. Examples of such areas include information retrieval [76, 74, 32, 19], multimedia applications [39, 41, 73], DNA matching =-=[4, 72, 63, 84]-=-, data compression [49], pattern recognition [36, 31], and machine learning [30]. Typically, the objects are mapped into a multidimensional space of features, and similarity search is reduced to a Nea... |

1971 | The probabilistic method - Alon, Spencer - 2000 |

1377 |
Nearest neighbor pattern classification
- Cover, Hart
- 1967
(Show Context)
Citation Context ...o a query object. Examples of such areas include information retrieval [76, 74, 32, 19], multimedia applications [39, 41, 73], DNA matching [4, 72, 63, 84], data compression [49], pattern recognition =-=[36, 31]-=-, and machine learning [30]. Typically, the objects are mapped into a multidimensional space of features, and similarity search is reduced to a Nearest Neighbor Search in this feature space. In most a... |

1345 |
The Design and Analysis of Spatial Data Structures
- Samet
- 1990
(Show Context)
Citation Context ...equal hyper-rectangles. An example of the latter are the optimized k-d-trees [47] which recursively partition the data set into two almost equal sets. A survey of such data structures can be found in =-=[78]-=-. More recent results are reported in [48]. We now describe the optimized k-d-trees, a prominent data structure that has motivated a number of similar structures. The k-d-tree is a binary search tree.... |

1052 |
Query by image and video content: the qbic system.
- Flickner, al
- 1995
(Show Context)
Citation Context ...of similarity search, that is, given a set of objects, return the object most similar to a query object. Examples of such areas include information retrieval [76, 74, 32, 19], multimedia applications =-=[39, 41, 73]-=-, DNA matching [4, 72, 63, 84], data compression [49], pattern recognition [36, 31], and machine learning [30]. Typically, the objects are mapped into a multidimensional space of features, and similar... |

1030 | Approximate nearest neighbors: Towards removing the curse of dimensionality
- Indyk, Motwani
- 1998
(Show Context)
Citation Context ... can achieve query time O(n + d log 3 n). The algorithm requires space O (nd), where the O () notation hides factors that are polynomial in log n. 6.2.3 The Hamming Space The following two algorithms =-=[61, 56]-=- that we present consider the approximate NNS problem for the Hamming space (H d ; h), where H d is the d-dimensional Hamming cube, and h is the Hamming distance. Then they show that an instance of th... |

984 | An optimal algorithm for approximate nearest neighbor searching
- Arya, Mount, et al.
- 1994
(Show Context)
Citation Context ... in the dimension, or require query time that is not much better than a linear scan of the data points. In fact for dimension d ? log n, a brute force search is usually the best choice both in theory =-=[27, 6]-=-, and in practice [14, 86], a predicament commonly referred to as the curse of dimensionality 1 . In an attempt to circumvent the curse of dimensionality, researchers resorted to approximate nearest 1... |

959 |
Analysis of a complex of statistical variables into principal components
- Hotelling
- 1933
(Show Context)
Citation Context ...cuments and queries are represented as vectors in the space of terms, a space of thousands of dimensions. Even though there exist techniques such as LSI [32, 19, 13, 38], Principal Component Analysis =-=[54, 38]-=-, or Karhunen-Loeve transform [38], that reduce the dimensionality of the space, the resulting dimension is still in the order of a few hundreds. The situation is similar in the case of image retrieva... |

776 |
Algorithms in Combinatorial Geometry
- Edelsbrunner
- 1987
(Show Context)
Citation Context ...technique is called limited node copying persistence [81]. A different approach to the NNS problem is to consider it as a special case of the problem of point location in monotone planar subdivisions =-=[37]-=-. A planar subdivision is a partition of the plane into vertices, edges, and polygonal faces. A monotone subdivision is a planar subdivision whose faces are x-monotone polygons, i.e. the intersection ... |

764 | An algorithm for finding best matches in logarithmic expected time.
- Friedman, Bentley, et al.
- 1977
(Show Context)
Citation Context ...ta set to be stored. An example of the former are the region quadtrees [77] which recursively partition the space into 2 d equal hyper-rectangles. An example of the latter are the optimized k-d-trees =-=[47]-=- which recursively partition the data set into two almost equal sets. A survey of such data structures can be found in [78]. More recent results are reported in [48]. We now describe the optimized k-d... |

759 | Communication Complexity
- Kushilevitz, Nisan
- 1996
(Show Context)
Citation Context ...the data structure is the number t of cells that are probed in order to answer the query. 31 A useful tool for the study of problems within the cell probe model is asymmetric communication complexity =-=[71]-=-. Asymmetric communication complexity is a special case of the communication complexity model, an elegant model introduced by Yao for the study of distributed applications. We have two players, Alice ... |

686 | Multidimensional access methods
- Gaede, Gunther
- 1998
(Show Context)
Citation Context ...e most commonly used distance measure for R d is the Euclidean distance L 2 . However, other Minkowsky metrics L p , such as the Manhattan distance L 1 , or the maximum norm L1 , are also widely used =-=[55, 10, 48]-=-. Another interesting case is the space \Sigma d , the space of all strings of size d over some alphabet \Sigma. The distance between two strings is usually defined as the number of coordinates in whi... |

653 | The ubiquitous B-tree
- Comer
- 1979
(Show Context)
Citation Context ...he index should be linear in the number of points, with a small multiplicative constant. 23 The problem can be solved optimally for the case of 1-dimensional spaces, using the B-tree and its variants =-=[28]-=-. B-trees impose a hierarchical decomposition of the data. They consist of balanced trees, where each node of the tree corresponds to an interval of points. The children of an internal node are mutual... |

641 | Similarity search in high dimensions via hashing. In
- Gionis, Indyk, et al.
- 1999
(Show Context)
Citation Context ...s that points that are close in space have higher probability to be hashed to the same bucket. Locality sensitive hash functions are used to answer approximate nearest neighbor queries. Gionis et al. =-=[50]-=- propose an implementation of the algorithms described in [56], and they show that locality sensitive hashing outperforms the SR-trees. The AV-file Recent studies [86] have shown that under the unifor... |

623 | A quantitative analysis and performance study for similarity-search methods in high dimensional spaces, in:
- Weber, Schek, et al.
- 1998
(Show Context)
Citation Context ...uire query time that is not much better than a linear scan of the data points. In fact for dimension d ? log n, a brute force search is usually the best choice both in theory [27, 6], and in practice =-=[14, 86], a p-=-redicament commonly referred to as the curse of dimensionality 1 . In an attempt to circumvent the curse of dimensionality, researchers resorted to approximate nearest 1 The term "curse of dimens... |

600 |
The input/output complexity of sorting and related problems
- Aggarwal, Vitter
- 1987
(Show Context)
Citation Context ...technique can be used for the evaluation of nearest neighbor queries. Finally, an interesting theoretical model for the study of indexes is the external memory model introduced by Aggarwal and Vitter =-=[2]-=-. The model is characterized by the following parameters: ffl N : the number of records of the problem instance, ffl M : the number of records that can fit in the main memory, ffl B : the number of re... |

592 | Nearest Neighbor Queries,”
- Roussopoulos, Kelley, et al.
- 1995
(Show Context)
Citation Context ..., the distance of the nearest neighbor. In the worst case however, it will access all the blocks of the index. 7.1.2 The RKV algorithm A variation of the same idea was proposed by Roussopoulos et al. =-=[75]-=-. Given a query point q, the algorithm starts from the root of the tree and performs a depth-first traversal of the tree. At each node the children are sorted according to some criterion, and they are... |

590 | The x-tree : An index structure for high-dimensional data
- Berchtold
- 1996
(Show Context)
Citation Context ...oints in the sphere. The authors implement a variant of the RKV algorithm for nearest neighbor queries. They show that the SR-tree outperforms both the SS-tree, and the R -tree. The X-tree The X-tree =-=[11]-=- is an index designed for multidimensional data. The X-tree relies on the observation that as the dimension increases the overlap between the bounding rectangles also increases. If the overlap is high... |

554 |
Rapid and Sensitive Protein Similarity Searches,”
- Pearson, Lipman
- 1985
(Show Context)
Citation Context ...is, given a set of objects, return the object most similar to a query object. Examples of such areas include information retrieval [76, 74, 32, 19], multimedia applications [39, 41, 73], DNA matching =-=[4, 72, 63, 84]-=-, data compression [49], pattern recognition [36, 31], and machine learning [30]. Typically, the objects are mapped into a multidimensional space of features, and similarity search is reduced to a Nea... |

541 | The quadtree and related hierarchical data structures,”
- Samet
- 1984
(Show Context)
Citation Context ...ms into two categories: those that partition the embedding space from which the points are drawn, and those that partition the data set to be stored. An example of the former are the region quadtrees =-=[77]-=- which recursively partition the space into 2 d equal hyper-rectangles. An example of the latter are the optimized k-d-trees [47] which recursively partition the data set into two almost equal sets. A... |

524 |
Syntactic clustering of the Web
- Broder, Glassman, et al.
- 1997
(Show Context)
Citation Context ...nts is usually defined as the dot product of the two vectors, or the cosine of their angle. Recently, an interesting similarity measure for documents used for web search is the set similarity measure =-=[18, 17]. Giv-=-en two sets of terms A and B, we define the measure of their similarity as A"B A[B . Indyk and Motwani consider the NNS problem under this similarity measure using locality sensitive hashing tech... |

514 |
The R* tree: an efficient and robust access method for points and rectangles, in:
- Beckmann, Kriegel, et al.
- 1990
(Show Context)
Citation Context ... that causes an MBR to overflow, we split the MBR into two MBRs. The disadvantage of splits is that they may cause the overlap between different MBRs, which results in higher search time. The R -tree =-=[7]-=- addresses this problem by forcing the points in the MBR to be reinserted, resulting in fewer splits, and less overlap. Despite their popularity, R-trees are not well suited for nearest neighbor queri... |

506 | On the resemblance and containment of documents
- BRODER
- 1997
(Show Context)
Citation Context ...nts is usually defined as the dot product of the two vectors, or the cosine of their angle. Recently, an interesting similarity measure for documents used for web search is the set similarity measure =-=[18, 17]. Giv-=-en two sets of terms A and B, we define the measure of their similarity as A"B A[B . Indyk and Motwani consider the NNS problem under this similarity measure using locality sensitive hashing tech... |

503 | Efficient and Effective Querying by Image Content,
- Faloutsos, Barber, et al.
- 1994
(Show Context)
Citation Context ...of similarity search, that is, given a set of objects, return the object most similar to a query object. Examples of such areas include information retrieval [76, 74, 32, 19], multimedia applications =-=[39, 41, 73]-=-, DNA matching [4, 72, 63, 84], data compression [49], pattern recognition [36, 31], and machine learning [30]. Typically, the objects are mapped into a multidimensional space of features, and similar... |

438 | The sr-tree: An index structure for high-dimensional nearest neighbor queries
- Katayama, Satoh
- 1997
(Show Context)
Citation Context ...meter, but they exhibit high overlap. On the other hand, the R -tree groups the points into regions that have small overlap, but their diameter increases exponentially with the dimension. The SR-tree =-=[58]-=- attempts to combine the advantages, and overcome the shortcomings of both these methods, by using both bounding spheres and bounding rectangles. A region in the SR-tree is defined as the intersection... |

438 |
Photobook: Tools for content-based manipula-tion of image databases.
- Pentland, Picard, et al.
- 1994
(Show Context)
Citation Context ...of similarity search, that is, given a set of objects, return the object most similar to a query object. Examples of such areas include information retrieval [76, 74, 32, 19], multimedia applications =-=[39, 41, 73]-=-, DNA matching [4, 72, 63, 84], data compression [49], pattern recognition [36, 31], and machine learning [30]. Typically, the objects are mapped into a multidimensional space of features, and similar... |

358 | Data structures and algorithms for nearest neighbor search in general metric spaces. - Yianilos - 1993 |

345 | Similarity indexing with the ss-tree
- White, Jain
- 1996
(Show Context)
Citation Context ...asily adapted to work as indexes for secondary memory. In this section we present some of the indexes designed for indexing multidimensional data for nearest neighbor queries. The SS-tree The SS-tree =-=[87]-=- is inspired by the R -tree, but instead of bounding hyper-rectangles it uses bounding spheres. Given a set of points it encloses them in a sphere that is centered at the centroid of the points, and h... |

309 | A weighted nearest neighbor algorithm for learning with symbolic features.
- Cost, Salzberg
- 1993
(Show Context)
Citation Context ...uch areas include information retrieval [76, 74, 32, 19], multimedia applications [39, 41, 73], DNA matching [4, 72, 63, 84], data compression [49], pattern recognition [36, 31], and machine learning =-=[30]-=-. Typically, the objects are mapped into a multidimensional space of features, and similarity search is reduced to a Nearest Neighbor Search in this feature space. In most applications, the resulting ... |

300 |
Computational Geometry: An Introduction through Randomized Algorithms.
- Mulmuley
- 1994
(Show Context)
Citation Context ... For the remainder of this section we consider the vector space to be R d , and the distance metric to be the Euclidean norm L 2 . The presentation in this section follows closely the presentation in =-=[70]-=-. 3.1 Hyperplanes and half-spaces We define a hyperplane h in R d as the set of points in R d that satisfy a linear equality of the form a 1 x 1 +a 2 x 2 + : : : +a d x d = a 0 , where x j 's denote t... |

243 | Applying parallel computation algorithms in the design of serial algorithms
- Megiddo
- 1983
(Show Context)
Citation Context ...e on the boundary of the convex polytope that is hit by a ray emanating from infinity and passing through the point q. Agarwal and Matusek solve the problem using a technique called parametric search =-=[65, 70]-=-. Parametric search is a powerful technique that reduces search problems to decision problems. We present the technique through its application to the vertical ray shooting problem in appendix B. The ... |

214 | Efficient search for approximate nearest neighbor in high dimensional spaces
- Kushilevitz, Ostrovsky, et al.
- 2000
(Show Context)
Citation Context ... can achieve query time O(n + d log 3 n). The algorithm requires space O (nd), where the O () notation hides factors that are polynomial in log n. 6.2.3 The Hamming Space The following two algorithms =-=[61, 56]-=- that we present consider the approximate NNS problem for the Hamming space (H d ; h), where H d is the d-dimensional Hamming cube, and h is the Hamming distance. Then they show that an instance of th... |

210 | Ranking in spatial databases.
- Hjaltason, Samet
- 1995
(Show Context)
Citation Context ...gorithms for Nearest Neighbor Search We now present two algorithms for Nearest Neighbor Search. The algorithms can be applied to any hierarchical index that makes use of what we term container blocks =-=[53]-=-. This term is used to denote an area of the space that may itself be decomposed further, depending on the number of points it contains. The resulting index consists of a tree of nested container bloc... |

207 |
Satisfying general proximity/similarity queries with metric trees. Information processing letters,
- Uhlmann
- 1991
(Show Context)
Citation Context ...nt relative to the R -tree. One caveat in the above approach is that the design of the X-tree is based on the assumption that data and query points are uniformly distributed. The VP-tree The VP-trees =-=[82, 90, 16]-=- use a simple and intuitive idea. Given a set of points P , we select a point p to be the vantage point. We compute the distance from p to all other points in P , and we use these distances to compare... |

203 | A cost model for nearest neighbor search in highdimensional data space
- Keim, Berchtold, et al.
- 1997
(Show Context)
Citation Context ...ich they are arranged in the disk, which is approximately ten times faster than an unordered retrieval of the same pages. The same conclusion is reached in a more careful analysis by Berchtold et al. =-=[9]-=- that takes into account the boundary effects. Therefore, when evaluating an index method it is also important to compare it with the linear scan. Results of similar flavor are obtained by Beyer et al... |

196 | Two algorithms for nearest-neighbor search in high dimension
- Kleinberg
- 1997
(Show Context)
Citation Context ...ces of the points in P , and the set of all possible traces. Given a query point q, we use the precomputed information for the trace t(q) to answer the query. 6.2.2 The Kleinberg algorithms Kleinberg =-=[59]-=- was the first to obtain a result that improves asymptotically over the simple linear search. He presents two algorithms. The first answers queries in time O((d 2 log d)(d + log n)), but requires stor... |

177 | Planar Point Location Using Persistent Search Trees,”
- Sarnak, Tarjan
- 1986
(Show Context)
Citation Context ...essary space however, is O(n 2 ), since there are O(n) slabs, each containing O(n) edges in the worst case. 7 The storage requirements can be reduced to O(n) by introducing the concept of persistence =-=[80]-=-. A dynamic data structure is called persistent if it allows accesses to any of its previous versions. The main idea is to exploit the fact that the location structures for adjacent slabs are similar ... |

157 | New Retrieval Approaches Using SMART: TREC 4
- Buckley, Singhal, et al.
- 1995
(Show Context)
Citation Context ...e of applications that involve the notion of similarity search, that is, given a set of objects, return the object most similar to a query object. Examples of such areas include information retrieval =-=[76, 74, 32, 19]-=-, multimedia applications [39, 41, 73], DNA matching [4, 72, 63, 84], data compression [49], pattern recognition [36, 31], and machine learning [30]. Typically, the objects are mapped into a multidime... |

142 |
Searching Multimedia Databases by Content.
- Faloutsos
- 1996
(Show Context)
Citation Context ...nsion. For example in information retrieval, documents and queries are represented as vectors in the space of terms, a space of thousands of dimensions. Even though there exist techniques such as LSI =-=[32, 19, 13, 38]-=-, Principal Component Analysis [54, 38], or Karhunen-Loeve transform [38], that reduce the dimensionality of the space, the resulting dimension is still in the order of a few hundreds. The situation i... |

138 | Approximate Nearest Neighbor Queries in Fixed Dimensions;
- Arya, Mount
- 1996
(Show Context)
Citation Context ... the tradeoff proven by Agarwal and Matusek [1] for the general case of random sampling algorithms. 6 Approximate Nearest Neighbor Search in d-dimensional Spaces 6.1 The Early Attempts Arya and Mount =-=[5] were-=- the first to consider the approximate Nearest Neighbor Problem. Their algorithm constructs for each point p 2 P a set of cones with angular diameter 3 O(") that share a common apex p, and cover ... |

129 | Distance-Based Indexing for High-Dimensional Metric Spaces,
- Bozkaya, Ozsoyoglu
- 1997
(Show Context)
Citation Context ...nt relative to the R -tree. One caveat in the above approach is that the design of the X-tree is based on the assumption that data and query points are uniformly distributed. The VP-tree The VP-trees =-=[82, 90, 16]-=- use a simple and intuitive idea. Given a set of points P , we select a point p to be the vantage point. We compute the distance from p to all other points in P , and we use these distances to compare... |

126 | Ray shooting and parametric search.
- Agarwal, Matousek
- 1993
(Show Context)
Citation Context ...n's random sampling with Meiser's searching technique. The 13 resulting algorithm uses O (n dn=2e(1+") ) space, and has O(d 5 log n) search time. The Agarwal and Matusek Algorithm Agarwal and Mat=-=usek [1]-=- view the NNS problem as a special case of ray shooting in a convex polytope. Recall that there is a simple transformation from a d-dimensional Voronoi diagram to a (d + 1)- dimensional upper convex p... |

122 |
A randomized algorithm for closest-point queries
- Clarkson
- 1988
(Show Context)
Citation Context ...space, and has query time O(log 2 n). The best solutions for Nearest Neighbor Search in three dimensions come from the application of algorithms for arbitrary dimensions d to the case d = 3. Clarkson =-=[26]-=-, and Meiser [66] give algorithms that have optimal query time O(log n), but require O(n 2+ffi ) space. Yao and Yao [89] give an algorithm that achieves linear space, but requires barely sublinear que... |

108 |
The Johnson-Lindenstrauss lemma and the sphericity of some graphs
- Frankl, Maehara
- 1988
(Show Context)
Citation Context ...e of the algorithm is O(n) \Theta O( 1 " ) d The query is answered in time O(d), the time to compute the hash function. The authors combine this result with a well known theorem by Frankl and Mae=-=hara [42], which st-=-ates that the projection of the point set P onto a subspace defined by 9" 2 log n lines preserves all inter-point distances within a relative error ". The resulting algorithm uses space (nd)... |

103 |
Multidimensional search problems
- Dobkin, Lipton
- 1976
(Show Context)
Citation Context ...earch will be an interval that determines the point closest to q. Thus, using O(n) space, and with O(n log n) preprocessing time, we have a data structure with O(log n) search time. Dobkin and Lipton =-=[33]-=- proposed the following natural generalization of the binary search to two dimensions. Given the point set P , create the Voronoi diagram of P . Using the plane sweep algorithm [70] this takes O(n log... |

93 | On data structures and asymmetric communication complexity
- Miltersen, Nisan, et al.
- 1998
(Show Context)
Citation Context ... returns no otherwise. Borodin et al. consider the NPM problem for the case that the query vector has exactly log n + 1 exposed coordinates. Using the richness technique developed by Miltersen et al. =-=[69], they pro-=-ve that if there is an [a; b]-protocol for the NPM problem, then either a =\Omega\Gamma361 n log d), or b = \Omega\Gamma n 1\Gamma" ) for some " ? 0. Theorem 1 implies that any cell probe mo... |

89 | On the Analysis of Indexing Schemes.
- Hellerstein, Koutsupias, et al.
- 1997
(Show Context)
Citation Context ...the bucket of the nearest code word. Error-correcting codes seem like an interesting idea to be explored further. 8.4 The indexing model A model of similar flavor was introduced by Hellerstein et al. =-=[52]-=- for the study of more general indexing schemes. Indexes in this model are evaluated in the context of a specific workload. A workload is defined by a triplet W = (D; I ; Q), where D is the domain of ... |

72 |
An algorithm for approximate closest-point queries,
- Clarkson
- 1994
(Show Context)
Citation Context ... in the dimension, or require query time that is not much better than a linear scan of the data points. In fact for dimension d ? log n, a brute force search is usually the best choice both in theory =-=[27, 6]-=-, and in practice [14, 86], a predicament commonly referred to as the curse of dimensionality 1 . In an attempt to circumvent the curse of dimensionality, researchers resorted to approximate nearest 1... |

70 |
Quaai-optimal upper bounds for simplex range searching and new zone theorems
- Chazelle, Sharir, et al.
- 1990
(Show Context)
Citation Context ...ek [64] proves that the half-space emptiness queries can be answered in time O ( n m 1=bd=2c log n), using O (m 1+ffi ) space, where nsmsn bd=2c . The algorithm is based on results by Chazelle et al. =-=[24]-=- on random sampling. A combination of this result with the parametric search technique gives an algorithm for ray shooting queries, and thus for nearest neighbor queries, that uses O (m 1+ffi ) space,... |

62 | Approximate nearest neighbor queries revisited.
- Chan
- 1998
(Show Context)
Citation Context ... a priority heap we can locate these points in time O(c d;" d log n). Therefore, the BBD-tree uses optimal space O(dn), and has query time O(c d;" d log n). Combining ideas from both [5] and=-= [6] Chan [22] provided -=-an algorithm that reduces the exponential factor " \Gamma(d\Gamma1) that appears in the running time of the algorithm in [6] to " \Gamma(d\Gamma1)=2 . The data structure consists of a set of... |

61 | A cost model for similarity queries in metric spaces,”
- Ciaccia, Patella, et al.
- 1998
(Show Context)
Citation Context ...Unfortunately, there is no comparative study of the VP-trees with other indexes, with respect to block accesses. An interesting probabilistic analysis under the homogeneity assumption can be found in =-=[25]-=-. Density based indexing In some applications such as pattern matching we can think of the database and the query points as being generated by a mixture of distributions. The idea behind density based... |

60 |
Point location in arrangements of hyperplanes
- Meiser
(Show Context)
Citation Context ...ery time O(log 2 n). The best solutions for Nearest Neighbor Search in three dimensions come from the application of algorithms for arbitrary dimensions d to the case d = 3. Clarkson [26], and Meiser =-=[66]-=- give algorithms that have optimal query time O(log n), but require O(n 2+ffi ) space. Yao and Yao [89] give an algorithm that achieves linear space, but requires barely sublinear query time. We inves... |

57 |
External memory algorithms
- Vitter
- 1998
(Show Context)
Citation Context ...eded to solve a problem. All other computation is free. They use this model to prove a \Theta( N B logM B N B ) bound for the sorting problem. They also derive bounds for various geometrical problems =-=[83]-=-. It would be interesting to investigate the performance of nearest neighbor queries under this model. 8 Lower Bounds for the Nearest Neighbor Search Problem To date there exists no satisfactory solut... |

56 | Locality-preserving hashing in multidimensional spaces - Indyk, Motwani, et al. - 1997 |

55 | Lower bounds for high dimensional nearest neighbor search and related problems,
- Borodin, Ostrovsky, et al.
- 1999
(Show Context)
Citation Context ...ages, and Bob sends at most b bits in each of his messages Unfortunately, the converse can be proven only for the restricted case when the number of rounds of the protocol is constant. Borodin et al. =-=[15]-=- study the No Partial Match(NPM) decision problem within the framework of the cell probe model. Let D be a database of n vectors in the d-dimensional Hamming space, and let q be a d-dimensional vector... |

54 | Lower bounds for union-split-find related problems on random access machines,
- Miltersen
- 1994
(Show Context)
Citation Context ...nction f(x; y) is the answer of the query x to the database y. It was recently shown that there is a strong connection between asymmetric communication complexity, and the cell probe model. Miltersen =-=[67, 68]-=- proves the following theorem. Theorem 1 If there is a cell probe model solution for some problem that uses a data structure with m cells, b bits per cell, and answers queries with at most t probes, t... |

45 |
Approximate closest-point queries in high dimensions
- Bern
- 1993
(Show Context)
Citation Context ...tor is possible at the expense of a multiplicative log n factor in the query time. Better results can be obtained if one is willing to settle for a rough approximation. Improving upon results by Bern =-=[12]-=-, Chan describes an method using quadtrees that finds a O(d 3=2 )-approximate nearest 4 A Delaunay neighbor of a point p is a point p 0 such that the Voronoi cells of the two points are adjacent. 17 n... |

37 |
When is "nearest neighbor" meaningful
- Beyer, Goldstein, et al.
- 1998
(Show Context)
Citation Context ...uire query time that is not much better than a linear scan of the data points. In fact for dimension d ? log n, a brute force search is usually the best choice both in theory [27, 6], and in practice =-=[14, 86], a p-=-redicament commonly referred to as the curse of dimensionality 1 . In an attempt to circumvent the curse of dimensionality, researchers resorted to approximate nearest 1 The term "curse of dimens... |

35 | Efficient sequence alignment algorithms.
- Waterman
- 1984
(Show Context)
Citation Context |

32 | On approximate nearest neighbors in non-Euclidean spaces,
- Indyk
- 1999
(Show Context)
Citation Context ...e most commonly used distance measure for R d is the Euclidean distance L 2 . However, other Minkowsky metrics L p , such as the Manhattan distance L 1 , or the maximum norm L1 , are also widely used =-=[55, 10, 48]-=-. Another interesting case is the space \Sigma d , the space of all strings of size d over some alphabet \Sigma. The distance between two strings is usually defined as the number of coordinates in whi... |

32 | Non-Expansive Hashing - Linial, Sasson - 1996 |

31 |
A lower bound on the complexity of orthogonal range queries
- Fredman
- 1981
(Show Context)
Citation Context ...ate the number of variables per register. Summing over all possible registers, we obtain the equation 2. Using this general technique the authors prove lower bounds for a variety of problems. Fredman =-=[43]-=- considers range queries, and spherical queries in the dynamic setting [44]. Fredman and Volper [46] consider the problem of partial match queries. They prove that the complexity of the problem is 1:2... |

30 | Information retrieval algorithms: A survey
- RAGHAVAN
- 1997
(Show Context)
Citation Context |

26 |
How to search in history
- Chazelle
- 1985
(Show Context)
Citation Context ...s Voronoi diagrams in three dimensions is bound to use at least \Theta(n 2 ) space. There are only few algorithms that tackle directly the NNS problem in three dimensions. The most efficient solution =-=[23]-=-, uses a persistent dynamic data structure for planar subdivisions to build a data structure that can be used for point location in space subdivisions. The algorithm requires O(n 2 ) space, and has qu... |

26 |
Lower bounds on the complexity of some optimal data structures
- Fredman
- 1981
(Show Context)
Citation Context ...point can serve as an approximate nearest neighbor for both instances. 8.2 The semi-group model A well structured model for the study of lower bounds for search problems is Fredman's semi-group model =-=[20, 45, 44]-=-. A semi-group S is defined as a set closed under an associative addition operation. 33 (Sometimes, we may assume that the operation is also commutative). In this model, a semi-group value is associat... |

23 |
Point location
- Snoeyink
- 2004
(Show Context)
Citation Context ... and copies of nodes that are updated, we can store a persistent data structure for the slabs in linear space, and keep the time in O(log n). This technique is called limited node copying persistence =-=[81]-=-. A different approach to the NNS problem is to consider it as a special case of the problem of point location in monotone planar subdivisions [37]. A planar subdivision is a partition of the plane in... |

21 | An approximation-based data structure for similarity search.
- Weber, Blott
- 1997
(Show Context)
Citation Context ...[86] have shown that under the uniformity and independence assumption for the data, as the dimension grows the performance of partitioning index methods degrades to that of a linear scan. The AV-file =-=[86, 85]-=- provides a simple method to accelerate this linear scan. Assuming that the data space is the unit cube, we impose a grid upon the space. For every point in the database the AV-file stores the binary ... |

19 | A lower bound on the complexity of approximate nearest-neighbor searching on the Hamming Cube
- Chakrabarti, Chazelle, et al.
(Show Context)
Citation Context ...th constant number of rounds of communication, then either it uses space\Omega\Gamma n log d ), or it has query time \Omega\Gamma n 1\Gamma" ), for some " ? 0. In the same context, Chakrabar=-=ti et al. [21] prov-=-ide lower bounds for deterministic algorithms for the problem of (1 + ")-approximate nearest neighbor. They show that if we construct a data structure that has poly(nd) number of cells, where eac... |

16 | On the cell probe complexity of polynomial evaluation
- Miltersen
- 1995
(Show Context)
Citation Context ...nction f(x; y) is the answer of the query x to the database y. It was recently shown that there is a strong connection between asymmetric communication complexity, and the cell probe model. Miltersen =-=[67, 68]-=- proves the following theorem. Theorem 1 If there is a cell probe model solution for some problem that uses a data structure with m cells, b bits per cell, and answers queries with at most t probes, t... |

16 | A lower bound theorem for indexing schemes and its application to multidimensional range queries.
- Samoladas, Miranker
- 1998
(Show Context)
Citation Context ...number of blocks that answer the query Q. The access overhead ff of the indexing scheme S is defined as the maximum access overhead over all possible queries. Note that 1sffsB. Samoladas and Miranker =-=[79]-=- prove that for any workload W , if we fix the access overhead ff, then if we can find a set of M queries that have large size (that is, greater than B), and small intersection (that is, O( B ff 2 )),... |

14 |
Neighborhood preserving hashing and approximate queries
- Dolev, Harari, et al.
- 1994
(Show Context)
Citation Context ...or example that this model can be used for the representation of hashing schemes. 8.3 The bucketing model A model oriented towards the study of hashing schemes is the model introduced by Dolev et al. =-=[35, 34]-=-, which we shall refer to as the bucketing model. Let D be a set of words (dictionary) of length d over an alphabet \Sigma. The authors consider algorithms that map the dictionary D onto a set of m bu... |

14 | Tight bounds for 2-dimensional indexing schemes.
- Koutsoupias, Taylor
- 1998
(Show Context)
Citation Context ...a\Gamma MB n ). They apply this theorem for the case of ddimensional range queries, and derive a lower bound\Omega\Gamma logB log ff ) for the storage redundancy. Range queries are also considered in =-=[52, 60]-=-. Another interesting family of workloads is derived by the set inclusion queries. For some m ? 1, the domain D is the set of all possible subsets of the set f1; 2; :::; mg. The instance I is a subset... |

13 |
Beyond uniformity and independence: Analysis of R- trees using the concept of fractal dimension
- Faloutsos, Kamel
- 1994
(Show Context)
Citation Context ...most identically distributed. The authors claim that real data follow this assumption. They support their claim by showing that their estimates are validated by experimental data. Faloutsos and Kamel =-=[40]-=- take a different approach: they suggest that real data have properties of mathematical fractals, self similarity in particular. They propose the fractal dimension as a good tool for determining the p... |

9 | A case study of latent semantic indexing
- BERRY, DUMAIS, et al.
- 1992
(Show Context)
Citation Context ...nsion. For example in information retrieval, documents and queries are represented as vectors in the space of terms, a space of thousands of dimensions. Even though there exist techniques such as LSI =-=[32, 19, 13, 38]-=-, Principal Component Analysis [54, 38], or Karhunen-Loeve transform [38], that reduce the dimensionality of the space, the resulting dimension is still in the order of a few hundreds. The situation i... |

5 |
F.: A general approach to d-dimension geometric queries
- Yao, Yao
- 1985
(Show Context)
Citation Context ...pplication of algorithms for arbitrary dimensions d to the case d = 3. Clarkson [26], and Meiser [66] give algorithms that have optimal query time O(log n), but require O(n 2+ffi ) space. Yao and Yao =-=[89]-=- give an algorithm that achieves linear space, but requires barely sublinear query time. We investigate these algorithms in detail in the following section. 9 5 Exact Nearest Neighbor Search in d-dime... |

4 |
Inherent complexity trade-offs for range query problems,
- Burkhard, Fredman, et al.
- 1981
(Show Context)
Citation Context ...point can serve as an approximate nearest neighbor for both instances. 8.2 The semi-group model A well structured model for the study of lower bounds for search problems is Fredman's semi-group model =-=[20, 45, 44]-=-. A semi-group S is defined as a set closed under an associative addition operation. 33 (Sometimes, we may assume that the operation is also commutative). In this model, a semi-group value is associat... |

3 |
Query time versus redundancy trade-offs for range queries
- Fredman, Volper
- 1981
(Show Context)
Citation Context ...point can serve as an approximate nearest neighbor for both instances. 8.2 The semi-group model A well structured model for the study of lower bounds for search problems is Fredman's semi-group model =-=[20, 45, 44]-=-. A semi-group S is defined as a set closed under an associative addition operation. 33 (Sometimes, we may assume that the operation is also commutative). In this model, a semi-group value is associat... |

3 |
The complexity of partial match retrieval in a dynamic setting
- Fredman, Volper
- 1982
(Show Context)
Citation Context ...n 2. Using this general technique the authors prove lower bounds for a variety of problems. Fredman [43] considers range queries, and spherical queries in the dynamic setting [44]. Fredman and Volper =-=[46]-=- consider the problem of partial match queries. They prove that the complexity of the problem is 1:226 d N , where the complexity of a query is defined as the total number of registers accessed by a s... |

2 |
Density-Based Indexing for Nearest-Neighbor Queries. Microsoft Research Tech Report
- Bennett, Fayyad, et al.
- 1998
(Show Context)
Citation Context ...based indexing In some applications such as pattern matching we can think of the database and the query points as being generated by a mixture of distributions. The idea behind density based indexing =-=[8]-=- is to use this fact when constructing an index for nearest neighbor queries. Using EM (Expectation Maximization) techniques we produce a probability distribution that best fits the data, which consis... |

1 |
High-dimensional index structures -- improving the performance of multimedia databases
- Berchtold, Bohm, et al.
- 1999
(Show Context)
Citation Context ...e most commonly used distance measure for R d is the Euclidean distance L 2 . However, other Minkowsky metrics L p , such as the Manhattan distance L 1 , or the maximum norm L1 , are also widely used =-=[55, 10, 48]-=-. Another interesting case is the space \Sigma d , the space of all strings of size d over some alphabet \Sigma. The distance between two strings is usually defined as the number of coordinates in whi... |

1 |
Finding the neighbor of a query in a dictionary
- Dolev, Harari, et al.
- 1993
(Show Context)
Citation Context ...or example that this model can be used for the representation of hashing schemes. 8.3 The bucketing model A model oriented towards the study of hashing schemes is the model introduced by Dolev et al. =-=[35, 34]-=-, which we shall refer to as the bucketing model. Let D be a set of words (dictionary) of length d over an alphabet \Sigma. The authors consider algorithms that map the dictionary D onto a set of m bu... |

1 |
Reporting points in half-spaces
- Matusek
- 1991
(Show Context)
Citation Context ...B), which is in turn reduced [1] to the half-space emptiness problem; given a set of points, and a hyperplane, decide if there is a database point in the half-space defined by the hyperplane. Matusek =-=[64]-=- proves that the half-space emptiness queries can be answered in time O ( n m 1=bd=2c log n), using O (m 1+ffi ) space, where nsmsn bd=2c . The algorithm is based on results by Chazelle et al. [24] on... |

1 |
Should tables be sorted? (extended abstract
- Yao
- 1978
(Show Context)
Citation Context ...the models that have can be utilized along this direction. 8.1 The cell probe model One of the strongest theoretical models for the study of data structures is the cell probe model, introduced by Yao =-=[88]-=-. In this model, the data structure consists of a set of m cells, each one holding b bits of information. The query time for a search algorithm for the data structure is the number t of cells that are... |