Results 1  10
of
145
An algorithm for finding best matches in logarithmic expected time
 ACM Transactions on Mathematical Software
, 1977
"... An algorithm and data structure are presented for searching a file containing N records, each described by k real valued keys, for the m closest matches or nearest neighbors to a given query record. The computation required to organize the file is proportional to kNlogN. The expected number of recor ..."
Abstract

Cited by 759 (2 self)
 Add to MetaCart
An algorithm and data structure are presented for searching a file containing N records, each described by k real valued keys, for the m closest matches or nearest neighbors to a given query record. The computation required to organize the file is proportional to kNlogN. The expected number of records examined in each search is independent of the file size. The expected computation to perform each search is proportionalto 1ogN. Empirical evidence suggests that except for very small files, this algorithm is considerably faster than other methods.
Fast approximate nearest neighbors with automatic algorithm configuration
 In VISAPP International Conference on Computer Vision Theory and Applications
, 2009
"... nearestneighbors search, randomized kdtrees, hierarchical kmeans tree, clustering. For many computer vision problems, the most time consuming component consists of nearest neighbor matching in highdimensional spaces. There are no known exact algorithms for solving these highdimensional problems ..."
Abstract

Cited by 448 (2 self)
 Add to MetaCart
(Show Context)
nearestneighbors search, randomized kdtrees, hierarchical kmeans tree, clustering. For many computer vision problems, the most time consuming component consists of nearest neighbor matching in highdimensional spaces. There are no known exact algorithms for solving these highdimensional problems that are faster than linear search. Approximate algorithms are known to provide large speedups with only minor loss in accuracy, but many such algorithms have been published with only minimal guidance on selecting an algorithm and its parameters for any given problem. In this paper, we describe a system that answers the question, “What is the fastest approximate nearestneighbor algorithm for my data? ” Our system will take any given dataset and desired degree of precision and use these to automatically determine the best algorithm and parameter values. We also describe a new algorithm that applies priority search on hierarchical kmeans trees, which we have found to provide the best known performance on many datasets. After testing a range of alternatives, we have found that multiple randomized kd trees provide the best performance for other datasets. We are releasing public domain code that implements these approaches. This library provides about one order of magnitude improvement in query time over the best previously available software and provides fully automated parameter selection. 1
Distance Browsing in Spatial Databases
, 1999
"... Two different techniques of browsing through a collection of spatial objects stored in an Rtree spatial data structure on the basis of their distances from an arbitrary spatial query object are compared. The conventional approach is one that makes use of a knearest neighbor algorithm where k is kn ..."
Abstract

Cited by 390 (20 self)
 Add to MetaCart
Two different techniques of browsing through a collection of spatial objects stored in an Rtree spatial data structure on the basis of their distances from an arbitrary spatial query object are compared. The conventional approach is one that makes use of a knearest neighbor algorithm where k is known prior to the invocation of the algorithm. Thus if m#kneighbors are needed, the knearest neighbor algorithm needs to be reinvoked for m neighbors, thereby possibly performing some redundant computations. The second approach is incremental in the sense that having obtained the k nearest neighbors, the k +1 st neighbor can be obtained without having to calculate the k +1nearest neighbors from scratch. The incremental approach finds use when processing complex queries where one of the conditions involves spatial proximity (e.g., the nearest city to Chicago with population greater than a million), in which case a query engine can make use of a pipelined strategy. A general incremental nearest neighbor algorithm is presented that is applicable to a large class of hierarchical spatial data structures. This algorithm is adapted to the Rtree and its performance is compared to an existing knearest neighbor algorithm for Rtrees [45]. Experiments show that the incremental nearest neighbor algorithm significantly outperforms the knearest neighbor algorithm for distance browsing queries in a spatial database that uses the Rtree as a spatial index. Moreover, the incremental nearest neighbor algorithm also usually outperforms the knearest neighbor algorithm when applied to the knearest neighbor problem for the Rtree, although the improvement is not nearly as large as for distance browsing queries. In fact, we prove informally that, at any step in its execution, the incremental...
Data Structures and Algorithms for Nearest Neighbor Search in General Metric Spaces
, 1993
"... We consider the computational problem of finding nearest neighbors in general metric spaces. Of particular interest are spaces that may not be conveniently embedded or approximated in Euclidian space, or where the dimensionality of a Euclidian representation is very high. Also relevant are highdim ..."
Abstract

Cited by 356 (5 self)
 Add to MetaCart
We consider the computational problem of finding nearest neighbors in general metric spaces. Of particular interest are spaces that may not be conveniently embedded or approximated in Euclidian space, or where the dimensionality of a Euclidian representation is very high. Also relevant are highdimensional Euclidian settings in which the distribution of data is in some sense of lower dimension and embedded in the space. The vptree (vantage point tree) is introduced in several forms, together with associated algorithms, as an improved method for these difficult search problems. Tree construction executes in O(n log(n)) time, and search is under certain circumstances and in the limit, O(log(n)) expected time. The theoretical basis for this approach is developed and the results of several experiments are reported. In Euclidian cases, kdtree performance is compared.
Image Retrieval using Color and Shape
 Pattern Recognition
, 1996
"... This report deals with efficient retrieval of images from large databases based on the color and shape content in images. With the increasing popularity of the use of large volume image databases in various applications, it becomes imperative to build an automatic and efficient retrieval system to b ..."
Abstract

Cited by 221 (9 self)
 Add to MetaCart
(Show Context)
This report deals with efficient retrieval of images from large databases based on the color and shape content in images. With the increasing popularity of the use of large volume image databases in various applications, it becomes imperative to build an automatic and efficient retrieval system to browse through the entire database. Techniques using textual attributes for annotations are limited in their applications. Our approach relies on image features that exploit visual cues such as color and shape. Unlike previous approaches which concentrate on extracting a single concise feature, our technique combines features that represent both the color and shape in images. Experimental results on a database of 400 trademark images show that an integrated color and shapebased feature representation results in 99% of the images being retrieved within the top two positions. Additional results demonstrate that a combination of clustering and a branch and boundbased matching scheme aids in i...
The TVtree  an index structure for highdimensional data
 VLDB Journal
, 1994
"... We propose a file structure to index highdimensionality data, typically, points in some feature space. The idea is to use only a few of the features, utilizing additional features whenever the additional discriminatory power is absolutely necessary. We present in detail the design of our tree struc ..."
Abstract

Cited by 216 (8 self)
 Add to MetaCart
(Show Context)
We propose a file structure to index highdimensionality data, typically, points in some feature space. The idea is to use only a few of the features, utilizing additional features whenever the additional discriminatory power is absolutely necessary. We present in detail the design of our tree structure and the associated algorithms that handle such `varying length' feature vectors. Finally we report simulation results, comparing the proposed structure with the R tree, which is one of the most successful methods for lowdimensionality spaces. The results illustrate the superiority of our method, with up to 80% savings in disk accesses. Type of Contribution: New Index Structure, for highdimensionality feature spaces. Algorithms and performance measurements. Keywords: Spatial Index, Similarity Retrieval, Query by Content 1 Introduction Many applications require enhanced indexing, capable of performing similarity searching on several, nontraditional (`exotic') data types. The targ...
Near neighbor search in large metric spaces
 In Proceedings of the 21th International Conference on Very Large Data Bases
, 1995
"... Given user data, one often wants to find approximate matches in a large database. A good example of such a task is finding images similar to a given image in a large collection of images. We focus on the important and technically difficult case where each data element is high dimensional, or more ge ..."
Abstract

Cited by 216 (0 self)
 Add to MetaCart
(Show Context)
Given user data, one often wants to find approximate matches in a large database. A good example of such a task is finding images similar to a given image in a large collection of images. We focus on the important and technically difficult case where each data element is high dimensional, or more generally, is represented by a point in a large metric spaceand distance calculations are computationally expensive. In this paper we introduce a data structure to solve this problem called a GNAT Geometric Nearneighbor Access Tree. It is based on the philosophy that the data structure should act as a hierarchical geometrical model of the data as opposed to a simple decomposition of the data that does not use its intrinsic geometry. In experiments, we find that GNAT’s outperform previous data structures in a number of applications.
R.: Cityscale location recognition
 In: Computer Vision and Pattern Recognition (CVPR
, 2007
"... We look at the problem of location recognition in a large image dataset using a vocabulary tree. This entails finding the location of a query image in a large dataset containing 3 × 104 streetside images of a city. We investigate how the traditional invariant feature matching approach falls down as ..."
Abstract

Cited by 177 (3 self)
 Add to MetaCart
(Show Context)
We look at the problem of location recognition in a large image dataset using a vocabulary tree. This entails finding the location of a query image in a large dataset containing 3 × 104 streetside images of a city. We investigate how the traditional invariant feature matching approach falls down as the size of the database grows. In particular we show that by carefully selecting the vocabulary using the most informative features, retrieval performance is significantly improved, allowing us to increase the number of database images by a factor of 10. We also introduce a generalization of the traditional vocabulary tree search algorithm which improves performance by effectively increasing the branching factor of a fixed vocabulary tree. 1.
Influence Sets Based on Reverse Nearest Neighbor Queries
 In SIGMOD
, 2000
"... Inherent in the operation of many decision support and continuous referral systems is the notion of the "influence" of a data point on the database. This notion arises in examples such as finding the set of customers affected by the opening of a new store outlet location, notifying the sub ..."
Abstract

Cited by 153 (1 self)
 Add to MetaCart
Inherent in the operation of many decision support and continuous referral systems is the notion of the "influence" of a data point on the database. This notion arises in examples such as finding the set of customers affected by the opening of a new store outlet location, notifying the subset of subscribers to a digital library who will find a newly added document most relevant, etc. Standard approaches to determining the influence set of a data point involve range searching and nearest neighbor queries. In this paper, we formalize a novel notion of influence based on reverse neighbor queries and its variants. Since the nearest neighbor relation is not symmetric, the set of points that are closest to a query point (i.e., the nearest neighbors) differs from the set of points that have the query point as their nearest neighbor (called the reverse nearest neighbors). Influence sets based on reverse nearest neighbor (RNN) queries seem to capture the intuitive notion of influence from our ...