Results 1  10
of
93
Nearoptimal hashing algorithms for approximate nearest neighbor in high dimensions
, 2008
"... In this article, we give an overview of efficient algorithms for the approximate and exact nearest neighbor problem. The goal is to preprocess a dataset of objects (e.g., images) so that later, given a new query object, one can quickly return the dataset object that is most similar to the query. The ..."
Abstract

Cited by 457 (7 self)
 Add to MetaCart
In this article, we give an overview of efficient algorithms for the approximate and exact nearest neighbor problem. The goal is to preprocess a dataset of objects (e.g., images) so that later, given a new query object, one can quickly return the dataset object that is most similar to the query. The problem is of significant interest in a wide variety of areas.
Approximate nearest neighbors and Fast JohnsonLindenstrauss Transform.
 Proceedings of the Symposium on Theory of Computing,
, 2006
"... ABSTRACT We introduce a new lowdistortion embedding of The FJLT is faster than standard random projections and just as easy to implement. It is based upon the preconditioning of a sparse projection matrix with a randomized Fourier transform. Sparse random projections are unsuitable for lowdistort ..."
Abstract

Cited by 156 (6 self)
 Add to MetaCart
(Show Context)
ABSTRACT We introduce a new lowdistortion embedding of The FJLT is faster than standard random projections and just as easy to implement. It is based upon the preconditioning of a sparse projection matrix with a randomized Fourier transform. Sparse random projections are unsuitable for lowdistortion embeddings. We overcome this handicap by exploiting the "Heisenberg principle" of the Fourier transform, ie, its localglobal duality. The FJLT can be used to speed up search algorithms based on lowdistortion embeddings in 1 and 2. We consider the case of approximate nearest neighbors in d 2 . We provide a faster algorithm using classical projections, which we then further speed up by plugging in the FJLT. We also give a faster algorithm for searching over the hypercube.
Nearestneighbor searching and metric space dimensions
 In NearestNeighbor Methods for Learning and Vision: Theory and Practice
, 2006
"... Given a set S of n sites (points), and a distance measure d, the nearest neighbor searching problem is to build a data structure so that given a query point q, the site nearest to q can be found quickly. This paper gives a data structure for this problem; the data structure is built using the distan ..."
Abstract

Cited by 107 (0 self)
 Add to MetaCart
(Show Context)
Given a set S of n sites (points), and a distance measure d, the nearest neighbor searching problem is to build a data structure so that given a query point q, the site nearest to q can be found quickly. This paper gives a data structure for this problem; the data structure is built using the distance function as a “black box”. The structure is able to speed up nearest neighbor searching in a variety of settings, for example: points in lowdimensional or structured Euclidean space, strings under Hamming and edit distance, and bit vector data from an OCR application. The data structures are observed to need linear space, with a modest constant factor. The preprocessing time needed per site is observed to match the query time. The data structure can be viewed as an application of a “kdtree ” approach in the metric space setting, using Voronoi regions of a subset in place of axisaligned boxes. 1
Fast HighDimensional Approximation with Sparse Occupancy Trees
, 2010
"... Abstract This paper is concerned with scattered data approximation in high dimensions: Given a data set X ⊂ R d of N data points x i along with values y i ∈ R d , i = 1, . . . , N , and viewing the y i as values y i = f (x i ) of some unknown function f , we wish to return for any query point x ∈ R ..."
Abstract

Cited by 94 (9 self)
 Add to MetaCart
(Show Context)
Abstract This paper is concerned with scattered data approximation in high dimensions: Given a data set X ⊂ R d of N data points x i along with values y i ∈ R d , i = 1, . . . , N , and viewing the y i as values y i = f (x i ) of some unknown function f , we wish to return for any query point x ∈ R d an approximationf (x) to y = f (x). Here the spatial dimension d should be thought of as large. We wish to emphasize that we do not seek a representation off in terms of a fixed set of trial functions but definef through recovery schemes which, in the first place, are designed to be fast and to deal efficiently with large data sets. For this purpose we propose new methods based on what we call sparse occupancy trees and piecewise linear schemes based on simplex subdivisions.
Fast k nearest neighbor search using GPU
 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
, 2008
"... Statistical measures coming from information theory represent interesting bases for image and video processing tasks such as image retrieval and video object tracking. For example, let us mention the entropy and the KullbackLeibler divergence. Accurate estimation of these measures requires to adapt ..."
Abstract

Cited by 73 (5 self)
 Add to MetaCart
Statistical measures coming from information theory represent interesting bases for image and video processing tasks such as image retrieval and video object tracking. For example, let us mention the entropy and the KullbackLeibler divergence. Accurate estimation of these measures requires to adapt to the local sample density, especially if the data are highdimensional. The k nearest neighbor (kNN) framework has been used to define efficient variablebandwidth kernelbased estimators with such a locally adaptive property. Unfortunately, these estimators are computationally intensive since they rely on searching neighbors among large sets of ddimensional vectors. This computational burden can be reduced by prestructuring the data, e.g. using binary trees as proposed by the Approximated Nearest Neighbor (ANN) library. Yet, the recent opening of Graphics Processing Units (GPU) to generalpurpose computation by means of the NVIDIA CUDA API offers the image and video processing community a powerful platform with parallel calculation capabilities. In this paper, we propose a CUDA implementation of the “brute force ” kNN search and we compare its performances to several CPUbased implementations including an equivalent brute force algorithm and ANN. We show a speed increase on synthetic and real data by up to one or two orders of magnitude depending on the data, with a quasilinear behavior with respect to the data size in a given, practical range. 1.
The fast JohnsonLindenstrauss transform and approximate nearest neighbors
 SIAM J. COMPUT
, 2009
"... We introduce a new lowdistortion embedding of ℓd n) 2 into ℓO(log p (p =1, 2) called the fast Johnson–Lindenstrauss transform (FJLT). The FJLT is faster than standard random projections and just as easy to implement. It is based upon the preconditioning of a sparse projection matrix with a random ..."
Abstract

Cited by 61 (0 self)
 Add to MetaCart
(Show Context)
We introduce a new lowdistortion embedding of ℓd n) 2 into ℓO(log p (p =1, 2) called the fast Johnson–Lindenstrauss transform (FJLT). The FJLT is faster than standard random projections and just as easy to implement. It is based upon the preconditioning of a sparse projection matrix with a randomized Fourier transform. Sparse random projections are unsuitable for lowdistortion embeddings. We overcome this handicap by exploiting the “Heisenberg principle ” of the Fourier transform, i.e., its localglobal duality. The FJLT can be used to speed up search algorithms based on lowdistortion embeddings in ℓ1 and ℓ2. We consider the case of approximate nearest neighbors in ℓd 2. We provide a faster algorithm using classical projections, which we then speed up further by plugging in the FJLT. We also give a faster algorithm for searching over the hypercube.
On the probabilistic foundations of probabilistic roadmap planning
 In Proc. Int. Symp. on Robotics Research
, 2005
"... Probabilistic roadmap (PRM) planners [5, 16] solve apparently difficult motion planning problems where the robot’s configuration space C has dimensionality six or more, and the geometry of the robot and the obstacles is described by hundreds of thousands of triangles. While an algebraic planner woul ..."
Abstract

Cited by 61 (11 self)
 Add to MetaCart
Probabilistic roadmap (PRM) planners [5, 16] solve apparently difficult motion planning problems where the robot’s configuration space C has dimensionality six or more, and the geometry of the robot and the obstacles is described by hundreds of thousands of triangles. While an algebraic planner would be overwhelmed by the high cost of computing an exact representation of the free space F, defined as the collisionfree subset of C, a PRM planner builds only an extremely simplified representation of F, called a probabilistic roadmap. This roadmap is a graph, whose nodes are configurations sampled from F with a suitable probability measure and whose edges are simple collisionfree paths, e.g., straightline segments, between the sampled configurations. PRM planners work surprisingly well in practice, but why? Previous work has partially addressed this question by identifying and formalizing properties of F that guarantee good performance for a PRM planner using the uniform sampling measure (e.g.,
Efficient meanshift tracking via a new similarity measure
 in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR ’05
, 2005
"... The mean shift algorithm has achieved considerable success in object tracking due to its simplicity and robustness. It finds local minima of a similarity measure between the color histograms or kernel density estimates of the model and target image. The most typically used similarity measures are th ..."
Abstract

Cited by 52 (4 self)
 Add to MetaCart
(Show Context)
The mean shift algorithm has achieved considerable success in object tracking due to its simplicity and robustness. It finds local minima of a similarity measure between the color histograms or kernel density estimates of the model and target image. The most typically used similarity measures are the Bhattacharyya coefficient or the KullbackLeibler divergence. In practice, these approaches face three difficulties. First, the spatial information of the target is lost when the color histogram is employed, which precludes the application of more elaborate motion models. Second, the classical similarity measures are not very discriminative. Third, the samplebased classical similarity measures require a calculation that is quadratic in the number of samples, making realtime performance difficult. To deal with these difficulties we propose a new, simpletocompute and more discriminative similarity measure in spatialfeature spaces. The new similarity measure allows the mean shift algorithm to track more general motion models in an integrated way. To reduce the complexity of the computation to linear order we employ the recently proposed improved fast Gauss transform. This leads to a very efficient and robust nonparametric spatialfeature tracking algorithm. The algorithm is tested on several image sequences and shown to achieve robust and reliable framerate tracking.
Entropy based nearest neighbor search in high dimensions
 In Proc. 17th Ann. ACMSIAM Symposium on Discrete Algorithm
, 1195
"... In this paper we study the problem of finding the approximate nearest neighbor of a query point in the high dimensional space, focusing on the Euclidean space. The earlier approaches use localitypreserving hash functions (that tend to map nearby points to the same value) to construct several hash ..."
Abstract

Cited by 51 (5 self)
 Add to MetaCart
In this paper we study the problem of finding the approximate nearest neighbor of a query point in the high dimensional space, focusing on the Euclidean space. The earlier approaches use localitypreserving hash functions (that tend to map nearby points to the same value) to construct several hash tables to ensure that the query point hashes to the same bucket as its nearest neighbor in at least one table. Our approach is different – we use one (or a few) hash table and hash several randomly chosen points in the neighborhood of the query point showing that at least one of them will hash to the bucket containing its nearest neighbor. We show that the number of randomly chosen points in the neighborhood of the query point q required depends on the entropy of the hash value h(p) of a random point p at the same distance from q at its nearest neighbor, given q and the locality preserving hash function h chosen randomly from the hash family. Precisely, we show that if the entropy I(h(p)q, h) = M and g is a bound on the probability that two faroff points will hash to the same bucket, then we can find the approximate nearest neighbor in O(nρ) time and near linear Õ(n) space where ρ = M / log(1/g). Alternatively we can build a data structure of size Õ(n1/(1−ρ)) to answer queries in Õ(d) time. By applying this analysis to the locality preserving hash functions in [17, 21, 6] and adjusting the parameters we show that the c nearest neighbor can be computed in time Õ(nρ) and near linear space where ρ ≈ 2.06/c as c becomes large. 1
Spotsigs: robust and efficient near duplicate detection in large web collections
 In SIGIR ’08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
, 2008
"... Motivated by our work with political scientists who need to manually analyze large Web archives of news sites, we present SpotSigs, a new algorithm for extracting and matching signatures for near duplicate detection in large Web crawls. Our spot signatures are designed to favor naturallanguage porti ..."
Abstract

Cited by 45 (2 self)
 Add to MetaCart
(Show Context)
Motivated by our work with political scientists who need to manually analyze large Web archives of news sites, we present SpotSigs, a new algorithm for extracting and matching signatures for near duplicate detection in large Web crawls. Our spot signatures are designed to favor naturallanguage portions of Web pages over advertisements and navigational bars. The contributions of SpotSigs are twofold: 1) by combining stopword antecedents with short chains of adjacent content terms, we create robust document signatures with a natural ability to filter out noisy components of Web pages that would otherwise distract pure ngrambased approaches such as Shingling; 2) we provide an exact and efficient, selftuning matching algorithm that exploits a novel combination of collection partitioning and inverted index pruning for highdimensional similarity search. Experiments confirm an increase in combined precision and recall of more than 24 percent over stateoftheart approaches such as Shingling or IMatch and up to a factor of 3 faster execution times than Locality Sensitive Hashing (LSH), over a demonstrative “Gold Set ” of manually assessed nearduplicate news articles as well as the TREC WT10g Web collection.