Results 1  10
of
13
Nearoptimal hashing algorithms for approximate nearest neighbor in high dimensions
, 2008
"... In this article, we give an overview of efficient algorithms for the approximate and exact nearest neighbor problem. The goal is to preprocess a dataset of objects (e.g., images) so that later, given a new query object, one can quickly return the dataset object that is most similar to the query. The ..."
Abstract

Cited by 457 (7 self)
 Add to MetaCart
In this article, we give an overview of efficient algorithms for the approximate and exact nearest neighbor problem. The goal is to preprocess a dataset of objects (e.g., images) so that later, given a new query object, one can quickly return the dataset object that is most similar to the query. The problem is of significant interest in a wide variety of areas.
Nearest Neighbors In HighDimensional Spaces
, 2004
"... In this chapter we consider the following problem: given a set P of points in a highdimensional space, construct a data structure which given any query point q nds the point in P closest to q. This problem, called nearest neighbor search is of significant importance to several areas of computer sci ..."
Abstract

Cited by 93 (2 self)
 Add to MetaCart
In this chapter we consider the following problem: given a set P of points in a highdimensional space, construct a data structure which given any query point q nds the point in P closest to q. This problem, called nearest neighbor search is of significant importance to several areas of computer science, including pattern recognition, searching in multimedial data, vector compression [GG91], computational statistics [DW82], and data mining. Many of these applications involve data sets which are very large (e.g., a database containing Web documents could contain over one billion documents). Moreover, the dimensionality of the points is usually large as well (e.g., in the order of a few hundred). Therefore, it is crucial to design algorithms which scale well with the database size as well as with the dimension. The nearestneighbor problem is an example of a large class of proximity problems, which, roughly speaking, are problems whose definitions involve the notion of...
LowDistortion Embeddings of Finite Metric Spaces
 in Handbook of Discrete and Computational Geometry
, 2004
"... INTRODUCTION An npoint metric space (X; D) can be represented by an n n table specifying the distances. Such tables arise in many diverse areas. For example, consider the following scenario in microbiology: X is a collection of bacterial strains, and for every two strains, one is given their diss ..."
Abstract

Cited by 66 (1 self)
 Add to MetaCart
INTRODUCTION An npoint metric space (X; D) can be represented by an n n table specifying the distances. Such tables arise in many diverse areas. For example, consider the following scenario in microbiology: X is a collection of bacterial strains, and for every two strains, one is given their dissimilarity (computed, say, by comparing their DNA). It is dicult to see any structure in a large table of numbers, and so we would like to represent a given metric space in a more comprehensible way. For example, it would be very nice if we could assign to each x 2 X a point f(x) in the plane in such a way that D(x; y) equals the Euclidean distance of f(x) and f(y). Such a representation would allow us to see the structure of the metric space: tight clusters, isolated points, and so on. Another advantage would be that the metric would now be represented by only 2n real numbers, the coordinates of the n points in the plane, instead of numbers as before. Moreover, many quantities concern
Algorithms for dynamic geometric problems over data streams
 In STOC ’04: Proceedings of the thirtysixth annual ACM symposium on Theory of computing
, 2004
"... ..."
An improved approximation algorithm for the 0extension problem
 In 14th Annual ACMSIAM Symposium on Discrete Algorithms
, 2003
"... Abstract Given a graph G = (V, E), a set of terminals T ` V, anda metric D on T, the 0extension problem is to assignvertices in V to terminals, so that the sum, over all edges e, of the distance (under D) between the terminals towhich the end points of e are assigned, is minimized.This problem was ..."
Abstract

Cited by 33 (5 self)
 Add to MetaCart
(Show Context)
Abstract Given a graph G = (V, E), a set of terminals T ` V, anda metric D on T, the 0extension problem is to assignvertices in V to terminals, so that the sum, over all edges e, of the distance (under D) between the terminals towhich the end points of e are assigned, is minimized.This problem was first studied by Karzanov. Calinescu, Karloff and Rabani gave an O(log k) approximationalgorithm based on a linear programming relaxation for the problem, where k is the number of terminals. Weimprove on this bound, and give an O(log k / log log k)approximation algorithm for the problem. 1 Introduction In the 0extension problem, we are given an undirectedgraph G = (V, E) with costs c(u, v) on edges, a setof terminals
Efficient sketches for earthmover distance, with applications
 in FOCS
, 2009
"... Abstract — We provide the first sublinear sketching algorithm for estimating the planar EarthMover Distance with a constant approximation. For sets living in the twodimensional grid [∆] 2, we achieve space ∆ ɛ for approximation O(1/ɛ), for any desired 0 < ɛ < 1. Our sketch has immediate app ..."
Abstract

Cited by 28 (8 self)
 Add to MetaCart
(Show Context)
Abstract — We provide the first sublinear sketching algorithm for estimating the planar EarthMover Distance with a constant approximation. For sets living in the twodimensional grid [∆] 2, we achieve space ∆ ɛ for approximation O(1/ɛ), for any desired 0 < ɛ < 1. Our sketch has immediate applications to the streaming and nearest neighbor search problems. 1.
Earth Mover Distance over HighDimensional Spaces
, 2007
"... The Earth Mover Distance (EMD) between two equalsize sets of points in R d is defined to be the minimum cost of a bipartite matching between the two pointsets. It is a natural metric for comparing sets of features, and as such, it has received significant interest in computer vision. Motivated by re ..."
Abstract

Cited by 22 (8 self)
 Add to MetaCart
The Earth Mover Distance (EMD) between two equalsize sets of points in R d is defined to be the minimum cost of a bipartite matching between the two pointsets. It is a natural metric for comparing sets of features, and as such, it has received significant interest in computer vision. Motivated by recent developments in that area, we address computational problems involving EMD over highdimensional pointsets. A natural approach is to embed the EMD metric into ℓ1, and use the algorithms designed for the latter space. However, Khot and Naor [KN06] show that any embedding of EMD over the ddimensional Hamming cube into ℓ1 must incur a distortion Ω(d), thus practically losing all distance information. We circumvent this roadblock by focusing on sets with cardinalities upperbounded by a parameter s, and achieve a distortion of only O(log s · log d). Since in applications the feature sets have bounded size, the resulting distortion is much smaller than the Ω(d) lower bound. Our approach is quite general and easily extends to EMD over R d. We then provide a strong lower bound on the multiround communication complexity of estimating EMD, which in particular strengthens the known nonembeddability result of [KN06]. Our bound exhibits a smooth tradeoff between approximation and communication, and for example implies that every algorithm that estimates EMD using constant size sketches can only achieve Ω(log s) approximation.
A near linear time constant factor approximation for Euclidean bichromatic matching (cost)
 IN PROC. 18TH SYMP. ON DISC. ALG
, 2007
"... We give an N log O(1) Ntime randomized O(1)approximation algorithm for computing the cost of minimum bichromatic matching between two planar pointsets of size N. ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
We give an N log O(1) Ntime randomized O(1)approximation algorithm for computing the cost of minimum bichromatic matching between two planar pointsets of size N.
Overcoming the ℓ1 nonembeddability barrier: Algorithms for product metrics
, 2008
"... A common approach for solving computational problems over a difficult metric space is to embed the “hard ” metric into L1, which admits efficient algorithms and is thus considered an “easy ” metric. This approach has proved successful or partially successful for important spaces such as the edit dis ..."
Abstract

Cited by 18 (8 self)
 Add to MetaCart
(Show Context)
A common approach for solving computational problems over a difficult metric space is to embed the “hard ” metric into L1, which admits efficient algorithms and is thus considered an “easy ” metric. This approach has proved successful or partially successful for important spaces such as the edit distance, but it also has inherent limitations: it is provably impossible to go below certain approximation for some metrics. We propose a new approach, of embedding the difficult space into richer host spaces, namely iterated products of standard spaces like ℓ1 and ℓ∞. We show that this class is rich since it contains useful metric spaces with only a constant distortion, and, at the same time, it is tractable and admits efficient algorithms. Using this approach, we obtain for example the first nearest neighbor data structure with O(log log d) approximation for edit distance in nonrepetitive strings (the Ulam metric). This approximation is exponentially better than the lower bound for embedding into L1. Furthermore, we give constant factor approximation for two other computational problems. Along the way, we answer positively a question posed in [Ajtai, Jayram, Kumar, and Sivakumar, STOC 2002]. One of our algorithms has already found applications for smoothed edit distance over 01 strings [Andoni and Krauthgamer, ICALP 2008]. 1
KMedian Clustering, ModelBased Compressive Sensing, and Sparse Recovery for Earth Mover Distance
, 2011
"... We initiate the study of sparse recovery problems under the EarthMover Distance (EMD). Specifically, we design a distribution over m × n matrices A such that for any x, given Ax, we can recover a ksparse approximation to x under the EMD distance. One construction yields m = O(k log(n/k)) and a 1 + ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
We initiate the study of sparse recovery problems under the EarthMover Distance (EMD). Specifically, we design a distribution over m × n matrices A such that for any x, given Ax, we can recover a ksparse approximation to x under the EMD distance. One construction yields m = O(k log(n/k)) and a 1 + ɛ approximation factor, which matches the best achievable bound for other error measures, such as the ℓ1 norm. Our algorithms are obtained by exploiting novel connections to other problems and areas, such as streaming algorithms for kmedian clustering and modelbased compressive sensing. We also provide novel algorithms and results for the latter problems.