Results 1  10
of
33
Approximate Nearest Neighbors and the Fast JohnsonLindenstrauss Transform
 STOC'06
, 2006
"... We introduce a new lowdistortion embedding of ℓ d 2 into O(log n) ℓp (p = 1, 2), called the FastJohnsonLindenstraussTransform. The FJLT is faster than standard random projections and just as easy to implement. It is based upon the preconditioning of a sparse projection matrix with a randomized F ..."
Abstract

Cited by 156 (6 self)
 Add to MetaCart
We introduce a new lowdistortion embedding of ℓ d 2 into O(log n) ℓp (p = 1, 2), called the FastJohnsonLindenstraussTransform. The FJLT is faster than standard random projections and just as easy to implement. It is based upon the preconditioning of a sparse projection matrix with a randomized Fourier transform. Sparse random projections are unsuitable for lowdistortion embeddings. We overcome this handicap by exploiting the “Heisenberg principle” of the Fourier transform, ie, its localglobal duality. The FJLT can be used to speed up search algorithms based on lowdistortion embeddings in ℓ1 and ℓ2. We consider the case of approximate nearest neighbors in ℓ d 2. We provide a faster algorithm using classical projections, which we then further speed up by plugging in the FJLT. We also give a faster algorithm for searching over the hypercube.
Nearest Neighbors In HighDimensional Spaces
, 2004
"... In this chapter we consider the following problem: given a set P of points in a highdimensional space, construct a data structure which given any query point q nds the point in P closest to q. This problem, called nearest neighbor search is of significant importance to several areas of computer sci ..."
Abstract

Cited by 95 (3 self)
 Add to MetaCart
In this chapter we consider the following problem: given a set P of points in a highdimensional space, construct a data structure which given any query point q nds the point in P closest to q. This problem, called nearest neighbor search is of significant importance to several areas of computer science, including pattern recognition, searching in multimedial data, vector compression [GG91], computational statistics [DW82], and data mining. Many of these applications involve data sets which are very large (e.g., a database containing Web documents could contain over one billion documents). Moreover, the dimensionality of the points is usually large as well (e.g., in the order of a few hundred). Therefore, it is crucial to design algorithms which scale well with the database size as well as with the dimension. The nearestneighbor problem is an example of a large class of proximity problems, which, roughly speaking, are problems whose definitions involve the notion of...
LowDistortion Embeddings of Finite Metric Spaces
 in Handbook of Discrete and Computational Geometry
, 2004
"... INTRODUCTION An npoint metric space (X; D) can be represented by an n n table specifying the distances. Such tables arise in many diverse areas. For example, consider the following scenario in microbiology: X is a collection of bacterial strains, and for every two strains, one is given their diss ..."
Abstract

Cited by 65 (2 self)
 Add to MetaCart
INTRODUCTION An npoint metric space (X; D) can be represented by an n n table specifying the distances. Such tables arise in many diverse areas. For example, consider the following scenario in microbiology: X is a collection of bacterial strains, and for every two strains, one is given their dissimilarity (computed, say, by comparing their DNA). It is dicult to see any structure in a large table of numbers, and so we would like to represent a given metric space in a more comprehensible way. For example, it would be very nice if we could assign to each x 2 X a point f(x) in the plane in such a way that D(x; y) equals the Euclidean distance of f(x) and f(y). Such a representation would allow us to see the structure of the metric space: tight clusters, isolated points, and so on. Another advantage would be that the metric would now be represented by only 2n real numbers, the coordinates of the n points in the plane, instead of numbers as before. Moreover, many quantities concern
The fast JohnsonLindenstrauss transform and approximate nearest neighbors
 SIAM J. Comput
, 2009
"... Abstract. We introduce a new lowdistortion embedding of ℓd n) 2 into ℓO(log p (p =1, 2) called the fast Johnson–Lindenstrauss transform (FJLT). The FJLT is faster than standard random projections and just as easy to implement. It is based upon the preconditioning of a sparse projection matrix with ..."
Abstract

Cited by 57 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We introduce a new lowdistortion embedding of ℓd n) 2 into ℓO(log p (p =1, 2) called the fast Johnson–Lindenstrauss transform (FJLT). The FJLT is faster than standard random projections and just as easy to implement. It is based upon the preconditioning of a sparse projection matrix with a randomized Fourier transform. Sparse random projections are unsuitable for lowdistortion embeddings. We overcome this handicap by exploiting the “Heisenberg principle ” of the Fourier transform, i.e., its localglobal duality. The FJLT can be used to speed up search algorithms based on lowdistortion embeddings in ℓ1 and ℓ2. We consider the case of approximate nearest neighbors in ℓd 2. We provide a faster algorithm using classical projections, which we then speed up further by plugging in the FJLT. We also give a faster algorithm for searching over the hypercube.
Approximate Nearest Neighbors and Sequence Comparison With Block Operations
 IN STOC
, 2000
"... We study sequence nearest neighbors (SNN). Let D be a database of n sequences; we would like to preprocess D so that given any online query sequence Q we can quickly find a sequence S in D for which d(S; Q) d(S; T ) for any other sequence T in D. Here d(S; Q) denotes the distance between sequences ..."
Abstract

Cited by 48 (6 self)
 Add to MetaCart
We study sequence nearest neighbors (SNN). Let D be a database of n sequences; we would like to preprocess D so that given any online query sequence Q we can quickly find a sequence S in D for which d(S; Q) d(S; T ) for any other sequence T in D. Here d(S; Q) denotes the distance between sequences S and Q, defined to be the minimum number of edit operations needed to transform one to another (all edit operations will be reversible so that d(S; T ) = d(T; S) for any two sequences T and S). These operations correspond to the notion of similarity between sequences that we wish to capture in a given application. Natural edit operations include character edits (inserts, replacements, deletes etc), block edits (moves, copies, deletes, reversals) and block numerical transformations (scaling by an additive or a multiplicative constant). The SNN problem arises in many applications. We present the first known efficient algorithm for "approximate" nearest neighbor search for sequences with p...
On Approximate Nearest Neighbors in NonEuclidean Spaces
 In FOCS
, 1998
"... The nearest neighbor search (NNS) problem is the following: Given a set of n points P = fp 1 ; : : : ; png in some metric space X, preprocess P so as to efficiently answer queries which require finding a point in P closest to a query point q 2 X. The approximate nearest neighbor search (cNNS) is a ..."
Abstract

Cited by 34 (8 self)
 Add to MetaCart
(Show Context)
The nearest neighbor search (NNS) problem is the following: Given a set of n points P = fp 1 ; : : : ; png in some metric space X, preprocess P so as to efficiently answer queries which require finding a point in P closest to a query point q 2 X. The approximate nearest neighbor search (cNNS) is a relaxation of NNS which allows to return any point within c times the distance to the nearest neighbor (called cnearest neighbor). This problem is of major and growing importance to a variety of applications. In this paper, we give an algorithm for (4dlog 1+ae log 4de + 3)NNS algorithm in l d 1 with O(dn 1+ae log n) storage and O(d log n) query time. In particular, this yields the first algorithm for O(1)NNS for l 1 with subexponential storage. The preprocessing time is close to linear in the size of the data structure. The algorithm can be also used (after simple modifications) to output the exact nearest neighbor in time bounded by O(d log n) plus the number of (4dlog 1+ae log 4d...
Exact Algorithm for Partial Curve Matching via the Fréchet Distance
 Proc. 20th ACMSIAM Symposium on Discrete Algorithms
, 2009
"... Curve matching is a fundamental problem that occurs in many applications. In this paper, we study the problem of measuring partial similarity between curves. Specifically, given two curves, we wish to maximize the total length of subcurves that are close to each other, where closeness is measured by ..."
Abstract

Cited by 24 (4 self)
 Add to MetaCart
(Show Context)
Curve matching is a fundamental problem that occurs in many applications. In this paper, we study the problem of measuring partial similarity between curves. Specifically, given two curves, we wish to maximize the total length of subcurves that are close to each other, where closeness is measured by the Fréchet distance, a common distance measure for curves. The resulting maximal length is called the partial Fréchet similarity between the two input curves. Given two polygonal curves P and Q in IR d of size m and n, respectively, we present the first exact algorithm that runs in polynomial time to compute Fδ(P, Q), the partial Fréchet similarity between P and Q, under the L1 and L ∞ norms. Specifically, we formulate the problem of computing Fδ(P, Q) as a longest path problem, and solve it in O(mn(m + n) log(mn)) time, under the L1 or L∞ norm, using a “shortestpath map ” type decomposition. To the best of our knowledge, this is the first paper to study this natural definition of partial curve similarity in the continuous setting (with all points in the curve considered), and present a polynomialtime exact algorithm for it. 1
Four soviets walk the dog  with an application to Alt’s conjecture
 CORR
"... Given two polygonal curves in the plane, there are many ways to define a notion of similarity between them. One measure that is extremely popular is the Fréchet distance. Since it has been proposed by Alt and Godau in 1992, many variants and extensions have been studied. Nonetheless, even more than ..."
Abstract

Cited by 20 (4 self)
 Add to MetaCart
Given two polygonal curves in the plane, there are many ways to define a notion of similarity between them. One measure that is extremely popular is the Fréchet distance. Since it has been proposed by Alt and Godau in 1992, many variants and extensions have been studied. Nonetheless, even more than 20 years later, the original O(n 2 log n) algorithm by Alt and Godau for computing the Fréchet distance remains the state of the art (here n denotes the number of vertices on each curve). This has led Helmut Alt to conjecture that the associated decision problem is 3SUMhard. In recent work, Agarwal et al. show how to break the quadratic barrier for the discrete version of the Fréchet distance, where one considers sequences of points instead of polygonal curves. Building on their work, we give a randomized algorithm to compute the Fréchet distance between two polygonal curves in time O(n 2 √ log n(log log n) 3/2) on a pointer machine and in time O(n 2 (log log n) 2) on a word RAM. Furthermore, we show that there exists an algebraic decision tree for the decision problem of depth O(n 2−ε), for some ε> 0. This provides evidence that the decision problem may not be 3SUMhard after all and reveals an intriguing new aspect of this wellstudied problem.
Fréchet distance for curves, revisited
 In ESA
, 2006
"... Abstract. We revisit the problem of computing the Fréchet distance between polygonal curves, focusing on the discrete Fréchet distance, where only distance between vertices is considered. We develop efficient approximation algorithms for two natural classes of curves: κbounded curves and backbone c ..."
Abstract

Cited by 20 (6 self)
 Add to MetaCart
(Show Context)
Abstract. We revisit the problem of computing the Fréchet distance between polygonal curves, focusing on the discrete Fréchet distance, where only distance between vertices is considered. We develop efficient approximation algorithms for two natural classes of curves: κbounded curves and backbone curves, the latter of which are widely used to model molecular structures. We also propose a pseudo–outputsensitive algorithm for computing the discrete Fréchet distance exactly. The complexity of the algorithm is a function of the complexity of the freespace boundary, which is quadratic in the worst case, but tends to be lower in practice. 1
Overcoming the ℓ1 nonembeddability barrier: Algorithms for product metrics
, 2008
"... A common approach for solving computational problems over a difficult metric space is to embed the “hard ” metric into L1, which admits efficient algorithms and is thus considered an “easy ” metric. This approach has proved successful or partially successful for important spaces such as the edit dis ..."
Abstract

Cited by 18 (8 self)
 Add to MetaCart
(Show Context)
A common approach for solving computational problems over a difficult metric space is to embed the “hard ” metric into L1, which admits efficient algorithms and is thus considered an “easy ” metric. This approach has proved successful or partially successful for important spaces such as the edit distance, but it also has inherent limitations: it is provably impossible to go below certain approximation for some metrics. We propose a new approach, of embedding the difficult space into richer host spaces, namely iterated products of standard spaces like ℓ1 and ℓ∞. We show that this class is rich since it contains useful metric spaces with only a constant distortion, and, at the same time, it is tractable and admits efficient algorithms. Using this approach, we obtain for example the first nearest neighbor data structure with O(log log d) approximation for edit distance in nonrepetitive strings (the Ulam metric). This approximation is exponentially better than the lower bound for embedding into L1. Furthermore, we give constant factor approximation for two other computational problems. Along the way, we answer positively a question posed in [Ajtai, Jayram, Kumar, and Sivakumar, STOC 2002]. One of our algorithms has already found applications for smoothed edit distance over 01 strings [Andoni and Krauthgamer, ICALP 2008]. 1