Results 1  10
of
67
Similarity estimation techniques from rounding algorithms
 In Proc. of 34th STOC
, 2002
"... A locality sensitive hashing scheme is a distribution on a family F of hash functions operating on a collection of objects, such that for two objects x, y, Prh∈F[h(x) = h(y)] = sim(x,y), where sim(x,y) ∈ [0, 1] is some similarity function defined on the collection of objects. Such a scheme leads ..."
Abstract

Cited by 436 (6 self)
 Add to MetaCart
(Show Context)
A locality sensitive hashing scheme is a distribution on a family F of hash functions operating on a collection of objects, such that for two objects x, y, Prh∈F[h(x) = h(y)] = sim(x,y), where sim(x,y) ∈ [0, 1] is some similarity function defined on the collection of objects. Such a scheme leads to a compact representation of objects so that similarity of objects can be estimated from their compact sketches, and also leads to efficient algorithms for approximate nearest neighbor search and clustering. Minwise independent permutations provide an elegant construction of such a locality sensitive hashing scheme for a collection of subsets with the set similarity measure sim(A, B) = A∩B A∪B . We show that rounding algorithms for LPs and SDPs used in the context of approximation algorithms can be viewed as locality sensitive hashing schemes for several interesting collections of objects. Based on this insight, we construct new locality sensitive hashing schemes for: 1. A collection of vectors with the distance between ⃗u and ⃗v measured by θ(⃗u,⃗v)/π, where θ(⃗u,⃗v) is the angle between ⃗u and ⃗v. This yields a sketching scheme for estimating the cosine similarity measure between two vectors, as well as a simple alternative to minwise independent permutations for estimating set similarity. 2. A collection of distributions on n points in a metric space, with distance between distributions measured by the Earth Mover Distance (EMD), (a popular distance measure in graphics and vision). Our hash functions map distributions to points in the metric space such that, for distributions P and Q,
A Tight Bound on Approximating Arbitrary Metrics by Tree Metrics
 In Proceedings of the 35th Annual ACM Symposium on Theory of Computing
, 2003
"... In this paper, we show that any n point metric space can be embedded into a distribution over dominating tree metrics such that the expected stretch of any edge is O(log n). This improves upon the result of Bartal who gave a bound of O(log n log log n). Moreover, our result is existentially tight; t ..."
Abstract

Cited by 317 (8 self)
 Add to MetaCart
(Show Context)
In this paper, we show that any n point metric space can be embedded into a distribution over dominating tree metrics such that the expected stretch of any edge is O(log n). This improves upon the result of Bartal who gave a bound of O(log n log log n). Moreover, our result is existentially tight; there exist metric spaces where any tree embedding must have distortion#sto n)distortion. This problem lies at the heart of numerous approximation and online algorithms including ones for group Steiner tree, metric labeling, buyatbulk network design and metrical task system. Our result improves the performance guarantees for all of these problems.
Bounded geometries, fractals, and lowdistortion embeddings
"... The doubling constant of a metric space (X; d) is thesmallest value * such that every ball in X can be covered by * balls of half the radius. The doubling dimension of X isthen defined as dim(X) = log2 *. A metric (or sequence ofmetrics) is called doubling precisely when its doubling dimension is ..."
Abstract

Cited by 211 (42 self)
 Add to MetaCart
(Show Context)
The doubling constant of a metric space (X; d) is thesmallest value * such that every ball in X can be covered by * balls of half the radius. The doubling dimension of X isthen defined as dim(X) = log2 *. A metric (or sequence ofmetrics) is called doubling precisely when its doubling dimension is bounded. This is a robust class of metric spaceswhich contains many families of metrics that occur in applied settings.We give tight bounds for embedding doubling metrics into (lowdimensional) normed spaces. We consider bothgeneral doubling metrics, as well as more restricted families such as those arising from trees, from graphs excludinga fixed minor, and from snowflaked metrics. Our techniques include decomposition theorems for doubling metrics, andan analysis of a fractal in the plane due to Laakso [21]. Finally, we discuss some applications and point out a centralopen question regarding dimensionality reduction in L2.
Euclidean distortion and the Sparsest Cut
 In Proceedings of the 37th Annual ACM Symposium on Theory of Computing
, 2005
"... BiLipschitz embeddings of finite metric spaces, a topic originally studied in geometric analysis and Banach space theory, became an integral part of theoretical computer science following work of Linial, London, and Rabinovich [29]. They presented an algorithmic version of a result of Bourgain [8] ..."
Abstract

Cited by 120 (25 self)
 Add to MetaCart
(Show Context)
BiLipschitz embeddings of finite metric spaces, a topic originally studied in geometric analysis and Banach space theory, became an integral part of theoretical computer science following work of Linial, London, and Rabinovich [29]. They presented an algorithmic version of a result of Bourgain [8] which shows that every
Measured descent: A new embedding method for finite metrics
 In Proc. 45th FOCS
, 2004
"... We devise a new embedding technique, which we call measured descent, based on decomposing a metric space locally, at varying speeds, according to the density of some probability measure. This provides a refined and unified framework for the two primary methods of constructing Fréchet embeddings for ..."
Abstract

Cited by 98 (32 self)
 Add to MetaCart
(Show Context)
We devise a new embedding technique, which we call measured descent, based on decomposing a metric space locally, at varying speeds, according to the density of some probability measure. This provides a refined and unified framework for the two primary methods of constructing Fréchet embeddings for finite metrics, due to [Bourgain, 1985] and [Rao, 1999]. We prove that any npoint metric space (X, d) embeds in Hilbert space with distortion O ( √ αX · log n), where αX is a geometric estimate on the decomposability of X. As an immediate corollary, we obtain an O ( √ (log λX)log n) distortion embedding, where λX is the doubling constant of X. Since λX ≤ n, this result recovers Bourgain’s theorem, but when the metric X is, in a sense, “lowdimensional, ” improved bounds are achieved. Our embeddings are volumerespecting for subsets of arbitrary size. One consequence is the existence of (k, O(log n)) volumerespecting embeddings for all 1 ≤ k ≤ n, which is the best possible, and answers positively a question posed by U. Feige. Our techniques are also used to answer positively a question of Y. Rabinovich, showing that any weighted npoint planar graph O(log n) embeds in ℓ∞ with O(1) distortion. The O(log n) bound on the dimension is optimal, and improves upon the previously known bound of O((log n) 2). 1
Bypassing the embedding: Algorithms for lowdimensional metrics
 In Proceedings of the 36th ACM Symposium on the Theory of Computing (STOC
, 2004
"... The doubling dimension of a metric is the smallest k such that any ball of radius 2r can be covered using 2 k balls of radius r. This concept for abstract metrics has been proposed as a natural analog to the dimension of a Euclidean space. If we could embed metrics with low doubling dimension into l ..."
Abstract

Cited by 82 (3 self)
 Add to MetaCart
(Show Context)
The doubling dimension of a metric is the smallest k such that any ball of radius 2r can be covered using 2 k balls of radius r. This concept for abstract metrics has been proposed as a natural analog to the dimension of a Euclidean space. If we could embed metrics with low doubling dimension into low dimensional Euclidean spaces, they would inherit several algorithmic and structural properties of the Euclidean spaces. Unfortunately however, such a restriction on dimension does not suffice to guarantee embeddibility in a normed space. In this paper we explore the option of bypassing the embedding. In particular we show the following for low dimensional metrics: • Quasipolynomial time (1+ɛ)approximation algorithm for various optimization problems such as TSP, kmedian and facility location. • (1 + ɛ)approximate distance labeling scheme with optimal label length. • (1+ɛ)stretch polylogarithmic storage routing scheme.
Approximation Algorithms for the Metric Labeling Problem via a New Linear Programming Formulation
, 2000
"... We consider approximation algorithms for the metric labeling problem. Informally speaking, we are given a weighted graph that specifies relations between pairs of objects drawn from a given set of objects. The goal is to find a minimum cost labeling of these objects where the cost of a labeling is d ..."
Abstract

Cited by 77 (1 self)
 Add to MetaCart
(Show Context)
We consider approximation algorithms for the metric labeling problem. Informally speaking, we are given a weighted graph that specifies relations between pairs of objects drawn from a given set of objects. The goal is to find a minimum cost labeling of these objects where the cost of a labeling is determined by the pairwise relations between the objects and a distance function on labels; the distance function is assumed to be a metric. Each object also incurs an assignment cost that is label, and vertex dependent. The problem was introduced in a recent paper by Kleinberg and Tardos [19], and captures many classification problems that arise in computer vision and related fields. They gave an O(log k log log k) approximation for the general case where k is the number of labels and a 2approximation for the uniform metric case. More recently, Gupta and Tardos [14] gave a 4approximation for the truncated linear metric, a natural nonuniform metric motivated by practical applications to image restoration and visual correspondence. In this paper we introduce a new natural integer programming formulation and show that the integrality gap of its linear relaxation either matches or improves the ratios known for several cases of the metric labeling problem studied until now, providing a unified approach to solving them. Specifically, we show that the integrality gap of our LP is bounded by O(log k log log k) for general metric and 2 for the uniform metric thus matching the ratios in [19]. We also develop an algorithm based on our LP that achieves a ratio of 2 + p 2 ' 3:414 for the truncated linear metric improving the ratio provided by [14]. Our algorithm uses the fact that the integrality gap of our LP is 1 on a linear metric. We believe that our formulation h...
A linear programming formulation and approximation algorithms for the metric labeling problem
 SIAM J. Discrete Math
"... We consider approximation algorithms for the metric labeling problem. This problem was introduced in a paper by Kleinberg and Tardos [J. ACM, 49 (2002), pp. 616–630] and captures many classification problems that arise in computer vision and related fields. They gave an O(log k log log k) approximat ..."
Abstract

Cited by 43 (1 self)
 Add to MetaCart
(Show Context)
We consider approximation algorithms for the metric labeling problem. This problem was introduced in a paper by Kleinberg and Tardos [J. ACM, 49 (2002), pp. 616–630] and captures many classification problems that arise in computer vision and related fields. They gave an O(log k log log k) approximation for the general case, where k is the number of labels, and a 2approximation for the uniform metric case. (In fact, the bound for general metrics can be improved to O(log k) by the work of Fakcheroenphol, Rao, and Talwar [Proceedings