Results 1  10
of
64
A Tight Bound on Approximating Arbitrary Metrics by Tree Metrics
 In Proceedings of the 35th Annual ACM Symposium on Theory of Computing
, 2003
"... In this paper, we show that any n point metric space can be embedded into a distribution over dominating tree metrics such that the expected stretch of any edge is O(log n). This improves upon the result of Bartal who gave a bound of O(log n log log n). Moreover, our result is existentially tight; t ..."
Abstract

Cited by 317 (8 self)
 Add to MetaCart
In this paper, we show that any n point metric space can be embedded into a distribution over dominating tree metrics such that the expected stretch of any edge is O(log n). This improves upon the result of Bartal who gave a bound of O(log n log log n). Moreover, our result is existentially tight; there exist metric spaces where any tree embedding must have distortion#sto n)distortion. This problem lies at the heart of numerous approximation and online algorithms including ones for group Steiner tree, metric labeling, buyatbulk network design and metrical task system. Our result improves the performance guarantees for all of these problems.
Approximate Nearest Neighbors and the Fast JohnsonLindenstrauss Transform
 STOC'06
, 2006
"... We introduce a new lowdistortion embedding of ℓ d 2 into O(log n) ℓp (p = 1, 2), called the FastJohnsonLindenstraussTransform. The FJLT is faster than standard random projections and just as easy to implement. It is based upon the preconditioning of a sparse projection matrix with a randomized F ..."
Abstract

Cited by 156 (6 self)
 Add to MetaCart
We introduce a new lowdistortion embedding of ℓ d 2 into O(log n) ℓp (p = 1, 2), called the FastJohnsonLindenstraussTransform. The FJLT is faster than standard random projections and just as easy to implement. It is based upon the preconditioning of a sparse projection matrix with a randomized Fourier transform. Sparse random projections are unsuitable for lowdistortion embeddings. We overcome this handicap by exploiting the “Heisenberg principle” of the Fourier transform, ie, its localglobal duality. The FJLT can be used to speed up search algorithms based on lowdistortion embeddings in ℓ1 and ℓ2. We consider the case of approximate nearest neighbors in ℓ d 2. We provide a faster algorithm using classical projections, which we then further speed up by plugging in the FJLT. We also give a faster algorithm for searching over the hypercube.
Nearestneighbor searching and metric space dimensions
 In NearestNeighbor Methods for Learning and Vision: Theory and Practice
, 2006
"... Given a set S of n sites (points), and a distance measure d, the nearest neighbor searching problem is to build a data structure so that given a query point q, the site nearest to q can be found quickly. This paper gives a data structure for this problem; the data structure is built using the distan ..."
Abstract

Cited by 106 (0 self)
 Add to MetaCart
(Show Context)
Given a set S of n sites (points), and a distance measure d, the nearest neighbor searching problem is to build a data structure so that given a query point q, the site nearest to q can be found quickly. This paper gives a data structure for this problem; the data structure is built using the distance function as a “black box”. The structure is able to speed up nearest neighbor searching in a variety of settings, for example: points in lowdimensional or structured Euclidean space, strings under Hamming and edit distance, and bit vector data from an OCR application. The data structures are observed to need linear space, with a modest constant factor. The preprocessing time needed per site is observed to match the query time. The data structure can be viewed as an application of a “kdtree ” approach in the metric space setting, using Voronoi regions of a subset in place of axisaligned boxes. 1
Triangulation and Embedding using Small Sets of Beacons
, 2008
"... Concurrent with recent theoretical interest in the problem of metric embedding, a growing body of research in the networking community has studied the distance matrix defined by nodetonode latencies in the Internet, resulting in a number of recent approaches that approximately embed this distance ..."
Abstract

Cited by 98 (11 self)
 Add to MetaCart
Concurrent with recent theoretical interest in the problem of metric embedding, a growing body of research in the networking community has studied the distance matrix defined by nodetonode latencies in the Internet, resulting in a number of recent approaches that approximately embed this distance matrix into lowdimensional Euclidean space. There is a fundamental distinction, however, between the theoretical approaches to the embedding problem and this recent Internetrelated work: in addition to computational limitations, Internet measurement algorithms operate under the constraint that it is only feasible to measure distances for a linear (or nearlinear) number of node pairs, and typically in a highly structured way. Indeed, the most common framework for Internet measurements of this type is a beaconbased approach: one chooses uniformly at random a constant number of nodes (‘beacons’) in the network, each node measures its distance to all beacons, and one then has access to only these measurements for the remainder of the algorithm. Moreover, beaconbased algorithms are often designed not for embedding but for the more basic problem of triangulation, in which one uses the triangle inequality to infer the distances that have not been measured. Here we give algorithms with provable performance guarantees for beaconbased triangulation and
Video Suggestion and Discovery for YouTube: Taking Random Walks Through the View Graph
"... The rapid growth of the number of videos in YouTube provides enormous potential for users to find content of interest to them. Unfortunately, given the difficulty of searching videos, the size of the video repository also makes the discovery of new content a daunting task. In this paper, we present ..."
Abstract

Cited by 93 (6 self)
 Add to MetaCart
(Show Context)
The rapid growth of the number of videos in YouTube provides enormous potential for users to find content of interest to them. Unfortunately, given the difficulty of searching videos, the size of the video repository also makes the discovery of new content a daunting task. In this paper, we present a novel method based upon the analysis of the entire user–video graph to provide personalized video suggestions for users. The resulting algorithm, termed Adsorption, provides a simple method to efficiently propagate preference information through a variety of graphs. We extensively test the results of the recommendations on a three month snapshot of live data from YouTube.
The fast JohnsonLindenstrauss transform and approximate nearest neighbors
 SIAM J. Comput
, 2009
"... Abstract. We introduce a new lowdistortion embedding of ℓd n) 2 into ℓO(log p (p =1, 2) called the fast Johnson–Lindenstrauss transform (FJLT). The FJLT is faster than standard random projections and just as easy to implement. It is based upon the preconditioning of a sparse projection matrix with ..."
Abstract

Cited by 57 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We introduce a new lowdistortion embedding of ℓd n) 2 into ℓO(log p (p =1, 2) called the fast Johnson–Lindenstrauss transform (FJLT). The FJLT is faster than standard random projections and just as easy to implement. It is based upon the preconditioning of a sparse projection matrix with a randomized Fourier transform. Sparse random projections are unsuitable for lowdistortion embeddings. We overcome this handicap by exploiting the “Heisenberg principle ” of the Fourier transform, i.e., its localglobal duality. The FJLT can be used to speed up search algorithms based on lowdistortion embeddings in ℓ1 and ℓ2. We consider the case of approximate nearest neighbors in ℓd 2. We provide a faster algorithm using classical projections, which we then speed up further by plugging in the FJLT. We also give a faster algorithm for searching over the hypercube.
Weaklysupervised acquisition of labeled class instances using graph random walks
 In Proc. of EMNLP
, 2008
"... We present a graphbased semisupervised label propagation algorithm for acquiring opendomain labeled classes and their instances from a combination of unstructured and structured text sources. This acquisition method significantly improves coverage compared to a previous set of labeled classes a ..."
Abstract

Cited by 46 (3 self)
 Add to MetaCart
We present a graphbased semisupervised label propagation algorithm for acquiring opendomain labeled classes and their instances from a combination of unstructured and structured text sources. This acquisition method significantly improves coverage compared to a previous set of labeled classes and instances derived from free text, while achieving comparable precision.
Metric cotype
, 2005
"... We introduce the notion of metric cotype, a property of metric spaces related to a property of normed spaces, called Rademacher cotype. Apart from settling a long standing open problem in metric geometry, this property is used to prove the following dichotomy: A family of metric spaces F is either a ..."
Abstract

Cited by 44 (22 self)
 Add to MetaCart
(Show Context)
We introduce the notion of metric cotype, a property of metric spaces related to a property of normed spaces, called Rademacher cotype. Apart from settling a long standing open problem in metric geometry, this property is used to prove the following dichotomy: A family of metric spaces F is either almost universal (i.e., contains any finite metric space with any distortion> 1), or there exists α> 0, and arbitrarily large npoint metrics whose distortion when embedded in any member of F is at least Ω ((log n) α). The same property is also used to prove strong nonembeddability theorems of Lq into Lp, when q> max{2, p}. Finally we use metric cotype to obtain a new type of isoperimetric inequality on the discrete torus. 1
Advances in metric embedding theory
 IN STOC ’06: PROCEEDINGS OF THE THIRTYEIGHTH ANNUAL ACM SYMPOSIUM ON THEORY OF COMPUTING
, 2006
"... Metric Embedding plays an important role in a vast range of application areas such as computer vision, computational biology, machine learning, networking, statistics, and mathematical psychology, to name a few. The theory of metric embedding received much attention in recent years by mathematicians ..."
Abstract

Cited by 38 (14 self)
 Add to MetaCart
Metric Embedding plays an important role in a vast range of application areas such as computer vision, computational biology, machine learning, networking, statistics, and mathematical psychology, to name a few. The theory of metric embedding received much attention in recent years by mathematicians as well as computer scientists and has been applied in many algorithmic applications. A cornerstone of the field is a celebrated theorem of Bourgain which states that every finite metric space on n points embeds in Euclidean space with O(log n) distortion. Bourgain’s result is best possible when considering the worst case distortion over all pairs of points in the metric space. Yet, it is possible that an embedding can do much better in terms of the average distortion. Indeed, in most practical applications of metric embedding the main criteria for the quality of an embedding is its average distortion over all pairs. In this paper we provide an embedding with constant average distortion for arbitrary metric spaces, while maintaining the same worst case bound provided by Bourgain’s theorem. In fact, our embedding possesses a much stronger property. We define the ℓqdistortion of a uniformly distributed pair of points. Our embedding achieves the best possible ℓqdistortion for all 1 ≤ q ≤ ∞ simultaneously. These results have several algorithmic implications, e.g. an O(1) approximation for the unweighted uncapacitated quadratic assignment problem. The results are based on novel embedding methods which improve on previous methods in another important aspect: the dimension. The dimension of an embedding is of very high importance in particular in applications and much effort has been invested in analyzing it. However, no previous result im