Results 1  10
of
36
Efficient Search for Approximate Nearest Neighbor in High Dimensional Spaces
, 1998
"... We address the problem of designing data structures that allow efficient search for approximate nearest neighbors. More specifically, given a database consisting of a set of vectors in some high dimensional Euclidean space, we want to construct a spaceefficient data structure that would allow us to ..."
Abstract

Cited by 220 (9 self)
 Add to MetaCart
(Show Context)
We address the problem of designing data structures that allow efficient search for approximate nearest neighbors. More specifically, given a database consisting of a set of vectors in some high dimensional Euclidean space, we want to construct a spaceefficient data structure that would allow us to search, given a query vector, for the closest or nearly closest vector in the database. We also address this problem when distances are measured by the L 1 norm, and in the Hamming cube. Significantly improving and extending recent results of Kleinberg, we construct data structures whose size is polynomial in the size of the database, and search algorithms that run in time nearly linear or nearly quadratic in the dimension (depending on the case; the extra factors are polylogarithmic in the size of the database). Computer Science Department, Technion  IIT, Haifa 32000, Israel. Email: eyalk@cs.technion.ac.il y Bell Communications Research, MCC1C365B, 445 South Street, Morristown, NJ ...
Two Algorithms for NearestNeighbor Search in High Dimensions
, 1997
"... Representing data as points in a highdimensional space, so as to use geometric methods for indexing, is an algorithmic technique with a wide array of uses. It is central to a number of areas such as information retrieval, pattern recognition, and statistical data analysis; many of the problems aris ..."
Abstract

Cited by 201 (0 self)
 Add to MetaCart
(Show Context)
Representing data as points in a highdimensional space, so as to use geometric methods for indexing, is an algorithmic technique with a wide array of uses. It is central to a number of areas such as information retrieval, pattern recognition, and statistical data analysis; many of the problems arising in these applications can involve several hundred or several thousand dimensions. We consider the nearestneighbor problem for ddimensional Euclidean space: we wish to preprocess a database of n points so that given a query point, one can efficiently determine its nearest neighbors in the database. There is a large literature on algorithms for this problem, in both the exact and approximate cases. The more sophisticated algorithms typically achieve a query time that is logarithmic in n at the expense of an exponential dependence on the dimension d; indeed, even the averagecase analysis of heuristics such as kd trees reveals an exponential dependence on d in the query time. In this wor...
Approximate Nearest Neighbors and the Fast JohnsonLindenstrauss Transform
 STOC'06
, 2006
"... We introduce a new lowdistortion embedding of ℓ d 2 into O(log n) ℓp (p = 1, 2), called the FastJohnsonLindenstraussTransform. The FJLT is faster than standard random projections and just as easy to implement. It is based upon the preconditioning of a sparse projection matrix with a randomized F ..."
Abstract

Cited by 156 (6 self)
 Add to MetaCart
We introduce a new lowdistortion embedding of ℓ d 2 into O(log n) ℓp (p = 1, 2), called the FastJohnsonLindenstraussTransform. The FJLT is faster than standard random projections and just as easy to implement. It is based upon the preconditioning of a sparse projection matrix with a randomized Fourier transform. Sparse random projections are unsuitable for lowdistortion embeddings. We overcome this handicap by exploiting the “Heisenberg principle” of the Fourier transform, ie, its localglobal duality. The FJLT can be used to speed up search algorithms based on lowdistortion embeddings in ℓ1 and ℓ2. We consider the case of approximate nearest neighbors in ℓ d 2. We provide a faster algorithm using classical projections, which we then further speed up by plugging in the FJLT. We also give a faster algorithm for searching over the hypercube.
The fast JohnsonLindenstrauss transform and approximate nearest neighbors
 SIAM J. Comput
, 2009
"... Abstract. We introduce a new lowdistortion embedding of ℓd n) 2 into ℓO(log p (p =1, 2) called the fast Johnson–Lindenstrauss transform (FJLT). The FJLT is faster than standard random projections and just as easy to implement. It is based upon the preconditioning of a sparse projection matrix with ..."
Abstract

Cited by 57 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We introduce a new lowdistortion embedding of ℓd n) 2 into ℓO(log p (p =1, 2) called the fast Johnson–Lindenstrauss transform (FJLT). The FJLT is faster than standard random projections and just as easy to implement. It is based upon the preconditioning of a sparse projection matrix with a randomized Fourier transform. Sparse random projections are unsuitable for lowdistortion embeddings. We overcome this handicap by exploiting the “Heisenberg principle ” of the Fourier transform, i.e., its localglobal duality. The FJLT can be used to speed up search algorithms based on lowdistortion embeddings in ℓ1 and ℓ2. We consider the case of approximate nearest neighbors in ℓd 2. We provide a faster algorithm using classical projections, which we then speed up further by plugging in the FJLT. We also give a faster algorithm for searching over the hypercube.
The Communication Complexity of Threshold Gates
 In Proceedings of “Combinatorics, Paul Erdos is Eighty
, 1994
"... We prove upper bounds on the randomized communication complexity of evaluating a threshold gate (with arbitrary weights). For linear threshold gates this is done in the usual 2 party communication model, and for degreed threshold gates this is done in the multiparty model. We then use these upp ..."
Abstract

Cited by 39 (1 self)
 Add to MetaCart
(Show Context)
We prove upper bounds on the randomized communication complexity of evaluating a threshold gate (with arbitrary weights). For linear threshold gates this is done in the usual 2 party communication model, and for degreed threshold gates this is done in the multiparty model. We then use these upper bounds together with known lower bounds for communication complexity in order to give very easy proofs for lower bounds in various models of computation involving threshold gates. This generalizes several known bounds and answers several open problems.
Noisy sorting without resampling
 In SODA ’08: Proceedings of the 19th ACMSIAM Symposium on Discrete algorithms
, 2008
"... In this paper we study noisy sorting without resampling. In this problem there is an unknown order aπ(1) <... < aπ(n) where π is a permutation on n elements. The input is the status of () n 2 queries of the form q(ai, xj), where q(ai, aj) = + with probability at least 1/2 + γ if π(i)> π(j ..."
Abstract

Cited by 39 (1 self)
 Add to MetaCart
(Show Context)
In this paper we study noisy sorting without resampling. In this problem there is an unknown order aπ(1) <... < aπ(n) where π is a permutation on n elements. The input is the status of () n 2 queries of the form q(ai, xj), where q(ai, aj) = + with probability at least 1/2 + γ if π(i)> π(j) for all pairs i ̸ = j, where γ> 0 is a constant and q(ai, aj) = −q(aj, ai) for all i and j. It is assumed that the errors are independent. Given the status of the queries the goal is to find the maximum likelihood order. In other words, the goal is find a permutation σ that minimizes the number of pairs σ(i)> σ(j) where q(σ(i), σ(j)) = −. The problem so defined is the feedback arc set problem on distributions of inputs, each of which is a tournament obtained as a noisy perturbations of a linear order. Note that when γ < 1/2 and n is large, it is impossible to recover the original order π. It is known that the weighted feedback are set problem on tournaments is NPhard in general. Here we present an algorithm of running time nO(γ−4) and sampling complexity Oγ(n log n) that with high probability solves the noisy sorting without resampling problem. We also show that if a σ(1), a σ(2),..., a σ(n) is an optimal solution of the problem then it is “close ” to the original order. More formally, with high probability it holds that ∑ i σ(i)−π(i)  = Θ(n) and maxi σ(i)−π(i)  = Θ(log n). Our results are of interest in applications to ranking, such as ranking in sports, or ranking of search items based on comparisons by experts. C.S. University of Toronto, partially supported by and NSERC CGS scholarship. Part of the work was done while on a visit to
Computation in Noisy Radio Networks
 in Proc. 9th Ann. ACMSIAM Symp. on Discrete Algorithms
"... In this paper we examine noisy radio (broadcast) networks in which every bit transmitted has a certain probability to be flipped. Each processor has some initial input bit, and the goal is to compute a function of the initial inputs. In this model we show a protocol to compute any threshold function ..."
Abstract

Cited by 35 (0 self)
 Add to MetaCart
In this paper we examine noisy radio (broadcast) networks in which every bit transmitted has a certain probability to be flipped. Each processor has some initial input bit, and the goal is to compute a function of the initial inputs. In this model we show a protocol to compute any threshold function using only a linear number of transmissions. 1 Introduction The influence of noise (or faults) on the complexity of computation was studied in many contexts. In particular people were interested in random noise. In a typical such scenario, it is assumed that the outcome of each operation is noisy with some fixed probability p and all the faults are independent. Usually, if t is the number of operations performed by the computation, then by repeating each operation O(log t) times and taking the majority of the results, one can ensure a constant probability of error at the cost of O(t log t) operations. It is desirable however to obtain a cost of O(t) (i.e., increase only by a constant fa...
Error Correcting Tournaments
, 2008
"... Abstract. We present a family of adaptive pairwise tournaments that are provably robust against large error fractions when used to determine the largest element in a set. The tournaments use nk pairwise comparisons but have only O(k + log n) depth, where n is the number of players and k is the robus ..."
Abstract

Cited by 26 (4 self)
 Add to MetaCart
(Show Context)
Abstract. We present a family of adaptive pairwise tournaments that are provably robust against large error fractions when used to determine the largest element in a set. The tournaments use nk pairwise comparisons but have only O(k + log n) depth, where n is the number of players and k is the robustness parameter (for reasonable values of n and k). These tournaments also give a reduction from multiclass to binary classification in machine learning, yielding the best known analysis for the problem. 1
Mixing times of the biased card shuffling and the asymmetric exclusion process
 Trans. Amer. Math. Soc
, 2005
"... Abstract. Consider the following method of card shuffling. Start with a deck of N cards numbered 1 through N. Fix a parameter p between 0 and 1. In this model a “shuffle ” consists of uniformly selecting a pair of adjacent cards and then flipping a coin that is heads with probability p. If the coin ..."
Abstract

Cited by 21 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Consider the following method of card shuffling. Start with a deck of N cards numbered 1 through N. Fix a parameter p between 0 and 1. In this model a “shuffle ” consists of uniformly selecting a pair of adjacent cards and then flipping a coin that is heads with probability p. If the coin comes up heads, then we arrange the two cards so that the lowernumbered card comes before the highernumbered card. If the coin comes up tails, then we arrange the cards with the highernumbered card first. In this paper we prove that for all p � = 1/2, the mixing time of this card shuffling is O(N 2), as conjectured by Diaconis and Ram (2000). Our result is a rare case of an exact estimate for the convergence rate of the Metropolis algorithm. A novel feature of our proof is that the analysis of an infinite (asymmetric exclusion) process plays an essential role in bounding the mixing time of a finite process. 1.