Results 1  10
of
215
An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions
 ACMSIAM SYMPOSIUM ON DISCRETE ALGORITHMS
, 1994
"... Consider a set S of n data points in real ddimensional space, R d , where distances are measured using any Minkowski metric. In nearest neighbor searching we preprocess S into a data structure, so that given any query point q 2 R d , the closest point of S to q can be reported quickly. Given any po ..."
Abstract

Cited by 984 (32 self)
 Add to MetaCart
Consider a set S of n data points in real ddimensional space, R d , where distances are measured using any Minkowski metric. In nearest neighbor searching we preprocess S into a data structure, so that given any query point q 2 R d , the closest point of S to q can be reported quickly. Given any positive real ffl, a data point p is a (1 + ffl)approximate nearest neighbor of q if its distance from q is within a factor of (1 + ffl) of the distance to the true nearest neighbor. We show that it is possible to preprocess a set of n points in R d in O(dn log n) time and O(dn) space, so that given a query point q 2 R d , and ffl ? 0, a (1 + ffl)approximate nearest neighbor of q can be computed in O(c d;ffl log n) time, where c d;ffl d d1 + 6d=ffle d is a factor depending only on dimension and ffl. In general, we show that given an integer k 1, (1 + ffl)approximations to the k nearest neighbors of q can be computed in additional O(kd log n) time.
Nearoptimal hashing algorithms for approximate nearest neighbor in high dimensions
, 2008
"... In this article, we give an overview of efficient algorithms for the approximate and exact nearest neighbor problem. The goal is to preprocess a dataset of objects (e.g., images) so that later, given a new query object, one can quickly return the dataset object that is most similar to the query. The ..."
Abstract

Cited by 457 (7 self)
 Add to MetaCart
In this article, we give an overview of efficient algorithms for the approximate and exact nearest neighbor problem. The goal is to preprocess a dataset of objects (e.g., images) so that later, given a new query object, one can quickly return the dataset object that is most similar to the query. The problem is of significant interest in a wide variety of areas.
FINDING STRUCTURE WITH RANDOMNESS: PROBABILISTIC ALGORITHMS FOR CONSTRUCTING APPROXIMATE MATRIX DECOMPOSITIONS
"... Lowrank matrix approximations, such as the truncated singular value decomposition and the rankrevealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for ..."
Abstract

Cited by 253 (6 self)
 Add to MetaCart
(Show Context)
Lowrank matrix approximations, such as the truncated singular value decomposition and the rankrevealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing lowrank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets. This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed—either explicitly or implicitly—to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired lowrank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, speed, and robustness. These claims are supported by extensive numerical experiments and a detailed error analysis. The specific benefits of randomized techniques depend on the computational environment. Consider the model problem of finding the k dominant components of the singular value decomposition
Approximate nearest neighbors and Fast JohnsonLindenstrauss Transform.
 Proceedings of the Symposium on Theory of Computing,
, 2006
"... ABSTRACT We introduce a new lowdistortion embedding of The FJLT is faster than standard random projections and just as easy to implement. It is based upon the preconditioning of a sparse projection matrix with a randomized Fourier transform. Sparse random projections are unsuitable for lowdistort ..."
Abstract

Cited by 156 (6 self)
 Add to MetaCart
(Show Context)
ABSTRACT We introduce a new lowdistortion embedding of The FJLT is faster than standard random projections and just as easy to implement. It is based upon the preconditioning of a sparse projection matrix with a randomized Fourier transform. Sparse random projections are unsuitable for lowdistortion embeddings. We overcome this handicap by exploiting the "Heisenberg principle" of the Fourier transform, ie, its localglobal duality. The FJLT can be used to speed up search algorithms based on lowdistortion embeddings in 1 and 2. We consider the case of approximate nearest neighbors in d 2 . We provide a faster algorithm using classical projections, which we then further speed up by plugging in the FJLT. We also give a faster algorithm for searching over the hypercube.
Efficient Similarity Search and Classification Via Rank Aggregation
 In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data
, 2003
"... We propose a novel approach to performing efficient similarity search and classification in high dimensional data. In this framework, the database elements are vectors in a Euclidean space. Given a query vector in the same space, the goal is to find elements of the database that are similar to the ..."
Abstract

Cited by 152 (3 self)
 Add to MetaCart
We propose a novel approach to performing efficient similarity search and classification in high dimensional data. In this framework, the database elements are vectors in a Euclidean space. Given a query vector in the same space, the goal is to find elements of the database that are similar to the query. In our approach, a small number of independent "voters" rank the database elements based on similarity to the query. These rankings are then combined by a highly efficient aggregation algorithm. Our methodology leads both to techniques for computing approximate nearest neighbors and to a conceptually rich alternative to nearest neighbors.
Navigating nets: Simple algorithms for proximity search (Extended Abstract)
, 2004
"... Robert Krauthgamer # James R. Lee + Abstract We present a simple deterministic data structure for maintaining a set S of points in a general metric space, while supporting proximity search (nearest neighbor and range queries) and updates to S (insertions and deletions). Our data structure consists ..."
Abstract

Cited by 150 (17 self)
 Add to MetaCart
Robert Krauthgamer # James R. Lee + Abstract We present a simple deterministic data structure for maintaining a set S of points in a general metric space, while supporting proximity search (nearest neighbor and range queries) and updates to S (insertions and deletions). Our data structure consists of a sequence of progressively finer #nets of S, with pointers that allow us to navigate easily from one scale to the next.
Mean Shift Based Clustering in High Dimensions: A Texture Classification Example
, 2003
"... Feature space analysis is the main module in many computer vision tasks. The most popular technique, kmeans clustering, however, has two inherent limitations: the clusters are constrained to be spherically symmetric and their number has to be known a priori. In nonparametric clustering methods, lik ..."
Abstract

Cited by 137 (3 self)
 Add to MetaCart
(Show Context)
Feature space analysis is the main module in many computer vision tasks. The most popular technique, kmeans clustering, however, has two inherent limitations: the clusters are constrained to be spherically symmetric and their number has to be known a priori. In nonparametric clustering methods, like the one based on mean shift, these limitations are eliminated but the amount of computation becomes prohibitively large as the dimension of the space increases. We exploit a recently proposed approximation technique, localitysensitive hashing (LSH), to reduce the computational complexity of adaptive mean shift. In our implementation of LSH the optimal parameters of the data structure are determined by a pilot learning procedure, and the partitions are data driven. As an application, the performance of mode and kmeans based textons are compared in a texture classification study.
An investigation of practical approximate nearest neighbor algorithms
, 2004
"... This paper concerns approximate nearest neighbor searching algorithms, which have become increasingly important, especially in high dimensional perception areas such as computer vision, with dozens of publications in recent years. Much of this enthusiasm is due to a successful new approximate neares ..."
Abstract

Cited by 115 (4 self)
 Add to MetaCart
(Show Context)
This paper concerns approximate nearest neighbor searching algorithms, which have become increasingly important, especially in high dimensional perception areas such as computer vision, with dozens of publications in recent years. Much of this enthusiasm is due to a successful new approximate nearest neighbor approach called Locality Sensitive Hashing (LSH). In this paper we ask the question: can earlier spatial data structure approaches to exact nearest neighbor, such as metric trees, be altered to provide approximate answers to proximity queries and if so, how? We introduce a new kind of metric tree that allows overlap: certain datapoints may appear in both the children of a parent. We also introduce new approximate kNN search algorithms on this structure. We show why these structures should be able to exploit the same randomprojectionbased approximations that LSH enjoys, but with a simpler algorithm and perhaps with greater efficiency. We then provide a detailed empirical evaluation on five large, high dimensional datasets which show up to 31fold accelerations over LSH. This result holds true throughout the spectrum of approximation levels.
Secure multiparty computation of approximations
, 2001
"... Approximation algorithms can sometimes provide efficient solutions when no efficient exact computation is known. In particular, approximations are often useful in a distributed setting where the inputs are held by different parties and may be extremely large. Furthermore, for some applications, the ..."
Abstract

Cited by 108 (25 self)
 Add to MetaCart
Approximation algorithms can sometimes provide efficient solutions when no efficient exact computation is known. In particular, approximations are often useful in a distributed setting where the inputs are held by different parties and may be extremely large. Furthermore, for some applications, the parties want to compute a function of their inputs securely, without revealing more information than necessary. In this work we study the question of simultaneously addressing the above efficiency and security concerns via what we call secure approximations. We start by extending standard definitions of secure (exact) computation to the setting of secure approximations. Our definitions guarantee that no additional information is revealed by the approximation beyond what follows from the output of the function being approximated. We then study the complexity of specific secure approximation problems. In particular, we obtain a sublinearcommunication protocol for securely approximating the Hamming distance and a polynomialtime protocol for securely approximating the permanent and related #Phard problems. 1