• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. (2008)

by A Andoni, P Indyk
Venue:Commun. ACM,
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 455
Next 10 →

Fast approximate nearest neighbors with automatic algorithm configuration

by Marius Muja, David G. Lowe - In VISAPP International Conference on Computer Vision Theory and Applications , 2009
"... nearest-neighbors search, randomized kd-trees, hierarchical k-means tree, clustering. For many computer vision problems, the most time consuming component consists of nearest neighbor matching in high-dimensional spaces. There are no known exact algorithms for solving these high-dimensional problems ..."
Abstract - Cited by 455 (2 self) - Add to MetaCart
nearest-neighbors search, randomized kd-trees, hierarchical k-means tree, clustering. For many computer vision problems, the most time consuming component consists of nearest neighbor matching in high-dimensional spaces. There are no known exact algorithms for solving these high-dimensional problems that are faster than linear search. Approximate algorithms are known to provide large speedups with only minor loss in accuracy, but many such algorithms have been published with only minimal guidance on selecting an algorithm and its parameters for any given problem. In this paper, we describe a system that answers the question, “What is the fastest approximate nearest-neighbor algorithm for my data? ” Our system will take any given dataset and desired degree of precision and use these to automatically determine the best algorithm and parameter values. We also describe a new algorithm that applies priority search on hierarchical k-means trees, which we have found to provide the best known performance on many datasets. After testing a range of alternatives, we have found that multiple randomized k-d trees provide the best performance for other datasets. We are releasing public domain code that implements these approaches. This library provides about one order of magnitude improvement in query time over the best previously available software and provides fully automated parameter selection. 1
(Show Context)

Citation Context

...be the best at finding fast approximate nearest neighbors (the multiple randomized kd-trees and the hierarchical kmeans tree) with existing approaches, the ANN (Arya et al., 1998) and LSH algorithms (=-=Andoni, 2006-=-) 3 on 2 http://www.vis.uky.edu/˜ stewe/ukbench/data/ 3 We have used the publicly available implementations the first dataset of 100,000 SIFT features. Since the LSH implementation (the E2LSH package)...

Spectral hashing

by Yair Weiss, Antonio Torralba, Rob Fergus , 2009
"... Semantic hashing [1] seeks compact binary codes of data-points so that the Hamming distance between codewords correlates with semantic similarity. In this paper, we show that the problem of finding a best code for a given dataset is closely related to the problem of graph partitioning and can be sho ..."
Abstract - Cited by 284 (4 self) - Add to MetaCart
Semantic hashing [1] seeks compact binary codes of data-points so that the Hamming distance between codewords correlates with semantic similarity. In this paper, we show that the problem of finding a best code for a given dataset is closely related to the problem of graph partitioning and can be shown to be NP hard. By relaxing the original problem, we obtain a spectral method whose solutions are simply a subset of thresholded eigenvectors of the graph Laplacian. By utilizing recent results on convergence of graph Laplacian eigenvectors to the Laplace-Beltrami eigenfunctions of manifolds, we show how to efficiently calculate the code of a novel datapoint. Taken together, both learning the code and applying it to a novel point are extremely simple. Our experiments show that our codes outperform the state-of-the art.
(Show Context)

Citation Context

... then it is easy to design a binary code so that items that are close in Euclidean space will map to similar binary codewords. This is the basis of the popular locality sensitive hashing method E2LSH =-=[8]-=-. As shown in[8], if every bit in the code is calculated by a random linear projection followed by a random threshold, then the Hamming distance between codewords will asymptotically approach the Eucl...

Semantic hashing

by Ruslan Salakhutdinov, Geoffrey Hinton - INTERNATIONAL JOURNAL OF APPROXIMATE REASONING , 2009
"... ..."
Abstract - Cited by 202 (10 self) - Add to MetaCart
Abstract not found

Small codes and large image databases for recognition

by Antonio Torralba, Rob Fergus, Yair Weiss
"... The Internet contains billions of images, freely available online. Methods for efficiently searching this incredibly rich resource are vital for a large number of applications. These include object recognition [2], computer graphics [11, 27], personal photo collections, online image search tools. In ..."
Abstract - Cited by 185 (7 self) - Add to MetaCart
The Internet contains billions of images, freely available online. Methods for efficiently searching this incredibly rich resource are vital for a large number of applications. These include object recognition [2], computer graphics [11, 27], personal photo collections, online image search tools. In this paper, our goal is to develop efficient image search and scene matching techniques that are not only fast, but also require very little memory, enabling their use on standard hardware or even on handheld devices. Our approach uses recently developed machine learning techniques to convert the Gist descriptor (a real valued vector that describes orientation energies at different scales and orientations within an image) to a compact binary code, with a few hundred bits per image. Using our scheme, it
(Show Context)

Citation Context

...neighbors. LSH has been used successfully in a number of vision applications [26]. An alternative approach is to use kd-trees [16, 17] although LSH has been reported to work better in high dimensions =-=[1]-=-. Despite the success of LSH, it is important to realize that the theoretical guarantees are asymptotic - as the number of random projections grows. In our experience, when the number of bits is fixed...

Iterative quantization: A procrustean approach to learning binary codes

by Yunchao Gong, Svetlana Lazebnik - In Proc. of the IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR , 2011
"... This paper addresses the problem of learning similaritypreserving binary codes for efficient retrieval in large-scale image collections. We propose a simple and efficient alternating minimization scheme for finding a rotation of zerocentered data so as to minimize the quantization error of mapping t ..."
Abstract - Cited by 157 (6 self) - Add to MetaCart
This paper addresses the problem of learning similaritypreserving binary codes for efficient retrieval in large-scale image collections. We propose a simple and efficient alternating minimization scheme for finding a rotation of zerocentered data so as to minimize the quantization error of mapping this data to the vertices of a zero-centered binary hypercube. This method, dubbed iterative quantization (ITQ), has connections to multi-class spectral clustering and to the orthogonal Procrustes problem, and it can be used both with unsupervised data embeddings such as PCA and supervised embeddings such as canonical correlation analysis (CCA). Our experiments show that the resulting binary coding schemes decisively outperform several other state-of-the-art methods. 1.
(Show Context)

Citation Context

... et al. [18] have introduced the binary coding problem to the vision community and compared several methods based on boosting, restricted Boltzmann machines [14], and locality sensitive hashing (LSH) =-=[1]-=-. To further improve the performance and scalability, Weiss et al. have proposed Spectral Hashing (SH) [21], a method motivated by spectral graph partitioning. Raginsky and Lazebnik [12] have proposed...

What does classifying more than 10,000 image categories tell us?

by Jia Deng, Alexander C. Berg, Kai Li, Li Fei-Fei
"... Image classification is a critical task for both humans and computers. One of the challenges lies in the large scale of the semantic space. In particular, humans can recognize tens of thousands of object classes and scenes. No computer vision algorithm today has been tested at this scale. This pape ..."
Abstract - Cited by 118 (11 self) - Add to MetaCart
Image classification is a critical task for both humans and computers. One of the challenges lies in the large scale of the semantic space. In particular, humans can recognize tens of thousands of object classes and scenes. No computer vision algorithm today has been tested at this scale. This paper presents a study of large scale categorization including a series of challenging experiments on classification with more than 10, 000 image classes. We find that a) computational issues become crucial in algorithm design; b) conventional wisdom from a couple of hundred image categories on relative performance of different classifiers does not necessarily hold when the number of categories increases; c) there is a surprisingly strong relationship between the structure of WordNet (developed for studying language) and the difficulty of visual categorization; d) classification can be improved by exploiting the semantic hierarchy. Toward the future goal of developing automatic vision algorithms to recognize tens of thousands or even millions of image categories, we make a series of observations and arguments about dataset scale, category density, and image hierarchy.
(Show Context)

Citation Context

...ethods, we use brute force linear scan. It takes 1 year to run through all testing examples for GIST or BOW features. It is possible to use approximation techniques such as locality sensitive hashing =-=[36]-=-, but due to the high feature dimensionality (e.g. 960 for GIST), we have found relatively small speed-up. Thus we choose linear scan to avoid unnecessary approximation. In practice, all algorithms ar...

Building Rome on a Cloudless Day

by Jan-michael Frahm, Pierre Fite-georgel, David Gallup, Tim Johnson, Rahul Raguram, Changchang Wu, Yi-hung Jen, Enrique Dunn, Svetlana Lazebnik, Marc Pollefeys
"... Abstract. This paper introduces an approach for dense 3D reconstruction from unregistered Internet-scale photo collections with about 3 million images within the span of a day on a single PC (“cloudless”). Our method advances image clustering, stereo, stereo fusion and structure from motion to achie ..."
Abstract - Cited by 90 (13 self) - Add to MetaCart
Abstract. This paper introduces an approach for dense 3D reconstruction from unregistered Internet-scale photo collections with about 3 million images within the span of a day on a single PC (“cloudless”). Our method advances image clustering, stereo, stereo fusion and structure from motion to achieve high computational performance. We leverage geometric and appearance constraints to obtain a highly parallel implementation on modern graphics processors and multi-core architectures. This leads to two orders of magnitude higher performance on an order of magnitude larger dataset than competing state-of-the-art approaches. 1
(Show Context)

Citation Context

... to obtain a set of canonical or iconic views [3]. In order to be able to fit several million gist features in GPU-memory, we compress them to compact binary strings using a locality sensitive scheme =-=[10,11,12]-=-. We then cluster them based on Hamming distance using the k-medoids algorithm [13] implemented on the GPU. To our knowledge, this is the first application of small codes in the style of [12] outside ...

Robust 1-Bit Compressive Sensing via Binary Stable Embeddings of Sparse Vectors

by Laurent Jacques, Jason N. Laska, Petros T. Boufounos, Richard G. Baraniuk , 2011
"... The Compressive Sensing (CS) framework aims to ease the burden on analog-to-digital converters (ADCs) by reducing the sampling rate required to acquire and stably recover sparse signals. Practical ADCs not only sample but also quantize each measurement to a finite number of bits; moreover, there is ..."
Abstract - Cited by 85 (26 self) - Add to MetaCart
The Compressive Sensing (CS) framework aims to ease the burden on analog-to-digital converters (ADCs) by reducing the sampling rate required to acquire and stably recover sparse signals. Practical ADCs not only sample but also quantize each measurement to a finite number of bits; moreover, there is an inverse relationship between the achievable sampling rate and the bit depth. In this paper, we investigate an alternative CS approach that shifts the emphasis from the sampling rate to the number of bits per measurement. In particular, we explore the extreme case of 1-bit CS measurements, which capture just their sign. Our results come in two flavors. First, we consider ideal reconstruction from noiseless 1-bit measurements and provide a lower bound on the best achievable reconstruction error. We also demonstrate that a large class of measurement mappings achieve this optimal bound. Second, we consider reconstruction robustness to measurement errors and noise and introduce the Binary ɛ-Stable Embedding (BɛSE) property, which characterizes the robustness measurement process to sign changes. We show the same class of matrices that provide optimal noiseless performance also enable such a robust mapping. On the practical side, we introduce the Binary Iterative Hard Thresholding (BIHT) algorithm for signal reconstruction from 1-bit measurements that offers state-of-the-art performance.

Locality-Sensitive Binary Codes from Shift-Invariant Kernels

by Maxim Raginsky, Svetlana Lazebnik - ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS , 2009
"... This paper addresses the problem of designing binary codes for high-dimensional data such that vectors that are similar in the original space map to similar binary strings. We introduce a simple distribution-free encoding scheme based on random projections, such that the expected Hamming distance be ..."
Abstract - Cited by 81 (1 self) - Add to MetaCart
This paper addresses the problem of designing binary codes for high-dimensional data such that vectors that are similar in the original space map to similar binary strings. We introduce a simple distribution-free encoding scheme based on random projections, such that the expected Hamming distance between the binary codes of two vectors is related to the value of a shift-invariant kernel (e.g., a Gaussian kernel) between the vectors. We present a full theoretical analysis of the convergence properties of the proposed scheme, and report favorable experimental performance as compared to a recent state-of-the-art method, spectral hashing.
(Show Context)

Citation Context

...ion of the domain. Our scheme is completely distribution-free with respect to the data: its structure depends only on the underlying kernel. In this, it is similar to locality sensitive hashing (LSH) =-=[1]-=-, which is a family of methods for deriving low-dimensional discrete representations of the data for sublinear near-neighbor search. However, our scheme differs from LSH in that we obtain both upper a...

Pairwise Document Similarity in Large Collections with MapReduce

by Tamer Elsayed, Jimmy Lin, Douglas W. Oard
"... This paper presents a MapReduce algorithm for computing pairwise document similarity in large document collections. MapReduce is an attractive framework because it allows us to decompose the inner products involved in computing document similarity into separate multiplication and summation stages in ..."
Abstract - Cited by 56 (6 self) - Add to MetaCart
This paper presents a MapReduce algorithm for computing pairwise document similarity in large document collections. MapReduce is an attractive framework because it allows us to decompose the inner products involved in computing document similarity into separate multiplication and summation stages in a way that is well matched to efficient disk access patterns across several machines. On a collection consisting of approximately 900,000 newswire articles, our algorithm exhibits linear growth in running time and space in terms of the number of documents. 1
(Show Context)

Citation Context

... efficiency vs. effectiveness tradeoff that is best made in the context of a specific application. Finally, we note that alternative approaches to similar problems based on localitysensitive hashing (=-=Andoni and Indyk, 2008-=-) face similar tradeoffs in tuning for a particular false positive rate; cf. (Bayardo et al., 2007). 6 Conclusion We present a MapReduce algorithm for efficiently computing pairwise document similarit...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University