Results

**1 - 5**of**5**### Sparse Composite Quantization

"... The quantization techniques have shown competitive performance in approximate nearest neighbor search. The state-of-the-art algorithm, composite quantization, takes advantage of the compositionabity, i.e., the vector approx-imation accuracy, as opposed to product quantization and Cartesian k-means. ..."

Abstract
- Add to MetaCart

(Show Context)
The quantization techniques have shown competitive performance in approximate nearest neighbor search. The state-of-the-art algorithm, composite quantization, takes advantage of the compositionabity, i.e., the vector approx-imation accuracy, as opposed to product quantization and Cartesian k-means. However, we have observed that the runtime cost of computing the distance table in composite quantization, which is used as a lookup table for fast dis-tance computation, becomes nonnegligible in real applica-tions, e.g., reordering the candidates retrieved from the in-verted index when handling very large scale databases. To address this problem, we develop a novel approach, called sparse composite quantization, which constructs sparse dic-tionaries. The benefit is that the distance evaluation be-tween the query and the dictionary element (a sparse vec-tor) is accelerated using the efficient sparse vector opera-tion, and thus the cost of distance table computation is re-duced a lot. Experiment results on large scale ANN retrieval tasks (1M SIFTs and 1B SIFTs) and applications to ob-ject retrieval show that the proposed approach yields com-petitive performance: superior search accuracy to product quantization and Cartesian k-means with almost the same computing cost, and much faster ANN search than compos-ite quantization with the same level of accuracy. 1.

### Sparse Projections for High-Dimensional Binary Codes

"... This paper addresses the problem of learning long bi-nary codes from high-dimensional data. We observe that two key challenges arise while learning and using long binary codes: (1) lack of an effective regularizer for the learned high-dimensional mapping and (2) high computa-tional cost for computin ..."

Abstract
- Add to MetaCart

(Show Context)
This paper addresses the problem of learning long bi-nary codes from high-dimensional data. We observe that two key challenges arise while learning and using long binary codes: (1) lack of an effective regularizer for the learned high-dimensional mapping and (2) high computa-tional cost for computing long codes. In this paper, we overcome both these problems by introducing a sparsity encouraging regularizer that reduces the effective number of parameters involved in the learned projection operator. This regularizer not only reduces overfitting but, due to the sparse nature of the projection matrix, also leads to a dra-matic reduction in the computational cost. To evaluate the effectiveness of our method, we analyze its performance on the problems of nearest neighbour search, image retrieval and image classification. Experiments on a number of chal-lenging datasets show that our method leads to better accu-racy than dense projections (ITQ [11] and LSH [16]) with the same code lengths, and meanwhile is over an order of magnitude faster. Furthermore, our method is also more ac-curate and faster than other recently proposed methods for speeding up high-dimensional binary encoding. 1.

### Hashing for Similarity Search: A Survey

, 2014

"... Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database. Various methods have been developed to address this problem, and recently a lot of efforts have been devoted to approximate search. In this pap ..."

Abstract
- Add to MetaCart

Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database. Various methods have been developed to address this problem, and recently a lot of efforts have been devoted to approximate search. In this paper, we present a survey on one of the main solutions, hashing, which has been widely studied since the pioneering work locality sensitive hashing. We divide the hashing algorithms two main categories: locality sensitive hashing, which designs hash functions without exploring the data distribution and learning to hash, which learns hash functions according the data distribution, and review them from various aspects, including hash function design and distance measure and search scheme in the hash coding space.

### Bilinear Random Projections for Locality-Sensitive Binary Codes

"... Locality-sensitive hashing (LSH) is a popular data-independent indexing method for approximate similarity search, where random projections followed by quantization hash the points from the database so as to ensure that the probability of collision is much higher for objects that are close to each ot ..."

Abstract
- Add to MetaCart

(Show Context)
Locality-sensitive hashing (LSH) is a popular data-independent indexing method for approximate similarity search, where random projections followed by quantization hash the points from the database so as to ensure that the probability of collision is much higher for objects that are close to each other than for those that are far apart. Most of high-dimensional visual descriptors for images exhibit a natural matrix structure. When visual descriptors are repre-sented by high-dimensional feature vectors and long binary codes are assigned, a random projection matrix requires ex-pensive complexities in both space and time. In this pa-per we analyze a bilinear random projection method where feature matrices are transformed to binary codes by two smaller random projection matrices. We base our theoret-ical analysis on extending Raginsky and Lazebnik’s result where random Fourier features are composed with random binary quantizers to form locality sensitive binary codes. To this end, we answer the following two questions: (1) whether a bilinear random projection also yields similarity-preserving binary codes; (2) whether a bilinear random projection yields performance gain or loss, compared to a large linear projection. Regarding the first question, we present upper and lower bounds on the expected Hamming distance between binary codes produced by bilinear ran-dom projections. In regards to the second question, we an-alyze the upper and lower bounds on covariance between two bits of binary codes, showing that the correlation be-tween two bits is small. Numerical experiments on MNIST and Flickr45K datasets confirm the validity of our method. 1.