• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Efficient representation of local geometry for large scale object retrieval (2009)

by M Perdoch, O Chum, J Matas
Venue:In CVPR
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 82
Next 10 →

Product quantization for nearest neighbor search

by Hervé Jégou, Matthijs Douze, Cordelia Schmid , 2010
"... This paper introduces a product quantization based approach for approximate nearest neighbor search. The idea is to decomposes the space into a Cartesian product of low dimensional subspaces and to quantize each subspace separately. A vector is represented by a short code composed of its subspace q ..."
Abstract - Cited by 222 (31 self) - Add to MetaCart
This paper introduces a product quantization based approach for approximate nearest neighbor search. The idea is to decomposes the space into a Cartesian product of low dimensional subspaces and to quantize each subspace separately. A vector is represented by a short code composed of its subspace quantization indices. The Euclidean distance between two vectors can be efficiently estimated from their codes. An asymmetric version increases precision, as it computes the approximate distance between a vector and a code. Experimental results show that our approach searches for nearest neighbors efficiently, in particular in combination with an inverted file system. Results for SIFT and GIST image descriptors show excellent search accuracy outperforming three state-of-the-art approaches. The scalability of our approach is validated on a dataset of two billion vectors.
(Show Context)

Citation Context

...of the same image have the same identifier. Therefore, a 20-bit field is sufficient to identify an image from a dataset of one million. This memory cost can be reduced further using index compression =-=[27]-=-, [28], which may reduce the average cost of storing the identifier to about 8 bits, depending on parameters 2 . Note that some geometrical information can also be inserted in this entry, as proposed ...

Geometric min-Hashing: Finding a (Thick) Needle in a Haystack

by Ondrej Chum , Michal Perd’och, Jiri Matas - CVPR 2009 , 2009
"... We propose a novel hashing scheme for image retrieval, clustering and automatic object discovery. Unlike commonly used bag-of-words approaches, the spatial extent of image features is exploited in our method. The geometric information is used both to construct repeatable hash keys and to increase th ..."
Abstract - Cited by 93 (1 self) - Add to MetaCart
We propose a novel hashing scheme for image retrieval, clustering and automatic object discovery. Unlike commonly used bag-of-words approaches, the spatial extent of image features is exploited in our method. The geometric information is used both to construct repeatable hash keys and to increase the discriminability of the description. Each hash key combines visual appearance (visual words) with semi-local geometric information. Compared with the state-of-the-art min-Hash, the proposed method has both higher recall (probability of collision for hashes on the same object) and lower false positive rates (random collisions). The advantages of Geometric min-Hashing approach are most pronounced in the presence of viewpoint and scale change, significant occlusion or small physical overlap of the viewing fields. We demonstrate the power of the proposed method on small object discovery in a large unordered collection of images and on a large scale image clustering problem.

Location Recognition using Prioritized Feature Matching

by Yunpeng Li, Noah Snavely, Daniel P. Huttenlocher
"... Abstract. We present a fast, simple location recognition and image localization method that leverages feature correspondence and geometry estimated from large Internet photo collections. Such recovered structure contains a significant amount of useful information about images and image features that ..."
Abstract - Cited by 71 (5 self) - Add to MetaCart
Abstract. We present a fast, simple location recognition and image localization method that leverages feature correspondence and geometry estimated from large Internet photo collections. Such recovered structure contains a significant amount of useful information about images and image features that is not available when considering images in isolation. For instance, we can predict which views will be the most common, which feature points in a scene are most reliable, and which features in the scene tend to co-occur in the same image. Based on this information, we devise an adaptive, prioritized algorithm for matching a representative set of SIFT features covering a large scene to a query image for efficient localization. Our approach is based on considering features in the scene database, and matching them to query image features, as opposed to more conventional methods that match image features to visual words or database features. We find this approach results in improved performance, due to the richer knowledge of characteristics of the database features compared to query image features. We present experiments on two large city-scale photo collections, showing that our algorithm compares favorably to image retrieval-style approaches to location recognition.

Image retrieval with geometry-preserving visual phrases

by Yimeng Zhang, Zhaoyin Jia, Tsuhan Chen - in CVPR, 2011
"... The most popular approach to large scale image re-trieval is based on the bag-of-visual-word (BoV) represen-tation of images. The spatial information is usually re-introduced as a post-processing step to re-rank the retrieved images, through a spatial verification like RANSAC. Since the spatial veri ..."
Abstract - Cited by 52 (1 self) - Add to MetaCart
The most popular approach to large scale image re-trieval is based on the bag-of-visual-word (BoV) represen-tation of images. The spatial information is usually re-introduced as a post-processing step to re-rank the retrieved images, through a spatial verification like RANSAC. Since the spatial verification techniques are computationally ex-pensive, they can be applied only to the top images in the initial ranking. In this paper, we propose an approach that can encode more spatial information into BoV representa-tion and that is efficient enough to be applied to large-scale databases. Other works pursuing the same purpose have proposed exploring the word co-occurrences in the neigh-borhood areas. Our approach encodes more spatial in-formation through the geometry-preserving visual phrases (GVP). In addition to co-occurrences, the GVP method also captures the local and long-range spatial layouts of the words. Our GVP based searching algorithm increases little memory usage or computational time compared to the BoV method. Moreover, we show that our approach can also be integrated to the min-hash method to improve its retrieval accuracy. The experiment results on Oxford 5K and Flicker 1M dataset show that our approach outperforms the BoV method even following a RANSAC verification. 1.
(Show Context)

Citation Context

...ng [7] address the hard quantization problem of visual words. Spatial verification methods [15] and query expansion [5] have been proposed for re-ranking at the post-processing step, and many methods =-=[9, 22, 8, 14]-=- have been introduced to decrease the memory usage for the inverted files. In this paper, we are interested in improving the BoV model with spatial information. Despite its simplicity and efficiency, ...

Descriptor Learning for Efficient Retrieval

by James Philbin, Andrew Zisserman
"... Abstract. Many visual search and matching systems represent images using sparse sets of “visual words”: descriptors that have been quantized by assignment to the best-matching symbol in a discrete vocabulary. Errors in this quantization procedure propagate throughout the rest of the system, either h ..."
Abstract - Cited by 51 (1 self) - Add to MetaCart
Abstract. Many visual search and matching systems represent images using sparse sets of “visual words”: descriptors that have been quantized by assignment to the best-matching symbol in a discrete vocabulary. Errors in this quantization procedure propagate throughout the rest of the system, either harming performance or requiring correction using additional storage or processing. This paper aims to reduce these quantization errors at source, by learning a projection from descriptor space to a new Euclidean space in which standard clustering techniques are more likely to assign matching descriptors to the same cluster, and non-matching descriptors to different clusters. To achieve this, we learn a non-linear transformation model by minimizing a novel margin-based cost function, which aims to separate matching descriptors from two classes of non-matching descriptors. Training data is generated automatically by leveraging geometric consistency. Scalable, stochastic gradient methods are used for the optimization. For the case of particular object retrieval, we demonstrate impressive gains in performance on a ground truth dataset: our learnt 32-D descriptor without spatial re-ranking outperforms a baseline method using 128-D SIFT descriptors with spatial re-ranking. 1
(Show Context)

Citation Context

...nce the descriptors are transformed before quantization, they can easily be used in conjunction with other recent works that have improved performance over a raw bag of visual words approach, such as =-=[27,28]-=-.690 J. Philbin et al. We have illustrated the method for SIFT and for two types of projection functions, but clearly the framework of automatically generating training data and learning the projecti...

Learning a Fine Vocabulary

by Andrej Mikulík, Michal Perdoch
"... Abstract. We present a novel similarity measure for bag-of-words type large scale image retrieval. The similarity function is learned in an unsupervised manner, requires no extra space over the standard bag-of-words method and is more discriminative than both L2-based soft assignment and Hamming emb ..."
Abstract - Cited by 49 (1 self) - Add to MetaCart
Abstract. We present a novel similarity measure for bag-of-words type large scale image retrieval. The similarity function is learned in an unsupervised manner, requires no extra space over the standard bag-of-words method and is more discriminative than both L2-based soft assignment and Hamming embedding. Experimentally we show that the novel similarity function achieves mean average precision that is superior to any result published in the literature on the standard Oxford 105k dataset/protocol. At the same time, retrieval with the proposed similarity function is faster than the reference method. 1
(Show Context)

Citation Context

...nverted file has to be processed for each of the features separately as they will have different binary signature. While the reported bits per feature required in the search index ranges from 11 bits =-=[8]-=- to 18 bits [11], hamming embedding adds another 64 bits. The additional information reduces the number of features that can be stored in the memory by a factor of 6.8. Summary All approaches to soft ...

Retrieving landmark and nonlandmark images from community photo collections

by Yannis Avrithis, Iroon Polytexneiou, Yannis Kalantidis, Iroon Polytexneiou, Evaggelos Spyrou, Iroon Polytexneiou, Giorgos Tolias, Iroon Polytexneiou - In ACM Multimedia , 2010
"... State of the art data mining and image retrieval in community photo collections typically focus on popular subsets, e.g. images containing landmarks or associated to Wikipedia articles. We propose an image clustering scheme that, seen as vector quantization, compresses a large corpus of images by gr ..."
Abstract - Cited by 27 (6 self) - Add to MetaCart
State of the art data mining and image retrieval in community photo collections typically focus on popular subsets, e.g. images containing landmarks or associated to Wikipedia articles. We propose an image clustering scheme that, seen as vector quantization, compresses a large corpus of images by grouping visually consistent ones while providing a guaranteed distortion bound. This allows us, for instance, to represent the visual content of all thousands of images depicting the Parthenon in just a few dozens of scene maps and still be able to retrieve any single, isolated, non-landmark image like a house or a graffiti on a wall. Starting from a geo-tagged dataset, we first group images geographically and then visually, where each visual cluster is assumed to depict different views of the the same scene. We align all views to one reference image and construct a 2D scene map by preserving details from all images while discarding repeating visual features. Our indexing, retrieval and spatial matching scheme then operates directly on scene maps. We evaluate the precision of the proposed method on a challenging one-million urban image dataset.

Spatial-bag-of-features

by Yang Cao, Changhu Wang, Zhiwei Li, Liqing Zhang, Lei Zhang - In IEEE Conference on Computer Vision and Pattern Recognition , 2010
"... In this paper, we study the problem of large scale im-age retrieval by developing a new class of bag-of-features to encode geometric information of objects within an im-age. Beyond existing orderless bag-of-features, local fea-tures of an image are first projected to different directions or points t ..."
Abstract - Cited by 26 (2 self) - Add to MetaCart
In this paper, we study the problem of large scale im-age retrieval by developing a new class of bag-of-features to encode geometric information of objects within an im-age. Beyond existing orderless bag-of-features, local fea-tures of an image are first projected to different directions or points to generate a series of ordered bag-of-features, based on which different families of spatial bag-of-features are designed to capture the invariance of object translation, rotation, and scaling. Then the most representative features are selected based on a boosting-like method to generate a new bag-of-features-like vector representation of an image. The proposed retrieval framework works well in image re-trieval task owing to the following three properties: 1) the encoding of geometric information of objects for captur-ing objects ’ spatial transformation, 2) the supervised fea-ture selection and combination strategy for enhancing the discriminative power, and 3) the representation of bag-of-features for effective image matching and indexing for large scale image retrieval. Extensive experiments on 5000 Ox-ford building images and 1 million Panoramio images show the effectiveness and efficiency of the proposed features as well as the retrieval framework. 1.
(Show Context)

Citation Context

...ture vectors [12, 13]. Unlike in previous systems, in which an image is often represented by a single histogram and some extra features used for reranking (e.g. spatial information of local features) =-=[10, 12, 14]-=-, in our system, an image is represented by a set of selected sub-histogram, while no extra features are needed in ranking process. Therefore, all spatial-bag-of-features, i.e. histograms, can be comp...

Handling Urban Location Recognition as a 2D Homothetic Problem

by Georges Baatz, Kevin Köser, David Chen, Radek Grzeszczuk
"... Abstract. We address the problem of large scale place-of-interest recognition in cell phone images of urban scenarios. Here, we go beyond what has been shown in earlier approaches by exploiting the nowadays often available 3D building information (e.g. from extruded floor plans) and massive street-v ..."
Abstract - Cited by 24 (4 self) - Add to MetaCart
Abstract. We address the problem of large scale place-of-interest recognition in cell phone images of urban scenarios. Here, we go beyond what has been shown in earlier approaches by exploiting the nowadays often available 3D building information (e.g. from extruded floor plans) and massive street-view like image data for database creation. Exploiting vanishing points in query images and thus fully removing 3D rotation from the recognition problem allows then to simplify the feature invariance to a pure homothetic problem, which we show leaves more discriminative power in feature descriptors than classical SIFT. We rerank visual word based document queries using a fast stratified homothetic verification that is tailored for repetitive patterns like window grids on facades and in most cases boosts the correct document to top positions if it was in the short list. Since we exploit 3D building information, the approach finally outputs the camera pose in real world coordinates ready for augmenting the cell phone image with virtual 3D information. The whole system is demonstrated to outperform traditional approaches on city scale experiments for different sources of street-view like image data and a challenging set of cell phone images. 1

To aggregate or not to aggregate: Selective match kernels for image search

by Giorgos Tolias, Yannis Avrithis, Hervé Jégou - ICCV- INTERNATIONAL CONFERENCE ON COMPUTER VISION , 2013
"... This paper considers a family of metrics to compare images based on their local descriptors. It encompasses the VLAD descriptor and matching techniques such as Hamming Embedding. Making the bridge between these approaches leads us to propose a match kernel that takes the best of existing techniques ..."
Abstract - Cited by 20 (7 self) - Add to MetaCart
This paper considers a family of metrics to compare images based on their local descriptors. It encompasses the VLAD descriptor and matching techniques such as Hamming Embedding. Making the bridge between these approaches leads us to propose a match kernel that takes the best of existing techniques by combining an aggregation procedure with a selective match kernel. Finally, the representation underpinning this kernel is approximated, providing a large scale image search both precise and scalable, as shown by our experiments on several benchmarks.
(Show Context)

Citation Context

...ign better systems. In particular, the search is advantageously refined by re-ranking approaches, which operate on an initial short-list. This is done by exploiting additional geometrical information =-=[22, 18, 26]-=- or applying query expansion techniques [6, 27]. This paper focuses on improving the quality of the initial result set. Re-ranking approaches are complementary stages that are subsequently applied. Th...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University