Results 11 - 20
of
330
Bundling features for large scale partial-duplicate web image search
, 2009
"... In state-of-the-art image retrieval systems, an image is represented by a bag of visual words obtained by quantizing high-dimensional local image descriptors, and scalable schemes inspired by text retrieval are then applied for large scale image indexing and retrieval. Bag-of-words representations, ..."
Abstract
-
Cited by 81 (3 self)
- Add to MetaCart
(Show Context)
In state-of-the-art image retrieval systems, an image is represented by a bag of visual words obtained by quantizing high-dimensional local image descriptors, and scalable schemes inspired by text retrieval are then applied for large scale image indexing and retrieval. Bag-of-words representations, however: 1) reduce the discriminative power of image features due to feature quantization; and 2) ignore geometric relationships among visual words. Exploiting such geometric constraints, by estimating a 2D affine transformation between a query image and each candidate image, has been shown to greatly improve retrieval precision but at high computational cost. In this paper we present a novel scheme where image features are bundled into local groups. Each group of bundled features becomes much more discriminative than a single feature, and within each group simple and robust geometric constraints can be efficiently enforced. Experiments in web image search, with a database of more than one million images, show that our scheme achieves a 49 % improvement in average precision over the baseline bag-of-words approach. Retrieval performance is comparable to existing full geometric verification approaches while being much less computationally expensive. When combined with full geometric verification we achieve a 77 % precision improvement over the baseline bag-of-words approach, and a 24 % improvement over full geometric verification alone. 1.
Minimal Loss Hashing for Compact Binary Codes
"... We propose a method for learning similaritypreserving hash functions that map highdimensional data onto binary codes. The formulation is based on structured prediction with latent variables and a hinge-like loss function. It is efficient to train for large datasets, scales well to large code lengths ..."
Abstract
-
Cited by 81 (3 self)
- Add to MetaCart
(Show Context)
We propose a method for learning similaritypreserving hash functions that map highdimensional data onto binary codes. The formulation is based on structured prediction with latent variables and a hinge-like loss function. It is efficient to train for large datasets, scales well to large code lengths, and outperforms state-of-the-art methods. 1.
LDAHash: Improved matching with smaller descriptors
, 2010
"... SIFT-like local feature descriptors are ubiquitously employed in such computer vision applications as content-based retrieval, video analysis, copy detection, object recognition, photo-tourism and 3D reconstruction. Feature descriptors can be designed to be invariant to certain classes of photometri ..."
Abstract
-
Cited by 80 (10 self)
- Add to MetaCart
(Show Context)
SIFT-like local feature descriptors are ubiquitously employed in such computer vision applications as content-based retrieval, video analysis, copy detection, object recognition, photo-tourism and 3D reconstruction. Feature descriptors can be designed to be invariant to certain classes of photometric and geometric transformations, in particular, affine and intensity scale transformations. However, real transformations that an image can undergo can only be approximately modeled in this way, and thus most descriptorsareonlyapproximatelyinvariantinpractice. Secondly, descriptors are usually high-dimensional (e.g. SIFT is represented as a 128-dimensional vector). In large-scale retrieval and matching problems, this can pose challenges in storing and retrieving descriptor data. We map the descriptor vectors into the Hamming space, in which the Hamming metric is used to compare the resulting representations. This way, we reduce the size of the descriptors by representing them as short binary strings and learn descriptor invariance from examples. We show extensive experimental validation, demonstrating the advantage of the proposed approach.
Better matching with fewer features: The selection of useful features in large database recognition problems
"... There has been recent progress on the problem of recognizing specific objects in very large datasets. The most common approach has been based on the bag-of-words (BOW) method, in which local image features are clustered into visual words. This can provide significant savings in memory compared to st ..."
Abstract
-
Cited by 56 (0 self)
- Add to MetaCart
(Show Context)
There has been recent progress on the problem of recognizing specific objects in very large datasets. The most common approach has been based on the bag-of-words (BOW) method, in which local image features are clustered into visual words. This can provide significant savings in memory compared to storing and matching each feature independently. In this paper we take an additional step to reducing memory requirements by selecting only a small subset of the training features to use for recognition. This is based on the observation that many local features are unreliable or represent irrelevant clutter. We are able to select “useful ” features, which are both robust and distinctive, by an unsupervised preprocessing step that identifies correctly matching features among the training images. We demonstrate that this selection approach allows an average of 4% of the original features per image to provide matching performance that is as accurate as the full set. In addition, we employ a graph to represent the matching relationships between images. Doing so enables us to effectively augment the feature set for each image through merging of useful features of neighboring images. We demonstrate adjacent and 2-adjacent augmentation, both of which give a substantial boost in performance. (a) All image features (b) Useful image features 1.
Packing bag-of-features
- in ICCV
, 2009
"... One of the main limitations of image search based on bag-of-features is the memory usage per image. Only a few million images can be handled on a single machine in reasonable response time. In this paper, we first evaluate how the memory usage is reduced by using lossless index compression. We then ..."
Abstract
-
Cited by 55 (9 self)
- Add to MetaCart
(Show Context)
One of the main limitations of image search based on bag-of-features is the memory usage per image. Only a few million images can be handled on a single machine in reasonable response time. In this paper, we first evaluate how the memory usage is reduced by using lossless index compression. We then propose an approximate representation of bag-of-features obtained by projecting the corresponding histogram onto a set of pre-defined sparse projection functions, producing several image descriptors. Coupled with a proper indexing structure, an image is represented by a few hundred bytes. A distance expectation criterion is then used to rank the images. Our method is at least one order of magnitude faster than standard bag-of-features while providing excellent search quality. 1.
Descriptor Learning for Efficient Retrieval
"... Abstract. Many visual search and matching systems represent images using sparse sets of “visual words”: descriptors that have been quantized by assignment to the best-matching symbol in a discrete vocabulary. Errors in this quantization procedure propagate throughout the rest of the system, either h ..."
Abstract
-
Cited by 51 (1 self)
- Add to MetaCart
(Show Context)
Abstract. Many visual search and matching systems represent images using sparse sets of “visual words”: descriptors that have been quantized by assignment to the best-matching symbol in a discrete vocabulary. Errors in this quantization procedure propagate throughout the rest of the system, either harming performance or requiring correction using additional storage or processing. This paper aims to reduce these quantization errors at source, by learning a projection from descriptor space to a new Euclidean space in which standard clustering techniques are more likely to assign matching descriptors to the same cluster, and non-matching descriptors to different clusters. To achieve this, we learn a non-linear transformation model by minimizing a novel margin-based cost function, which aims to separate matching descriptors from two classes of non-matching descriptors. Training data is generated automatically by leveraging geometric consistency. Scalable, stochastic gradient methods are used for the optimization. For the case of particular object retrieval, we demonstrate impressive gains in performance on a ground truth dataset: our learnt 32-D descriptor without spatial re-ranking outperforms a baseline method using 128-D SIFT descriptors with spatial re-ranking. 1
Learning a Fine Vocabulary
"... Abstract. We present a novel similarity measure for bag-of-words type large scale image retrieval. The similarity function is learned in an unsupervised manner, requires no extra space over the standard bag-of-words method and is more discriminative than both L2-based soft assignment and Hamming emb ..."
Abstract
-
Cited by 49 (1 self)
- Add to MetaCart
Abstract. We present a novel similarity measure for bag-of-words type large scale image retrieval. The similarity function is learned in an unsupervised manner, requires no extra space over the standard bag-of-words method and is more discriminative than both L2-based soft assignment and Hamming embedding. Experimentally we show that the novel similarity function achieves mean average precision that is superior to any result published in the literature on the standard Oxford 105k dataset/protocol. At the same time, retrieval with the proposed similarity function is faster than the reference method. 1
Towards life-long visual maps
- In IEEE IROS
, 2009
"... Abstract — The typical SLAM mapping system assumes a static environment and constructs a map that is then used without regard for ongoing changes. Most SLAM systems, such as FastSLAM, also require a single connected run to create a map. In this paper we present a system of visual mapping, using only ..."
Abstract
-
Cited by 47 (0 self)
- Add to MetaCart
(Show Context)
Abstract — The typical SLAM mapping system assumes a static environment and constructs a map that is then used without regard for ongoing changes. Most SLAM systems, such as FastSLAM, also require a single connected run to create a map. In this paper we present a system of visual mapping, using only input from a stereo camera, that continually updates an optimized metric map in large indoor spaces with movable objects: people, furniture, partitions, etc. The system can be stopped and restarted at arbitrary disconnected points, is robust to occlusion and localization failures, and efficiently maintains alternative views of a dynamic environment. It operates completely online at a 30 Hz frame rate. I.
View-Based Maps
"... Robotic systems that can create and use visual maps in realtime have obvious advantages in many applications, from automatic driving to mobile manipulation in the home. In this paper we describe a mapping system based on retaining stereo views of the environment that are collected as the robot moves ..."
Abstract
-
Cited by 44 (2 self)
- Add to MetaCart
(Show Context)
Robotic systems that can create and use visual maps in realtime have obvious advantages in many applications, from automatic driving to mobile manipulation in the home. In this paper we describe a mapping system based on retaining stereo views of the environment that are collected as the robot moves. Connections among the views are formed by consistent geometric matching of their features. Out-of-sequence matching is the key problem: how to find connections from the current view to other corresponding views in the map. Our approach uses a vocabulary tree to propose candidate views, and a strong geometric filter to eliminate false positives – essentially, the robot continually re-recognizes where it is. We present experiments showing the utility of the approach on video data, including map building in large indoor and outdoor environments, map building without localization, and re-localization when lost.
Combining attributes and Fisher vectors for efficient image retrieval
- Colorado Springs, United States
, 2011
"... Attributes were recently shown to give excellent results for category recognition. In this paper, we demonstrate their performance in the context of image retrieval. First, we show that retrieving images of particular objects based on attribute vectors gives results comparable to the state of the ar ..."
Abstract
-
Cited by 43 (1 self)
- Add to MetaCart
(Show Context)
Attributes were recently shown to give excellent results for category recognition. In this paper, we demonstrate their performance in the context of image retrieval. First, we show that retrieving images of particular objects based on attribute vectors gives results comparable to the state of the art. Second, we demonstrate that combining attribute and Fisher vectors improves performance for retrieval of particular objects as well as categories. Third, we im-plement an efficient coding technique for compressing the combined descriptor to very small codes. Experimental re-sults on the Holidays dataset show that our approach sig-nificantly outperforms the state of the art, even for a very compact representation of 16 bytes per image. Retrieving category images is evaluated on the “web-queries ” dataset. We show that attribute features combined with Fisher vec-tors improve the performance and that combined image fea-tures can supplement text features. 1.