Results 1 - 10
of
93
Aggregating local descriptors into a compact image representation
"... We address the problem of image search on a very large scale, where three constraints have to be considered jointly: the accuracy of the search, its efficiency, and the memory usage of the representation. We first propose a simple yet efficient way of aggregating local image descriptors into a vecto ..."
Abstract
-
Cited by 226 (19 self)
- Add to MetaCart
We address the problem of image search on a very large scale, where three constraints have to be considered jointly: the accuracy of the search, its efficiency, and the memory usage of the representation. We first propose a simple yet efficient way of aggregating local image descriptors into a vector of limited dimension, which can be viewed as a simplification of the Fisher kernel representation. We then show how to jointly optimize the dimension reduction and the indexing algorithm, so that it best preserves the quality of vector comparison. The evaluation shows that our approach significantly outperforms the state of the art: the search ac-curacy is comparable to the bag-of-features approach for an image representation that fits in 20 bytes. Searching a 10 million image dataset takes about 50ms.
Iterative quantization: A procrustean approach to learning binary codes
- In Proc. of the IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR
, 2011
"... This paper addresses the problem of learning similaritypreserving binary codes for efficient retrieval in large-scale image collections. We propose a simple and efficient alternating minimization scheme for finding a rotation of zerocentered data so as to minimize the quantization error of mapping t ..."
Abstract
-
Cited by 157 (6 self)
- Add to MetaCart
(Show Context)
This paper addresses the problem of learning similaritypreserving binary codes for efficient retrieval in large-scale image collections. We propose a simple and efficient alternating minimization scheme for finding a rotation of zerocentered data so as to minimize the quantization error of mapping this data to the vertices of a zero-centered binary hypercube. This method, dubbed iterative quantization (ITQ), has connections to multi-class spectral clustering and to the orthogonal Procrustes problem, and it can be used both with unsupervised data embeddings such as PCA and supervised embeddings such as canonical correlation analysis (CCA). Our experiments show that the resulting binary coding schemes decisively outperform several other state-of-the-art methods. 1.
Descriptor Learning for Efficient Retrieval
"... Abstract. Many visual search and matching systems represent images using sparse sets of “visual words”: descriptors that have been quantized by assignment to the best-matching symbol in a discrete vocabulary. Errors in this quantization procedure propagate throughout the rest of the system, either h ..."
Abstract
-
Cited by 51 (1 self)
- Add to MetaCart
(Show Context)
Abstract. Many visual search and matching systems represent images using sparse sets of “visual words”: descriptors that have been quantized by assignment to the best-matching symbol in a discrete vocabulary. Errors in this quantization procedure propagate throughout the rest of the system, either harming performance or requiring correction using additional storage or processing. This paper aims to reduce these quantization errors at source, by learning a projection from descriptor space to a new Euclidean space in which standard clustering techniques are more likely to assign matching descriptors to the same cluster, and non-matching descriptors to different clusters. To achieve this, we learn a non-linear transformation model by minimizing a novel margin-based cost function, which aims to separate matching descriptors from two classes of non-matching descriptors. Training data is generated automatically by leveraging geometric consistency. Scalable, stochastic gradient methods are used for the optimization. For the case of particular object retrieval, we demonstrate impressive gains in performance on a ground truth dataset: our learnt 32-D descriptor without spatial re-ranking outperforms a baseline method using 128-D SIFT descriptors with spatial re-ranking. 1
What Makes Paris Look Like Paris?
, 2012
"... Given a large repository of geotagged imagery, we seek to automatically find visual elements, e.g. windows, balconies, and street signs, that are most distinctive for a certain geo-spatial area, for example the city of Paris. This is a tremendously difficult task as the visual features distinguish ..."
Abstract
-
Cited by 49 (8 self)
- Add to MetaCart
Given a large repository of geotagged imagery, we seek to automatically find visual elements, e.g. windows, balconies, and street signs, that are most distinctive for a certain geo-spatial area, for example the city of Paris. This is a tremendously difficult task as the visual features distinguishing architectural elements of different places can be very subtle. In addition, we face a hard search problem: given all possible patches in all images, which of them are both frequently occurring and geographically informative? To address these issues, we propose to use a discriminative clustering
Visual-Inertial Navigation, Mapping and Localization: A Scalable Real-Time Causal Approach
, 2010
"... We present a model to estimate motion from monocular visual and inertial measurements. We analyze the model and characterize the conditions under which its state is observable, and its parameters are identifiable. These include the unknown gravity vector, and the unknown transformation between the c ..."
Abstract
-
Cited by 47 (1 self)
- Add to MetaCart
We present a model to estimate motion from monocular visual and inertial measurements. We analyze the model and characterize the conditions under which its state is observable, and its parameters are identifiable. These include the unknown gravity vector, and the unknown transformation between the camera coordinate frame and the inertial unit. We show that it is possible to estimate both state and parameters as part of an on-line procedure, but only provided that the motion sequence is “rich enough,” a condition that we characterize explicitly. We then describe an efficient implementation of a filter to estimate the state and parameters of this model, including gravity and camera-to-inertial calibration. It runs in real-time on an embedded platform, and its performance has been tested extensively. We report experiments of continuous operation, without failures, re-initialization, or re-calibration, on paths of length up to 30Km. We also describe an integrated approach to “loop-closure,” that is the recognition of previously-seen locations and the topological re-adjustment of the traveled path. It represents visual features relative to the global orientation reference provided by the gravity vector estimated by the filter, and relative to the scale provided by their known position within the map; these features are organized into “locations ” defined by visibility constraints, represented in a topological graph, where loop closure can be performed without the need to re-compute past trajectories or perform bundle adjustment. The software infrastructure as well as the embedded platform is described in detail in a technical report (Jones and Soatto (2009).)
Avoiding confusing features in place recognition
"... We seek to recognize the place depicted in a query image using a database of “street side” images annotated with geolocation information. This is a challenging task due to changes in scale, viewpoint and lighting between the query and the images in the database. One of the key problems in place re ..."
Abstract
-
Cited by 43 (3 self)
- Add to MetaCart
We seek to recognize the place depicted in a query image using a database of “street side” images annotated with geolocation information. This is a challenging task due to changes in scale, viewpoint and lighting between the query and the images in the database. One of the key problems in place recognition is the presence of objects such as trees or road markings, which frequently occur in the database and hence cause significant confusion between different places. As the main contribution, we show how to avoid features leading to confusion of particular places by using geotags attached to database images as a form of supervision. We develop a method for automatic detection of image-specific and spatially-localized groups of confusing features, and demonstrate that suppressing them significantly improves place recognition performance while reducing the database size. We show the method combines well with the state of the art bag-of-features model including query expansion, and demonstrate place recognition that generalizes over wide range of viewpoints and lighting conditions. Results are shown on a geotagged database of over 17K images of Paris downloaded from Google Street View.
Image Webs: Computing and Exploiting Connectivity in Image Collections
"... The widespread availability of digital cameras and ubiquitous Internet access have facilitated the creation of massive image collections. These collections can be highly interconnected through implicit links between image pairs viewing the same or similar objects. We propose building graphs called I ..."
Abstract
-
Cited by 30 (2 self)
- Add to MetaCart
The widespread availability of digital cameras and ubiquitous Internet access have facilitated the creation of massive image collections. These collections can be highly interconnected through implicit links between image pairs viewing the same or similar objects. We propose building graphs called Image Webs to represent such connections. While earlier efforts studied local neighborhoods of such graphs, we are interested in understanding global structure and exploiting connectivity at larger scales. We show how to efficiently construct Image Webs that capture the connectivity in an image collection using spectral graph theory. Our technique can link together tens of thousands of images in a few minutes using a computer cluster. We also demonstrate applications for exploring collections based on global topological analysis.
SCRAMSAC: Improving RANSAC’s efficiency with a spatial consistency filter
- In ICCV
, 2009
"... Geometric verification with RANSAC has become a cru-cial step for many local feature based matching applica-tions. Therefore, the details of its implementation are di-rectly relevant for an application’s run-time and the qual-ity of the estimated results. In this paper, we propose a RANSAC extension ..."
Abstract
-
Cited by 28 (4 self)
- Add to MetaCart
(Show Context)
Geometric verification with RANSAC has become a cru-cial step for many local feature based matching applica-tions. Therefore, the details of its implementation are di-rectly relevant for an application’s run-time and the qual-ity of the estimated results. In this paper, we propose a RANSAC extension that is several orders of magnitude faster than standard RANSAC and as fast as and more ro-bust to degenerate configurations than PROSAC, the cur-rently fastest RANSAC extension from the literature. In ad-dition, our proposed method is simple to implement and does not require parameter tuning. Its main component is a spatial consistency check that results in a reduced cor-respondence set with a significantly increased inlier ra-tio, leading to faster convergence of the remaining esti-mation steps. In addition, we experimentally demonstrate that RANSAC can operate entirely on the reduced set not only for sampling, but also for its consensus step, leading to additional speed-ups. The resulting approach is widely applicable and can be readily combined with other exten-sions from the literature. We quantitatively evaluate our approach’s robustness on a variety of challenging datasets and compare its performance to the state-of-the-art. 1.
Efficient Sequential Correspondence Selection by Cosegmentation
, 2009
"... In many retrieval, object recognition and wide baseline stereo methods, correspondences of interest points (distinguished regions) are commonly established by matching compact descriptors such as SIFTs. We show that a subsequent cosegmentation process coupled with a quasi-optimal sequential decision ..."
Abstract
-
Cited by 27 (7 self)
- Add to MetaCart
In many retrieval, object recognition and wide baseline stereo methods, correspondences of interest points (distinguished regions) are commonly established by matching compact descriptors such as SIFTs. We show that a subsequent cosegmentation process coupled with a quasi-optimal sequential decision process leads to a correspondence verification procedure that (i) has high precision (is highly discriminative) (ii) has good recall and (iii) is fast. The sequential decision on the correctness of a correspondence is based on simple statistics of a modified dense stereo matching algorithm. The statistics are projected on a prominent discriminative direction by SVM. Wald’s sequential probability ratio test is performed on the SVM projection computed on progressively larger cosegmented regions. We show experimentally that the proposed Sequential Correspondence Verification (SCV) algorithm significantly outperforms the standard correspondence selection method based on SIFT distance ratios on challenging matching problems.