Results 1 - 10
of
104
Building rome in a day.
- In Proc. Int. Conf. on Computer Vision.
, 2009
"... We present a system that can reconstruct 3D geometry from large, unorganized collections of photographs such as those found by searching for a given city (e.g., Rome) on Internet photo-sharing sites. Our system is built on a set of new, distributed computer vision algorithms for image matching and ..."
Abstract
-
Cited by 285 (30 self)
- Add to MetaCart
(Show Context)
We present a system that can reconstruct 3D geometry from large, unorganized collections of photographs such as those found by searching for a given city (e.g., Rome) on Internet photo-sharing sites. Our system is built on a set of new, distributed computer vision algorithms for image matching and 3D reconstruction, designed to maximize parallelism at each stage of the pipeline and to scale gracefully with both the size of the problem and the amount of available computation. Our experimental results demonstrate that it is now possible to reconstruct city-scale image collections with more than a hundred thousand images in less than a day. intRoDuction Amateur photography was once largely a personal endeavor. Traditionally, a photographer would capture a moment on film and share it with a small number of friends and family members, perhaps storing a few hundred of them in a shoebox. The advent of digital photography, and the recent growth of photo-sharing Web sites such as Flickr.com, have brought about a seismic change in photography and the use of photo collections. Today, a photograph shared online can potentially be seen by millions of people. As a result, we now have access to a vast, ever-growing collection of photographs the world over capturing its cities and landmarks innumerable times. For instance, a search for the term "Rome" on Flickr returns nearly 3 million photographs. This collection represents an increasingly complete photographic record of the city, capturing every popular site, façade, interior, fountain, sculpture, painting, and café. Virtually anything that people find interesting in Rome has been captured from thousands of viewpoints and under myriad illumination and weather conditions. For example, the Trevi Fountain appears in over 50,000 of these photographs. How much of the city of Rome can be reconstructed in 3D from this photo collection? In principle, the photos of Rome on Flickr represent an ideal data set for 3D modeling research, as they capture the highlights of the city in exquisite detail and from a broad range of viewpoints. However, extracting high quality 3D models from such a collection is challenging for several reasons. First, the photos are unstructured-they are taken in no particular order and we have no control over the distribution of camera viewpoints. Second, they are uncalibrated-the photos are taken by thousands of different photographers and we know very little about the camera settings. Third, the scale of the problem is
From structure-from-motion point clouds to fast location recognition
- In IEEE Conference on Computer Vision and Pattern Recognition (CVPR
, 2009
"... Efficient view registration with respect to a given 3D reconstruction has many applications like inside-out tracking in indoor and outdoor environments, and geo-locating images from large photo collections. We present a fast location recognition technique based on structure from motion point clouds. ..."
Abstract
-
Cited by 103 (10 self)
- Add to MetaCart
(Show Context)
Efficient view registration with respect to a given 3D reconstruction has many applications like inside-out tracking in indoor and outdoor environments, and geo-locating images from large photo collections. We present a fast location recognition technique based on structure from motion point clouds. Vocabulary tree-based indexing of features directly returns relevant fragments of 3D models instead of documents from the images database. Additionally, we propose a compressed 3D scene representation which improves recognition rates while simultaneously reducing the computation time and the memory consumption. The design of our method is based on algorithms that efficiently utilize modern graphics processing units to deliver real-time performance for view registration. We demonstrate the approach by matching hand-held outdoor videos to known 3D urban models, and by registering images from online photo collections to the corresponding landmarks. 1.
Geometric min-Hashing: Finding a (Thick) Needle in a Haystack
- CVPR 2009
, 2009
"... We propose a novel hashing scheme for image retrieval, clustering and automatic object discovery. Unlike commonly used bag-of-words approaches, the spatial extent of image features is exploited in our method. The geometric information is used both to construct repeatable hash keys and to increase th ..."
Abstract
-
Cited by 93 (1 self)
- Add to MetaCart
We propose a novel hashing scheme for image retrieval, clustering and automatic object discovery. Unlike commonly used bag-of-words approaches, the spatial extent of image features is exploited in our method. The geometric information is used both to construct repeatable hash keys and to increase the discriminability of the description. Each hash key combines visual appearance (visual words) with semi-local geometric information. Compared with the state-of-the-art min-Hash, the proposed method has both higher recall (probability of collision for hashes on the same object) and lower false positive rates (random collisions). The advantages of Geometric min-Hashing approach are most pronounced in the presence of viewpoint and scale change, significant occlusion or small physical overlap of the viewing fields. We demonstrate the power of the proposed method on small object discovery in a large unordered collection of images and on a large scale image clustering problem.
Building Rome on a Cloudless Day
"... Abstract. This paper introduces an approach for dense 3D reconstruction from unregistered Internet-scale photo collections with about 3 million images within the span of a day on a single PC (“cloudless”). Our method advances image clustering, stereo, stereo fusion and structure from motion to achie ..."
Abstract
-
Cited by 90 (13 self)
- Add to MetaCart
(Show Context)
Abstract. This paper introduces an approach for dense 3D reconstruction from unregistered Internet-scale photo collections with about 3 million images within the span of a day on a single PC (“cloudless”). Our method advances image clustering, stereo, stereo fusion and structure from motion to achieve high computational performance. We leverage geometric and appearance constraints to obtain a highly parallel implementation on modern graphics processors and multi-core architectures. This leads to two orders of magnitude higher performance on an order of magnitude larger dataset than competing state-of-the-art approaches. 1
Tour the world: Building a web-scale landmark recognition engine
- in: IEEE Conference on Computer Vision and Pattern Recognition, Electronic Proceedings
, 2009
"... Modeling and recognizing landmarks at world-scale is a useful yet challenging task. There exists no readily available list of worldwide landmarks. Obtaining reliable visual models for each landmark can also pose problems, and efficiency is another challenge for such a large scale system. This paper ..."
Abstract
-
Cited by 78 (1 self)
- Add to MetaCart
(Show Context)
Modeling and recognizing landmarks at world-scale is a useful yet challenging task. There exists no readily available list of worldwide landmarks. Obtaining reliable visual models for each landmark can also pose problems, and efficiency is another challenge for such a large scale system. This paper leverages the vast amount of multimedia data on the web, the availability of an Internet image search engine, and advances in object recognition and clustering techniques, to address these issues. First, a comprehensive list of landmarks is mined from two sources: (1) ∼20 million GPS-tagged photos and (2) online tour guide web pages. Candidate images for each landmark are then obtained from photo sharing websites or by querying an image search engine. Second, landmark visual models are built by pruning candidate images using efficient image matching and unsupervised clustering techniques. Finally, the landmarks and their visual models are validated by checking authorship of their member images. The resulting landmark recognition engine incorporates 5312 landmarks from 1259 cities in 144 countries. The experiments demonstrate that the engine can deliver satisfactory recognition performance with high efficiency. 1.
Location Recognition using Prioritized Feature Matching
"... Abstract. We present a fast, simple location recognition and image localization method that leverages feature correspondence and geometry estimated from large Internet photo collections. Such recovered structure contains a significant amount of useful information about images and image features that ..."
Abstract
-
Cited by 71 (5 self)
- Add to MetaCart
(Show Context)
Abstract. We present a fast, simple location recognition and image localization method that leverages feature correspondence and geometry estimated from large Internet photo collections. Such recovered structure contains a significant amount of useful information about images and image features that is not available when considering images in isolation. For instance, we can predict which views will be the most common, which feature points in a scene are most reliable, and which features in the scene tend to co-occur in the same image. Based on this information, we devise an adaptive, prioritized algorithm for matching a representative set of SIFT features covering a large scene to a query image for efficient localization. Our approach is based on considering features in the scene database, and matching them to query image features, as opposed to more conventional methods that match image features to visual words or database features. We find this approach results in improved performance, due to the richer knowledge of characteristics of the database features compared to query image features. We present experiments on two large city-scale photo collections, showing that our algorithm compares favorably to image retrieval-style approaches to location recognition.
Discrete-continuous optimization for large-scale structure from motion
- in Proc. IEEE Conf. on Computer Vision and Pattern Recognition
, 2011
"... Recent work in structure from motion (SfM) has successfully built 3D models from large unstructured collections of images downloaded from the Internet. Most approaches use incremental algorithms that solve progressively larger bundle adjustment problems. These incremental techniques scale poorly as ..."
Abstract
-
Cited by 61 (3 self)
- Add to MetaCart
(Show Context)
Recent work in structure from motion (SfM) has successfully built 3D models from large unstructured collections of images downloaded from the Internet. Most approaches use incremental algorithms that solve progressively larger bundle adjustment problems. These incremental techniques scale poorly as the number of images grows, and can drift or fall into bad local minima. We present an alternative formulation for SfM based on finding a coarse initial solution using a hybrid discrete-continuous optimization, and then improving that solution using bundle adjustment. The initial optimization step uses a discrete Markov random field (MRF) formulation, coupled with a continuous Levenberg-Marquardt refinement. The formulation naturally incorporates various sources of information about both the cameras and the points, including noisy geotags and vanishing point estimates. We test our method on several large-scale photo collections, including one with measured camera positions, and show that it can produce models that are similar to or better than those produced with incremental bundle adjustment, but more robustly and in a fraction of the time. 1.
Towards Linear-time Incremental Structure from Motion
"... The time complexity of incremental structure from motion (SfM) is often known as O(n4) with respect to the number of cameras. As bundle adjustment (BA) being significantly improved recently by preconditioned conjugate gradient (PCG), it is worth revisiting how fast incremental SfM is. We introduce a ..."
Abstract
-
Cited by 52 (2 self)
- Add to MetaCart
(Show Context)
The time complexity of incremental structure from motion (SfM) is often known as O(n4) with respect to the number of cameras. As bundle adjustment (BA) being significantly improved recently by preconditioned conjugate gradient (PCG), it is worth revisiting how fast incremental SfM is. We introduce a novel BA strategy that provides good balance between speed and accuracy. Through algorithm analysis and extensive experiments, we show that incremental SfM requires only O(n) time on many major steps including BA. Our method maintains high accuracy by regularly re-triangulating the feature matches that initially fail to triangulate. We test our algorithm on large photo collections and long video sequences with various settings, and show that our method offers state of the art performance for large-scale reconstructions. The presented algorithm is available as part of VisualSFM at
Landmark Classification in Large-scale Image Collections
"... With the rise of photo-sharing websites such as Facebook and Flickr has come dramatic growth in the number of photographs online. Recent research in object recognition has used such sites as a source of image data, but the test images have been selected and labeled by hand, yielding relatively small ..."
Abstract
-
Cited by 50 (4 self)
- Add to MetaCart
(Show Context)
With the rise of photo-sharing websites such as Facebook and Flickr has come dramatic growth in the number of photographs online. Recent research in object recognition has used such sites as a source of image data, but the test images have been selected and labeled by hand, yielding relatively small validation sets. In this paper we study image classification on a much larger dataset of 30 million images, including nearly 2 million of which have been labeled into one of 500 categories. The dataset and categories are formed automatically from geotagged photos from Flickr, by looking for peaks in the spatial geotag distribution corresponding to frequently-photographed landmarks. We learn models for these landmarks with a multiclass support vector machine, using vector-quantized interest point descriptors as features. We also explore the non-visual information available on modern photo-sharing sites, showing that using textual tags and temporal constraints leads to significant improvements in classification rate. We find that in some cases image features alone yield comparable classification accuracy to using text tags as well as to the performance of human observers. 1.