Results 1 - 10
of
285
R.: Towards internet-scale multiview stereo
- In: Proceedings of IEEE CVPR
, 2010
"... This paper introduces an approach for enabling existing multi-view stereo methods to operate on extremely large unstructured photo collections. The main idea is to decompose the collection into a set of overlapping sets of photos that can be processed in parallel, and to merge the resulting reconstr ..."
Abstract
-
Cited by 101 (6 self)
- Add to MetaCart
(Show Context)
This paper introduces an approach for enabling existing multi-view stereo methods to operate on extremely large unstructured photo collections. The main idea is to decompose the collection into a set of overlapping sets of photos that can be processed in parallel, and to merge the resulting reconstructions. This overlapping clustering problem is formulated as a constrained optimization and solved iteratively. The merging algorithm, designed to be parallel and out-of-core, incorporates robust filtering steps to eliminate low-quality reconstructions and enforce global visibility constraints. The approach has been tested on several large datasets downloaded from Flickr.com, including one with over ten thousand images, yielding a 3D reconstruction with nearly thirty million points. 1.
Building Rome on a Cloudless Day
"... Abstract. This paper introduces an approach for dense 3D reconstruction from unregistered Internet-scale photo collections with about 3 million images within the span of a day on a single PC (“cloudless”). Our method advances image clustering, stereo, stereo fusion and structure from motion to achie ..."
Abstract
-
Cited by 90 (13 self)
- Add to MetaCart
(Show Context)
Abstract. This paper introduces an approach for dense 3D reconstruction from unregistered Internet-scale photo collections with about 3 million images within the span of a day on a single PC (“cloudless”). Our method advances image clustering, stereo, stereo fusion and structure from motion to achieve high computational performance. We leverage geometric and appearance constraints to obtain a highly parallel implementation on modern graphics processors and multi-core architectures. This leads to two orders of magnitude higher performance on an order of magnitude larger dataset than competing state-of-the-art approaches. 1
Location Recognition using Prioritized Feature Matching
"... Abstract. We present a fast, simple location recognition and image localization method that leverages feature correspondence and geometry estimated from large Internet photo collections. Such recovered structure contains a significant amount of useful information about images and image features that ..."
Abstract
-
Cited by 71 (5 self)
- Add to MetaCart
(Show Context)
Abstract. We present a fast, simple location recognition and image localization method that leverages feature correspondence and geometry estimated from large Internet photo collections. Such recovered structure contains a significant amount of useful information about images and image features that is not available when considering images in isolation. For instance, we can predict which views will be the most common, which feature points in a scene are most reliable, and which features in the scene tend to co-occur in the same image. Based on this information, we devise an adaptive, prioritized algorithm for matching a representative set of SIFT features covering a large scene to a query image for efficient localization. Our approach is based on considering features in the scene database, and matching them to query image features, as opposed to more conventional methods that match image features to visual words or database features. We find this approach results in improved performance, due to the richer knowledge of characteristics of the database features compared to query image features. We present experiments on two large city-scale photo collections, showing that our algorithm compares favorably to image retrieval-style approaches to location recognition.
Stereoscan: Dense 3d reconstruction in real-time
- in IEEE Intelligent Vehicles Symposium, 2011
"... Abstract — Accurate 3d perception from video sequences is a core subject in computer vision and robotics, since it forms the basis of subsequent scene analysis. In practice however, online requirements often severely limit the utilizable camera resolution and hence also reconstruction accuracy. Furt ..."
Abstract
-
Cited by 69 (10 self)
- Add to MetaCart
(Show Context)
Abstract — Accurate 3d perception from video sequences is a core subject in computer vision and robotics, since it forms the basis of subsequent scene analysis. In practice however, online requirements often severely limit the utilizable camera resolution and hence also reconstruction accuracy. Furthermore, real-time systems often rely on heavy parallelism which can prevent applications in mobile devices or driver assistance systems, especially in cases where FPGAs cannot be employed. This paper proposes a novel approach to build 3d maps from high-resolution stereo sequences in real-time. Inspired by recent progress in stereo matching, we propose a sparse feature matcher in conjunction with an efficient and robust visual odometry algorithm. Our reconstruction pipeline combines both techniques with efficient stereo matching and a multi-view linking scheme for generating consistent 3d point clouds. In our experiments we show that the proposed odometry method achieves state-of-the-art accuracy. Including feature matching, the visual odometry part of our algorithm runs at 25 frames per second, while – at the same time – we obtain new depth maps at 3-4 fps, sufficient for online 3d reconstructions. I.
Multicore bundle adjustment
- In IEEE Conference on Computer Vision and Pattern Recognition (CVPR
, 2011
"... We present the design and implementation of new inexact Newton type Bundle Adjustment algorithms that exploit hardware parallelism for efficiently solving large scale 3D scene reconstruction problems. We explore the use of multicore CPU as well as multicore GPUs for this purpose. We show that overco ..."
Abstract
-
Cited by 61 (4 self)
- Add to MetaCart
(Show Context)
We present the design and implementation of new inexact Newton type Bundle Adjustment algorithms that exploit hardware parallelism for efficiently solving large scale 3D scene reconstruction problems. We explore the use of multicore CPU as well as multicore GPUs for this purpose. We show that overcoming the severe memory and bandwidth limitations of current generation GPUs not only leads to more space efficient algorithms, but also to surprising savings in runtime. Our CPU based system is up to ten times and our GPU based system is up to thirty times faster than the current state of the art methods [1], while maintaining comparable convergence behavior. The code and additional results are available at
Towards Linear-time Incremental Structure from Motion
"... The time complexity of incremental structure from motion (SfM) is often known as O(n4) with respect to the number of cameras. As bundle adjustment (BA) being significantly improved recently by preconditioned conjugate gradient (PCG), it is worth revisiting how fast incremental SfM is. We introduce a ..."
Abstract
-
Cited by 52 (2 self)
- Add to MetaCart
(Show Context)
The time complexity of incremental structure from motion (SfM) is often known as O(n4) with respect to the number of cameras. As bundle adjustment (BA) being significantly improved recently by preconditioned conjugate gradient (PCG), it is worth revisiting how fast incremental SfM is. We introduce a novel BA strategy that provides good balance between speed and accuracy. Through algorithm analysis and extensive experiments, we show that incremental SfM requires only O(n) time on many major steps including BA. Our method maintains high accuracy by regularly re-triangulating the feature matches that initially fail to triangulate. We test our algorithm on large photo collections and long video sequences with various settings, and show that our method offers state of the art performance for large-scale reconstructions. The presented algorithm is available as part of VisualSFM at
Learning a Fine Vocabulary
"... Abstract. We present a novel similarity measure for bag-of-words type large scale image retrieval. The similarity function is learned in an unsupervised manner, requires no extra space over the standard bag-of-words method and is more discriminative than both L2-based soft assignment and Hamming emb ..."
Abstract
-
Cited by 49 (1 self)
- Add to MetaCart
(Show Context)
Abstract. We present a novel similarity measure for bag-of-words type large scale image retrieval. The similarity function is learned in an unsupervised manner, requires no extra space over the standard bag-of-words method and is more discriminative than both L2-based soft assignment and Hamming embedding. Experimentally we show that the novel similarity function achieves mean average precision that is superior to any result published in the literature on the standard Oxford 105k dataset/protocol. At the same time, retrieval with the proposed similarity function is faster than the reference method. 1
Image Webs: Computing and Exploiting Connectivity in Image Collections
"... The widespread availability of digital cameras and ubiquitous Internet access have facilitated the creation of massive image collections. These collections can be highly interconnected through implicit links between image pairs viewing the same or similar objects. We propose building graphs called I ..."
Abstract
-
Cited by 30 (2 self)
- Add to MetaCart
The widespread availability of digital cameras and ubiquitous Internet access have facilitated the creation of massive image collections. These collections can be highly interconnected through implicit links between image pairs viewing the same or similar objects. We propose building graphs called Image Webs to represent such connections. While earlier efforts studied local neighborhoods of such graphs, we are interested in understanding global structure and exploiting connectivity at larger scales. We show how to efficiently construct Image Webs that capture the connectivity in an image collection using spectral graph theory. Our technique can link together tens of thousands of images in a few minutes using a computer cluster. We also demonstrate applications for exploring collections based on global topological analysis.
Retrieving landmark and nonlandmark images from community photo collections
- In ACM Multimedia
, 2010
"... State of the art data mining and image retrieval in community photo collections typically focus on popular subsets, e.g. images containing landmarks or associated to Wikipedia articles. We propose an image clustering scheme that, seen as vector quantization, compresses a large corpus of images by gr ..."
Abstract
-
Cited by 27 (6 self)
- Add to MetaCart
(Show Context)
State of the art data mining and image retrieval in community photo collections typically focus on popular subsets, e.g. images containing landmarks or associated to Wikipedia articles. We propose an image clustering scheme that, seen as vector quantization, compresses a large corpus of images by grouping visually consistent ones while providing a guaranteed distortion bound. This allows us, for instance, to represent the visual content of all thousands of images depicting the Parthenon in just a few dozens of scene maps and still be able to retrieve any single, isolated, non-landmark image like a house or a graffiti on a wall. Starting from a geo-tagged dataset, we first group images geographically and then visually, where each visual cluster is assumed to depict different views of the the same scene. We align all views to one reference image and construct a 2D scene map by preserving details from all images while discarding repeating visual features. Our indexing, retrieval and spatial matching scheme then operates directly on scene maps. We evaluate the precision of the proposed method on a challenging one-million urban image dataset.