Results 1 - 10
of
187
Modeling the World from Internet Photo Collections
- INT J COMPUT VIS
, 2007
"... There are billions of photographs on the Internet, comprising the largest and most diverse photo collection ever assembled. How can computer vision researchers exploit this imagery? This paper explores this question from the standpoint of 3D scene modeling and visualization. We present structure-fro ..."
Abstract
-
Cited by 267 (6 self)
- Add to MetaCart
There are billions of photographs on the Internet, comprising the largest and most diverse photo collection ever assembled. How can computer vision researchers exploit this imagery? This paper explores this question from the standpoint of 3D scene modeling and visualization. We present structure-from-motion and image-based rendering algorithms that operate on hundreds of images downloaded as a result of keyword-based image search queries like “Notre Dame ” or “Trevi Fountain.” This approach, which we call Photo Tourism, has enabled reconstructions of numerous well-known world sites. This paper presents these algorithms and results as a first step towards 3D modeling of the world’s well-photographed sites, cities, and landscapes from Internet imagery, and discusses key open problems and challenges for the research community.
Learning Local Image Descriptors
- Proc. IEEE Conf. Computer Vision and Pattern Recognition
, 2007
"... Abstract—In this paper, we explore methods for learning local image descriptors from training data. We describe a set of building blocks for constructing descriptors which can be combined together and jointly optimized so as to minimize the error of a nearest-neighbor classifier. We consider both li ..."
Abstract
-
Cited by 174 (2 self)
- Add to MetaCart
Abstract—In this paper, we explore methods for learning local image descriptors from training data. We describe a set of building blocks for constructing descriptors which can be combined together and jointly optimized so as to minimize the error of a nearest-neighbor classifier. We consider both linear and nonlinear transforms with dimensionality reduction, and make use of discriminant learning techniques such as Linear Discriminant Analysis (LDA) and Powell minimization to solve for the parameters. Using these techniques, we obtain descriptors that exceed state-of-the-art performance with low dimensionality. In addition to new experiments and recommendations for descriptor learning, we are also making available a new and realistic ground truth data set based on multiview stereo data. Index Terms—Image descriptors, local features, discriminative learning, SIFT. Ç 1
Modeling and recognition of landmark image collections using iconic scene graphs
- In ECCV
"... Abstract. This paper presents an approach for modeling landmark sites such as the Statue of Liberty based on large-scale contaminated image collections gathered from the Internet. Our system combines 2D appearance and 3D geometric constraints to efficiently extract scene summaries, build 3D models, ..."
Abstract
-
Cited by 104 (11 self)
- Add to MetaCart
(Show Context)
Abstract. This paper presents an approach for modeling landmark sites such as the Statue of Liberty based on large-scale contaminated image collections gathered from the Internet. Our system combines 2D appearance and 3D geometric constraints to efficiently extract scene summaries, build 3D models, and recognize instances of the landmark in new test images. We start by clustering images using low-dimensional global “gist” descriptors. Next, we perform geometric verification to retain only the clusters whose images share a common 3D structure. Each valid cluster is then represented by a single iconic view, and geometric relationships between iconic views are captured by an iconic scene graph. In addition to serving as a compact scene summary, this graph is used to guide structure from motion to efficiently produce 3D models of the different aspects of the landmark. The set of iconic images is also used for recognition, i.e., determining whether new test images contain the landmark. Results on three data sets consisting of tens of thousands of images demonstrate the potential of the proposed approach. 1
World-scale Mining of Objects and Events from Community Photo Collections
- CIVR'08
, 2008
"... In this paper, we describe an approach for mining images of objects (such as touristic sights) from community photo collections in an unsupervised fashion. Our approach relies on retrieving geotagged photos from those web-sites using a grid of geospatial tiles. The downloaded photos are clustered in ..."
Abstract
-
Cited by 93 (4 self)
- Add to MetaCart
In this paper, we describe an approach for mining images of objects (such as touristic sights) from community photo collections in an unsupervised fashion. Our approach relies on retrieving geotagged photos from those web-sites using a grid of geospatial tiles. The downloaded photos are clustered into potentially interesting entities through a processing pipeline of several modalities, including visual, textual and spatial proximity. The resulting clusters are analyzed and are automatically classified into objects and events. Using mining techniques, we then find text labels for these clusters, which are used to again assign each cluster to a corresponding Wikipedia article in a fully unsupervised manner. A final verification step uses the contents (including images) from the selected Wikipedia article to verify the cluster-article assignment. We demonstrate this approach on several urban areas, densely covering an area of over 700 square kilometers and mining over 200,000 photos, making it probably the largest experiment of its kind to date.
Building Rome on a Cloudless Day
"... Abstract. This paper introduces an approach for dense 3D reconstruction from unregistered Internet-scale photo collections with about 3 million images within the span of a day on a single PC (“cloudless”). Our method advances image clustering, stereo, stereo fusion and structure from motion to achie ..."
Abstract
-
Cited by 90 (13 self)
- Add to MetaCart
(Show Context)
Abstract. This paper introduces an approach for dense 3D reconstruction from unregistered Internet-scale photo collections with about 3 million images within the span of a day on a single PC (“cloudless”). Our method advances image clustering, stereo, stereo fusion and structure from motion to achieve high computational performance. We leverage geometric and appearance constraints to obtain a highly parallel implementation on modern graphics processors and multi-core architectures. This leads to two orders of magnitude higher performance on an order of magnitude larger dataset than competing state-of-the-art approaches. 1
Tour the world: Building a web-scale landmark recognition engine
- in: IEEE Conference on Computer Vision and Pattern Recognition, Electronic Proceedings
, 2009
"... Modeling and recognizing landmarks at world-scale is a useful yet challenging task. There exists no readily available list of worldwide landmarks. Obtaining reliable visual models for each landmark can also pose problems, and efficiency is another challenge for such a large scale system. This paper ..."
Abstract
-
Cited by 78 (1 self)
- Add to MetaCart
(Show Context)
Modeling and recognizing landmarks at world-scale is a useful yet challenging task. There exists no readily available list of worldwide landmarks. Obtaining reliable visual models for each landmark can also pose problems, and efficiency is another challenge for such a large scale system. This paper leverages the vast amount of multimedia data on the web, the availability of an Internet image search engine, and advances in object recognition and clustering techniques, to address these issues. First, a comprehensive list of landmarks is mined from two sources: (1) ∼20 million GPS-tagged photos and (2) online tour guide web pages. Candidate images for each landmark are then obtained from photo sharing websites or by querying an image search engine. Second, landmark visual models are built by pruning candidate images using efficient image matching and unsupervised clustering techniques. Finally, the landmarks and their visual models are validated by checking authorship of their member images. The resulting landmark recognition engine incorporates 5312 landmarks from 1259 cities in 144 countries. The experiments demonstrate that the engine can deliver satisfactory recognition performance with high efficiency. 1.
Finding paths through the world’s photos
- In SIGGRAPH
, 2008
"... When a scene is photographed many times by different people, the viewpointsoftenclusteralongcertainpaths. Thesepathsarelargely specifictothescenebeingphotographed,andfollowinterestingregions and viewpoints. We seek to discover a range of such paths and turn them into controls for image-based renderi ..."
Abstract
-
Cited by 77 (9 self)
- Add to MetaCart
When a scene is photographed many times by different people, the viewpointsoftenclusteralongcertainpaths. Thesepathsarelargely specifictothescenebeingphotographed,andfollowinterestingregions and viewpoints. We seek to discover a range of such paths and turn them into controls for image-based rendering. Our approach takes as input a large set of community or personal photos, reconstructscameraviewpoints,andautomaticallycomputesorbits, panoramas,canonicalviews,andoptimalpathsbetweenviews. The scene can then be interactively browsed in 3D using these controls or with six degree-of-freedom free-viewpoint control. As the userbrowsesthescene,nearbyviewsarecontinuouslyselectedand transformed,usingcontrol-adaptive reprojection techniques. 1
Interactive 3d architectural modeling from unordered photo collections
- Proc. of SIGGRAPH Asia 2008
, 2008
"... Figure 1: Our interactive image-based modeling system provides an intuitive sketch-based interface for reconstructing a photorealistic textured piecewise planar 3D model of a building or architectural scene from an unordered collection of photographs. We present an interactive system for generating ..."
Abstract
-
Cited by 72 (6 self)
- Add to MetaCart
Figure 1: Our interactive image-based modeling system provides an intuitive sketch-based interface for reconstructing a photorealistic textured piecewise planar 3D model of a building or architectural scene from an unordered collection of photographs. We present an interactive system for generating photorealistic, textured, piecewise-planar 3D models of architectural structures and urban scenes from unordered sets of photographs. To reconstruct 3D geometry in our system, the user draws outlines overlaid on 2D photographs. The 3D structure is then automatically computed by combining the 2D interaction with the multi-view geometric information recovered by performing structure from motion analysis on the input photographs. We utilize vanishing point constraints at multiple stages during the reconstruction, which is particularly useful for architectural scenes where parallel lines are abundant. Our approach enables us to accurately model polygonal faces from 2D interactions in a single image. Our system also supports useful operations such as edge snapping and extrusions. Seamless texture maps are automatically generated by combining multiple input photographs using graph cut optimization and Poisson blending. The user can add brush strokes as hints during the texture generation stage to remove artifacts caused by unmodeled geometric structures. We build models for a variety of architectural scenes from collections of up to about a hundred photographs. CR Categories: I.3.7 [Computer Graphics]: Three-dimensional graphics and realism, image-based modeling, texture mapping—
Evaluation of Stereo Matching Costs on Images with Radiometric Differences
- IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2009
"... Stereo correspondence methods rely on matching costs for computing the similarity of image locations. We evaluate the insensitivity of different costs for passive binocular stereo methods with respect to radiometric variations of the input images. We consider both pixel-based and window-based varian ..."
Abstract
-
Cited by 71 (2 self)
- Add to MetaCart
Stereo correspondence methods rely on matching costs for computing the similarity of image locations. We evaluate the insensitivity of different costs for passive binocular stereo methods with respect to radiometric variations of the input images. We consider both pixel-based and window-based variants like the absolute difference, the sampling-insensitive absolute difference, and normalized cross correlation, as well as their zero-mean versions. We also consider filters like LoG, mean, and bilateral background subtraction (BilSub) and non-parametric measures like Rank, SoftRank, Census, and Ordinal. Finally, hierarchical mutual information (HMI) is considered as pixelwise cost. Using stereo datasets with ground-truth disparities taken under controlled changes of exposure and lighting, we evaluate the costs with a local, a semi-global, and a global stereo method. We measure the performance of all costs in the presence of simulated and real radiometric differences, including exposure differences, vignetting, varying lighting and noise. Overall, the ranking of methods across all datasets and experiments appears to be consistent. Among the best costs are BilSub, which performs consistently very well for low radiometric differences; HMI, which is slightly better as pixel-wise matching cost in some cases and for strong image noise; and Census, which showed the best and most robust overall performance.