Results 1 - 10
of
562
An Efficient Solution to the Five-Point Relative Pose Problem
, 2004
"... An efficient algorithmic solution to the classical five-point relative pose problem is presented. The problem is to find the possible solutions for relative camera pose between two calibrated views given five corresponding points. The algorithm consists of computing the coefficients of a tenth degre ..."
Abstract
-
Cited by 484 (13 self)
- Add to MetaCart
An efficient algorithmic solution to the classical five-point relative pose problem is presented. The problem is to find the possible solutions for relative camera pose between two calibrated views given five corresponding points. The algorithm consists of computing the coefficients of a tenth degree polynomial in closed form and subsequently finding its roots. It is the first algorithm well suited for numerical implementation that also corresponds to the inherent complexity of the problem. We investigate the numerical precision of the algorithm. We also study its performance under noise in minimal as well as over-determined cases. The performance is compared to that of the well known 8 and 7-point methods and a 6-point scheme. The algorithm is used in a robust hypothesize-and-test framework to estimate structure and motion in real-time with low delay. The real-time system uses solely visual input and has been demonstrated at major conferences.
Recognising Panoramas
, 2003
"... The problem considered in this paper is the fully automatic construction of panoramas. Fundamentally, this problem requires recognition, as we need to know which parts of the panorama join up. Previous approaches have used human input or restrictions on the image sequence for the matching step. In t ..."
Abstract
-
Cited by 293 (2 self)
- Add to MetaCart
The problem considered in this paper is the fully automatic construction of panoramas. Fundamentally, this problem requires recognition, as we need to know which parts of the panorama join up. Previous approaches have used human input or restrictions on the image sequence for the matching step. In this work we use object recognition techniques based on invariant local features to select matching images, and a probabilistic model for verification. Because of this our method is insensitive to the ordering, orientation, scale and illumination of the images. It is also insensitive to `noise' images which are not part of the panorama at all, that is, it recognises panoramas. This suggests a useful application for photographers: the system takes as input the images on an entire flash card or film, recognises images that form part of a panorama, and stitches them with no user input whatsoever.
Building rome in a day.
- In Proc. Int. Conf. on Computer Vision.
, 2009
"... We present a system that can reconstruct 3D geometry from large, unorganized collections of photographs such as those found by searching for a given city (e.g., Rome) on Internet photo-sharing sites. Our system is built on a set of new, distributed computer vision algorithms for image matching and ..."
Abstract
-
Cited by 285 (30 self)
- Add to MetaCart
(Show Context)
We present a system that can reconstruct 3D geometry from large, unorganized collections of photographs such as those found by searching for a given city (e.g., Rome) on Internet photo-sharing sites. Our system is built on a set of new, distributed computer vision algorithms for image matching and 3D reconstruction, designed to maximize parallelism at each stage of the pipeline and to scale gracefully with both the size of the problem and the amount of available computation. Our experimental results demonstrate that it is now possible to reconstruct city-scale image collections with more than a hundred thousand images in less than a day. intRoDuction Amateur photography was once largely a personal endeavor. Traditionally, a photographer would capture a moment on film and share it with a small number of friends and family members, perhaps storing a few hundred of them in a shoebox. The advent of digital photography, and the recent growth of photo-sharing Web sites such as Flickr.com, have brought about a seismic change in photography and the use of photo collections. Today, a photograph shared online can potentially be seen by millions of people. As a result, we now have access to a vast, ever-growing collection of photographs the world over capturing its cities and landmarks innumerable times. For instance, a search for the term "Rome" on Flickr returns nearly 3 million photographs. This collection represents an increasingly complete photographic record of the city, capturing every popular site, façade, interior, fountain, sculpture, painting, and café. Virtually anything that people find interesting in Rome has been captured from thousands of viewpoints and under myriad illumination and weather conditions. For example, the Trevi Fountain appears in over 50,000 of these photographs. How much of the city of Rome can be reconstructed in 3D from this photo collection? In principle, the photos of Rome on Flickr represent an ideal data set for 3D modeling research, as they capture the highlights of the city in exquisite detail and from a broad range of viewpoints. However, extracting high quality 3D models from such a collection is challenging for several reasons. First, the photos are unstructured-they are taken in no particular order and we have no control over the distribution of camera viewpoints. Second, they are uncalibrated-the photos are taken by thousands of different photographers and we know very little about the camera settings. Third, the scale of the problem is
Robust Registration of 2D and 3D Point Sets
, 2001
"... This paper introduces a new method of registering point sets. The registration error is directly minimized using general-purpose nonlinear optimization (the Levenberg-Marquardt algorithm). The surprising conclusion of the paper is that this technique is comparable in speed to the special-purpose ICP ..."
Abstract
-
Cited by 147 (0 self)
- Add to MetaCart
(Show Context)
This paper introduces a new method of registering point sets. The registration error is directly minimized using general-purpose nonlinear optimization (the Levenberg-Marquardt algorithm). The surprising conclusion of the paper is that this technique is comparable in speed to the special-purpose ICP algorithm which is most commonly used for this task. Because the routine directly minimizes an energy function, it is easy to extend it to incorporate robust estimation via a Huber kernel, yielding a basin of convergence that is many times wider than existing techniques. Finally we introduce a data structure for the minimization based on the chamfer distance transform which yields an algorithm which is both faster and more robust than previously described methods.
Square Root SAM: Simultaneous localization and mapping via square root information smoothing
, 2006
"... Solving the SLAM (simultaneous localization and mapping) problem is one way to enable a robot to explore, map, and navigate in a previously unknown environment. Smoothing approaches have been investigated as a viable alternative to extended Kalman filter (EKF)based solutions to the problem. In parti ..."
Abstract
-
Cited by 144 (38 self)
- Add to MetaCart
(Show Context)
Solving the SLAM (simultaneous localization and mapping) problem is one way to enable a robot to explore, map, and navigate in a previously unknown environment. Smoothing approaches have been investigated as a viable alternative to extended Kalman filter (EKF)based solutions to the problem. In particular, approaches have been looked at that factorize either the associated information matrix or the measurement Jacobian into square root form. Such techniques have several significant advantages over the EKF: they are faster yet exact; they can be used in either batch or incremental mode; are better equipped to deal with non-linear process and measurement models; and yield the entire robot trajectory, at lower cost for a large class of SLAM problems. In addition, in an indirect but dramatic way, column ordering heuristics automatically exploit the locality inherent in the geographic nature of the SLAM problem. This paper presents the theory underlying these methods, along with an interpretation of factorization in terms of the graphical model associated with the SLAM problem. Both simulation results and actual SLAM experiments in large-scale environments are presented that underscore the potential of these methods as an alternative to EKF-based approaches.
Simultaneous Linear Estimation of Multiple View Geometry and Lens Distortion
, 2001
"... A bugbear of uncalibrated stereo reconstruction is that cameras which deviate from the pinhole model have to be pre-calibrated in order to correct for nonlinear lens distortion. If they are not, and point correspondence is attempted using the uncorrected images, the matching constraints provided by ..."
Abstract
-
Cited by 127 (1 self)
- Add to MetaCart
(Show Context)
A bugbear of uncalibrated stereo reconstruction is that cameras which deviate from the pinhole model have to be pre-calibrated in order to correct for nonlinear lens distortion. If they are not, and point correspondence is attempted using the uncorrected images, the matching constraints provided by the fundamental matrix must be set so loose that point matching is significantly hampered. This paper shows how linear estimation of the fundamental matrix from two-view point correspondences may be augmented to include one term of radial lens distortion. This is achieved by (1) changing from the standard radiallens model to another which (as we show) has equivalent power, but which takes a simpler form in homogeneous coordinates, and (2) expressing fundamental matrix estimation as a Quadratic Eigenvalue Problem (QEP), for which efficient algorithms are well known. I derive the new estimator, and compare its performance against bundle-adjusted calibration-grid data. The new estimator is fast enough to be included in a RANSAC-based matching loop, and we show cases of matching being rendered possible by its use. I show how the same lens can be calibrated in a natural scene where the lack of straight lines precludes most previous techniques. The modification when the multi-view relation is a planar homography or trifocal tensor is described. 1.
Estimating Articulated Human Motion With Covariance Scaled Sampling
- International Journal of Robotics Research
, 2003
"... We present a method for recovering 3D human body motion from monocular video sequences based on a robust image matching metric, incorporation of joint limits and non-self-intersection constraints, and a new sample-and-refine search strategy guided by rescaled cost-function covariances. Monocular 3D ..."
Abstract
-
Cited by 124 (10 self)
- Add to MetaCart
We present a method for recovering 3D human body motion from monocular video sequences based on a robust image matching metric, incorporation of joint limits and non-self-intersection constraints, and a new sample-and-refine search strategy guided by rescaled cost-function covariances. Monocular 3D body tracking is challenging: besides the difficulty of matching an imperfect, highly flexible, self-occluding model to cluttered image features, realistic body models have at least 30 joint parameters subject to highly nonlinear physical constraints, and at least a third of these degrees of freedom are nearly unobservable in any given monocular image. For image matching we use a carefully designed robust cost metric combining robust optical flow, edge energy, and motion boundaries. The nonlinearities and matching ambiguities make the parameter-space cost surface multi-modal, ill-conditioned and highly nonlinear, so searching it is difficult. We discuss the limitations of CONDENSATION-like samplers, and describe a novel hybrid search algorithm that combines inflated-covariance-scaled sampling and robust continuous optimization subject to physical constraints and model priors. Our experiments on challenging monocular sequences show that robust cost modeling, joint and selfintersection constraints, and informed sampling are all essential for reliable monocular 3D motion estimation.
3D Object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints
- International Journal of Computer Vision
, 2006
"... Abstract. This article introduces a novel representation for three-dimensional (3D) objects in terms of local affine-invariant descriptors of their images and the spatial relationships between the corresponding surface patches. Geometric constraints associated with different views of the same patche ..."
Abstract
-
Cited by 118 (14 self)
- Add to MetaCart
(Show Context)
Abstract. This article introduces a novel representation for three-dimensional (3D) objects in terms of local affine-invariant descriptors of their images and the spatial relationships between the corresponding surface patches. Geometric constraints associated with different views of the same patches under affine projection are combined with a normalized representation of their appearance to guide matching and reconstruction, allowing the acquisition of true 3D affine and Euclidean models from multiple unregistered images, as well as their recognition in photographs taken from arbitrary viewpoints. The proposed approach does not require a separate segmentation stage, and it is applicable to highly cluttered scenes. Modeling and recognition results are presented.