Results 1 - 10
of
13
Object retrieval with large vocabularies and fast spatial matching
- In Proc. IEEE Conf. on Computer Vision and Pattern Recognition
, 2007
"... In this paper, we present a large-scale object retrieval system. The user supplies a query object by selecting a region of a query image, and the system returns a ranked list of images that contain the same object, retrieved from a large corpus. We demonstrate the scalability and performance of our ..."
Abstract
-
Cited by 139 (14 self)
- Add to MetaCart
In this paper, we present a large-scale object retrieval system. The user supplies a query object by selecting a region of a query image, and the system returns a ranked list of images that contain the same object, retrieved from a large corpus. We demonstrate the scalability and performance of our system on a dataset of over 1 million images crawled from the photo-sharing site, Flickr [3], using Oxford landmarks as queries. Building an image-feature vocabulary is a major time and performance bottleneck, due to the size of our dataset. To address this problem we compare different scalable methods for building a vocabulary and introduce a novel quantization method based on randomized trees which we show outperforms the current state-of-the-art on an extensive
An Assessment of Information Criteria for Motion Model Selection
- In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR
, 1997
"... Rigid motion imposes constraints on the motion of image points between the two images. The matched points must conform to one of several possible constraints, such as that given by the fundamental matrix or image-image homography, and it is essential to know which model to fit to the data before rec ..."
Abstract
-
Cited by 51 (8 self)
- Add to MetaCart
Rigid motion imposes constraints on the motion of image points between the two images. The matched points must conform to one of several possible constraints, such as that given by the fundamental matrix or image-image homography, and it is essential to know which model to fit to the data before recovery of structure, matching or segmentation can be performed successfully. This paper compares several model selection methods with a particular emphasis on providing a method that will work fully automatically on real imagery. 1 Introduction Robotic vision has its basis in geometric modelling of the world, and many vision algorithms attempt to estimate these geometric models from perceived data. Usually only one model is fitted to the data. But what if the data might have arisen from one of several possible models? In this case the fitting procedure needs to fit all the potential models and select which of these fits the data best. This is the task of robust model selection which, in spi...
Audio-Visual Speaker Localization Using Graphical Models
"... In this work we propose an approach to combine audio and video modalities for person tracking using graphical models. We demonstrate a principled and intuitive framework for combining these modalities to obtain robustness against occlusion and change in appearance. We further exploit the temporal co ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
In this work we propose an approach to combine audio and video modalities for person tracking using graphical models. We demonstrate a principled and intuitive framework for combining these modalities to obtain robustness against occlusion and change in appearance. We further exploit the temporal correlations that exist for a moving object between adjacent frames to account for the cases where having both modalities might still not be enough, e.g., when the person being tracked is occluded and not speaking. Improvement in tracking results is shown at each step and compared with manually annotated ground truth. 1
Optimizing Polynomial Solvers for Minimal Geometry Problems
"... In recent years polynomial solvers based on algebraic geometry techniques, and specifically the action matrix method, have become popular for solving minimal problems in computer vision. In this paper we develop a new method for reducing the computational time and improving numerical stability of al ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
In recent years polynomial solvers based on algebraic geometry techniques, and specifically the action matrix method, have become popular for solving minimal problems in computer vision. In this paper we develop a new method for reducing the computational time and improving numerical stability of algorithms using this method. To achieve this, we propose and prove a set of algebraic conditions which allow us to reduce the size of the elimination template (polynomial coefficient matrix), which leads to faster LU or QR decomposition. Our technique is generic and has potential to improve performance of many solvers that use the action matrix method. We demonstrate the approach on specific examples, including an image stitching algorithm where computation time is halved and single precision arithmetic can be used. 1.
K.: Automatic alignment of a camera with a line scan lidar system
- In: Proc. IEEE Int. Conf. Robot. Autom
, 2011
"... Abstract — We propose a new method for extrinsic calibration of a line-scan LIDAR with a perspective projection camera. Our method is a closed-form, minimal solution to the problem. The solution is a symbolic template found via variable elimination and the multi-polynomial Macaulay resultant. It doe ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract — We propose a new method for extrinsic calibration of a line-scan LIDAR with a perspective projection camera. Our method is a closed-form, minimal solution to the problem. The solution is a symbolic template found via variable elimination and the multi-polynomial Macaulay resultant. It does not require initialization, and can be used in an automatic calibration setting when paired with RANSAC and least-squares refinement. We show the efficacy of our approach through a set of simulations and a real calibration. I.
Structure from Motion with Directional Correspondence for Visual Odometry
"... This report presents two efficient solutions to the two-view, relative pose problem from three image point correspondences and one common reference direction. This three-plus-one problem can be used either as a substitute for the classic five-point algorithm using a vanishing point for the reference ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This report presents two efficient solutions to the two-view, relative pose problem from three image point correspondences and one common reference direction. This three-plus-one problem can be used either as a substitute for the classic five-point algorithm using a vanishing point for the reference direction, or to make use of an inertial measurement unit commonly available on robots and mobile devices, where the gravity vector becomes the reference direction. We provide a simple closed-form solution and a solution based on techniques from algebraic geometry and investigate numerical and computational advantages of each approach. In a set of real experiments, we demonstrate the power of our approach by comparing it to the five-point method in a hypothesize-and-test visual odometry setting.
content and type selection from always-on wearable video
- 17 th Int. Conf. on Pattern Recognition (ICPR
"... A system is described for summarizing head-mounted or hand-carried “always-on ” video. The example used is a tourist walking around a historic city with friends and family. The summary consists of a mixture of stills, panoramas and video clips. The system identifies both the scenes to appear in the ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
A system is described for summarizing head-mounted or hand-carried “always-on ” video. The example used is a tourist walking around a historic city with friends and family. The summary consists of a mixture of stills, panoramas and video clips. The system identifies both the scenes to appear in the summary and the media type used to represent them. As there are few shot boundaries in this class of video, the decisions are based on the system’s classification of the user’s behaviour demonstrated by the motion of the camera, and motion in the scene. 1.
IBM Research TRECVID-2010 Video Copy Detection and Multimedia Event Detection System
"... In this paper, we describe the system jointly developed by IBM Research and Columbia University for video copy detection and multimedia event detection applied to the TRECVID-2010 video retrieval benchmark. A. Content-Based Copy Detection: The focus of our copy detection system this year was fusing ..."
Abstract
- Add to MetaCart
In this paper, we describe the system jointly developed by IBM Research and Columbia University for video copy detection and multimedia event detection applied to the TRECVID-2010 video retrieval benchmark. A. Content-Based Copy Detection: The focus of our copy detection system this year was fusing three types of complementary fingerprints: a keyframe-based color correlogram, SIFTogram (bag of visual words), and a GIST-based fingerprint. However, in our official submissions, we did not use the color correlogram component since our best results on the training set came from the GIST and SIFTogram components. A summary of our runs is listed below: 1. IBM.m.nofa.gistG: A run based on the grayscale GIST frame-level feature, with at most 1 result per query, except in the case of ties. 2. IBM.m.balanced.gistG: As in the above run, but with including more results per query, though on average still less than 2. 3. IBM.m.nofa.gistGC: The result of the nofa.gistG run, fused with results from GIST features extracted from the R,G,B color channels. 4. IBM.m.nofa.gistGCsift: The result of the nofa.gistGC run, fused with a SIFTogram result. Overall, the grayscale GIST approach performed best. We found it produced excellent results when tested on the
Image Processing Onboard Spacecraft for Autonomous Plume Detection ✩
"... Previous missions have imaged active plumes at Io and Enceladus, as well as outgassing by cometary nuclei. It is often difficult to predict where and when these transient events will occur, so characterizing them requires collecting long image sequences with many redundant frames. This demands a pro ..."
Abstract
- Add to MetaCart
Previous missions have imaged active plumes at Io and Enceladus, as well as outgassing by cometary nuclei. It is often difficult to predict where and when these transient events will occur, so characterizing them requires collecting long image sequences with many redundant frames. This demands a prohibitive fraction of the spacecraft’s limited cache and bandwidth, and precludes sustained surveys of plume activity. Onboard processing could enable long-term plume monitoring campaigns with high imaging rates. Specifically, spacecraft can analyze image sequences onboard to identify plumes, with events triggering preferential storage, prioritized transmission, or followup with coincident observations by Thermal or Visible Near-Infrared imagers. We propose a detection method based on horizon identification with Random Sample Consensus (RANSAC). The approach evidences reliable performance on a test set of plume images from Enceladus and Io. Keywords:
FUSION OF OPTICAL AND TERRESTRIAL LASER SCANNER DATA
"... Optical imagery and range data can be registered to create photo-realistic scene models via texture mapping. Presented in this paper is an alternative approach where true colour (RBG) point clouds are generated by automatically fusing a close-range optical (RGB) image acquired with an uncalibrated d ..."
Abstract
- Add to MetaCart
Optical imagery and range data can be registered to create photo-realistic scene models via texture mapping. Presented in this paper is an alternative approach where true colour (RBG) point clouds are generated by automatically fusing a close-range optical (RGB) image acquired with an uncalibrated digital camera with the corresponding high-density 3D lidar point cloud collected with a terrestrial laser scanner (TLS). The alignment of optical pixel colour values and lidar point cloud is obtained by estimating the position and orientation of the camera with respect to the lidar point cloud reference system. To perform this sensor co-registration, an automated corner feature extraction algorithm, followed by area-based image matching is applied between the optical data and the lidar intensity image to establish point correspondence. The matching process is solely based on point matches and does not use external control or calibration patterns. The 3D lidar points of the corresponding lidar intensity image corner points are then extracted from the point cloud. As these 3D lidar points correspond to the extracted optical image corner points, a bundle selfcalibration adjustment with additional parameters is applied using the extended collinearity equations to estimate the interior and exterior orientation of the camera. The RANSAC robust estimator is used to reduce the influence of outliers in the estimation of the camera parameters. Having established the mathematical relationship between image space and lidar points a photo-realistic 3D model is generated. Through reverse mapping, each point in the lidar point cloud is assigned the RGB value of the image pixel upon which it is projected. Experiments are performed observing typical urban scenes, particularly building facades. The feasibility and potential of estimating the co-registration parameters using a TLS is evaluated in terms of accuracy of the results. The true calibration

