• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Multiple view feature descriptors from image sequences via kernel principal component analysis (2004)

by J Meltzer, M Yang, R Gupta, S Soatto
Venue:In Proc. of ECCV
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 10

Keypoint recognition using randomized trees

by Vincent Lepetit - IEEE Trans. Pattern Anal. Mach. Intell
"... In many 3–D object-detection and pose-estimation problems, run-time performance is of critical importance. However, there usually is time to train the system, which we will show to be very useful. Assuming that several registered images of the target object are available, we developed a keypoint-bas ..."
Abstract - Cited by 87 (15 self) - Add to MetaCart
In many 3–D object-detection and pose-estimation problems, run-time performance is of critical importance. However, there usually is time to train the system, which we will show to be very useful. Assuming that several registered images of the target object are available, we developed a keypoint-based approach that is effective in this context by formulating wide-baseline matching of keypoints extracted from the input images to those found in the model images as a classification problem. This shifts much of the computational burden to a training phase, without sacrificing recognition performance. As a result, the resulting algorithm is robust, accurate, and fast-enough for frame-rate performance. This reduction in run-time computational complexity is our first contribution. Our second contribution is to show that, in this context, a simple and fast keypoint detector suffices to support detection and tracking even under large perspective and scale variations. While earlier methods require a detector that can be expected to produce very repeatable results in general, which usually is very time-consuming, we simply find the most repeatable object keypoints for the specific target object during the training phase. We have incorporated these ideas into a real-time system that detects planar, non-planar, and deformable objects. It then estimates the pose of the rigid ones and the deformations of the others.

Randomized trees for real-time keypoint recognition

by Vincent Lepetit, Pascal Lagger, Pascal Fua - In CVPR , 2005
"... In earlier work, we proposed treating wide baseline matching of feature points as a classification problem, in which each class corresponds to the set of all possible views of such a point. We used a K-mean plus Nearest Neighbor classifier to validate our approach, mostly because it was simple to im ..."
Abstract - Cited by 75 (4 self) - Add to MetaCart
In earlier work, we proposed treating wide baseline matching of feature points as a classification problem, in which each class corresponds to the set of all possible views of such a point. We used a K-mean plus Nearest Neighbor classifier to validate our approach, mostly because it was simple to implement. It has proved effective but still too slow for real-time use. In this paper, we advocate instead the use of randomized trees as the classification technique. It is both fast enough for real-time performance and more robust. It also gives us a principled way not only to match keypoints but to select during a training phase those that are the most recognizable ones. This results in a real-time system able to detect and position in 3D planar, non-planar, and even deformable objects. It is robust to illuminations changes, scale changes and occlusions. 1.

Feature Harvesting for Tracking-by-Detection

by Mustafa Ozuysal, Vincent Lepetit, François Fleuret, Pascal Fua - IN EUROPEAN CONFERENCE ON COMPUTER VISION , 2006
"... We propose a fast approach to 3--D object detection and pose estimation that owes its robustness to a training phase during which the target object slowly moves with respect to the camera. No additional information is provided to the system, save a very rough initialization in the first frame of ..."
Abstract - Cited by 16 (1 self) - Add to MetaCart
We propose a fast approach to 3--D object detection and pose estimation that owes its robustness to a training phase during which the target object slowly moves with respect to the camera. No additional information is provided to the system, save a very rough initialization in the first frame of the training sequence. It can be used to detect the target object in each video frame independently.

Incremental Kernel PCA for Efficient Non-linear Feature Extraction

by Tat-jun Chin, David Suter - In BMVC , 2006
"... The Kernel Principal Component Analysis (KPCA) has been effectively applied as an unsupervised non-linear feature extractor in many machine learning applications. However, with a time complexity of O(n 3), the practicality of KPCA on large datasets is minimal. In this paper, we propose an approximat ..."
Abstract - Cited by 6 (0 self) - Add to MetaCart
The Kernel Principal Component Analysis (KPCA) has been effectively applied as an unsupervised non-linear feature extractor in many machine learning applications. However, with a time complexity of O(n 3), the practicality of KPCA on large datasets is minimal. In this paper, we propose an approximate incremental KPCA algorithm which allows efficient processing of large datasets. We extend a linear PCA updating algorithm to the non-linear case by utilizing the kernel trick, and apply a reduced set construction method to compress expressions for the derived KPCA basis at each update. In addition, we show how multiple feature space vectors can be compressed efficiently, and how approximated KPCA bases can be re-orthogonalized using the kernel trick. The proposed method is justified through experimental validations. 1

Fusion of 3D and appearance models for fast object detection and pose estimation

by Hesam Najafi, Yakup Genc, Nassir Navab - In ACCV , 2006
"... Abstract. Real-time estimation of a camera’s pose relative to an object is still an open problem. The difficulty stems from the need for fast and robust detection of known objects in the scene given their 3D models, or a set of 2D images or both. This paper proposes a method that conducts a statisti ..."
Abstract - Cited by 5 (2 self) - Add to MetaCart
Abstract. Real-time estimation of a camera’s pose relative to an object is still an open problem. The difficulty stems from the need for fast and robust detection of known objects in the scene given their 3D models, or a set of 2D images or both. This paper proposes a method that conducts a statistical analysis of the appearance of model patches from all possible viewpoints in the scene and incorporates the 3D geometry during both matching and the pose estimation processes. Thereby the appearance information from the 3D model and real images are combined with synthesized images in order to learn the variations in the multiple view feature descriptors using PCA. Furthermore, by analyzing the computed visibility distribution of each patch from different viewpoints, a reliability measure for each patch is estimated. This reliability measure is used to further constrain the classification problem. This results in a more scalable representation reducing the effect of the complexity of the 3D model on the run-time matching performance. Moreover, as required in many real-time applications this approach can yield a reliability measure for the estimated pose. Experimental results show how the pose of complex objects can be estimated efficiently from a single test image. 1

Simultaneous localization and mapping using multiple view feature descriptors

by Jason Meltzer - in Proc. of IEEE/RSJ Int. Conf. on Intelligent Robots and Systems , 2004
"... Abstract — We propose a vision-based SLAM algorithm incorporating feature descriptors derived from multiple views of a scene, incorporating illumination and viewpoint variations. These descriptors are extracted from video and then applied to the challenging task of wide baseline matching across sign ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
Abstract — We propose a vision-based SLAM algorithm incorporating feature descriptors derived from multiple views of a scene, incorporating illumination and viewpoint variations. These descriptors are extracted from video and then applied to the challenging task of wide baseline matching across significant viewpoint changes. The system incorporates a single camera on a mobile robot in an extended Kalman filter framework to develop a 3D map of the environment and determine egomotion. At the same time, the feature descriptors are generated from the video sequence, which can be used to localize the robot when it returns to a mapped location. The kidnapped robot problem is addressed by matching descriptors without any estimate of position, then determining the epipolar geometry with respect to a known position in the map. I.

Categorization in Natural Time-Varying Image Sequences

by Teresa Ko, Stefano Soatto, Deborah Estrin
"... Approaches to single image categorization do not easily generalize to natural time-varying image sequences. In natural environments, object categories tend to have few features that help to distinguish between each other and the surrounding environment. To better discriminate between categories and ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Approaches to single image categorization do not easily generalize to natural time-varying image sequences. In natural environments, object categories tend to have few features that help to distinguish between each other and the surrounding environment. To better discriminate between categories and the surrounding environment, we propose a multi-view categorization approach that exploits the statistics of image sequences rather than single images. The approach is unbiased towards redundant views – that is, it does not matter how many times an object appears from the same viewpoint. At the same time, the approach does not penalize for missing views, so that we do not have to capture an object at all viewpoints to successfully categorize the object. We first present a data set for studying natural environment monitoring: an image sequence of birds at a feeder station. After manual localization, a baseline bag of features approach was found to perform significantly worse on the proposed data set compared to the standard Caltech 101 data set. We find that our approach increases the categorization accuracy from 48 % to 58 % on average when compared to an equivalent single view categorization method. Finally, we show how the same metric proposed for the supervised categorization can be used to transform, in an unsupervised manner, an image sequence into a manageable set of categories. 1.

Incremental Kernel SVD for Face Recognition with Image Sets ∗

by unknown authors
"... Non-linear subspaces derived using kernel methods have been found to be superior compared to linear subspaces in modeling or classification tasks of several visual phenomena. Such kernel methods include Kernel PCA, Kernel DA, Kernel SVD and Kernel QR. Since incremental computation algorithms for the ..."
Abstract - Add to MetaCart
Non-linear subspaces derived using kernel methods have been found to be superior compared to linear subspaces in modeling or classification tasks of several visual phenomena. Such kernel methods include Kernel PCA, Kernel DA, Kernel SVD and Kernel QR. Since incremental computation algorithms for these methods do not exist yet, the practicality of these methods on large datasets or online video processing is minimal. We propose an approximate incremental Kernel SVD algorithm for computer vision applications that require estimation of non-linear subspaces, specifically face recognition by matching image sets obtained through long-term observations or video recordings. We extend a well-known linear subspace updating algorithm to the nonlinear case by utilizing the kernel trick, and apply a reduced set construction method to produce sparse expressions for the derived subspace basis so as to maintain constant processing speed and memory usage. Experimental results demonstrate the effectiveness of the proposed method. 1.

Bag-of-Features Kernel Eigen Spaces for Classification

by Gaurav Sharma, Santanu Chaudhury, J. B. Srivastava
"... We present a classifier unifying local features based representation and subspace based learning. We also propose a novel method to merge kernel eigen spaces (KES) in feature space. Subspace methods have traditionally been used with the full appearance of the image. Recently local features based bag ..."
Abstract - Add to MetaCart
We present a classifier unifying local features based representation and subspace based learning. We also propose a novel method to merge kernel eigen spaces (KES) in feature space. Subspace methods have traditionally been used with the full appearance of the image. Recently local features based bag-of-features (BoF) representation has performed impressively on classification tasks. We use KES with BoF vectors to construct class specific subspaces and use the distance of a query vector from the database KESs as the classification criteria. The use of local features makes our approach invariant to illumination, rotation, scale, small affine transformation and partial occlusions. The system allows hierarchy by merging the KES in the feature space. The classifier performs competitively on the challenging Caltech-101 dataset under normal and simulated occlusion conditions. We show hierarchy on a dataset of videos collected over the internet. 1

Learning and Matching Multiscale Template Descriptors for Real-Time Detection, Localization and Tracking

by Taehee Lee, Stefano Soatto
"... We describe a system to learn an object template from a video stream, and localize and track the corresponding object in live video. The template is decomposed into a number of local descriptors, thus enabling detection and tracking in spite of partial occlusion. Each local descriptor aggregates con ..."
Abstract - Add to MetaCart
We describe a system to learn an object template from a video stream, and localize and track the corresponding object in live video. The template is decomposed into a number of local descriptors, thus enabling detection and tracking in spite of partial occlusion. Each local descriptor aggregates contrast invariant statistics (normalized intensity and gradient orientation) across scales, in a way that enables matching under significant scale variations. Lowlevel tracking during the training video sequence enables capturing object-specific variability due to the shape of the object, which is encapsulated in the descriptor. Salient locations on both the template and the target image are used as hypotheses to expedite matching. 1.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University