Results 1 - 10
of
575
Histograms of Oriented Gradients for Human Detection
- In CVPR
, 2005
"... We study the question of feature sets for robust visual object recognition, adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of Histograms of Oriented Gradient (HOG) descriptors significantly out ..."
Abstract
-
Cited by 3735 (9 self)
- Add to MetaCart
(Show Context)
We study the question of feature sets for robust visual object recognition, adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of Histograms of Oriented Gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds. 1
Object Tracking: A Survey
, 2006
"... The goal of this article is to review the state-of-the-art tracking methods, classify them into different categories, and identify new trends. Object tracking, in general, is a challenging problem. Difficulties in tracking objects can arise due to abrupt object motion, changing appearance patterns o ..."
Abstract
-
Cited by 701 (7 self)
- Add to MetaCart
The goal of this article is to review the state-of-the-art tracking methods, classify them into different categories, and identify new trends. Object tracking, in general, is a challenging problem. Difficulties in tracking objects can arise due to abrupt object motion, changing appearance patterns of both the object and the scene, nonrigid object structures, object-to-object and object-to-scene occlusions, and camera motion. Tracking is usually performed in the context of higher-level applications that require the location and/or shape of the object in every frame. Typically, assumptions are made to constrain the tracking problem in the context of a particular application. In this survey, we categorize the tracking methods on the basis of the object and motion representations used, provide detailed descriptions of representative methods in each category, and examine their pros and cons. Moreover, we discuss the important issues related to tracking including the use of appropriate image features, selection of motion models, and detection of objects.
One-shot learning of object categories
- IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2006
"... Learning visual models of object categories notoriously requires hundreds or thousands of training examples. We show that it is possible to learn much information about a category from just one, or a handful, of images. The key insight is that, rather than learning from scratch, one can take advant ..."
Abstract
-
Cited by 364 (20 self)
- Add to MetaCart
(Show Context)
Learning visual models of object categories notoriously requires hundreds or thousands of training examples. We show that it is possible to learn much information about a category from just one, or a handful, of images. The key insight is that, rather than learning from scratch, one can take advantage of knowledge coming from previously learned categories, no matter how different these categories might be. We explore a Bayesian implementation of this idea. Object categories are represented by probabilistic models. Prior knowledge is represented as a probability density function on the parameters of these models. The posterior model for an object category is obtained by updating the prior in the light of one or more observations. We test a simple implementation of our algorithm on a database of 101 diverse object categories. We compare category models learned by an implementation of our Bayesian approach to models learned from by Maximum Likelihood (ML) and Maximum A Posteriori (MAP) methods. We find that on a database of more than 100 categories, the Bayesian approach produces informative models when the number of training examples is too small for other methods to operate successfully.
Human detection using oriented histograms of flow and appearance
- In ECCV
, 2006
"... Abstract. Detecting humans in films and videos is a challenging problem owing to the motion of the subjects, the camera and the background and to variations in pose, appearance, clothing, illumination and background clutter. We develop a detector for standing and moving people in videos with possibl ..."
Abstract
-
Cited by 283 (20 self)
- Add to MetaCart
(Show Context)
Abstract. Detecting humans in films and videos is a challenging problem owing to the motion of the subjects, the camera and the background and to variations in pose, appearance, clothing, illumination and background clutter. We develop a detector for standing and moving people in videos with possibly moving cameras and backgrounds, testing several different motion coding schemes and showing empirically that orientated histograms of differential optical flow give the best overall performance. These motion-based descriptors are combined with our Histogram of Oriented Gradient appearance descriptors. The resulting detector is tested on several databases including a challenging test set taken from feature films and containing wide ranges of pose, motion and background variations, including moving cameras and backgrounds. We validate our results on two challenging test sets containing more than 4400 human examples. The combined detector reduces the false alarm rate by a factor of 10 relative to the best appearance-based detector, for example giving false alarm rates of 1 per 20,000 windows tested at 8 % miss rate on our Test Set 1. 1
Geometric context from a single image.
- In Proc. Int. Conf. on Computer Vision.
, 2005
"... ..."
(Show Context)
Recovering human body configurations: Combining segmentation and recognition
- In CVPR
, 2004
"... localized joints and limbs. (c) Segmentation mask associated with human figure. The goal of this work is to take an image such as the one in Figure 1(a), detect a human figure, and localize his joints and limbs (b) along with their associated pixel masks (c). In this work we attempt to tackle this p ..."
Abstract
-
Cited by 215 (8 self)
- Add to MetaCart
(Show Context)
localized joints and limbs. (c) Segmentation mask associated with human figure. The goal of this work is to take an image such as the one in Figure 1(a), detect a human figure, and localize his joints and limbs (b) along with their associated pixel masks (c). In this work we attempt to tackle this problem in a general setting. The dataset we use is a collection of sports news photographs of baseball players, varying dramatically in pose and clothing. The approach that we take is to use segmentation to guide our recognition algorithm to salient bits of the image. We use this segmentation approach to build limb and torso detectors, the outputs of which are assembled into human figures. We present quantitative results on torso localization, in addition to shortlisted full body configurations. 1.
Pictorial structures revisited: People detection and articulated pose estimation
- In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009
, 2009
"... Non-rigid object detection and articulated pose estimation are two related and challenging problems in computer vision. Numerous models have been proposed over the years and often address different special cases, such as pedestrian detection or upper body pose estimation in TV footage. This paper sh ..."
Abstract
-
Cited by 211 (17 self)
- Add to MetaCart
(Show Context)
Non-rigid object detection and articulated pose estimation are two related and challenging problems in computer vision. Numerous models have been proposed over the years and often address different special cases, such as pedestrian detection or upper body pose estimation in TV footage. This paper shows that such specialization may not be necessary, and proposes a generic approach based on the pictorial structures framework. We show that the right selection of components for both appearance and spatial modeling is crucial for general applicability and overall performance of the model. The appearance of body parts is modeled using densely sampled shape context descriptors and discriminatively trained AdaBoost classifiers. Furthermore, we interpret the normalized margin of each classifier as likelihood in a generative model. Non-Gaussian relationships between parts are represented as Gaussians in the coordinate system of the joint between parts. The marginal posterior of each part is inferred using belief propagation. We demonstrate that such a model is equally suitable for both detection and pose estimation tasks, outperforming the state of the art on three recently proposed datasets. 1. Introduction and Related
Pedestrian Detection: An Evaluation of the State of the Art
- SUBMISSION TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1
"... Pedestrian detection is a key problem in computer vision, with several applications that have the potential to positively impact quality of life. In recent years, the number of approaches to detecting pedestrians in monocular images has grown steadily. However, multiple datasets and widely varying e ..."
Abstract
-
Cited by 174 (10 self)
- Add to MetaCart
Pedestrian detection is a key problem in computer vision, with several applications that have the potential to positively impact quality of life. In recent years, the number of approaches to detecting pedestrians in monocular images has grown steadily. However, multiple datasets and widely varying evaluation protocols are used, making direct comparisons difficult. To address these shortcomings, we perform an extensive evaluation of the state of the art in a unified framework. We make three primary contributions: (1) we put together a large, well-annotated and realistic monocular pedestrian detection dataset and study the statistics of the size, position and occlusion patterns of pedestrians in urban scenes, (2) we propose a refined per-frame evaluation methodology that allows us to carry out probing and informative comparisons, including measuring performance in relation to scale and occlusion, and (3) we evaluate the performance of sixteen pre-trained state-of-the-art detectors across six datasets. Our study allows us to assess the state of the art and provides a framework for gauging future efforts. Our experiments show that despite significant progress, performance still has much room for improvement. In particular, detection is disappointing at low resolutions and for partially occluded pedestrians.
Monocular Pedestrian Detection: Survey and Experiments
, 2008
"... Pedestrian detection is a rapidly evolving area in computer vision with key applications in intelligent vehicles, surveillance and advanced robotics. The objective of this paper is to provide an overview of the current state of the art from both methodological and experimental perspective. The first ..."
Abstract
-
Cited by 153 (13 self)
- Add to MetaCart
Pedestrian detection is a rapidly evolving area in computer vision with key applications in intelligent vehicles, surveillance and advanced robotics. The objective of this paper is to provide an overview of the current state of the art from both methodological and experimental perspective. The first part of the paper consists of a survey. We cover the main components of a pedestrian detection system and the underlying models. The second (and larger) part of the paper contains a corresponding experimental study. We consider a diverse set of state-of-the-art systems: wavelet-based AdaBoost cascade [74], HOG/linSVM [11], NN/LRF [75] and combined shape-texture detection [23]. Experiments are performed on an extensive dataset captured on-board a vehicle driving through urban environment. The dataset includes many thousands of training samples as well as a 27 minute test sequence involving more than 20000 images with annotated pedestrian locations. We consider a generic evaluation setting and one specific to pedestrian detection on-board a vehicle. Results indicate a clear advantage of HOG/linSVM at higher image resolutions and lower processing speeds, and a superiority of the wavelet-based AdaBoost cascade approach at lower image resolutions and (near) real-time processing speeds. The dataset (8.5GB) is made public for benchmarking purposes.