Results 1  10
of
143
Pictorial Structures for Object Recognition
 IJCV
, 2003
"... In this paper we present a statistical framework for modeling the appearance of objects. Our work is motivated by the pictorial structure models introduced by Fischler and Elschlager. The basic idea is to model an object by a collection of parts arranged in a deformable configuration. The appearance ..."
Abstract

Cited by 816 (15 self)
 Add to MetaCart
(Show Context)
In this paper we present a statistical framework for modeling the appearance of objects. Our work is motivated by the pictorial structure models introduced by Fischler and Elschlager. The basic idea is to model an object by a collection of parts arranged in a deformable configuration. The appearance of each part is modeled separately, and the deformable configuration is represented by springlike connections between pairs of parts. These models allow for qualitative descriptions of visual appearance, and are suitable for generic recognition problems. We use these models to address the problem of detecting an object in an image as well as the problem of learning an object model from training examples, and present efficient algorithms for both these problems. We demonstrate the techniques by learning models that represent faces and human bodies and using the resulting models to locate the corresponding objects in novel images.
Realtime human pose recognition in parts from single depth images
 IN CVPR
, 2011
"... We propose a new method to quickly and accurately predict 3D positions of body joints from a single depth image, using no temporal information. We take an object recognition approach, designing an intermediate body parts representation that maps the difficult pose estimation problem into a simpler p ..."
Abstract

Cited by 568 (17 self)
 Add to MetaCart
(Show Context)
We propose a new method to quickly and accurately predict 3D positions of body joints from a single depth image, using no temporal information. We take an object recognition approach, designing an intermediate body parts representation that maps the difficult pose estimation problem into a simpler perpixel classification problem. Our large and highly varied training dataset allows the classifier to estimate body parts invariant to pose, body shape, clothing, etc. Finally we generate confidencescored 3D proposals of several body joints by reprojecting the classification result and finding local modes. The system runs at 200 frames per second on consumer hardware. Our evaluation shows high accuracy on both synthetic and real test sets, and investigates the effect of several training parameters. We achieve state of the art accuracy in our comparison with related work and demonstrate improved generalization over exact wholeskeleton nearest neighbor matching.
ZISSERMAN A.: Tracking people by learning their appearance.
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2007
"... ..."
Automatic annotation of everyday movements
, 2003
"... This paper describes a system that can annotate a video sequence with: a description of the appearance of each actor; when the actor is in view; and a representation of the actor’s activity while in view. The system does not require a fixed background, and is automatic. The system works by (1) track ..."
Abstract

Cited by 88 (5 self)
 Add to MetaCart
(Show Context)
This paper describes a system that can annotate a video sequence with: a description of the appearance of each actor; when the actor is in view; and a representation of the actor’s activity while in view. The system does not require a fixed background, and is automatic. The system works by (1) tracking people in 2D and then, using an annotated motion capture dataset, (2) synthesizing an annotated 3D motion sequence matching the 2D tracks. The motion capture data is manually annotated using a class structure that describes everyday motions and allows motion annotations to be composed — one may jump while running, for example. Descriptions computed from video of real motions show that the method is accurate. 1.
Guiding Model Search Using Segmentation
, 2005
"... ... paradigm can be used to improve the efficiency and accuracy of model search in an image. We operationalize this idea using an oversegmentation of an image into superpixels. The problem domain we explore is human body pose estimation from still images. The superpixels prove useful in two ways. F ..."
Abstract

Cited by 84 (0 self)
 Add to MetaCart
... paradigm can be used to improve the efficiency and accuracy of model search in an image. We operationalize this idea using an oversegmentation of an image into superpixels. The problem domain we explore is human body pose estimation from still images. The superpixels prove useful in two ways. First, we restrict the joint positions in our human body model to lie at centers of superpixels, which reduces the size of the model search space. In addition, accurate support masks for computing features on halflimbs of the body model are obtained by using agglomerations of superpixels as halflimb segments. We present results on a challenging dataset of people in sports news images.
Measure locally, reason globally: Occlusionsensitive articulated pose estimation
 In CVPR 2006
, 2006
"... Partbased treestructured models have been widely used for 2D articulated human poseestimation. These approaches admit efficient inference algorithms while capturing the important kinematic constraints of the human body as a graphical model. These methods often fail however when multiple body part ..."
Abstract

Cited by 83 (3 self)
 Add to MetaCart
Partbased treestructured models have been widely used for 2D articulated human poseestimation. These approaches admit efficient inference algorithms while capturing the important kinematic constraints of the human body as a graphical model. These methods often fail however when multiple body parts fit the same image region resulting in global pose estimates that poorly explain the overall image evidence. Attempts to solve this problem have focused on the use of strong prior models that are limited to learned activities such as walking. We argue that the problem actually lies with the image observations and not with the prior. In particular, image evidence for each body part is estimated independently of other parts without regard to selfocclusion. To address this we introduce occlusionsensitive local likelihoods that approximate the global image likelihood using perpixel hidden binary variables that encode the occlusion relationships between parts. This occlusion reasoning introduces interactions between nonadjacent body parts creating loops in the underlying graphical model. We deal with this using an extension of an approximate belief propagation algorithm (PAMPAS). The algorithm recovers the realvalued 2D pose of the body in the presence of occlusions, does not require strong priors over body pose and does a quantitatively better job of explaining image evidence than previous methods. 1.
Action Recognition from a Distributed Representation of Pose and Appearance
"... We present a distributed representation of pose and appearance of people called the “poselet activation vector”. First we show that this representation can be used to estimate the pose of people defined by the 3D orientations of the head and torso in the challenging PASCAL VOC 2010 person detection ..."
Abstract

Cited by 79 (5 self)
 Add to MetaCart
(Show Context)
We present a distributed representation of pose and appearance of people called the “poselet activation vector”. First we show that this representation can be used to estimate the pose of people defined by the 3D orientations of the head and torso in the challenging PASCAL VOC 2010 person detection dataset. Our method is robust to clutter, aspect and viewpoint variation and works even when body parts like faces and limbs are occluded or hard to localize. We combine this representation with other sources of information like interaction with objects and other people in the image and use it for action recognition. We report competitive results on the PASCAL VOC 2010 static image action classification challenge. 1.
Visual hand tracking using nonparametric belief propagation
 Propagation,” IEEE Workshop on Generative Model Based Vision
, 2004
"... Abstract — This paper develops probabilistic methods for visual tracking of a threedimensional geometric hand model from monocular image sequences. We consider a redundant representation in which each model component is described by its position and orientation in the world coordinate frame. A prio ..."
Abstract

Cited by 71 (1 self)
 Add to MetaCart
(Show Context)
Abstract — This paper develops probabilistic methods for visual tracking of a threedimensional geometric hand model from monocular image sequences. We consider a redundant representation in which each model component is described by its position and orientation in the world coordinate frame. A prior model is then defined which enforces the kinematic constraints implied by the model’s joints. We show that this prior has a local structure, and is in fact a pairwise Markov random field. Furthermore, our redundant representation allows color and edgebased likelihood measures, such as the Chamfer distance, to be similarly decomposed in cases where there is no self–occlusion. Given this graphical model of hand kinematics, we may track the hand’s motion using the recently proposed nonparametric belief propagation (NBP) algorithm. Like particle filters, NBP approximates the posterior distribution over hand configurations as a collection of samples. However, NBP uses the graphical structure to greatly reduce the dimensionality of these distributions, providing improved robustness. Several methods are used to improve NBP’s computational efficiency, including a novel KDtree based method for fast Chamfer distance evaluation. We provide simulations showing that NBP may be used to refine inaccurate model initializations, as well as track hand motion through extended image sequences. I.
Distributed occlusion reasoning for tracking with nonparametric belief propagation
 In NIPS
, 2004
"... We describe a three–dimensional geometric hand model suitable for visual tracking applications. The kinematic constraints implied by the model’s joints have a probabilistic structure which is well described by a graphical model. Inference in this model is complicated by the hand’s many degrees of fr ..."
Abstract

Cited by 60 (0 self)
 Add to MetaCart
(Show Context)
We describe a three–dimensional geometric hand model suitable for visual tracking applications. The kinematic constraints implied by the model’s joints have a probabilistic structure which is well described by a graphical model. Inference in this model is complicated by the hand’s many degrees of freedom, as well as multimodal likelihoods caused by ambiguous image measurements. We use nonparametric belief propagation (NBP) to develop a tracking algorithm which exploits the graph’s structure to control complexity, while avoiding costly discretization. While kinematic constraints naturally have a local structure, self– occlusions created by the imaging process lead to complex interpendencies in color and edge–based likelihood functions. However, we show that local structure may be recovered by introducing binary hidden variables describing the occlusion state of each pixel. We augment the NBP algorithm to infer these occlusion variables in a distributed fashion, and then analytically marginalize over them to produce hand position estimates which properly account for occlusion events. We provide simulations showing that NBP may be used to refine inaccurate model initializations, as well as track hand motion through extended image sequences. 1
Attractive people: Assembling looselimbed models using nonparametric belief propagation
 In NIPS. 2004
"... The detection and pose estimation of people in images and video is made challenging by the variability of human appearance and the high dimensionality of articulated body models. To cope with these problems we exploit rich image likelihood models and represent the 3D human body using a graphical mod ..."
Abstract

Cited by 60 (2 self)
 Add to MetaCart
(Show Context)
The detection and pose estimation of people in images and video is made challenging by the variability of human appearance and the high dimensionality of articulated body models. To cope with these problems we exploit rich image likelihood models and represent the 3D human body using a graphical model in which the relationships between the body parts are represented by conditional probability distributions. We formulate the pose estimation problem as one of probabilistic inference over a graphical model where the random variables correspond to the individual limb parameters (position and orientation). Because the limbs are described by 6dimensional vectors encoding pose in 3space, discretization is impractical and the random variables in our model must be continuousvalued. To approximate belief propagation in such a graph we exploit a recently introduced generalization of the particle filter. This framework facilitates the automatic initialization of the bodymodel from low level cues and is robust to occlusion of body parts and scene clutter. 1