Results 1 - 10
of
36
1 Articulated Human Detection with Flexible Mixtures-of-Parts
"... Abstract—We describe a method for articulated human detection and human pose estimation in static images based on a new representation of deformable part models. Rather than modeling articulation using a family of warped (rotated and foreshortened) templates, we use a mixture of small, non-oriented ..."
Abstract
-
Cited by 64 (2 self)
- Add to MetaCart
(Show Context)
Abstract—We describe a method for articulated human detection and human pose estimation in static images based on a new representation of deformable part models. Rather than modeling articulation using a family of warped (rotated and foreshortened) templates, we use a mixture of small, non-oriented parts. We describe a general, flexible mixture model that jointly captures spatial relations between part locations and co-occurrence relations between part mixtures, augmenting standard pictorial structure models that encode just spatial relations. Our models have several notable properties: (1) they efficiently model articulation by sharing computation across similar warps (2) they efficiently model an exponentially-large set of global mixtures through composition of local mixtures and (3) they capture the dependency of global geometry on local appearance (parts look different at different locations). When relations are tree-structured, our models can be efficiently optimized with dynamic programming. We learn all parameters, including local appearances, spatial relations, and co-occurrence relations (which encode local rigidity) with a structured SVM solver. Because our model is efficient enough to be used as a detector that searches over scales and image locations, we introduce novel criteria for evaluating pose estimation and human detection, both separately and jointly. We show that currently-used evaluation criteria may conflate these two issues. Most previous approaches model limbs with rigid and articulated templates that are trained independently of each other, while we present an extensive diagnostic evaluation that suggests that flexible structure and joint training are crucial for strong performance. We present experimental results on standard benchmarks that suggest our approach is the state-of-the-art system for pose estimation, improving past work on the challenging Parse and Buffy datasets, while being orders of magnitude faster.
Learning hierarchical poselets for human parsing
- In CVPR’11
"... We consider the problem of human parsing with part-based models. Most previous work in part-based models only considers rigid parts (e.g. torso, head, half limbs) guided by human anatomy. We argue that this represen-tation of parts is not necessarily appropriate for human parsing. In this paper, we ..."
Abstract
-
Cited by 51 (2 self)
- Add to MetaCart
(Show Context)
We consider the problem of human parsing with part-based models. Most previous work in part-based models only considers rigid parts (e.g. torso, head, half limbs) guided by human anatomy. We argue that this represen-tation of parts is not necessarily appropriate for human parsing. In this paper, we introduce hierarchical poselets – a new representation for human parsing. Hierarchical poselets can be rigid parts, but they can also be parts that cover large portions of human bodies (e.g. torso + left arm). In the extreme case, they can be the whole bod-ies. We develop a structured model to organize poselets in a hierarchical way and learn the model parameters in a max-margin framework. We demonstrate the superior per-formance of our proposed approach on two datasets with aggressive pose variations. 1.
Parsing human motion with stretchable models. CVPR
, 2011
"... We address the problem of articulated human pose es-timation in videos using an ensemble of tractable models with rich appearance, shape, contour and motion cues. In previous articulated pose estimation work on unconstrained videos, using temporal coupling of limb positions has made little to no dif ..."
Abstract
-
Cited by 45 (4 self)
- Add to MetaCart
(Show Context)
We address the problem of articulated human pose es-timation in videos using an ensemble of tractable models with rich appearance, shape, contour and motion cues. In previous articulated pose estimation work on unconstrained videos, using temporal coupling of limb positions has made little to no difference in performance over parsing frames individually [8, 28]. One crucial reason for this is that joint parsing of multiple articulated parts over time involves intractable inference and learning problems, and previous work has resorted to approximate inference and simplified models. We overcome these computational and modeling limitations using an ensemble of tractable submodels which couple locations of body joints within and across frames using expressive cues. Each submodel is responsible for tracking a single joint through time (e.g., left elbow) and also models the spatial arrangement of all joints in a single frame. Because of the tree structure of each submodel, we can perform efficient exact inference and use rich temporal features that depend on image appearance, e.g., color track-ing and optical flow contours. We propose and experimen-tally investigate a hierarchy of submodel combination meth-ods, and we find that a highly efficient max-marginal combi-nation method outperforms much slower (by orders of mag-nitude) approximate inference using dual decomposition. We apply our pose model on a new video dataset of highly varied and articulated poses from TV shows. We show significant quantitative and qualitative improvements over state-of-the-art single-frame pose estimation approaches. 1.
Learning effective human pose estimation from inaccurate annotation
- In CVPR
, 2011
"... The task of 2-D articulated human pose estimation in natural images is extremely challenging due to the high level of variation in human appearance. These variations arise from different clothing, anatomy, imaging conditions and the large number of poses it is possible for a human body to take. Rece ..."
Abstract
-
Cited by 42 (1 self)
- Add to MetaCart
(Show Context)
The task of 2-D articulated human pose estimation in natural images is extremely challenging due to the high level of variation in human appearance. These variations arise from different clothing, anatomy, imaging conditions and the large number of poses it is possible for a human body to take. Recent work has shown state-of-the-art results by partitioning the pose space and using strong nonlinear classifiers such that the pose dependence and multi-modal nature of body part appearance can be captured. We pro-pose to extend these methods to handle much larger quan-tities of training data, an order of magnitude larger than current datasets, and show how to utilize Amazon Mechan-ical Turk and a latent annotation update scheme to achieve high quality annotations at low cost. We demonstrate a sig-nificant increase in pose estimation accuracy, while simul-taneously reducing computational expense by a factor of 10, and contribute a dataset of 10,000 highly articulated poses. 1.
Human Pose Estimation using Body Parts Dependent Joint Regressors
"... In this work, we address the problem of estimating 2d human pose from still images. Recent methods that rely on discriminatively trained deformable parts organized in a tree model have shown to be very successful in solving this task. Within such a pictorial structure framework, we address the probl ..."
Abstract
-
Cited by 31 (6 self)
- Add to MetaCart
(Show Context)
In this work, we address the problem of estimating 2d human pose from still images. Recent methods that rely on discriminatively trained deformable parts organized in a tree model have shown to be very successful in solving this task. Within such a pictorial structure framework, we address the problem of obtaining good part templates by proposing novel, non-linear joint regressors. In particular, we employ two-layered random forests as joint regressors. The first layer acts as a discriminative, independent body part classifier. The second layer takes the estimated class distributions of the first one into account and is thereby able to predict joint locations by modeling the interdependence and co-occurrence of the parts. This results in a pose estimation framework that takes dependencies between body parts already for joint localization into account and is thus able to circumvent typical ambiguities of tree structures, such as for legs and arms. In the experiments, we demonstrate that our body parts dependent joint regressors achieve a higher joint localization accuracy than tree-based state-of-the-art methods. 1.
2D Articulated Human Pose Estimation and Retrieval in (Almost) Unconstrained Still Images
"... Abstract We present a technique for estimating the spatial layout of humans in still images – the position of the head, torso and arms. The theme we explore is that once a person is localized using an upper body detector, the search for their body parts can be considerably simplified using weak cons ..."
Abstract
-
Cited by 29 (4 self)
- Add to MetaCart
(Show Context)
Abstract We present a technique for estimating the spatial layout of humans in still images – the position of the head, torso and arms. The theme we explore is that once a person is localized using an upper body detector, the search for their body parts can be considerably simplified using weak constraints on position and appearance arising from that detection. Our approach is capable of estimating upper body pose in highly challenging uncontrolled images, without prior knowledge of background, clothing, lighting, or the location and scale of the person in the image. People are only required to be upright and seen from the front or the back (not side). We evaluate the stages of our approach experimentally using ground truth layout annotation on a variety of challenging material, such as images from the PASCAL VOC 2008 challenge and video frames from TV shows and feature films. We also propose and evaluate techniques for searching a video dataset for people in a specific pose. To this end, we develop three new pose descriptors and compare their classification and retrieval performance to two baselines built on state-of-the-art object detection models. Keywords articulated human pose estimation search retrieval
Poselet conditioned pictorial structures
- In CVPR
, 2013
"... In this paper we consider the challenging problem of ar-ticulated human pose estimation in still images. We observe that despite high variability of the body articulations, hu-man motions and activities often simultaneously constrain the positions of multiple body parts. Modelling such higher order ..."
Abstract
-
Cited by 22 (4 self)
- Add to MetaCart
(Show Context)
In this paper we consider the challenging problem of ar-ticulated human pose estimation in still images. We observe that despite high variability of the body articulations, hu-man motions and activities often simultaneously constrain the positions of multiple body parts. Modelling such higher order part dependencies seemingly comes at a cost of more expensive inference, which resulted in their limited use in state-of-the-art methods. In this paper we propose a model that incorporates higher order part dependencies while re-maining efficient. We achieve this by defining a conditional model in which all body parts are connected a-priori, but which becomes a tractable tree-structured pictorial struc-tures model once the image observations are available. In order to derive a set of conditioning variables we rely on the poselet-based features that have been shown to be effective for people detection but have so far found limited appli-cation for articulated human pose estimation. We demon-strate the effectiveness of our approach on three publicly available pose estimation benchmarks improving or being on-par with state of the art in each case. 1.
An efficient branchand-bound algorithm for optimal human pose estimation
"... Human pose estimation in a static image is a challenging problem in computer vision in that body part configurations are often subject to severe deformations and occlusions. Moreover, efficient pose estimation is often a desirable re-quirement in many applications. The trade-off between ac-curacy an ..."
Abstract
-
Cited by 22 (3 self)
- Add to MetaCart
(Show Context)
Human pose estimation in a static image is a challenging problem in computer vision in that body part configurations are often subject to severe deformations and occlusions. Moreover, efficient pose estimation is often a desirable re-quirement in many applications. The trade-off between ac-curacy and efficiency has been explored in a large number of approaches. On the one hand, models with simple rep-resentations (like tree or star models) can be efficiently ap-plied in pose estimation problems. However, these models are often prone to body part misclassification errors. On the other hand, models with rich representations (i.e., loopy graphical models) are theoretically more robust, but their inference complexity may increase dramatically. In this work, we propose an efficient and exact inference algorithm based on branch-and-bound to solve the human pose esti-mation problem on loopy graphical models. We show that our method is empirically much faster (about 74 times) than the state-of-the-art exact inference algorithm [21]. By ex-tending a state-of-the-art tree model [16] to a loopy graph-ical model, we show that the estimation accuracy improves for most of the body parts (especially lower arms) on pop-ular datasets such as Buffy [7] and Stickmen [5] datasets. Finally, our method can be used to exactly solve most of the inference problems on Stretchable Models [18] (which contains a few hundreds of variables) in just a few minutes. 1.
MODEC: Multimodal Decomposable Models for Human Pose Estimation
"... We propose a multimodal, decomposable model for articulated human pose estimation in monocular images. A typical approach to this problem is to use a linear structured model, which struggles to capture the wide range of appearance present in realistic, unconstrained images. In this paper, we instead ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
(Show Context)
We propose a multimodal, decomposable model for articulated human pose estimation in monocular images. A typical approach to this problem is to use a linear structured model, which struggles to capture the wide range of appearance present in realistic, unconstrained images. In this paper, we instead propose a model of human pose that explicitly captures a variety of pose modes. Unlike other multimodal models, our approach includes both global and local pose cues and uses a convex objective and joint training for mode selection and pose estimation. We also employ a cascaded mode selection step which controls the trade-off between speed and accuracy, yielding a 5x speedup in inference and learning. Our model outperforms state-of-theart approaches across the accuracy-speed trade-off curve for several pose datasets. This includes our newly-collected dataset of people in movies, FLIC, which contains an order of magnitude more labeled data for training and testing than existing datasets. The new dataset and code are available online. 1 1.
B.: Pstrong appearance and expressive spatial models for human pose estimation
- In: ICCV. IEEE (2013
"... Typical approaches to articulated pose estimation com-bine spatial modelling of the human body with appear-ance modelling of body parts. This paper aims to push the state-of-the-art in articulated pose estimation in two ways. First we explore various types of appearance representa-tions aiming to su ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
(Show Context)
Typical approaches to articulated pose estimation com-bine spatial modelling of the human body with appear-ance modelling of body parts. This paper aims to push the state-of-the-art in articulated pose estimation in two ways. First we explore various types of appearance representa-tions aiming to substantially improve the body part hypothe-ses. And second, we draw on and combine several recently proposed powerful ideas such as more flexible spatial mod-els as well as image-conditioned spatial models. In a se-ries of experiments we draw several important conclusions: (1) we show that the proposed appearance representations are complementary; (2) we demonstrate that even a basic tree-structure spatial human body model achieves state-of-the-art performance when augmented with the proper ap-pearance representation; and (3) we show that the com-bination of the best performing appearance model with a flexible image-conditioned spatial model achieves the best result, significantly improving over the state of the art, on the “Leeds Sports Poses ” and “Parse ” benchmarks. 1.