Results 1 - 10
of
18
DeepPose: Human pose estimation via deep neural networks
- In CVPR
, 2014
"... Figure 1. Besides extreme variability in articulations, many of the joints are barely visible. We can guess the location of the right arm in the left image only because we see the rest of the pose and anticipate the motion or activity of the person. Similarly, the left body half of the person on the ..."
Abstract
-
Cited by 41 (1 self)
- Add to MetaCart
(Show Context)
Figure 1. Besides extreme variability in articulations, many of the joints are barely visible. We can guess the location of the right arm in the left image only because we see the rest of the pose and anticipate the motion or activity of the person. Similarly, the left body half of the person on the right is not visible at all. These are examples of the need for holistic reasoning. We believe that DNNs can naturally provide such type of reasoning. We propose a method for human pose estimation based on Deep Neural Networks (DNNs). The pose estimation is formulated as a DNN-based regression problem towards body joints. We present a cascade of such DNN regres-sors which results in high precision pose estimates. The approach has the advantage of reasoning about pose in a holistic fashion and has a simple but yet powerful formula-tion which capitalizes on recent advances in Deep Learn-ing. We present a detailed empirical analysis with state-of-art or better performance on four academic benchmarks of diverse real-world images. 1.
C.: Joint training of a convolutional network and a graphical model for human pose estimation
, 2014
"... This paper proposes a new hybrid architecture that consists of a deep Convolu-tional Network and a Markov Random Field. We show how this architecture is successfully applied to the challenging problem of articulated human pose esti-mation in monocular images. The architecture can exploit structural ..."
Abstract
-
Cited by 30 (2 self)
- Add to MetaCart
(Show Context)
This paper proposes a new hybrid architecture that consists of a deep Convolu-tional Network and a Markov Random Field. We show how this architecture is successfully applied to the challenging problem of articulated human pose esti-mation in monocular images. The architecture can exploit structural domain con-straints such as geometric relationships between body joint locations. We show that joint training of these two model paradigms improves performance and allows us to significantly outperform existing state-of-the-art techniques. 1
B.: Pstrong appearance and expressive spatial models for human pose estimation
- In: ICCV. IEEE (2013
"... Typical approaches to articulated pose estimation com-bine spatial modelling of the human body with appear-ance modelling of body parts. This paper aims to push the state-of-the-art in articulated pose estimation in two ways. First we explore various types of appearance representa-tions aiming to su ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
(Show Context)
Typical approaches to articulated pose estimation com-bine spatial modelling of the human body with appear-ance modelling of body parts. This paper aims to push the state-of-the-art in articulated pose estimation in two ways. First we explore various types of appearance representa-tions aiming to substantially improve the body part hypothe-ses. And second, we draw on and combine several recently proposed powerful ideas such as more flexible spatial mod-els as well as image-conditioned spatial models. In a se-ries of experiments we draw several important conclusions: (1) we show that the proposed appearance representations are complementary; (2) we demonstrate that even a basic tree-structure spatial human body model achieves state-of-the-art performance when augmented with the proper ap-pearance representation; and (3) we show that the com-bination of the best performing appearance model with a flexible image-conditioned spatial model achieves the best result, significantly improving over the state of the art, on the “Leeds Sports Poses ” and “Parse ” benchmarks. 1.
2D Human Pose Estimation: New Benchmark and State of the Art Analysis
"... Human pose estimation has made significant progress during the last years. However current datasets are limited in their coverage of the overall pose estimation challenges. Still these serve as the common sources to evaluate, train and compare different models on. In this paper we intro-duce a novel ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
(Show Context)
Human pose estimation has made significant progress during the last years. However current datasets are limited in their coverage of the overall pose estimation challenges. Still these serve as the common sources to evaluate, train and compare different models on. In this paper we intro-duce a novel benchmark “MPII Human Pose”1 that makes a significant advance in terms of diversity and difficulty, a contribution that we feel is required for future develop-ments in human body models. This comprehensive dataset was collected using an established taxonomy of over 800 human activities [1]. The collected images cover a wider variety of human activities than previous datasets including various recreational, occupational and householding activ-ities, and capture people from a wider range of viewpoints. We provide a rich set of labels including positions of body joints, full 3D torso and head orientation, occlusion labels for joints and body parts, and activity labels. For each im-age we provide adjacent video frames to facilitate the use of motion information. Given these rich annotations we per-form a detailed analysis of leading human pose estimation approaches and gaining insights for the success and fail-ures of these methods. 1.
Multi-source deep learning for human pose estimation
- In CVPR
, 2014
"... Visual appearance score, appearance mixture type and deformation are three important information sources for human pose estimation. This paper proposes to build a multi-source deep model in order to extract non-linear representation from these different aspects of information sources. With the deep ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
(Show Context)
Visual appearance score, appearance mixture type and deformation are three important information sources for human pose estimation. This paper proposes to build a multi-source deep model in order to extract non-linear representation from these different aspects of information sources. With the deep model, the global, high-order hu-man body articulation patterns in these information sources are extracted for pose estimation. The task for estimat-ing body locations and the task for human detection are jointly learned using a unified deep model. The proposed approach can be viewed as a post-processing of pose esti-mation results and can flexibly integrate with existing meth-ods by taking their information sources as input. By extract-ing the non-linear representation from multiple information sources, the deep model outperforms state-of-the-art by up to 8.6 percent on three public benchmark datasets. 1.
Do convnets learn correspondence
- In NIPS
, 2014
"... Convolutional neural nets (convnets) trained from massive labeled datasets [1] have substantially improved the state-of-the-art in image classification [2] and ob-ject detection [3]. However, visual understanding requires establishing correspon-dence on a finer level than object category. Given thei ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
(Show Context)
Convolutional neural nets (convnets) trained from massive labeled datasets [1] have substantially improved the state-of-the-art in image classification [2] and ob-ject detection [3]. However, visual understanding requires establishing correspon-dence on a finer level than object category. Given their large pooling regions and training from whole-image labels, it is not clear that convnets derive their success from an accurate correspondence model which could be used for precise local-ization. In this paper, we study the effectiveness of convnet activation features for tasks requiring correspondence. We present evidence that convnet features lo-calize at a much finer scale than their receptive field sizes, that they can be used to perform intraclass alignment as well as conventional hand-engineered features, and that they outperform conventional features in keypoint prediction on objects from PASCAL VOC 2011 [4]. 1
Bregler: MoDeep: A deep learning framework using motion features for human pose estimation. ACCV
, 2014
"... Abstract. In this work, we propose a novel and efficient method for ar-ticulated human pose estimation in videos using a convolutional network architecture, which incorporates both color and motion features. We pro-pose a new human body pose dataset, FLIC-motion1, that extends the FLIC dataset [1] w ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
(Show Context)
Abstract. In this work, we propose a novel and efficient method for ar-ticulated human pose estimation in videos using a convolutional network architecture, which incorporates both color and motion features. We pro-pose a new human body pose dataset, FLIC-motion1, that extends the FLIC dataset [1] with additional motion features. We apply our archi-tecture to this dataset and report significantly better performance than current state-of-the-art pose detection systems. 1
Efficient ConvNet-based Marker-less Motion Capture in General Scenes with a Low Number of Cameras
"... We present a novel method for accurate marker-less capture of articulated skeleton motion of several subjects in general scenes, indoors and outdoors, even from input filmed with as few as two cameras. Our approach unites a discriminative image-based joint detection method with a model-based generat ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
We present a novel method for accurate marker-less capture of articulated skeleton motion of several subjects in general scenes, indoors and outdoors, even from input filmed with as few as two cameras. Our approach unites a discriminative image-based joint detection method with a model-based generative motion tracking algorithm through a combined pose optimization energy. The discriminative part-based pose detection method, implemented using Con-volutional Networks (ConvNet), estimates unary potentials for each joint of a kinematic skeleton model. These unary potentials are used to probabilistically extract pose con-straints for tracking by using weighted sampling from a pose posterior guided by the model. In the final energy, these constraints are combined with an appearance-based model-to-image similarity term. Poses can be computed very efficiently using iterative local optimization, as Con-vNet detection is fast, and our formulation yields a com-bined pose estimation energy with analytic derivatives. In combination, this enables to track full articulated joint an-gles at state-of-the-art accuracy and temporal stability with a very low number of cameras. 1.
Body Parts Dependent Joint Regressors for Human Pose Estimation in Still Images
"... Abstract—In this work, we address the problem of estimating 2d human pose from still images. Articulated body pose estimation is challenging due to the large variation in body poses and appearances of the different body parts. Recent methods that rely on the pictorial structure framework have shown ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
Abstract—In this work, we address the problem of estimating 2d human pose from still images. Articulated body pose estimation is challenging due to the large variation in body poses and appearances of the different body parts. Recent methods that rely on the pictorial structure framework have shown to be very successful in solving this task. They model the body part appearances using discriminatively trained, independent part templates and the spatial relations of the body parts using a tree model. Within such a framework, we address the problem of obtaining better part templates which are able to handle a very high variation in appearance. To this end, we introduce parts dependent body joint regressors which are random forests that operate over two layers. While the first layer acts as an independent body part classifier, the second layer takes the estimated class distributions of the first one into account and is thereby able to predict joint locations by modeling the interdependence and co-occurrence of the parts. This helps to overcome typical ambiguities of tree structures, such as self-similarities of legs and arms. In addition, we introduce a novel dataset termed FashionPose that contains over 7, 000 images with a challenging variation of body part appearances due to a large variation of dressing styles. In the experiments, we demonstrate that the proposed parts dependent joint regressors outperform independent classifiers or regressors. The method also performs better or similar to the state-of-the-art in terms of accuracy, while running with a couple of frames per second. Index Terms—Human pose estimation, fashion, random forest, regression, classification F 1
R-CNNs for Pose Estimation and Action Detection
"... We present convolutional neural networks for the tasks of keypoint (pose) predic-tion and action classification of people in unconstrained images. Our approach involves training an R-CNN detector with loss functions depending on the task being tackled. We evaluate our method on the challenging PASCA ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
We present convolutional neural networks for the tasks of keypoint (pose) predic-tion and action classification of people in unconstrained images. Our approach involves training an R-CNN detector with loss functions depending on the task being tackled. We evaluate our method on the challenging PASCAL VOC dataset and compare it to previous leading approaches. Our method gives state-of-the-art results for keypoint and action prediction. Additionally, we introduce a new dataset for action detection, the task of simultaneously localizing people and clas-sifying their actions, and present results using our approach. 1