Results 1 - 10
of
28
T.K.: Real-time articulated hand pose estimation using semi-supervised transductive regression forests
- In: Proc. ICCV (2013
"... This paper presents the first semi-supervised transduc-tive algorithm for real-time articulated hand pose estima-tion. Noisy data and occlusions are the major challenges of articulated hand pose estimation. In addition, the dis-crepancies among realistic and synthetic pose data under-mine the perfor ..."
Abstract
-
Cited by 25 (3 self)
- Add to MetaCart
(Show Context)
This paper presents the first semi-supervised transduc-tive algorithm for real-time articulated hand pose estima-tion. Noisy data and occlusions are the major challenges of articulated hand pose estimation. In addition, the dis-crepancies among realistic and synthetic pose data under-mine the performances of existing approaches that use syn-thetic data extensively in training. We therefore propose the Semi-supervised Transductive Regression (STR) forest which learns the relationship between a small, sparsely la-belled realistic dataset and a large synthetic dataset. We also design a novel data-driven, pseudo-kinematic tech-nique to refine noisy or occluded joints. Our contributions include: (i) capturing the benefits of both realistic and syn-thetic data via transductive learning; (ii) showing accura-cies can be improved by considering unlabelled data; and (iii) introducing a pseudo-kinematic technique to refine ar-ticulations efficiently. Experimental results show not only the promising performance of our method with respect to noise and occlusions, but also its superiority over state-of-the-arts in accuracy, robustness and speed. 1.
Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations
"... We present a method for estimating articulated human pose from a single static image based on a graphical model with novel pairwise relations that make adap-tive use of local image measurements. More precisely, we specify a graphical model for human pose which exploits the fact the local image measu ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
(Show Context)
We present a method for estimating articulated human pose from a single static image based on a graphical model with novel pairwise relations that make adap-tive use of local image measurements. More precisely, we specify a graphical model for human pose which exploits the fact the local image measurements can be used both to detect parts (or joints) and also to predict the spatial relationships between them (Image Dependent Pairwise Relations). These spatial relationships are represented by a mixture model. We use Deep Convolutional Neural Networks (DCNNs) to learn conditional probabilities for the presence of parts and their spa-tial relationships within image patches. Hence our model combines the represen-tational flexibility of graphical models with the efficiency and statistical power of DCNNs. Our method significantly outperforms the state of the art methods on the LSP and FLIC datasets and also performs very well on the Buffy dataset without any training. 1
A.: Discriminative sub-categorization
, 2013
"... The objective of this work is to learn sub-categories. Rather than casting this as a problem of unsupervised clus-tering, we investigate a weakly supervised approach using both positive and negative samples of the category. We make the following contributions: (i) we introduce a new model for discri ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
(Show Context)
The objective of this work is to learn sub-categories. Rather than casting this as a problem of unsupervised clus-tering, we investigate a weakly supervised approach using both positive and negative samples of the category. We make the following contributions: (i) we introduce a new model for discriminative sub-categorization which determines cluster membership for positive samples whilst simultaneously learning a max-margin classifier to sepa-rate each cluster from the negative samples; (ii) we show that this model does not suffer from the degenerate cluster problem that afflicts several competing methods (e.g., La-tent SVM and Max-Margin Clustering); (iii) we show that the method is able to discover interpretable sub-categories in various datasets. The model is evaluated experimentally over various datasets, and its performance advantages over k-means and Latent SVM are demonstrated. We also stress test the model and show its resilience in discovering sub-categories as the parameters are varied. 1.
Mixing Body-Part Sequences for Human Pose Estimation
"... In this paper, we present a method for estimating articu-lated human poses in videos. We cast this as an optimization problem defined on body parts with spatio-temporal links between them. The resulting formulation is unfortunately intractable and previous approaches only provide approx-imate soluti ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
(Show Context)
In this paper, we present a method for estimating articu-lated human poses in videos. We cast this as an optimization problem defined on body parts with spatio-temporal links between them. The resulting formulation is unfortunately intractable and previous approaches only provide approx-imate solutions. Although such methods perform well on certain body parts, e.g., head, their performance on lower arms, i.e., elbows and wrists, remains poor. We present a new approximate scheme with two steps dedicated to pose estimation. First, our approach takes into account tempo-ral links with subsequent frames for the less-certain parts, namely elbows and wrists. Second, our method decomposes poses into limbs, generates limb sequences across time, and recomposes poses by mixing these body part sequences. We introduce a new dataset “Poses in the Wild”, which is more challenging than the existing ones, with sequences containing background clutter, occlusions, and severe cam-era motion. We experimentally compare our method with recent approaches on this new dataset as well as on two other benchmark datasets, and show significant improve-ment. 1.
Estimating human pose with flowing puppets
- In ICCV
, 2013
"... t, t+1forward!owbackward!ow d c b a b c d ..."
(Show Context)
Body Parts Dependent Joint Regressors for Human Pose Estimation in Still Images
"... Abstract—In this work, we address the problem of estimating 2d human pose from still images. Articulated body pose estimation is challenging due to the large variation in body poses and appearances of the different body parts. Recent methods that rely on the pictorial structure framework have shown ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
Abstract—In this work, we address the problem of estimating 2d human pose from still images. Articulated body pose estimation is challenging due to the large variation in body poses and appearances of the different body parts. Recent methods that rely on the pictorial structure framework have shown to be very successful in solving this task. They model the body part appearances using discriminatively trained, independent part templates and the spatial relations of the body parts using a tree model. Within such a framework, we address the problem of obtaining better part templates which are able to handle a very high variation in appearance. To this end, we introduce parts dependent body joint regressors which are random forests that operate over two layers. While the first layer acts as an independent body part classifier, the second layer takes the estimated class distributions of the first one into account and is thereby able to predict joint locations by modeling the interdependence and co-occurrence of the parts. This helps to overcome typical ambiguities of tree structures, such as self-similarities of legs and arms. In addition, we introduce a novel dataset termed FashionPose that contains over 7, 000 images with a challenging variation of body part appearances due to a large variation of dressing styles. In the experiments, we demonstrate that the proposed parts dependent joint regressors outperform independent classifiers or regressors. The method also performs better or similar to the state-of-the-art in terms of accuracy, while running with a couple of frames per second. Index Terms—Human pose estimation, fashion, random forest, regression, classification F 1
Human Pose Classification within the Context of Near-IR Imagery Tracking
"... We address the challenge of human behaviour analysis within automated image understanding. Whilst prior work concentrates on this task within visible-band (EO) imagery, by contrast we target basic human pose classification in thermal-band (infrared, IR) imagery. By leveraging the key advantages of l ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
We address the challenge of human behaviour analysis within automated image understanding. Whilst prior work concentrates on this task within visible-band (EO) imagery, by contrast we target basic human pose classification in thermal-band (infrared, IR) imagery. By leveraging the key advantages of limb localization this imagery offers we target two distinct human pose classification problems of varying complexity: 1) identifying passive or active individuals within the scene and 2) the identification of individuals potentially carrying weapons. Both approaches use a discrete set of features capturing body pose characteristics from which a range of machine learning techniques are then employed for final classification. Significant success is shown on these challenging tasks over a wide range of environmental conditions within the wider context of automated human target tracking in thermal-band (IR) imagery. 1.
Pyramidal Fisher Motion for Multiview Gait Recognition
- in IEEE 22nd International Conference On Pattern Recognition
"... Abstract—The goal of this paper is to identify individuals by analyzing their gait. Instead of using binary silhouettes as input data (as done in many previous works) we propose and evaluate the use of motion descriptors based on densely sampled short-term trajectories. We take advantage of state-of ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract—The goal of this paper is to identify individuals by analyzing their gait. Instead of using binary silhouettes as input data (as done in many previous works) we propose and evaluate the use of motion descriptors based on densely sampled short-term trajectories. We take advantage of state-of-the-art people detectors to define custom spatial configurations of the descriptors around the target person. Thus, obtaining a pyramidal representation of the gait motion. The local motion features (described by the Divergence-Curl-Shear descriptor [1]) extracted on the different spatial areas of the person are combined into a single high-level gait descriptor by using the Fisher Vector encoding [2]. The proposed approach, coined Pyramidal Fisher Motion, is experimentally validated on the recent ‘AVA Multiview Gait ’ dataset [3]. The results show that this new approach achieves promising results in the problem of gait recognition. I.
Spatio-temporal Matching for Human Detection in Video
"... Abstract. Detection and tracking humans in videos have been long-standing problems in computer vision. Most successful approaches (e.g., deformable parts models) heavily rely on discriminative models to build appearance detectors for body joints and generative models to constrain possible body confi ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract. Detection and tracking humans in videos have been long-standing problems in computer vision. Most successful approaches (e.g., deformable parts models) heavily rely on discriminative models to build appearance detectors for body joints and generative models to constrain possible body configurations (e.g., trees). While these 2D models have been successfully applied to images (and with less success to videos), a major challenge is to generalize these models to cope with camera views. In order to achieve view-invariance, these 2D models typically require a large amount of training data across views that is difficult to gather and time-consuming to label. Unlike existing 2D models, this paper for-mulates the problem of human detection in videos as spatio-temporal matching (STM) between a 3D motion capture model and trajectories in videos. Our algorithm estimates the camera view and selects a subset of tracked trajectories that matches the motion of the 3D model. The STM is efficiently solved with linear programming, and it is robust to tracking mismatches, occlusions and outliers. To the best of our knowl-edge this is the first paper that solves the correspondence between video and 3D motion capture data for human pose detection. Experiments on the Human3.6M and Berkeley MHAD databases illustrate the benefits of our method over state-of-the-art approaches. 1
BURGOS-ARTIZZU et al.: MERGING POSE ESTIMATES ACROSS SPACE AND TIME 1 Merging Pose Estimates Across Space and Time
"... Numerous ‘non-maximum suppression ’ (NMS) post-processing schemes have been proposed for merging multiple independent object detections. We propose a generalization of NMS beyond bounding boxes to merge multiple pose estimates in a single frame. The final estimates are centroids rather than medoids ..."
Abstract
- Add to MetaCart
Numerous ‘non-maximum suppression ’ (NMS) post-processing schemes have been proposed for merging multiple independent object detections. We propose a generalization of NMS beyond bounding boxes to merge multiple pose estimates in a single frame. The final estimates are centroids rather than medoids as in standard NMS, thus being more accurate than any of the individual candidates. Using the same mathematical framework, we extend our approach to the multi-frame setting, merging multiple independent pose estimates across space and time and outputting both the number and pose of the objects present in a scene. Our approach sidesteps many of the inherent challenges associated with full tracking (e.g. objects entering/leaving a scene, extended periods of occlusion, etc.). We show its versatility by applying it to two distinct state-of-the-art pose estimation algorithms in three domains: human bodies, faces and mice. Our approach improves both detection accuracy (by helping disambiguate correspondences) as well as pose estimation quality and is computationally efficient. 1