Results 1  10
of
203
Realtime human pose recognition in parts from single depth images
 In In CVPR, 2011. 3
"... We propose a new method to quickly and accurately predict 3D positions of body joints from a single depth image, using no temporal information. We take an object recognition approach, designing an intermediate body parts representation that maps the difficult pose estimation problem into a simpler p ..."
Abstract

Cited by 550 (19 self)
 Add to MetaCart
(Show Context)
We propose a new method to quickly and accurately predict 3D positions of body joints from a single depth image, using no temporal information. We take an object recognition approach, designing an intermediate body parts representation that maps the difficult pose estimation problem into a simpler perpixel classification problem. Our large and highly varied training dataset allows the classifier to estimate body parts invariant to pose, body shape, clothing, etc. Finally we generate confidencescored 3D proposals of several body joints by reprojecting the classification result and finding local modes. The system runs at 200 frames per second on consumer hardware. Our evaluation shows high accuracy on both synthetic and real test sets, and investigates the effect of several training parameters. We achieve state of the art accuracy in our comparison with related work and demonstrate improved generalization over exact wholeskeleton nearest neighbor matching. 1.
Interactive Control of Avatars Animated with Human Motion Data
, 2002
"... Realtime control of threedimensional avatars is an important problem in the context of computer games and virtual environments. Avatar animation and control is difficult, however, because a large repertoire of avatar behaviors must be made available, and the user must be able to select from this s ..."
Abstract

Cited by 369 (37 self)
 Add to MetaCart
Realtime control of threedimensional avatars is an important problem in the context of computer games and virtual environments. Avatar animation and control is difficult, however, because a large repertoire of avatar behaviors must be made available, and the user must be able to select from this set of behaviors, possibly with a lowdimensional input device. One appealing approach to obtaining a rich set of avatar behaviors is to collect an extended, unlabeled sequence of motion data appropriate to the application. In this paper, we show that such a motion database can be preprocessed for flexibility in behavior and efficient search and exploited for realtime avatar control. Flexibility is created by identifying plausible transitions between motion segments, and efficient search through the resulting graph structure is obtained through clustering. Three interface techniques are demonstrated for controlling avatar motion using this data structure: the user selects from a set of available choices, sketches a path through an environment, or acts out a desired motion in front of a video camera. We demonstrate the flexibility of the approach through four different applications and compare the avatar motion to directly recorded human motion.
Recovering 3D Human Pose from Monocular Images
"... We describe a learning based method for recovering 3D human body pose from single images and monocular image sequences. Our approach requires neither an explicit body model nor prior labelling of body parts in the image. Instead, it recovers pose by direct nonlinear regression against shape descrip ..."
Abstract

Cited by 261 (0 self)
 Add to MetaCart
(Show Context)
We describe a learning based method for recovering 3D human body pose from single images and monocular image sequences. Our approach requires neither an explicit body model nor prior labelling of body parts in the image. Instead, it recovers pose by direct nonlinear regression against shape descriptor vectors extracted automatically from image silhouettes. For robustness against local silhouette segmentation errors, silhouette shape is encoded by histogramofshapecontexts descriptors. We evaluate several different regression methods: ridge regression, Relevance Vector Machine (RVM) regression and Support Vector Machine (SVM) regression over both linear and kernel bases. The RVMs provide much sparser regressors without compromising performance, and kernel bases give a small but worthwhile improvement in performance. Loss of depth and limb labelling information often makes the recovery of 3D pose from single silhouettes ambiguous. We propose two solutions to this: the first embeds the method in a tracking framework, using dynamics from the previous state estimate to disambiguate the pose; the second uses a mixture of regressors framework to return multiple solutions for each silhouette. We show that the resulting system tracks long sequences stably, and is also capable of accurately reconstructing 3D human pose from single images, giving multiple possible solutions in ambiguous cases. For realism and good generalization over a wide range of viewpoints, we train the regressors on images resynthesized from real human motion capture data. The method is demonstrated on a 54parameter full body pose model, both quantitatively on independent but similar test data, and qualitatively on real image sequences. Mean angular errors of 4–5 degrees are obtained — a factor of 3 better than the current state of the art for the much simpler upper body problem.
StyleBased Inverse Kinematics
, 2004
"... This paper presents an inverse kinematics system based on a learned model of human poses. Given a set of constraints, our system can produce the most likely pose satisfying those constraints, in realtime. Training the model on different input data leads to different styles of IK. The model is repres ..."
Abstract

Cited by 211 (8 self)
 Add to MetaCart
This paper presents an inverse kinematics system based on a learned model of human poses. Given a set of constraints, our system can produce the most likely pose satisfying those constraints, in realtime. Training the model on different input data leads to different styles of IK. The model is represented as a probability distribution over the space of all possible poses. This means that our IK system can generate any pose, but prefers poses that are most similar to the space of poses in the training data. We represent the probability with a novel model called a Scaled Gaussian Process Latent Variable Model. The parameters of the model are all learned automatically; no manual tuning is required for the learning component of the system. We additionally describe a novel procedure for interpolating between styles. Our stylebased
3D Human Pose from Silhouettes by Relevance Vector Regression
 In CVPR
, 2004
"... We describe a learning based method for recovering 3D human body pose from single images and monocular image sequences. Our approach requires neither an explicit body model nor prior labelling of body parts in the image. Instead, it recovers pose by direct nonlinear regression against shape descript ..."
Abstract

Cited by 199 (8 self)
 Add to MetaCart
We describe a learning based method for recovering 3D human body pose from single images and monocular image sequences. Our approach requires neither an explicit body model nor prior labelling of body parts in the image. Instead, it recovers pose by direct nonlinear regression against shape descriptor vectors extracted automatically from image silhouettes. For robustness against local silhouette segmentation errors, silhouette shape is encoded by histogramofshapecontexts descriptors. For the main regression, we evaluate both regularized least squares and Relevance Vector Machine (RVM) regressors over both linear and kernel bases. The RVM’s provide much sparser regressors without compromising performance, and kernel bases give a small but worthwhile improvement in performance. For realism and good generalization with respect to viewpoints, we train the regressors on images resynthesized from real human motion capture data, and test it both quantitatively on similar independent test data, and qualitatively on a real image sequence. Mean angular errors of 6–7 degrees are obtained — a factor of 3 better than the current state of the art for the much simpler upper body problem. 1.
Data fusion for visual tracking with particles
 Proc. IEEE
, 2004
"... Abstract—The effectiveness of probabilistic tracking of objects in image sequences has been revolutionized by the development of particle filtering. Whereas Kalman filters are restricted to Gaussian distributions, particle filters can propagate more general distributions, albeit only approximately. ..."
Abstract

Cited by 169 (2 self)
 Add to MetaCart
(Show Context)
Abstract—The effectiveness of probabilistic tracking of objects in image sequences has been revolutionized by the development of particle filtering. Whereas Kalman filters are restricted to Gaussian distributions, particle filters can propagate more general distributions, albeit only approximately. This is of particular benefit in visual tracking because of the inherent ambiguity of the visual world that stems from its richness and complexity. One important advantage of the particle filtering framework is that it allows the information from different measurement sources to be fused in a principled manner. Although this fact has been acknowledged before, it has not been fully exploited within a visual tracking context. Here we introduce generic importance sampling mechanisms for data fusion and discuss them for fusing color with either stereo sound, for teleconferencing, or with motion, for surveillance with a still camera. We show how each of the three cues can be modeled by an appropriate data likelihood function, and how the intermittent cues (sound or motion) are best handled by generating proposal distributions from their likelihood functions. Finally, the effective fusion of the cues by particle filtering is demonstrated on real teleconference and surveillance data. Index Terms — Visual tracking, data fusion, particle filters, sound, color, motion I.
Gaussian process dynamical models for human motion
 IEEE Trans. Pattern Anal. Machine Intell
, 2007
"... Abstract—We introduce Gaussian process dynamical models (GPDMs) for nonlinear time series analysis, with applications to learning models of human pose and motion from highdimensional motion capture data. A GPDM is a latent variable model. It comprises a lowdimensional latent space with associated d ..."
Abstract

Cited by 156 (5 self)
 Add to MetaCart
(Show Context)
Abstract—We introduce Gaussian process dynamical models (GPDMs) for nonlinear time series analysis, with applications to learning models of human pose and motion from highdimensional motion capture data. A GPDM is a latent variable model. It comprises a lowdimensional latent space with associated dynamics, as well as a map from the latent space to an observation space. We marginalize out the model parameters in closed form by using Gaussian process priors for both the dynamical and the observation mappings. This results in a nonparametric model for dynamical systems that accounts for uncertainty in the model. We demonstrate the approach and compare four learning algorithms on human motion capture data, in which each pose is 50dimensional. Despite the use of small data sets, the GPDM learns an effective representation of the nonlinear dynamics in these spaces. Index Terms—Machine learning, motion, tracking, animation, stochastic processes, time series analysis. 1
Kinematic Jump Processes For Monocular 3D Human Tracking
 In Int. Conf. Computer Vision & Pattern Recognition
, 2003
"... A major difficulty for 3D human body tracking from monocular image sequences is the near nonobservability of kinematic degrees of freedom that generate motion in depth. For known link (body segment) lengths, the strict nonobservabilities reduce to twofold ‘forwards/backwards flipping ’ ambiguities ..."
Abstract

Cited by 139 (17 self)
 Add to MetaCart
(Show Context)
A major difficulty for 3D human body tracking from monocular image sequences is the near nonobservability of kinematic degrees of freedom that generate motion in depth. For known link (body segment) lengths, the strict nonobservabilities reduce to twofold ‘forwards/backwards flipping ’ ambiguities for each link. These imply 2 # links formal inverse kinematics solutions for the full model, and hence linked groups of O(2 # links) local minima in the modelimage matching cost function. Choosing the wrong minimum leads to rapid mistracking, so for reliable tracking, rapid methods of investigating alternative minima within a group are needed. Previous approaches to this have used generic search methods that do not exploit the specific problem structure. Here, we complement these by using simple kinematic reasoning to enumerate the tree of possible forwards/backwards flips, thus greatly speeding the search within each linked group of minima. Our methods can be used either deterministically, or within stochastic ‘jumpdiffusion ’ style search processes. We give experimental results on some challenging monocular human tracking sequences, showing how the new kinematicflipping based sampling method improves and complements existing ones.
Performance Animation from Lowdimensional Control Signals
 ACM Transactions on Graphics
, 2005
"... This paper introduces an approach to performance animation that employs video cameras and a small set of retroreflective markers to create a lowcost, easytouse system that might someday be practical for home use. The lowdimensional control signals from the user's performance are supplement ..."
Abstract

Cited by 130 (18 self)
 Add to MetaCart
This paper introduces an approach to performance animation that employs video cameras and a small set of retroreflective markers to create a lowcost, easytouse system that might someday be practical for home use. The lowdimensional control signals from the user's performance are supplemented by a database of prerecorded human motion. At run time, the system automatically learns a series of local models from a set of motion capture examples that are a close match to the marker locations captured by the cameras. These local models are then used to reconstruct the motion of the user as a fullbody animation. We demonstrate the power of this approach with realtime control of six different behaviors using two video cameras and a small set of retroreflective markers. We compare the resulting animation to animation from commercial motion capture equipment with a full set of markers.