Results 1 - 10
of
159
Non-Rigid Structure-From-Motion: Estimating Shape and Motion with Hierarchical Priors
, 2007
"... This paper describes methods for recovering time-varying shape and motion of non-rigid 3D objects from uncalibrated 2D point tracks. For example, given a video recording of a talking person, we would like to estimate the 3D shape of the face at each instant, and learn a model of facial deformation. ..."
Abstract
-
Cited by 91 (1 self)
- Add to MetaCart
This paper describes methods for recovering time-varying shape and motion of non-rigid 3D objects from uncalibrated 2D point tracks. For example, given a video recording of a talking person, we would like to estimate the 3D shape of the face at each instant, and learn a model of facial deformation. Time-varying shape is modeled as a rigid transformation combined with a non-rigid deformation. Reconstruction is ill-posed if arbitrary deformations are allowed, and thus additional assumptions about deformations are required. We first suggest restricting shapes to lie within a lowdimensional subspace, and describe estimation algorithms. However, this restriction alone is insufficient to constrain reconstruction. To address these problems, we propose a reconstruction method using a Probabilistic Principal Components Analysis (PPCA) shape model, and an estimation algorithm that simultaneously estimates 3D shape and motion for each instant, learns the PPCA model parameters, and robustly fills-in missing data points. We then extend the model to model temporal dynamics in object shape, allowing the algorithm to robustly handle severe cases of missing data.
A data-driven approach to quantifying natural human motion
- ACM Trans. Graph
, 2005
"... Figure 1: Examples from our test set of motions. The left two images are natural (motion capture data). The two images to the right are unnatural (badly edited and incompletely cleaned motion). Joints that are marked in red-yellow were detected as having unnatural motion. Frames for these images wer ..."
Abstract
-
Cited by 67 (5 self)
- Add to MetaCart
Figure 1: Examples from our test set of motions. The left two images are natural (motion capture data). The two images to the right are unnatural (badly edited and incompletely cleaned motion). Joints that are marked in red-yellow were detected as having unnatural motion. Frames for these images were selected by the method presented in [Assa et al. 2005]. In this paper, we investigate whether it is possible to develop a measure that quantifies the naturalness of human motion (as defined by a large database). Such a measure might prove useful in verifying that a motion editing operation had not destroyed the naturalness of a motion capture clip or that a synthetic motion transition was within the space of those seen in natural human motion. We explore the performance of mixture of Gaussians (MoG), hidden Markov models (HMM), and switching linear dynamic systems (SLDS) on this problem. We use each of these statistical models alone and as part of an ensemble of smaller statistical models. We also implement a Naive Bayes (NB) model for a baseline comparison. We test these techniques on motion capture data held out from a database, keyframed motions, edited motions, motions with noise added, and synthetic motion transitions. We present the results as receiver operating characteristic (ROC) curves and compare the results to the judgments made by subjects in a user study.
The Inversion Effect in Biological Motion Perception: Evidence for a ‘‘Life Detector’’?
"... If biological-motion point-light displays are presented upside down, adequate perception is strongly impaired [1, 2]. Reminiscent of the inversion effect in face recognition, it has been suggested that the inversion effect in biological motion is due to impaired configural processing in a highly tra ..."
Abstract
-
Cited by 49 (8 self)
- Add to MetaCart
(Show Context)
If biological-motion point-light displays are presented upside down, adequate perception is strongly impaired [1, 2]. Reminiscent of the inversion effect in face recognition, it has been suggested that the inversion effect in biological motion is due to impaired configural processing in a highly trained expert system [3–5]. Here, we present data that are incompatible with this view. We show that observers can readily retrieve information about direction from scrambled point-light displays of humans and animals. Even though all configural information is entirely disrupted, perception of these displays is still subject to a significant inversion effect. Inverting only parts of the display reveals that the information about direction, as well as the associated inversion effect, is entirely carried
Temporal motion models for monocular and multiview 3D human body tracking
- CVIU
"... We explore an approach to 3D people tracking with learned motion models and deterministic optimization. The tracking problem is formulated as the minimization of a differentiable criterion whose differential structure is rich enough for optimization to be accomplished via hill-climbing. This avoids ..."
Abstract
-
Cited by 36 (4 self)
- Add to MetaCart
(Show Context)
We explore an approach to 3D people tracking with learned motion models and deterministic optimization. The tracking problem is formulated as the minimization of a differentiable criterion whose differential structure is rich enough for optimization to be accomplished via hill-climbing. This avoids the computational expense of Monte Carlo methods, while yielding good results under challenging conditions. To demonstrate the generality of the approach we show that we can learn and track cyclic motions such as walking and running, as well as acyclic motions such as a golf swing. We also show results from both monocular and multi-camera tracking. Finally, we provide results with a motion model learned from multiple activities, and show how this models might be used for recognition.
An expressive three-mode principal components model of human action style
- Image and Vision Computing
, 2003
"... We present a three-mode expressive-feature model for recognizing gender (female, male) from point-light displays of walking people. Prototype female and male walkers are initially decomposed into a subspace of their three-mode components (posture, time, and gender). We then apply a weight factor to ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
(Show Context)
We present a three-mode expressive-feature model for recognizing gender (female, male) from point-light displays of walking people. Prototype female and male walkers are initially decomposed into a subspace of their three-mode components (posture, time, and gender). We then apply a weight factor to each point-light trajectory in the basis representation to enable adaptive, context-based gender estimations. The weight values are automatically learned from labeled training data. We present experiments using physical (actual) and perceived (from perceptual experiments) gender labels to train and test the system. Results with 40 walkers demonstrate greater than 90 % recognition for both physically and perceptually-labeled training examples. The approach has a greater flexibility over standard squared-error gender estimation to successfully adapt to different matching contexts.
Human carrying status in visual surveillance
- in Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition
"... A person’s gait changes when he or she is carrying an object such as a bag, suitcase or rucksack. As a result, human identification and tracking are made more difficult because the averaged gait image is too simple to represent the carrying status. Therefore, in this paper we first introduce a set o ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
(Show Context)
A person’s gait changes when he or she is carrying an object such as a bag, suitcase or rucksack. As a result, human identification and tracking are made more difficult because the averaged gait image is too simple to represent the carrying status. Therefore, in this paper we first introduce a set of Gabor based human gait appearance models, because Gabor functions are similar to the receptive field profiles in the mammalian cortical simple cells. The very high dimensionality of the feature space makes training difficult. In order to solve this problem we propose a general tensor discriminant analysis (GTDA), which seamlessly incorporates the object (Gabor based human gait appearance model) structure information as a natural constraint. GTDA differs from the previous tensor based discriminant analysis methods in that the training converges. Existing methods fail to converge in the training stage. This makes them unsuitable for practical tasks. Experiments are carried out on the USF baseline data set to recognize a human’s ID from the gait silhouette. The proposed Gabor gait incorporated with GTDA is demonstrated to significantly outperform the existing appearance-based methods.
A motion-capture library for the study of identity, gender and emotion perception from biological motion
, 2004
"... ..."
Representing cyclic human motion using functional analysis
- Image and Vision Computing
, 2005
"... We present a robust automatic method for modeling cyclic 3D human motion such as walking using motion-capture data. The pose of the body is represented by a time series of joint angles which are automatically segmented into a sequence of motion cycles. The mean and the principal components of these ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
(Show Context)
We present a robust automatic method for modeling cyclic 3D human motion such as walking using motion-capture data. The pose of the body is represented by a time series of joint angles which are automatically segmented into a sequence of motion cycles. The mean and the principal components of these cycles are computed using a new algorithm that enforces smooth transitions between the cycles by operating in the Fourier domain. Key to this method is its ability to automatically deal with noise and missing data. A learned walking model is then exploited for Bayesian tracking of 3D human motion.
Biological motion as a cue for the perception of size.
- Journal of Vision,
, 2003
"... Animals as well as humans adjust their gait patterns in order to minimize energy required for their locomotion. A particularly important factor is the constant force of earth's gravity. In many dynamic systems, gravity defines a relation between temporal and spatial parameters. The stride freq ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
(Show Context)
Animals as well as humans adjust their gait patterns in order to minimize energy required for their locomotion. A particularly important factor is the constant force of earth's gravity. In many dynamic systems, gravity defines a relation between temporal and spatial parameters. The stride frequency of an animal that moves efficiently in terms of energy consumption depends on its size. In two psychophysical experiments, we investigated whether human observers can employ this relation in order to retrieve size information from point-light displays of dogs moving with varying stride frequencies across the screen. In Experiment 1, observers had to adjust the apparent size of a walking point-light dog by placing it at different depths in a three-dimensional depiction of a complex landscape. In Experiment 2, the size of the dog could be adjusted directly. Results show that displays with high stride frequencies are perceived to be smaller than displays with low stride frequencies and that this correlation perfectly reflects the predicted inverse quadratic relation between stride frequency and size. We conclude that biological motion can serve as a cue to retrieve the size of an animal and, therefore, to scale the visual environment.
Dynamic information for the recognition of conversational expressions.
- Journal of Vision,
, 2009
"... Communication is critical for normal, everyday life. During a conversation, information is conveyed in a number of ways, including through body, head, and facial changes. While much research has examined these latter forms of communication, the majority of it has focused on static representations o ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Communication is critical for normal, everyday life. During a conversation, information is conveyed in a number of ways, including through body, head, and facial changes. While much research has examined these latter forms of communication, the majority of it has focused on static representations of a few, supposedly universal expressions. Normal conversations, however, contain a very wide variety of expressions and are rarely, if ever, static. Here, we report several experiments that show that expressions that use head, eye, and internal facial motion are recognized more easily and accurately than static versions of those expressions. Moreover, we demonstrate conclusively that this dynamic advantage is due to information that is only available over time, and that the temporal integration window for this information is at least 100 ms long.