Results 1 -
5 of
5
Efficient ConvNet-based Marker-less Motion Capture in General Scenes with a Low Number of Cameras
"... We present a novel method for accurate marker-less capture of articulated skeleton motion of several subjects in general scenes, indoors and outdoors, even from input filmed with as few as two cameras. Our approach unites a discriminative image-based joint detection method with a model-based generat ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
We present a novel method for accurate marker-less capture of articulated skeleton motion of several subjects in general scenes, indoors and outdoors, even from input filmed with as few as two cameras. Our approach unites a discriminative image-based joint detection method with a model-based generative motion tracking algorithm through a combined pose optimization energy. The discriminative part-based pose detection method, implemented using Con-volutional Networks (ConvNet), estimates unary potentials for each joint of a kinematic skeleton model. These unary potentials are used to probabilistically extract pose con-straints for tracking by using weighted sampling from a pose posterior guided by the model. In the final energy, these constraints are combined with an appearance-based model-to-image similarity term. Poses can be computed very efficiently using iterative local optimization, as Con-vNet detection is fast, and our formulation yields a com-bined pose estimation energy with analytic derivatives. In combination, this enables to track full articulated joint an-gles at state-of-the-art accuracy and temporal stability with a very low number of cameras. 1.
Free-viewpoint Video of Human Actors using Multiple Handheld Kinects
- IEEE T-SMC:B SPECIAL ISSUE ON COMPUTER VISION FOR RGB-D SENSORS: KINECT AND ITS APPLICATIONS
"... We present an algorithm for creating free-viewpoint video of interacting humans using three hand-held Kinect cameras. Our method reconstructs deforming surface geometry and temporal varying texture of humans through estimation of human poses and camera poses for every time step of the RGBZ video. S ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We present an algorithm for creating free-viewpoint video of interacting humans using three hand-held Kinect cameras. Our method reconstructs deforming surface geometry and temporal varying texture of humans through estimation of human poses and camera poses for every time step of the RGBZ video. Skeletal configurations and camera poses are found by solving a joint energy minimization problem which optimizes the alignment of RGBZ data from all cameras, as well as the alignment of human shape templates to the Kinect data. The energy function is based on a combination of geometric correspondence finding, implicit scene segmentation, and correspondence finding using image features. Finally, texture recovery is achieved through jointly optimization on spatio-temporal RGB data using matrix completion. As opposed to previous methods, our algorithm succeeds on free-viewpoint video of human actors under general uncontrolled indoor scenes with potentially dynamic background, and it succeeds even if the cameras are moving.
Outdoor Human Motion Capture by Simultaneous Optimization of Pose and Camera Parameters
"... Figure 1: Examples of multi-person tracking with moving cameras. (Left two images) two actors, and two moving and 3 static cameras (Soccer1). (Right two images) One actor, and three moving and two static cameras (Walk2). We present a method for capturing the skeletal motions of humans using a sparse ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Figure 1: Examples of multi-person tracking with moving cameras. (Left two images) two actors, and two moving and 3 static cameras (Soccer1). (Right two images) One actor, and three moving and two static cameras (Walk2). We present a method for capturing the skeletal motions of humans using a sparse set of potentially moving cameras in an uncontrolled environment. Our approach is able to track multiple people even in front of cluttered and non-static backgrounds, and unsynchronized cameras with varying image quality and frame rate. We completely rely on optical information and do not make use of additional sensor information (e.g. depth images or inertial sensors). Our algorithm simultaneously reconstructs the skeletal pose parameters of multiple performers and the motion of each camera. This is facilitated by a new energy functional that captures the alignment of the model and the camera positions with the input videos in an analytic way. The approach can be adopted in many practical applications to replace the complex and expensive motion capture studios with few consumer-grade cameras even in uncontrolled outdoor scenes. We demonstrate this based on challenging multi-view video sequences that are captured with unsynchronized and moving (e.g. mobile-phone or GoPro) cameras. Categories and Subject Descriptors (according to ACM CCS): 1.
Volume xx (200y), Number z, pp. 1–13 Outdoor Human Motion Capture by Simultaneous Optimization of Pose and Camera Parameters
"... Figure 1: Examples of multi-person tracking with moving cameras. (Left two images) two actors, and two moving and 3 static cameras (Soccer1). (Right two images) One actor, and three moving and two static cameras (Walk2). We present a method for capturing the skeletal motions of humans using a sparse ..."
Abstract
- Add to MetaCart
Figure 1: Examples of multi-person tracking with moving cameras. (Left two images) two actors, and two moving and 3 static cameras (Soccer1). (Right two images) One actor, and three moving and two static cameras (Walk2). We present a method for capturing the skeletal motions of humans using a sparse set of potentially moving cameras in an uncontrolled environment. Our approach is able to track multiple people even in front of cluttered and non-static backgrounds, and unsynchronized cameras with varying image quality and frame rate. We completely rely on optical information and do not make use of additional sensor information (e.g. depth images or inertial sensors). Our algorithm simultaneously reconstructs the skeletal pose parameters of multiple performers and the motion of each camera. This is facilitated by a new energy functional that captures the alignment of the model and the camera positions with the input videos in an analytic way. The approach can be adopted in many practical applications to replace the complex and expensive motion capture studios with few consumer-grade cameras even in uncontrolled outdoor scenes. We demonstrate this based on challenging multi-view video sequences that are captured with unsynchronized and moving (e.g. mobile-phone or GoPro) cameras. Categories and Subject Descriptors (according to ACM CCS): 1.
Test-time Adaptation for 3D Human Pose Estimation
"... Abstract. In this paper we consider the task of articulated 3D human pose esti-mation in challenging scenes with dynamic background and multiple people. Ini-tial progress on this task has been achieved building on discriminatively trained part-based models that deliver a set of 2D body pose candidat ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. In this paper we consider the task of articulated 3D human pose esti-mation in challenging scenes with dynamic background and multiple people. Ini-tial progress on this task has been achieved building on discriminatively trained part-based models that deliver a set of 2D body pose candidates that are then sub-sequently refined by reasoning in 3D [1, 4, 5]. The performance of such methods is limited by the performance of the underlying 2D pose estimation approaches. In this paper we explore a way to boost the performance of 2D pose estimation based on the output of the 3D pose reconstruction process, thus closing the loop in the pose estimation pipeline. We build our approach around a component that is able to identify true positive pose estimation hypotheses with high confidence. We then either retrain 2D pose estimation models using such highly confident hy-potheses as additional training examples, or we use similarity to these hypotheses as a cue for 2D pose estimation. We consider a number of features that can be used for assessing the confidence of the pose estimation results. The strongest feature in our comparison corresponds to the ensemble agreement on the 3D pose output.