Results 1 - 10
of
31
Learning Discriminative Representations from RGB-D Video Data
- PROCEEDINGS OF THE TWENTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE
, 2013
"... Recently, the low-cost Microsoft Kinect sensor, which can capture real-time high-resolution RGB and depth visual information, has attracted increasing attentions for a wide range of applications in computer vision. Existing techniques extract hand-tuned features from the RGB and the depth data separ ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
Recently, the low-cost Microsoft Kinect sensor, which can capture real-time high-resolution RGB and depth visual information, has attracted increasing attentions for a wide range of applications in computer vision. Existing techniques extract hand-tuned features from the RGB and the depth data separately and heuristically fuse them, which would not fully exploit the complementarity of both data sources. In this paper, we introduce an adaptive learning methodology to automatically extract (holistic) spatio-temporal features, simultaneously fusing the RGB and depth information, from RGB-D video data for visual recognition tasks. We address this as an optimization problem using our proposed restricted graph-based genetic programming (RGGP) approach, in which a group of primitive 3D operators are first randomly assembled as graph-based combinations and then evolved generation by generation by evaluating on a set of RGB-D video samples. Finally the best-performed combination is selected as the (near-)optimal representation for a pre-defined task. The proposed method is systematically evaluated on a new hand gesture dataset, SKIG, that we collected ourselves and the public MSRDailyActivity3D dataset, respectively. Extensive experimental results show that our approach leads to significant advantages compared with state-of-the-art hand-crafted and machine-learned features.
A State of the Art Report on Kinect Sensor Setups in Computer Vision
"... Abstract. During the last three years after the launch of the Microsoft Kinect R ○ in the end-consumer market we have become witnesses of a small revolution in computer vision research towards the use of a standardized consumer-grade RGBD sensor for scene content retrieval. Beside classical localiza ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
Abstract. During the last three years after the launch of the Microsoft Kinect R ○ in the end-consumer market we have become witnesses of a small revolution in computer vision research towards the use of a standardized consumer-grade RGBD sensor for scene content retrieval. Beside classical localization and motion capturing tasks the Kinect has successfully been employed for the reconstruction of opaque and transparent objects. This report gives a comprehensive overview over the main publications using the Microsoft Kinect out of its original context as a decision-forest based motion-capturing tool. 1
Leveraging Hierarchical Parametric Networks for Skeletal Joints Based Action Segmentation and Recognition
"... Over the last few years, with the immense popularity of the Kinect, there has been renewed interest in develop-ing methods for human gesture and action recognition from 3D skeletal data. A number of approaches have been pro-posed to extract representative features from 3D skeletal data, most commonl ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
Over the last few years, with the immense popularity of the Kinect, there has been renewed interest in develop-ing methods for human gesture and action recognition from 3D skeletal data. A number of approaches have been pro-posed to extract representative features from 3D skeletal data, most commonly hard wired geometric or bio-inspired shape context features. We propose a hierarchial dynamic framework that first extracts high level skeletal joints fea-tures and then uses the learned representation for estimat-ing emission probability to infer action sequences. Current-ly gaussian mixture models are the dominant technique for modeling the emission distribution of hidden Markov mod-els. We show that better action recognition using skele-tal features can be achieved by replacing gaussian mixture models by deep neural networks that contain many layers of features to predict probability distributions over states of hidden Markov models. The framework can be easily ex-tended to include a ergodic state to segment and recognize actions simultaneously. 1.
Real-Time Multiple Human Perception with Color-Depth Cameras on a Mobile Robot
"... Abstract—The ability to perceive humans is an essential re-quirement for safe and efficient human-robot interaction. In real-world applications, the need for a robot to interact in real time with multiple humans in a dynamic, 3-D environment presents a significant challenge. The recent availability ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
(Show Context)
Abstract—The ability to perceive humans is an essential re-quirement for safe and efficient human-robot interaction. In real-world applications, the need for a robot to interact in real time with multiple humans in a dynamic, 3-D environment presents a significant challenge. The recent availability of commercial color-depth cameras allow for the creation of a system that makes use of the depth dimension, thus enabling a robot to observe its environment and perceive in the 3-D space. Here we present a system for 3-D multiple human perception in real time from a moving robot equipped with a color-depth camera and a consumer-grade computer. Our approach reduces computation time to achieve real-time performance through a unique combination of new ideas and established techniques. We remove the ground and ceiling planes from the 3-D point cloud input to separate candidate point clusters. We introduce the novel information concept, depth of interest, which we use to identify candidates for detection, and that avoids the computationally expensive scanning-window methods of other approaches. We utilize a cascade of detectors to distinguish humans from objects, in which we make intelligent reuse of intermediary features in successive detectors to improve computation. Because of the high computational cost of some methods, we represent our candidate tracking algorithm with a decision directed acyclic graph, which allows us to use the most computationally intense techniques only where necessary. We detail the successful implementation of our novel approach on a mobile robot and examine its performance in scenarios with real-world challenges, including occlusion, robot motion, nonupright humans, humans leaving and reentering the field of view (i.e., the reidentification challenge), human-object and human-human interaction. We conclude with the observation that the incorporation of the depth information, together with the use of modern techniques in new ways, we are able to create an accurate system for real-time 3-D perception of humans by a mobile robot. Index Terms—3-D vision, depth of interest, human detection and tracking, human perception, RGB-D camera application.
3d reconstruction of body parts using rgb-d sensors: Challenges from a biomedical perspective
- in the Proceedings of the 5th International Conference and Exhibition on 3D Body Scanning Technologies (accepted
, 2014
"... The patient 3D model reconstruction plays an important role in applications such as surgery planning or computer-aided prosthesis design systems. Common methods use either expensive devices or require expert personnel which are not available in every clinic. Thus to make patient-specific modelling m ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
The patient 3D model reconstruction plays an important role in applications such as surgery planning or computer-aided prosthesis design systems. Common methods use either expensive devices or require expert personnel which are not available in every clinic. Thus to make patient-specific modelling more versatile, it is required to develop efficient methods together with feasible devices. Body parts such as head and torso present valid challenges with different degrees of complexity, especially because of the absence of relevant and abundant features. Considering Microsoft Kinect, it is a low-cost and widely available sensor, which has been successfully applied in medical applications. Since single depth-map acquired by Kinect is often incomplete and noisy, different approaches have been proposed to perform the reconstruction by merging multiple depth-maps, by registering single view point clouds generated form each point cloud. As human body is a non-rigid model, most of previous reconstruction methods using Kinect fail to perform accurate reconstruction since they do not address non-rigid surfaces. In this paper we present the challenges of using low-cost RGB-D sensors to reconstruct human body. Additionally, we analysed coarse registration stage to understand its impact on the quality of
Free-viewpoint Video of Human Actors using Multiple Handheld Kinects
- IEEE T-SMC:B SPECIAL ISSUE ON COMPUTER VISION FOR RGB-D SENSORS: KINECT AND ITS APPLICATIONS
"... We present an algorithm for creating free-viewpoint video of interacting humans using three hand-held Kinect cameras. Our method reconstructs deforming surface geometry and temporal varying texture of humans through estimation of human poses and camera poses for every time step of the RGBZ video. S ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We present an algorithm for creating free-viewpoint video of interacting humans using three hand-held Kinect cameras. Our method reconstructs deforming surface geometry and temporal varying texture of humans through estimation of human poses and camera poses for every time step of the RGBZ video. Skeletal configurations and camera poses are found by solving a joint energy minimization problem which optimizes the alignment of RGBZ data from all cameras, as well as the alignment of human shape templates to the Kinect data. The energy function is based on a combination of geometric correspondence finding, implicit scene segmentation, and correspondence finding using image features. Finally, texture recovery is achieved through jointly optimization on spatio-temporal RGB data using matrix completion. As opposed to previous methods, our algorithm succeeds on free-viewpoint video of human actors under general uncontrolled indoor scenes with potentially dynamic background, and it succeeds even if the cameras are moving.
Regularity Guaranteed Human Pose Correction
- In Computer Vision—ACCV 2014
, 2014
"... Abstract. Benefited from the advantages provided by depth sensors, 3D human pose estimation has become feasible. However, the curren-t estimation systems usually yield poor results due to severe occlusion and sensor noise in depth data. In this paper, we focus on a post-process step, pose correction ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract. Benefited from the advantages provided by depth sensors, 3D human pose estimation has become feasible. However, the curren-t estimation systems usually yield poor results due to severe occlusion and sensor noise in depth data. In this paper, we focus on a post-process step, pose correction, which takes the initial estimated poses as the input and deliver more reliable results. Although the regression based correc-tion approach [1] has shown its effectiveness in decreasing the estimated errors, it cannot guarantee the regularity of corrected poses. To address this issue, we formulate pose correction as an optimization problem, which combines the output of the regression model with a pose prior model learned on a pre-captured motion data set. By considering the complexity and the geometric property of the pose data, the pose pri-or is estimated by von Mises-Fisher distributions in subspaces following divide-and-conquer strategies. By introducing the pose prior into our op-timization framework, the regularity of the corrected poses is guaranteed. The experimental results on a challenging data set demonstrate that the proposed pose correction approach not only improves the accuracy, but also outputs more regular poses, compared to the-state-of-the-art. 1
Citation:
, 2015
"... (2015) Self-organizing neural integration of pose-motion features for human action recognition. Front. Neurorobot. 9:3. doi: 10.3389/fnbot.2015.00003 Self-organizing neural integration of pose-motion features for human action recognition ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
(2015) Self-organizing neural integration of pose-motion features for human action recognition. Front. Neurorobot. 9:3. doi: 10.3389/fnbot.2015.00003 Self-organizing neural integration of pose-motion features for human action recognition