Results 1 - 10
of
44
Deep Learning for Detecting Robotic Grasps
"... Abstract—We consider the problem of detecting robotic grasps in an RGB-D view of a scene containing objects. In this work, we apply a deep learning approach to solve this problem, which avoids time-consuming hand-design of features. This presents two main challenges. First, we need to evaluate a hug ..."
Abstract
-
Cited by 22 (6 self)
- Add to MetaCart
(Show Context)
Abstract—We consider the problem of detecting robotic grasps in an RGB-D view of a scene containing objects. In this work, we apply a deep learning approach to solve this problem, which avoids time-consuming hand-design of features. This presents two main challenges. First, we need to evaluate a huge number of candidate grasps. In order to make detection fast, as well as robust, we present a two-step cascaded structure with two deep networks, where the top detections from the first are re-evaluated by the second. The first network has fewer features, is faster to run, and can effectively prune out unlikely candidate grasps. The second, with more features, is slower but has to run only on the top few detections. Second, we need to handle multimodal inputs well, for which we present a method to apply structured regularization on the weights based on multimodal group regularization. We demonstrate that our method outperforms the previous state-of-the-art methods in robotic grasp detection. I.
Dense Visual SLAM for RGB-D Cameras
"... Abstract — In this paper, we propose a dense visual SLAM method for RGB-D cameras that minimizes both the photometric and the depth error over all pixels. In contrast to sparse, feature-based methods, this allows us to better exploit the available information in the image data which leads to higher ..."
Abstract
-
Cited by 22 (4 self)
- Add to MetaCart
(Show Context)
Abstract — In this paper, we propose a dense visual SLAM method for RGB-D cameras that minimizes both the photometric and the depth error over all pixels. In contrast to sparse, feature-based methods, this allows us to better exploit the available information in the image data which leads to higher pose accuracy. Furthermore, we propose an entropy-based similarity measure for keyframe selection and loop closure detection. From all successful matches, we build up a graph that we optimize using the g2o framework. We evaluated our approach extensively on publicly available benchmark datasets, and found that it performs well in scenes with low texture as well as low structure. In direct comparison to several stateof-the-art methods, our approach yields a significantly lower trajectory error. We release our software as open-source. I.
Real-Time Camera Tracking and 3D Reconstruction Using Signed Distance Functions
"... Abstract—The ability to quickly acquire 3D models is an essential capability needed in many disciplines including robotics, computer vision, geodesy, and architecture. In this paper we present a novel method for real-time camera tracking and 3D reconstruction of static indoor environments using an R ..."
Abstract
-
Cited by 20 (5 self)
- Add to MetaCart
(Show Context)
Abstract—The ability to quickly acquire 3D models is an essential capability needed in many disciplines including robotics, computer vision, geodesy, and architecture. In this paper we present a novel method for real-time camera tracking and 3D reconstruction of static indoor environments using an RGB-D sensor. We show that by representing the geometry with a signed distance function (SDF), the camera pose can be efficiently estimated by directly minimizing the error of the depth images on the SDF. As the SDF contains the distances to the surface for each voxel, the pose optimization can be carried out extremely fast. By iteratively estimating the camera poses and integrating the RGB-D data in the voxel grid, a detailed reconstruction of an indoor environment can be achieved. We present reconstructions of several rooms using a hand-held sensor and from onboard an autonomous quadrocopter. Our extensive evaluation on publicly available benchmark data shows that our approach is more accurate and robust than the iterated closest point algorithm (ICP) used by KinectFusion, and yields often a comparable accuracy at much higher speed to feature-based bundle adjustment methods such as RGB-D SLAM for up to medium-sized scenes. I.
Deformation-based Loop Closure for Large Scale Dense RGB-D SLAM
"... Abstract — In this paper we present a system for capturing large scale dense maps in an online setting with a low cost RGB-D sensor. Central to this work is the use of an “as-rigid-aspossible” space deformation for efficient dense map correction in a pose graph optimisation framework. By combining p ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
(Show Context)
Abstract — In this paper we present a system for capturing large scale dense maps in an online setting with a low cost RGB-D sensor. Central to this work is the use of an “as-rigid-aspossible” space deformation for efficient dense map correction in a pose graph optimisation framework. By combining pose graph optimisation with non-rigid deformation of a dense map we are able to obtain highly accurate dense maps over large scale trajectories that are both locally and globally consistent. With low latency in mind we derive an incremental method for deformation graph construction, allowing multi-million point maps to be captured over hundreds of metres in real-time. We provide benchmark results on a well established RGB-D SLAM dataset demonstrating the accuracy of the system and also provide a number of our own datasets which cover a wide range of environments, both indoors, outdoors and across multiple floors. I.
3D Mapping with an RGB-D Camera
"... Abstract—In this article we present a novel mapping system that robustly generates highly accurate 3D maps using an RGB-D camera. Our approach does not require any further sensors or odometry. With the availability of low-cost and light-weight RGB-D sensors such as the Microsoft Kinect, our approach ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
(Show Context)
Abstract—In this article we present a novel mapping system that robustly generates highly accurate 3D maps using an RGB-D camera. Our approach does not require any further sensors or odometry. With the availability of low-cost and light-weight RGB-D sensors such as the Microsoft Kinect, our approach applies to small domestic robots such as vacuum cleaners as well as flying robots such as quadrocopters. Furthermore, our system can also be used for free-hand reconstruction of detailed 3D models. In addition to the system itself, we present a thorough experimental evaluation on a publicly available benchmark dataset. We analyze and discuss the influence of several parameters such as the choice of the feature descriptor, the number of visual features, and validation methods. The results of the experiments demonstrate that our system can robustly deal with challenging scenarios such as fast cameras motions and feature-poor environments while being fast enough for online operation. Our system is fully available as open-source and has already been widely adopted by the robotics community.
Dense Scene Reconstruction with Points of Interest
"... Figure 1: Rodin’s “The Burghers of Calais, ” reconstructed from a stream of images produced by a handheld commodity range camera. The statues are 2 meters tall. We present an approach to detailed reconstruction of complex realworld scenes with a handheld commodity range sensor. The user moves the se ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Figure 1: Rodin’s “The Burghers of Calais, ” reconstructed from a stream of images produced by a handheld commodity range camera. The statues are 2 meters tall. We present an approach to detailed reconstruction of complex realworld scenes with a handheld commodity range sensor. The user moves the sensor freely through the environment and images the scene. An offline registration and integration pipeline produces a detailed scene model. To deal with the complex sensor trajectories required to produce detailed reconstructions with a consumer-grade sensor, our pipeline detects points of interest in the scene and preserves detailed geometry around them while a global optimization distributes residual registration errors through the environment. Our results demonstrate that detailed reconstructions of complex scenes can be obtained with a consumer-grade camera.
A.: Hierarchical semantic labeling for task-relevant rgb-d perception
- In: RSS (2014
"... Abstract—Semantic labeling of RGB-D scenes is very impor-tant in enabling robots to perform mobile manipulation tasks, but different tasks may require entirely different sets of labels. For example, when navigating to an object, we may need only a single label denoting its class, but to manipulate i ..."
Abstract
-
Cited by 10 (7 self)
- Add to MetaCart
(Show Context)
Abstract—Semantic labeling of RGB-D scenes is very impor-tant in enabling robots to perform mobile manipulation tasks, but different tasks may require entirely different sets of labels. For example, when navigating to an object, we may need only a single label denoting its class, but to manipulate it, we might need to identify individual parts. In this work, we present an algorithm that produces hierarchical labelings of a scene, following is-part-of and is-type-of relationships. Our model is based on a Conditional Random Field that relates pixel-wise and pair-wise observations to labels. We encode hierarchical labeling constraints into the model while keeping inference tractable. Our model thus predicts different specificities in labeling based on its confidence—if it is not sure whether an object is Pepsi or Sprite, it will predict soda rather than making an arbitrary choice. In extensive experiments, both offline on standard datasets as well as in online robotic experiments, we show that our model outperforms other state-of-the-art methods in labeling performance as well as in success rate for robotic tasks. I.
A Benchmark for RGB-D Visual Odometry, 3D Reconstruction and SLAM
"... for the evaluation of visual odometry, 3D reconstruction and SLAM algorithms that typically use RGB-D data. We present a collection of handheld RGB-D camera sequences within synthetically generated environments. RGB-D sequences with perfect ground truth poses are provided as well as a ground truth s ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
(Show Context)
for the evaluation of visual odometry, 3D reconstruction and SLAM algorithms that typically use RGB-D data. We present a collection of handheld RGB-D camera sequences within synthetically generated environments. RGB-D sequences with perfect ground truth poses are provided as well as a ground truth surface model that enables a method of quantitatively evaluating the final map or surface reconstruction accuracy. Care has been taken to simulate typically observed real-world artefacts in the synthetic imagery by modelling sensor noise in both RGB and depth data. While this dataset is useful for the evaluation of visual odometry and SLAM trajectory estimation, our main focus is on providing a method to benchmark the surface reconstruction accuracy which to date has been missing in the RGB-D community despite the plethora of ground truth RGB-D datasets available. I.
Large-Scale Multi-Resolution Surface Reconstruction from RGB-D Sequences
"... We propose a method to generate highly detailed, textured 3D models of large environments from RGB-D sequences. Our system runs in real-time on a standard desktop PC with a state-of-the-art graphics card. To reduce the memory consumption, we fuse the acquired depth maps and colors in a multi-scale o ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
(Show Context)
We propose a method to generate highly detailed, textured 3D models of large environments from RGB-D sequences. Our system runs in real-time on a standard desktop PC with a state-of-the-art graphics card. To reduce the memory consumption, we fuse the acquired depth maps and colors in a multi-scale octree representation of a signed distance function. To estimate the camera poses, we construct a pose graph and use dense image alignment to determine the relative pose between pairs of frames. We add edges between nodes when we detect loop-closures and optimize the pose graph to correct for long-term drift. Our implementation is highly parallelized on graphics hardware to achieve real-time performance. More specifically, we can reconstruct, store, and continuously update a colored 3D model of an entire corridor of nine rooms at high levels of detail in real-time on a single GPU with 2.5GB. 1.
A State of the Art Report on Kinect Sensor Setups in Computer Vision
"... Abstract. During the last three years after the launch of the Microsoft Kinect R ○ in the end-consumer market we have become witnesses of a small revolution in computer vision research towards the use of a standardized consumer-grade RGBD sensor for scene content retrieval. Beside classical localiza ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
Abstract. During the last three years after the launch of the Microsoft Kinect R ○ in the end-consumer market we have become witnesses of a small revolution in computer vision research towards the use of a standardized consumer-grade RGBD sensor for scene content retrieval. Beside classical localization and motion capturing tasks the Kinect has successfully been employed for the reconstruction of opaque and transparent objects. This report gives a comprehensive overview over the main publications using the Microsoft Kinect out of its original context as a decision-forest based motion-capturing tool. 1