Results 1 - 10
of
18
A.: SUN3D: A database of big spaces reconstructed using sfm and object labels
- In: ICCV. (2013
"... Existing scene understanding datasets contain only a limited set of views of a place, and they lack representations of complete 3D spaces. In this paper, we introduce SUN3D, a large-scale RGB-D video database with camera pose and object labels, capturing the full 3D extent of many places. The tasks ..."
Abstract
-
Cited by 17 (8 self)
- Add to MetaCart
(Show Context)
Existing scene understanding datasets contain only a limited set of views of a place, and they lack representations of complete 3D spaces. In this paper, we introduce SUN3D, a large-scale RGB-D video database with camera pose and object labels, capturing the full 3D extent of many places. The tasks that go into constructing such a dataset are diffi-cult in isolation – hand-labeling videos is painstaking, and structure from motion (SfM) is unreliable for large spaces. But if we combine them together, we make the dataset con-struction task much easier. First, we introduce an intuitive labeling tool that uses a partial reconstruction to propa-gate labels from one frame to another. Then we use the object labels to fix errors in the reconstruction. For this, we introduce a generalization of bundle adjustment that incor-porates object-to-object correspondences. This algorithm works by constraining points for the same object from dif-ferent frames to lie inside a fixed-size bounding box, pa-rameterized by its rotation and translation. The SUN3D database, the source code for the generalized bundle adjust-ment, and the web-based 3D annotation tool are all avail-able at
Playing with duality: An overview of recent primal-dual approaches for . . .
, 2014
"... Optimization methods are at the core of many problems in signal/image processing, computer vision, and machine learning. For a long time, it has been recognized that looking at the dual of an optimization problem may drastically simplify its solution. Deriving efficient strategies jointly bringing i ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Optimization methods are at the core of many problems in signal/image processing, computer vision, and machine learning. For a long time, it has been recognized that looking at the dual of an optimization problem may drastically simplify its solution. Deriving efficient strategies jointly bringing into play the primal and the dual problems is however a more recent idea which has generated many important new contributions in the last years. These novel developments are grounded on recent advances in convex analysis, discrete optimization, parallel processing, and nonsmooth optimization with emphasis on sparsity issues. In this paper, we aim at presenting the principles of primal-dual approaches, while giving an overview of numerical methods which have been proposed in different contexts. We show the benefits which can be drawn from primal-dual algorithms both for solving large-scale convex optimization problems and discrete ones, and we provide various application examples to illustrate their usefulness.
Discrete-Continuous Depth Estimation from a Single Image
"... In this paper, we tackle the problem of estimating the depth of a scene from a single image. This is a challeng-ing task, since a single image on its own does not provide any depth cue. To address this, we exploit the availability of a pool of images for which the depth is known. More specifically, ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
(Show Context)
In this paper, we tackle the problem of estimating the depth of a scene from a single image. This is a challeng-ing task, since a single image on its own does not provide any depth cue. To address this, we exploit the availability of a pool of images for which the depth is known. More specifically, we formulate monocular depth estimation as a discrete-continuous optimization problem, where the con-tinuous variables encode the depth of the superpixels in the input image, and the discrete ones represent relation-ships between neighboring superpixels. The solution to this discrete-continuous optimization problem is then obtained by performing inference in a graphical model using parti-cle belief propagation. The unary potentials in this graph-ical model are computed by making use of the images with known depth. We demonstrate the effectiveness of our model in both the indoor and outdoor scenarios. Our experimen-tal evaluation shows that our depth estimates are more ac-curate than existing methods on standard datasets. 1.
SemanticPaint: Interactive 3D Labeling and Learning at your Fingertips
- ACM TOG
, 2015
"... We present a new interactive and online approach to 3D scene understand-ing. Our system, SemanticPaint, allows users to simultaneously scan their environment, whilst interactively segmenting the scene simply by reaching out and touching any desired object or surface. Our system continuously learns f ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
We present a new interactive and online approach to 3D scene understand-ing. Our system, SemanticPaint, allows users to simultaneously scan their environment, whilst interactively segmenting the scene simply by reaching out and touching any desired object or surface. Our system continuously learns from these segmentations, and labels new unseen parts of the envi-ronment. Unlike offline systems, where capture, labeling and batch learning often takes hours or even days to perform, our approach is fully online. This provides users with continuous live feedback of the recognition during capture, allowing them to immediately correct errors in the segmentation and/or learning – a feature that has so far been unavailable to batch and offline methods. This leads to models that are tailored or personalized specif-ically to the user’s environments and object classes of interest, opening up the potential for new applications in augmented reality, interior design, and human/robot navigation. It also provides the ability to capture substantial labeled 3D datasets for training large-scale visual recognition systems.
Joint Semantic Segmentation and 3D Reconstruction from Monocular Video
"... Abstract. We present an approach for joint inference of 3D scene struc-ture and semantic labeling for monocular video. Starting with monocular image stream, our framework produces a 3D volumetric semantic + occu-pancy map, which is much more useful than a series of 2D semantic label images or a spar ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Abstract. We present an approach for joint inference of 3D scene struc-ture and semantic labeling for monocular video. Starting with monocular image stream, our framework produces a 3D volumetric semantic + occu-pancy map, which is much more useful than a series of 2D semantic label images or a sparse point cloud produced by traditional semantic segmen-tation and Structure from Motion(SfM) pipelines respectively. We derive a Conditional Random Field (CRF) model defined in the 3D space, that jointly infers the semantic category and occupancy for each voxel. Such a joint inference in the 3D CRF paves the way for more informed priors and constraints, which is otherwise not possible if solved separately in their traditional frameworks. We make use of class specific semantic cues that constrain the 3D structure in areas, where multiview constraints are weak. Our model comprises of higher order factors, which helps when the depth is unobservable. We also make use of class specific semantic cues to reduce either the degree of such higher order factors, or to approximately model them with unaries if possible. We demonstrate improved 3D struc-ture and temporally consistent semantic segmentation for difficult, large scale, forward moving monocular image sequences. Fig. 1. Overview of our system. From monocular image sequence, we first obtain 2D semantic segmentation, sparse 3D reconstruction and camera poses. We then build a volumetric 3D map which depicts both 3D structure and semantic labels. 1
Class specific 3d object shape priors using surface normals
- In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on
, 2014
"... Dense 3D reconstruction of real world objects contain-ing textureless, reflective and specular parts is a challeng-ing task. Using general smoothness priors such as surface area regularization can lead to defects in the form of discon-nected parts or unwanted indentations. We argue that this problem ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
Dense 3D reconstruction of real world objects contain-ing textureless, reflective and specular parts is a challeng-ing task. Using general smoothness priors such as surface area regularization can lead to defects in the form of discon-nected parts or unwanted indentations. We argue that this problem can be solved by exploiting the object class specific local surface orientations, e.g. a car is always close to hor-izontal in the roof area. Therefore, we formulate an object class specific shape prior in the form of spatially varying anisotropic smoothness terms. The parameters of the shape prior are extracted from training data. We detail how our shape prior formulation directly fits into recently proposed volumetric multi-label reconstruction approaches. This al-lows a segmentation between the object and its supporting ground. In our experimental evaluation we show recon-structions using our trained shape prior on several chal-lenging datasets. 1.
Shape anchors for data-driven multi-view reconstruction
- In ICCV, 2013. 8
"... We present a data-driven method for building dense 3D reconstructions using a combination of recognition and multi-view cues. Our approach is based on the idea that there are image patches that are so distinctive that we can accurately estimate their latent 3D shapes solely using recognition. We cal ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
We present a data-driven method for building dense 3D reconstructions using a combination of recognition and multi-view cues. Our approach is based on the idea that there are image patches that are so distinctive that we can accurately estimate their latent 3D shapes solely using recognition. We call these patches shape anchors, and we use them as the basis of a multi-view reconstruction system that transfers dense, complex geometry between scenes. We “anchor ” our 3D interpretation from these patches, using them to predict geometry for parts of the scene that are rel-atively ambiguous. The resulting algorithm produces dense reconstructions from stereo point clouds that are sparse and noisy, and we demonstrate it on a challenging dataset of real-world, indoor scenes. 1.
Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction
"... Dense semantic 3D reconstruction is typically formu-lated as a discrete or continuous problem over label assign-ments in a voxel grid, combining semantic and depth like-lihoods in a Markov Random Field framework. The depth and semantic information is incorporated as a unary po-tential, smoothed by a ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Dense semantic 3D reconstruction is typically formu-lated as a discrete or continuous problem over label assign-ments in a voxel grid, combining semantic and depth like-lihoods in a Markov Random Field framework. The depth and semantic information is incorporated as a unary po-tential, smoothed by a pairwise regularizer. However, mod-elling likelihoods as a unary potential does not model the problem correctly leading to various undesirable visibility artifacts. We propose to formulate an optimization problem that di-rectly optimizes the reprojection error of the 3D model with respect to the image estimates, which corresponds to the op-timization over rays, where the cost function depends on the semantic class and depth of the first occupied voxel along the ray. The 2-label formulation is made feasible by trans-forming it into a graph-representable form under QPBO re-laxation, solvable using graph cut. The multi-label prob-lem is solved by applying α-expansion using the same re-laxation in each expansion move. Our method was indeed shown to be feasible in practice, running comparably fast to the competing methods, while not suffering from ray po-tential approximation artifacts. 1.
Towards Unified Depth and Semantic Prediction from a Single Image
"... Depth estimation and semantic segmentation are two fundamental problems in image understanding. While the two tasks are strongly correlated and mutually beneficial, they are usually solved separately or sequentially. Moti-vated by the complementary properties of the two tasks, we propose a unified f ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Depth estimation and semantic segmentation are two fundamental problems in image understanding. While the two tasks are strongly correlated and mutually beneficial, they are usually solved separately or sequentially. Moti-vated by the complementary properties of the two tasks, we propose a unified framework for joint depth and semantic prediction. Given an image, we first use a trained Convo-lutional Neural Network (CNN) to jointly predict a global layout composed of pixel-wise depth values and semantic labels. By allowing for interactions between the depth and semantic information, the joint network provides more ac-curate depth prediction than a state-of-the-art CNN trained solely for depth prediction [6]. To further obtain fine-level details, the image is decomposed into local segments for region-level depth and semantic prediction under the guid-ance of global layout. Utilizing the pixel-wise global pre-diction and region-wise local prediction, we formulate the inference problem in a two-layer Hierarchical Conditional Random Field (HCRF) to produce the final depth and se-mantic map. As demonstrated in the experiments, our ap-proach effectively leverages the advantages of both tasks and provides the state-of-the-art results. 1.
The Semantic Paintbrush: Interactive 3D Mapping and Recognition in Large Outdoor Spaces
"... Figure 1: (a) Our system comprises of an off-the-shelf pair of optical see-through glasses, with additional stereo RGB-Infrared cameras, and an additional handheld infrared/visible light laser. (b) The passive stereo cameras are used for extended range and outdoor depth estimation. (c) The user can ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Figure 1: (a) Our system comprises of an off-the-shelf pair of optical see-through glasses, with additional stereo RGB-Infrared cameras, and an additional handheld infrared/visible light laser. (b) The passive stereo cameras are used for extended range and outdoor depth estimation. (c) The user can see these reconstructions immediately using the heads-up display, and can use a laser pointer to draw onto the 3D world to semantically segment objects (once segmented these labels will propagate to new parts of the scene). (d) The laser pointer can also be triangulated precisely in the stereo infrared images allowing for interactive ‘cleaning up’ of the model during capture. (e) Final output, the semantic map of the scene. We present an augmented reality system for large scale 3D reconstruction and recognition in outdoor scenes. Unlike exist-ing prior work, which tries to reconstruct scenes using active depth cameras, we use a purely passive stereo setup, allowing for outdoor use and extended sensing range. Our system not only produces a map of the 3D environment in real-time, it also allows the user to draw (or ‘paint’) with a laser pointer directly onto the reconstruction to segment the model into ob-