• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Indoor segmentation and support inference from rgbd images. In: ECCV. (2012)

by N Silberman, D Hoiem, P Kohli, R Fergus
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 159
Next 10 →

Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images

by Saurabh Gupta, Pablo Arbeláez, Jitendra Malik
"... We address the problems of contour detection, bottomup grouping and semantic segmentation using RGB-D data. We focus on the challenging setting of cluttered indoor scenes, and evaluate our approach on the recently introduced NYU-Depth V2 (NYUD2) dataset [27]. We propose algorithms for object boundar ..."
Abstract - Cited by 48 (3 self) - Add to MetaCart
We address the problems of contour detection, bottomup grouping and semantic segmentation using RGB-D data. We focus on the challenging setting of cluttered indoor scenes, and evaluate our approach on the recently introduced NYU-Depth V2 (NYUD2) dataset [27]. We propose algorithms for object boundary detection and hierarchical segmentation that generalize the gP b − ucm approach of [2] by making effective use of depth information. We show that our system can label each contour with its type (depth, normal or albedo). We also propose a generic method for long-range amodal completion of surfaces and show its effectiveness in grouping. We then turn to the problem of semantic segmentation and propose a simple approach that classifies superpixels into the 40 dominant object categories in NYUD2. We use both generic and class-specific features to encode the appearance and geometry of objects. We also show how our approach can be used for scene classification, and how this contextual information in turn improves object recognition. In all of these tasks, we report significant improvements over the state-of-the-art. 1.
(Show Context)

Citation Context

... grouping and semantic segmentation using RGB-D data. We focus on the challenging setting of cluttered indoor scenes, and evaluate our approach on the recently introduced NYU-Depth V2 (NYUD2) dataset =-=[27]-=-. We propose algorithms for object boundary detection and hierarchical segmentation that generalize the gP b − ucm approach of [2] by making effective use of depth information. We show that our system...

Microsoft COCO: Common Objects in Context

by Tsung-yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick
"... Abstract. We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understand-ing. This is achieved by gathering images of complex everyday scenes containing common obj ..."
Abstract - Cited by 43 (3 self) - Add to MetaCart
Abstract. We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understand-ing. This is achieved by gathering images of complex everyday scenes containing common objects in their natural context. Objects are labeled using per-instance segmentations to aid in precise object localization. Our dataset contains photos of 91 objects types that would be easily recognizable by a 4 year old. With a total of 2.5 million labeled in-stances in 328k images, the creation of our dataset drew upon extensive crowd worker involvement via novel user interfaces for category detec-tion, instance spotting and instance segmentation. We present a detailed statistical analysis of the dataset in comparison to PASCAL, ImageNet, and SUN. Finally, we provide baseline performance analysis for bounding box and segmentation detection results using a Deformable Parts Model. 1
(Show Context)

Citation Context

... in both object classification and detection research [5,6,7]. The community has also created datasets containing object attributes [8], scene attributes [9], keypoints [10], and 3D scene information =-=[11]-=-. This leads us to the obvious question: what datasets will best continue our advance towards our ultimate goal of scene understanding? We introduce a new large-scale dataset that addresses three core...

Fully convolutional networks for semantic segmentation

by Jonathan Long, Evan Shelhamer, Trevor Darrell , 2014
"... Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolu-tional networks by themselves, trained end-to-end, pixels-to-pixels, exceed the state-of-the-art in semantic segmen-tation. Our key insight is to build “fully convolutional” networks that take ..."
Abstract - Cited by 37 (0 self) - Add to MetaCart
Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolu-tional networks by themselves, trained end-to-end, pixels-to-pixels, exceed the state-of-the-art in semantic segmen-tation. Our key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolu-tional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet [17], the VGG net [28], and GoogLeNet [29]) into fully convolu-tional networks and transfer their learned representations by fine-tuning [2] to the segmentation task. We then de-fine a novel architecture that combines semantic informa-tion from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional network achieves state-of-the-art segmentation of PASCAL VOC (20 % rela-tive improvement to 62.2 % mean IU on 2012), NYUDv2, and SIFT Flow, while inference takes one third of a second for a typical image. 1.
(Show Context)

Citation Context

...e PASCAL VOC 2011 and 2012 test sets, and reduces inference time. mean IU mean IU inference VOC2011 test VOC2012 test time (s) R-CNN [10] 47.9 - - SDS [14] 52.6 51.6 ∼ 50 FCN-8s 62.7 62.2 ∼ 13 NYUDv2 =-=[27]-=- is an RGB-D dataset collected using the Microsoft Kinect. It has 1449 RGB-D images, with pixelwise labels that have been coalesced into a 40 class semantic segmentation task by Gupta et al. [11]. We ...

Enhanced computer vision with microsoft kinect sensor: A review

by Jungong Han, Ling Shao, Dong Xu, Jamie Shotton - IEEE TRANSACTIONS ON CYBERNETICS , 2013
"... With the invention of the low-cost Microsoft Kinect sensor, high-resolution depth and visual (RGB) sensing has become available for widespread use. The complementary nature of the depth and visual information provided by the Kinect sensor opens up new opportunities to solve fundamental problems in ..."
Abstract - Cited by 31 (2 self) - Add to MetaCart
With the invention of the low-cost Microsoft Kinect sensor, high-resolution depth and visual (RGB) sensing has become available for widespread use. The complementary nature of the depth and visual information provided by the Kinect sensor opens up new opportunities to solve fundamental problems in computer vision. This paper presents a comprehensive review of recent Kinect-based computer vision algorithms and applications. The reviewed approaches are classified according to the type of vision problems that can be addressed or enhanced by means of the Kinect sensor. The covered topics include preprocessing, object tracking and recognition, human activity analysis, hand gesture analysis, and indoor 3-D mapping. For each category of methods, we outline their main algorithmic contributions and summarize their advantages/differences compared to their RGB counterparts. Finally, we give an overview of the challenges in this field and future research trends. This paper is expected to serve as a tutorial and source of references for Kinect-based computer vision researchers.
(Show Context)

Citation Context

...t by matching the extracted features. The general conclusion is that the more information 1324 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 43, NO. 5, OCTOBER 2013 Fig. 7. Image samples from the dataset in =-=[44]-=-. From the left to the right: RGB images, depth images and annotated object regions with labels. the algorithm uses the better discriminative capability of the algorithm. In other words, increasing th...

Learning Rich Features from RGB-D Images for Object Detection and Segmentation: Supplementary Material

by Saurabh Gupta, Ross Girshick, Pablo Arbeláez, Jitendra Malik
"... In this subsection, we present the Precision Recall curves on the NYUD2 test set, comparing the output from our object detectors with that from RGB DPMs [1], ..."
Abstract - Cited by 31 (2 self) - Add to MetaCart
In this subsection, we present the Precision Recall curves on the NYUD2 test set, comparing the output from our object detectors with that from RGB DPMs [1],
(Show Context)

Citation Context

...mentation system we proposed in [18]. This approach obtains state-of-the-art results for that task, as well. 1.1 Related Work Most prior work on RGB-D perception has focussed on semantic segmentation =-=[3,18,23,30,33]-=-, i.e. the task of assigning a category label to each pixel. WhileLearning Rich Features from RGB-D Images for Detection and Segmentation 3 this is an interesting problem, many practical applications...

Hallucinated Humans as the Hidden Context for Labeling 3D Scenes

by Yun Jiang, Ashutosh Saxena, Environment Orison Swett Marden
"... For scene understanding, one popular approach has been to model the object-object relationships. In this paper, we hypothesize that such relationships are only an artifact of certain hidden factors, such as humans. For example, the objects, monitor and keyboard, are strongly spatially correlated onl ..."
Abstract - Cited by 30 (15 self) - Add to MetaCart
For scene understanding, one popular approach has been to model the object-object relationships. In this paper, we hypothesize that such relationships are only an artifact of certain hidden factors, such as humans. For example, the objects, monitor and keyboard, are strongly spatially correlated only because a human types on the keyboard while watching the monitor. Our goal is to learn this hidden human context (i.e., the human-object relationships), and also use it as a cue for labeling the scenes. We present Infinite Factored Topic Model (IFTM), where we consider a scene as being generated from two types of topics: human configurations and human-object relationships. This enables our algorithm to hallucinate the possible configurations of the humans in the scene parsimoniously. Given only a dataset of scenes containing objects but not humans, we show that our algorithm can recover the human object relationships. We then test our algorithm on the task of attribute and object labeling in 3D scenes and show consistent improvements over the state-of-the-art.
(Show Context)

Citation Context

...bjects [19]. In the past, 3D layout or depths have been used for improving object detection (e.g., [26, 27, 11, 21, 12, 22]), where an approximate 3D geometry is inferred from 2D images. Recent works =-=[31, 19, 2, 28]-=- address the problem of labeling 3D point clouds. Reasoning in 3D allows an algorithm to capture stronger context, such as shape, stability and orientation of the objects [15, 13, 16]. However, none o...

Indoor semantic segmentation using depth information

by Camille Couprie, Laurent Najman Yann Lecun, Rueil Malmaison, Universite ́ Paris-est - ICLR
"... This work addresses multi-class segmentation of indoor scenes with RGB-D in-puts. While this area of research has gained much attention recently, most works still rely on hand-crafted features. In contrast, we apply a multiscale convolutional network to learn features directly from the images and th ..."
Abstract - Cited by 28 (3 self) - Add to MetaCart
This work addresses multi-class segmentation of indoor scenes with RGB-D in-puts. While this area of research has gained much attention recently, most works still rely on hand-crafted features. In contrast, we apply a multiscale convolutional network to learn features directly from the images and the depth information. We obtain state-of-the-art on the NYU-v2 depth dataset with an accuracy of 64.5%. We illustrate the labeling of indoor scenes in videos sequences that could be pro-cessed in real-time using appropriate hardware such as an FPGA. 1
(Show Context)

Citation Context

...sing elaborate kernel descriptors and a post-processing step that employs gPb superpixels MRFs, involving large computation times. A second version of the NYU depth dataset was released more recently =-=[23]-=-, and improves the labels categorization into 894 different object classes. Furthermore, the size of the dataset did also increase, it now contains hundreds of video sequences (407024 frames) acquired...

Depth map prediction from a single image using a multi-scale deep network

by David Eigen, Christian Puhrsch, Rob Fergus - NIPS
"... Predicting depth is an essential component in understanding the 3D geometry of a scene. While for stereo images local correspondence suffices for estimation, finding depth relations from a single image is less straightforward, requiring in-tegration of both global and local information from various ..."
Abstract - Cited by 26 (2 self) - Add to MetaCart
Predicting depth is an essential component in understanding the 3D geometry of a scene. While for stereo images local correspondence suffices for estimation, finding depth relations from a single image is less straightforward, requiring in-tegration of both global and local information from various cues. Moreover, the task is inherently ambiguous, with a large source of uncertainty coming from the overall scale. In this paper, we present a new method that addresses this task by employing two deep network stacks: one that makes a coarse global prediction based on the entire image, and another that refines this prediction locally. We also apply a scale-invariant error to help measure depth relations rather than scale. By leveraging the raw datasets as large sources of training data, our method achieves state-of-the-art results on both NYU Depth and KITTI, and matches detailed depth boundaries without the need for superpixelation. 1
(Show Context)

Citation Context

...rstanding geometric relations within a scene. In turn, such relations help provide richer representations of objects and their environment, often leading to improvements in existing recognition tasks =-=[18]-=-, as well as enabling many further applications such as 3D modeling [16, 6], physics and support models [18], robotics [4, 14], and potentially reasoning about occlusions. While there is much prior wo...

3D-based reasoning with blocks, support, and stability

by Zhaoyin Jia, Andrew Gallagher, Ashutosh Saxena, Tsuhan Chen - IN CVPR , 2013
"... 3D volumetric reasoning is important for truly understanding a scene. Humans are able to both segment each object in an image, and perceive a rich 3D interpretation of the scene, e.g., the space an object occupies, which objects support other objects, and which objects would, if moved, cause other o ..."
Abstract - Cited by 23 (5 self) - Add to MetaCart
3D volumetric reasoning is important for truly understanding a scene. Humans are able to both segment each object in an image, and perceive a rich 3D interpretation of the scene, e.g., the space an object occupies, which objects support other objects, and which objects would, if moved, cause other objects to fall. We propose a new approach for parsing RGB-D images using 3D block units for volumetric reasoning. The algorithm fits image segments with 3D blocks, and iteratively evaluates the scene based on block interaction properties. We produce a 3D representation of the scene based on jointly optimizing over segmentations, block fitting, supporting relations, and object stability. Our algorithm incorporates the intuition that a good 3D representation of the scene is the one that fits the data well, and is a stable, self-supporting (i.e., one that does not topple) arrangement of objects. We experiment on several datasets including controlled and real indoor scenarios. Results show that our stability-reasoning framework improves RGB-D segmentation and scene volumetric representation.
(Show Context)

Citation Context

...e plausible segmentations and block arrangements (see Fig. 1d). The challenge is that objects can be arranged in complicated configurations. While some recent work considers notions of support (e.g., =-=[9, 14, 23]-=-), they are limited to single support or isolated objects on a flat surface. Thus, these methods do not apply to more complicated stacking arrangements of objects that can occur, for example, on (b) (...

A.: SUN3D: A database of big spaces reconstructed using sfm and object labels

by Jianxiong Xiao, Andrew Owens, Antonio Torralba - In: ICCV. (2013
"... Existing scene understanding datasets contain only a limited set of views of a place, and they lack representations of complete 3D spaces. In this paper, we introduce SUN3D, a large-scale RGB-D video database with camera pose and object labels, capturing the full 3D extent of many places. The tasks ..."
Abstract - Cited by 17 (8 self) - Add to MetaCart
Existing scene understanding datasets contain only a limited set of views of a place, and they lack representations of complete 3D spaces. In this paper, we introduce SUN3D, a large-scale RGB-D video database with camera pose and object labels, capturing the full 3D extent of many places. The tasks that go into constructing such a dataset are diffi-cult in isolation – hand-labeling videos is painstaking, and structure from motion (SfM) is unreliable for large spaces. But if we combine them together, we make the dataset con-struction task much easier. First, we introduce an intuitive labeling tool that uses a partial reconstruction to propa-gate labels from one frame to another. Then we use the object labels to fix errors in the reconstruction. For this, we introduce a generalization of bundle adjustment that incor-porates object-to-object correspondences. This algorithm works by constraining points for the same object from dif-ferent frames to lie inside a fixed-size bounding box, pa-rameterized by its rotation and translation. The SUN3D database, the source code for the generalized bundle adjust-ment, and the web-based 3D annotation tool are all avail-able at
(Show Context)

Citation Context

... scene recognition datasets to incorporate 3D. For example, [13] is an evolution of popular 2D object datasets such as Caltech 101 [5] to 3D objects captured by an RGB-D camera. The NYU Depth dataset =-=[20]-=- and others [11, 12, 2] go beyond objects by capturing RGB-D videos of scenes and labeling the objects within. However, these 3D datasets inherit many of the limitations of traditional 2D datasets: th...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University