Results 1 - 10
of
94
Indoor segmentation and support inference from RGBD images
- ECCV
, 2012
"... We present an approach to interpret the major surfaces, objects, and support relations of an indoor scene from an RGBD image. Most existing work ignores physical interactions or is applied only to tidy rooms and hallways. Our goal is to parse typical, often messy, indoor scenes into floor, walls, s ..."
Abstract
-
Cited by 159 (9 self)
- Add to MetaCart
(Show Context)
We present an approach to interpret the major surfaces, objects, and support relations of an indoor scene from an RGBD image. Most existing work ignores physical interactions or is applied only to tidy rooms and hallways. Our goal is to parse typical, often messy, indoor scenes into floor, walls, supporting surfaces, and object regions, and to recover support relationships. One of our main interests is to better understand how 3D cues can best inform a structured 3D interpretation. We also contribute a novel integer programming formulation to infer physical support relations. We offer a new dataset of 1449 RGBD images, capturing 464 diverse indoor scenes, with detailed annotations. Our experiments demonstrate our ability to infer support relations in complex scenes and verify that our 3D scene cues and inferred support lead to better object segmentation.
Recovering the Spatial Layout of Cluttered Rooms
"... In this paper, we consider the problem of recovering the spatial layout of indoor scenes from monocular images. The presence of clutter is a major problem for existing singleview 3D reconstruction algorithms, most of which rely on finding the ground-wall boundary. In most rooms, this boundary is par ..."
Abstract
-
Cited by 114 (7 self)
- Add to MetaCart
(Show Context)
In this paper, we consider the problem of recovering the spatial layout of indoor scenes from monocular images. The presence of clutter is a major problem for existing singleview 3D reconstruction algorithms, most of which rely on finding the ground-wall boundary. In most rooms, this boundary is partially or entirely occluded. We gain robustness to clutter by modeling the global room space with a parameteric 3D “box ” and by iteratively localizing clutter and refitting the box. To fit the box, we introduce a structured learning algorithm that chooses the set of parameters to minimize error, based on global perspective cues. On a dataset of 308 images, we demonstrate the ability of our algorithm to recover spatial layout in cluttered rooms and show several examples of estimated free space. 1.
Thinking inside the box: Using appearance models and context based on room geometry
- In ECCV
, 2010
"... Abstract. In this paper we show that a geometric representation of an object occurring in indoor scenes, along with rich scene structure can be used to produce a detector for that object in a single image. Using perspective cues from the global scene geometry, we first develop a 3D based object dete ..."
Abstract
-
Cited by 87 (3 self)
- Add to MetaCart
Abstract. In this paper we show that a geometric representation of an object occurring in indoor scenes, along with rich scene structure can be used to produce a detector for that object in a single image. Using perspective cues from the global scene geometry, we first develop a 3D based object detector. This detector is competitive with an image based detector built using state-of-the-art methods; however, combining the two produces a notably improved detector, because it unifies contextual and geometric information. We then use a probabilistic model that explicitly uses constraints imposed by spatial layout – the locations of walls and floor in the image – to refine the 3D object estimates. We use an existing approach to compute spatial layout [1], and use constraints such as objects are supported by floor and can not stick through the walls. The resulting detector (a) has significantly improved accuracy when compared to the state-of-the-art 2D detectors and (b) gives a 3D interpretation of the location of the object, derived from a 2D image. We evaluate the detector on beds, for which we give extensive quantitative results derived from images of real scenes. 1
Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces
"... There has been a recent push in extraction of 3D spatial layout of scenes. However, none of these approaches model the 3D interaction between objects and the spatial layout. In this paper, we argue for a parametric representation of objects in 3D, which allows us to incorporate volumetric constraint ..."
Abstract
-
Cited by 78 (11 self)
- Add to MetaCart
(Show Context)
There has been a recent push in extraction of 3D spatial layout of scenes. However, none of these approaches model the 3D interaction between objects and the spatial layout. In this paper, we argue for a parametric representation of objects in 3D, which allows us to incorporate volumetric constraints of the physical world. We show that augmenting current structured prediction techniques with volumetric reasoning significantly improves the performance of the state-of-the-art. 1
From 3D Scene Geometry to Human Workspace
"... We present a human-centric paradigm for scene understanding. Our approach goes beyond estimating 3D scene geometry and predicts the “workspace ” of a human which is represented by a data-driven vocabulary of human interactions. Our method builds upon the recent work in indoor scene understanding and ..."
Abstract
-
Cited by 67 (8 self)
- Add to MetaCart
We present a human-centric paradigm for scene understanding. Our approach goes beyond estimating 3D scene geometry and predicts the “workspace ” of a human which is represented by a data-driven vocabulary of human interactions. Our method builds upon the recent work in indoor scene understanding and the availability of motion capture data to create a joint space of human poses and scene geometry by modeling the physical interactions between the two. This joint space can then be used to predict potential human poses and joint locations from a single image. In a way, this work revisits the principle of Gibsonian affordances, reinterpreting it for the modern, data-driven era. (a) An indoor scene
Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images
"... We address the problems of contour detection, bottomup grouping and semantic segmentation using RGB-D data. We focus on the challenging setting of cluttered indoor scenes, and evaluate our approach on the recently introduced NYU-Depth V2 (NYUD2) dataset [27]. We propose algorithms for object boundar ..."
Abstract
-
Cited by 48 (3 self)
- Add to MetaCart
(Show Context)
We address the problems of contour detection, bottomup grouping and semantic segmentation using RGB-D data. We focus on the challenging setting of cluttered indoor scenes, and evaluate our approach on the recently introduced NYU-Depth V2 (NYUD2) dataset [27]. We propose algorithms for object boundary detection and hierarchical segmentation that generalize the gP b − ucm approach of [2] by making effective use of depth information. We show that our system can label each contour with its type (depth, normal or albedo). We also propose a generic method for long-range amodal completion of surfaces and show its effectiveness in grouping. We then turn to the problem of semantic segmentation and propose a simple approach that classifies superpixels into the 40 dominant object categories in NYUD2. We use both generic and class-specific features to encode the appearance and geometry of objects. We also show how our approach can be used for scene classification, and how this contextual information in turn improves object recognition. In all of these tasks, we report significant improvements over the state-of-the-art. 1.
Rendering Synthetic Objects into Legacy Photographs
"... Figure 1: With only a small amount of user interaction, our system allows objects to be inserted into legacy images so that perspective, occlusion, and lighting of inserted objects adhere to the physical properties of the scene. Our method works with only a single LDR photograph, and no access to th ..."
Abstract
-
Cited by 33 (3 self)
- Add to MetaCart
Figure 1: With only a small amount of user interaction, our system allows objects to be inserted into legacy images so that perspective, occlusion, and lighting of inserted objects adhere to the physical properties of the scene. Our method works with only a single LDR photograph, and no access to the scene is required. We propose a method to realistically insert synthetic objects into existing photographs without requiring access to the scene or any additional scene measurements. With a single image and a small amount of annotation, our method creates a physical model of the scene that is suitable for realistically rendering synthetic objects with diffuse, specular, and even glowing materials while accounting for lighting interactions between the objects and the scene. We demonstrate in a user study that synthetic images produced by our method are confusable with real scenes, even for people who believe they are good at telling the difference. Further, our study shows that our method is competitive with other insertion methods while requiring less scene information. We also collected new illumination and reflectance datasets; renderings produced by our system compare well to ground truth. Our system has applications in the movie and gaming industry, as well as home decorating and user content creation, among others.
A.: Scene semantics from long-term observation of people. In: ECCV
, 2012
"... Abstract. Our everyday objects support various tasks and can be used by people for different purposes. While object classification is a widely studied topic in computer vision, recognition of object function, i.e., what people can do with an object and how they do it, is rarely addressed. In this pa ..."
Abstract
-
Cited by 27 (6 self)
- Add to MetaCart
Abstract. Our everyday objects support various tasks and can be used by people for different purposes. While object classification is a widely studied topic in computer vision, recognition of object function, i.e., what people can do with an object and how they do it, is rarely addressed. In this paper we construct a functional object description with the aim to recognize objects by the way people interact with them. We describe scene objects (sofas, tables, chairs) by associated human poses and object appearance. Our model is learned discriminatively from automatically estimated body poses in many realistic scenes. In particular, we make use of time-lapse videos from YouTube providing a rich source of common human-object interactions and minimizing the effort of manual object annotation. We show how the models learned from human observations significantly improve object recognition and enable prediction of characteristic human poses in new scenes. Results are shown on a dataset of more than 400,000 frames obtained from 146 time-lapse videos of challenging and realistic indoor scenes. 1
Building Reconstruction using Manhattan-World Grammars
"... Figure 1. System Pipeline. The input to our system consists of one or more calibrated aerial images of a Manhattan-world building. After color segmentation and background/windows removal, our grammar-based algorithm adapts the geometry of the building that produces the façade orientation changes obs ..."
Abstract
-
Cited by 26 (3 self)
- Add to MetaCart
(Show Context)
Figure 1. System Pipeline. The input to our system consists of one or more calibrated aerial images of a Manhattan-world building. After color segmentation and background/windows removal, our grammar-based algorithm adapts the geometry of the building that produces the façade orientation changes observed in the photos. The input photos are projected as textures onto the reconstructed model. The result is an automatically-generated complete, closed 3D model of the observed building. We present a passive computer vision method that exploits existing mapping and navigation databases in order to automatically create 3D building models. Our method defines a grammar for representing changes in building geometry that approximately follow the Manhattan-world assumption which states there is a predominance of three mutually orthogonal directions in the scene. By using multiple calibrated aerial images, we extend previous Manhattan-world methods to robustly produce a single, coherent, complete geometric model of a building with partial textures. Our method uses an optimization to discover a 3D building geometry that produces the same set of façade orientation changes observed in the captured images. We have applied our method to several real-world buildings and have analyzed our approach using synthetic buildings. 1.
Understanding indoor scenes using 3d geometric phrases
- In CVPR
, 2013
"... Visual scene understanding is a difficult problem inter-leaving object detection, geometric reasoning and scene classification. We present a hierarchical scene model for learning and reasoning about complex indoor scenes which is computationally tractable, can be learned from a reason-able amount of ..."
Abstract
-
Cited by 21 (5 self)
- Add to MetaCart
(Show Context)
Visual scene understanding is a difficult problem inter-leaving object detection, geometric reasoning and scene classification. We present a hierarchical scene model for learning and reasoning about complex indoor scenes which is computationally tractable, can be learned from a reason-able amount of training data, and avoids oversimplification. At the core of this approach is the 3D Geometric Phrase Model which captures the semantic and geometric relation-ships between objects which frequently co-occur in the same 3D spatial configuration. Experiments show that this model effectively explains scene semantics, geometry and object groupings from a single image, while also improving indi-vidual object detections. 1.