Results 1 - 10
of
47
OmniTouch: wearable multitouch interaction everywhere
- In Proc. ACM UIST ’11
, 2011
"... Figure 1. OmniTouch is a wearable depth-sensing and projection system that allows everyday surfaces- including a wearer’s own body- to be appropriated for graphical multitouch interaction. OmniTouch is a wearable depth-sensing and projection system that enables interactive multitouch applications on ..."
Abstract
-
Cited by 86 (11 self)
- Add to MetaCart
(Show Context)
Figure 1. OmniTouch is a wearable depth-sensing and projection system that allows everyday surfaces- including a wearer’s own body- to be appropriated for graphical multitouch interaction. OmniTouch is a wearable depth-sensing and projection system that enables interactive multitouch applications on everyday surfaces. Beyond the shoulder-worn system, there is no instrumentation of the user or environment. Foremost, the system allows the wearer to use their hands, arms and legs as graphical, interactive surfaces. Users can also transiently appropriate surfaces from the environment to expand the interactive area (e.g., books, walls, tables). On such surfaces- without any calibration- OmniTouch provides capabilities similar to that of a mouse or touchscreen: X and Y location in 2D interfaces and whether fingers are “clicked” or hovering, enabling a wide variety of interactions. Reliable operation on the hands, for example, requires buttons to be 2.3cm in diameter. Thus, it is now conceivable that anything one can do on today’s mobile devices, they could do in the palm of their hand. ACM Classification: H.5.2 [Information interfaces and
Learning human activities and object affordances from rgb-d videos. IJRR
, 2013
"... such as making cereal and arranging objects in a room (see Fig. 9). For example, the making cereal activity consists of around 12 sub-activities on average, which includes reaching the pitcher, moving the pitcher to the bowl, and then pouring the milk into the bowl. This proves to be a very challeng ..."
Abstract
-
Cited by 59 (16 self)
- Add to MetaCart
(Show Context)
such as making cereal and arranging objects in a room (see Fig. 9). For example, the making cereal activity consists of around 12 sub-activities on average, which includes reaching the pitcher, moving the pitcher to the bowl, and then pouring the milk into the bowl. This proves to be a very challenging task given the variability across individuals in performing each sub-activity, and other environment induced conditions such as cluttered background and viewpoint changes. (See Fig. 2 for some examples.) In most previous works, object detection and activity recognition have been addressed as separate tasks. Only recently, some works have shown that modeling mutual context is beneficial (Gupta et al., 2009; Yao and Fei-Fei, 2010). The key idea in our work is to note that, in activity detection, it is sometimes more informative to know how an object is being used (associated affordances, Gibson, 1979) rather than knowing what the object is (i.e. the object category). For example, both chair and sofa might be categorized as ‘sittable, ’ and a cup might be categorized as both ‘drinkable ’ and ‘pourable. ’ Note that the affordances of an object change over time depending on its use, e.g., a pitcher may first be reachable, then movable and finally pourable. In addition to helping activity recognition, recognizing object affordances is important by itself because of their use in robotic applications (e.g., Kormushev et al., 2010; Jiang et al., 2012a; Jiang and Saxena, 2012). We propose a method to learn human activities by modarXiv:1210.1207v2
Enhanced computer vision with microsoft kinect sensor: A review
- IEEE TRANSACTIONS ON CYBERNETICS
, 2013
"... With the invention of the low-cost Microsoft Kinect sensor, high-resolution depth and visual (RGB) sensing has become available for widespread use. The complementary nature of the depth and visual information provided by the Kinect sensor opens up new opportunities to solve fundamental problems in ..."
Abstract
-
Cited by 31 (2 self)
- Add to MetaCart
(Show Context)
With the invention of the low-cost Microsoft Kinect sensor, high-resolution depth and visual (RGB) sensing has become available for widespread use. The complementary nature of the depth and visual information provided by the Kinect sensor opens up new opportunities to solve fundamental problems in computer vision. This paper presents a comprehensive review of recent Kinect-based computer vision algorithms and applications. The reviewed approaches are classified according to the type of vision problems that can be addressed or enhanced by means of the Kinect sensor. The covered topics include preprocessing, object tracking and recognition, human activity analysis, hand gesture analysis, and indoor 3-D mapping. For each category of methods, we outline their main algorithmic contributions and summarize their advantages/differences compared to their RGB counterparts. Finally, we give an overview of the challenges in this field and future research trends. This paper is expected to serve as a tutorial and source of references for Kinect-based computer vision researchers.
Learning to Place New Objects in a Scene
"... Abstract—Placing is a necessary skill for a personal robot to have in order to perform tasks such as arranging objects in a disorganized room. The object placements should not only be stable but also be in their semantically preferred placing areas and orientations. This is challenging because an en ..."
Abstract
-
Cited by 20 (10 self)
- Add to MetaCart
Abstract—Placing is a necessary skill for a personal robot to have in order to perform tasks such as arranging objects in a disorganized room. The object placements should not only be stable but also be in their semantically preferred placing areas and orientations. This is challenging because an environment can have a large variety of objects and placing areas that may not have been seen by the robot before. In this paper, we propose a learning approach for placing multiple objects in different placing areas in a scene. Given point-clouds of the objects and the scene, we design appropriate features and use a graphical model to encode various properties, such as the stacking of objects, stability, object-area relationship and common placing constraints. The inference in our model is an integer linear program, which we solve efficiently via an LP relaxation. We extensively evaluate our approach on 98 objects from 16 categories being placed into 40 areas. Our robotic experiments show a success rate of 98 % in placing known objects and 82 % in placing new objects stably. We use our method on our robots for performing tasks such as loading several dish-racks, a bookshelf and a fridge with multiple items. 1 I.
Histogram of Oriented Normal Vectors for Object Recognition with a Depth Sensor
"... Abstract. We propose a feature, the Histogram of Oriented Normal Vectors (HONV), designed specifically to capture local geometric characteristics for object recognition with a depth sensor. Through our derivation, the normal vector orientation represented as an ordered pair of azimuthal angle and ze ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
(Show Context)
Abstract. We propose a feature, the Histogram of Oriented Normal Vectors (HONV), designed specifically to capture local geometric characteristics for object recognition with a depth sensor. Through our derivation, the normal vector orientation represented as an ordered pair of azimuthal angle and zenith angle can be easily computed from the gradients of the depth image. We form the HONV as a concatenation of local histograms of azimuthal angle and zenith angle. Since the HONV is inherently the local distribution of the tangent plane orientation of an object surface, we use it as a feature for object detection/classification tasks. The object detection experiments on the standard RGB-D dataset [1] and a self-collected Chair-D dataset show that the HONV significantly outperforms traditional features such as HOG on the depth image and HOG on the intensity image, with an improvement of 11.6 % in average precision. For object classification, the HONV achieved 5.0 % improvement over state-of-the-art approaches. 1
Learning Discriminative Representations from RGB-D Video Data
- PROCEEDINGS OF THE TWENTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE
, 2013
"... Recently, the low-cost Microsoft Kinect sensor, which can capture real-time high-resolution RGB and depth visual information, has attracted increasing attentions for a wide range of applications in computer vision. Existing techniques extract hand-tuned features from the RGB and the depth data separ ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
Recently, the low-cost Microsoft Kinect sensor, which can capture real-time high-resolution RGB and depth visual information, has attracted increasing attentions for a wide range of applications in computer vision. Existing techniques extract hand-tuned features from the RGB and the depth data separately and heuristically fuse them, which would not fully exploit the complementarity of both data sources. In this paper, we introduce an adaptive learning methodology to automatically extract (holistic) spatio-temporal features, simultaneously fusing the RGB and depth information, from RGB-D video data for visual recognition tasks. We address this as an optimization problem using our proposed restricted graph-based genetic programming (RGGP) approach, in which a group of primitive 3D operators are first randomly assembled as graph-based combinations and then evolved generation by generation by evaluating on a set of RGB-D video samples. Finally the best-performed combination is selected as the (near-)optimal representation for a pre-defined task. The proposed method is systematically evaluated on a new hand gesture dataset, SKIG, that we collected ourselves and the public MSRDailyActivity3D dataset, respectively. Extensive experimental results show that our approach leads to significant advantages compared with state-of-the-art hand-crafted and machine-learned features.
Sliding Shapes for 3D Object Detection in Depth Images
"... Abstract. The depth information of RGB-D sensors has greatly simplified some common challenges in computer vision and enabled breakthroughs for several tasks. In this paper, we propose to use depth maps for object detection and de-sign a 3D detector to overcome the major difficulties for recognition ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
Abstract. The depth information of RGB-D sensors has greatly simplified some common challenges in computer vision and enabled breakthroughs for several tasks. In this paper, we propose to use depth maps for object detection and de-sign a 3D detector to overcome the major difficulties for recognition, namely the variations of texture, illumination, shape, viewpoint, clutter, occlusion, self-occlusion and sensor noises. We take a collection of 3D CAD models and render each CAD model from hundreds of viewpoints to obtain synthetic depth maps. For each depth rendering, we extract features from the 3D point cloud and train an Exemplar-SVM classifier. During testing and hard-negative mining, we slide a
Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes
"... Abstract. We propose a framework for automatic modeling, detection, and tracking of 3D objects with a Kinect. The detection part is mainly based on the recent template-based LINEMOD approach [1] for object detection. We show how to build the templates automatically from 3D models, and how to estimat ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
(Show Context)
Abstract. We propose a framework for automatic modeling, detection, and tracking of 3D objects with a Kinect. The detection part is mainly based on the recent template-based LINEMOD approach [1] for object detection. We show how to build the templates automatically from 3D models, and how to estimate the 6 degrees-of-freedom pose accurately and in real-time. The pose estimation and the color information allow us to check the detection hypotheses and improves the correct detection rate by 13 % with respect to the original LINEMOD. These many improvements make our framework suitable for object manipulation in Robotics applications. Moreover we propose a new dataset made of 15 registered, 1100+ frame video sequences of 15 various objects for the evaluation of future competing methods. Fig. 1. 15 different texture-less 3D objects are simultaneously detected with our approach under different poses on heavy cluttered background with partial occlusion. Each detected object is augmented with its 3D model. We also show the corresponding coordinate systems. 1
RGBD object recognition and visual texture classification for indoor semantic
"... mapping ..."
(Show Context)
A survey on human motion analysis from depth data
- In: Time-of-Flight and Depth Imaging. Sensors, Algorithms, and Applications
, 2013
"... Abstract. Human pose estimation has been actively studied for decades. While traditional approaches rely on 2d data like images or videos, the development of Time-of-Flight cameras and other depth sensors created new opportunities to advance the field. We give an overview of recent approaches that p ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
(Show Context)
Abstract. Human pose estimation has been actively studied for decades. While traditional approaches rely on 2d data like images or videos, the development of Time-of-Flight cameras and other depth sensors created new opportunities to advance the field. We give an overview of recent approaches that perform human motion analysis which includes depth-based and skeleton-based activity recognition, head pose estimation, fa-cial feature detection, facial performance capture, hand pose estimation and hand gesture recognition. While the focus is on approaches using depth data, we also discuss traditional image based methods to provide a broad overview of recent developments in these areas. 1