Results 1 - 10
of
23
This hand is my hand: A probabilistic approach to hand disambiguation in egocentric video
- In CVPR Egovision
, 2014
"... Egocentric cameras are becoming more popular, intro-ducing increasing volumes of video in which the biases and framing of traditional photography are replaced with those of natural viewing tendencies. This paradigm enables new applications, including novel studies of social interaction and human dev ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
(Show Context)
Egocentric cameras are becoming more popular, intro-ducing increasing volumes of video in which the biases and framing of traditional photography are replaced with those of natural viewing tendencies. This paradigm enables new applications, including novel studies of social interaction and human development. Recent work has focused on iden-tifying the camera wearer’s hands as a first step towards more complex analysis. In this paper, we study how to dis-ambiguate and track not only the observer’s hands but also those of social partners. We present a probabilistic frame-work for modeling paired interactions that incorporates the spatial, temporal, and appearance constraints inherent in egocentric video. We test our approach on a dataset of over 30 minutes of video from six pairs of subjects. 1.
Model Recommendation with Virtual Probes for Egocentric Hand Detection
- in ICCV 2013, (Sydney), IEEE Computer Society
, 2013
"... Egocentric cameras can be used to benefit such tasks as analyzing fine motor skills, recognizing gestures and learn-ing about hand-object manipulation. To enable such tech-nology, we believe that the hands must detected on the pixel-level to gain important information about the shape of the hands an ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
Egocentric cameras can be used to benefit such tasks as analyzing fine motor skills, recognizing gestures and learn-ing about hand-object manipulation. To enable such tech-nology, we believe that the hands must detected on the pixel-level to gain important information about the shape of the hands and fingers. We show that the problem of pixel-wise hand detection can be effectively solved, by posing the prob-lem as a model recommendation task. As such, the goal of a recommendation system is to recommend the n-best hand detectors based on the probe set – a small amount of la-beled data from the test distribution. This requirement of a probe set is a serious limitation in many applications, such as ego-centric hand detection, where the test distribution may be continually changing. To address this limitation, we propose the use of virtual probes which can be automati-cally extracted from the test distribution. The key idea is that many features, such as the color distribution or rela-tive performance between two detectors, can be used as a proxy to the probe set. In our experiments we show that the recommendation paradigm is well-equipped to handle complex changes in the appearance of the hands in first-person vision. In particular, we show how our system is able to generalize to new scenarios by testing our model across multiple users. 1.
The Evolution of First Person Vision Methods: A Survey
, 2015
"... The emergence of new wearable technologies, such as action cameras and smart glasses, has increased the interest of computer vision scientists in the first person perspective. Nowadays, this field is attracting attention and investments of companies aiming to develop commercial devices with first ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
(Show Context)
The emergence of new wearable technologies, such as action cameras and smart glasses, has increased the interest of computer vision scientists in the first person perspective. Nowadays, this field is attracting attention and investments of companies aiming to develop commercial devices with first person vision (FPV) recording capabilities. Due to this interest, an increasing demand of methods to process these videos, possibly in real time, is expected. The current approaches present a particular combinations of different image features and quantitative methods to accomplish specific objectives like object detection, activity recognition, user–machine interaction, and so on. This paper summarizes the evolution of the state of the art in FPV video analysis between 1997 and 2014, highlighting, among others, the most commonly used features, methods, challenges, and opportunities within the field.
A Sequential Classifier for Hand Detection in the Framework of Egocentric Vision
- in 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops
, 2014
"... Abstract—Hand detection is one of the most explored areas in Egocentric Vision Video Analysis for wearable devices. Current methods are focused on pixel-by-pixel hand segmentation, with the implicit assumption of hand presence in almost all activities. However, this assumption is false in many appli ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
(Show Context)
Abstract—Hand detection is one of the most explored areas in Egocentric Vision Video Analysis for wearable devices. Current methods are focused on pixel-by-pixel hand segmentation, with the implicit assumption of hand presence in almost all activities. However, this assumption is false in many applications for wearable cameras. Ignoring this fact could affect the whole performance of the device since hand measurements are usually the starting point for higher level inference, or could lead to inefficient use of computational resources and battery power. In this paper we propose a two-level sequential classifier, in which the first level, a hand-detector, deals with the possible presence of hands from a global perspective, and the second level, a hand-segmentator, delineates the hand regions at pixel level in the cases indicated by the first block. The performance of the sequential classifier is stated in probabilistic notation as a combination of both, classifiers allowing to test new hand-detectors independently of the type of segmentation and the dataset used in the training stage. Experimental results show a considerable improvement in the detection of true negatives, without compromising the performance of the true positives.
TOWARDS A UNIFIED FRAMEWORK FOR HAND-BASED METHODS IN FIRST PERSON VISION.
"... First Person Vision (Egocentric) video analysis stands nowa-days as one of the emerging fields in computer vision. The availability of wearable devices recording exactly what the user is looking at is ineluctable and the opportunities and chal-lenges carried by this kind of devices are broad. Partic ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
First Person Vision (Egocentric) video analysis stands nowa-days as one of the emerging fields in computer vision. The availability of wearable devices recording exactly what the user is looking at is ineluctable and the opportunities and chal-lenges carried by this kind of devices are broad. Particularly, for the first time a device is so intimate with the user to be able to record the movements of his hands, making hand-based ap-plications for First Person Vision one the most explored area in the field. This paper explores the more popular processing steps to develop hand-based applications, and proposes a hi-erarchical structure that optimally switches between each of the levels to reduce the computational cost of the system and improve its performance.
Pixel-Level Hand Detection with Shape-aware Structured Forests
"... Abstract. Hand detection has many important applications in HCI, yet it is a challenging problem because the appearance of hands can vary greatly in images. In this paper, we propose a novel method for effi-cient pixel-level hand detection. Unlike previous method which assigns a binary label to ever ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract. Hand detection has many important applications in HCI, yet it is a challenging problem because the appearance of hands can vary greatly in images. In this paper, we propose a novel method for effi-cient pixel-level hand detection. Unlike previous method which assigns a binary label to every pixel independently, our method estimates a proba-bility shape mask for a pixel using structured forests. This approach can better exploit hand shape information in the training data, and enforce shape constraints in the estimation. Aggregation of multiple predictions generated from neighboring pixels further improves the robustness of our method. We evaluate our method on both ego-centric videos and un-constrained still images. Experiment results show that our method can detect hands efficiently and outperform other state-of-the-art methods. 1
Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions
- In IEEE International Conference on Computer Vision
, 2015
"... Hands appear very often in egocentric video, and their appearance and pose give important cues about what peo-ple are doing and what they are paying attention to. But existing work in hand detection has made strong assump-tions that work well in only simple scenarios, such as with limited interactio ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Hands appear very often in egocentric video, and their appearance and pose give important cues about what peo-ple are doing and what they are paying attention to. But existing work in hand detection has made strong assump-tions that work well in only simple scenarios, such as with limited interaction with other people or in lab settings. We develop methods to locate and distinguish between hands in egocentric video using strong appearance models with Con-volutional Neural Networks, and introduce a simple can-didate region generation approach that outperforms exist-ing techniques at a fraction of the computational cost. We show how these high-quality bounding boxes can be used to create accurate pixelwise hand regions, and as an applica-tion, we investigate the extent to which hand segmentation alone can distinguish between different activities. We eval-uate these techniques on a new dataset of 48 first-person videos of people interacting in realistic environments, with pixel-level ground truth for over 15,000 hand instances.
Vision and Learning for Deliberative Monocular Cluttered Flight
"... Abstract Cameras provide a rich source of information while being passive, cheap and lightweight for small Unmanned Aerial Vehicles (UAVs). In this work we present the first implementation of receding horizon control, which is widely used in ground vehicles, with monocular vision as the only sensing ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract Cameras provide a rich source of information while being passive, cheap and lightweight for small Unmanned Aerial Vehicles (UAVs). In this work we present the first implementation of receding horizon control, which is widely used in ground vehicles, with monocular vision as the only sensing mode for autonomous UAV flight in dense clutter. Two key contributions make this possible: novel cou-pling of perception and control via relevant and diverse, multiple interpretations of the scene around the robot, leveraging recent advances in machine learning to showcase anytime budgeted cost-sensitive feature selection, and fast non-linear re-gression for monocular depth prediction. We empirically demonstrate the efficacy of our novel pipeline via real world experiments of more than 2 kms through dense trees with an off-the-shelf quadrotor. Moreover our pipeline is designed to combine information from other modalities like stereo and lidar. 1
Beyond Just Keeping Hands on the Wheel: Towards Visual Interpretation of Driver Hand Motion Patterns
"... Abstract — Observing hand activity in the car provides a rich set of patterns relating to vehicle maneuvering, secondary tasks, driver distraction, and driver intent inference. This work strives to develop a vision-based framework for analyzing such patterns in real-time. First, hands are detected a ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract — Observing hand activity in the car provides a rich set of patterns relating to vehicle maneuvering, secondary tasks, driver distraction, and driver intent inference. This work strives to develop a vision-based framework for analyzing such patterns in real-time. First, hands are detected and tracked from a monocular camera. This provides position information of the left and right hands with no intrusion over long, naturalistic drives. Second, the motion trajectories are studied in settings of activity recognition, prediction, and higher-level semantic categorization. I.
Experiments on an RGB-D Wearable Vision System for Egocentric Activity Recognition
"... tech.cornell.edu This work describes and explores novel steps towards activity recognition from an egocentric point of view. Ac-tivity recognition is a broadly studied topic in computer vi-sion, but the unique characteristics of wearable vision sys-tems present new challenges and opportunities. We e ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
tech.cornell.edu This work describes and explores novel steps towards activity recognition from an egocentric point of view. Ac-tivity recognition is a broadly studied topic in computer vi-sion, but the unique characteristics of wearable vision sys-tems present new challenges and opportunities. We eval-uate a challenging new publicly available dataset that in-cludes trajectories of different users across two indoor en-vironments performing a set of more than 20 different ac-tivities. The visual features studied include compact and global image descriptors, including GIST and a novel skin segmentation based histogram signature, and state-of-the art image representations for recognition, including Bag of SIFT words and Convolutional Neural Network (CNN) based features. Our experiments show that simple and com-pact features provide reasonable accuracy to obtain ba-sic activity information (in our case, manipulation vs. non-manipulation). However, for finer grained categories CNN-based features provide the most promising results. Future steps include integration of depth information with these features and temporal consistency into the pipeline. 1.