Results 1 -
6 of
6
The Evolution of First Person Vision Methods: A Survey
"... Abstract — The emergence of new wearable technologies, such as action cameras and smart glasses, has increased the interest of computer vision scientists in the first person perspective. Nowadays, this field is attracting attention and investments of companies aiming to develop commercial devices wi ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
(Show Context)
Abstract — The emergence of new wearable technologies, such as action cameras and smart glasses, has increased the interest of computer vision scientists in the first person perspective. Nowadays, this field is attracting attention and investments of companies aiming to develop commercial devices with first person vision (FPV) recording capabilities. Due to this interest, an increasing demand of methods to process these videos, possibly in real time, is expected. The current approaches present a par-ticular combinations of different image features and quantitative methods to accomplish specific objectives like object detection, activity recognition, user–machine interaction, and so on. This paper summarizes the evolution of the state of the art in FPV video analysis between 1997 and 2014, highlighting, among others, the most commonly used features, methods, challenges, and opportunities within the field. Index Terms — Computer vision, egocentric vision, first person vision (FPV), human–machine interaction, smart glasses, video analytics, wearable devices. I.
Video Summarization by Learning Submodular Mixtures of Objectives
- IEEE Conf. Comput. Vis. Pattern Recognit
, 2015
"... We present a novel method for summarizing raw, casu-ally captured videos. The objective is to create a short sum-mary that still conveys the story. It should thus be both, interesting and representative for the input video. Previous methods often used simplified assumptions and only opti-mized for o ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
We present a novel method for summarizing raw, casu-ally captured videos. The objective is to create a short sum-mary that still conveys the story. It should thus be both, interesting and representative for the input video. Previous methods often used simplified assumptions and only opti-mized for one of these goals. Alternatively, they used hand-defined objectives that were optimized sequentially by mak-ing consecutive hard decisions. This limits their use to a particular setting. Instead, we introduce a new method that (i) uses a supervised approach in order to learn the im-portance of global characteristics of a summary and (ii) jointly optimizes for multiple objectives and thus creates summaries that posses multiple properties of a good sum-mary. Experiments on two challenging and very diverse datasets demonstrate the effectiveness of our method, where we outperform or match current state-of-the-art. 1.
Convolutional Neural Networks for Detecting and Mapping Crowds in First Person Vision Applications
"... Abstract. There has been an increasing interest on the analysis of First Person Videos in the last few years due to the spread of low-cost wearable devices. Nevertheless, the understanding of the environment surround-ing the wearer is a difficult task with many elements involved. In this work, a met ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. There has been an increasing interest on the analysis of First Person Videos in the last few years due to the spread of low-cost wearable devices. Nevertheless, the understanding of the environment surround-ing the wearer is a difficult task with many elements involved. In this work, a method for detecting and mapping the presence of people and crowds around the wearer is presented. Features extracted at the crowd level are used for building a robust representation that can handle the variations and occlusion of people’s visual characteristics inside a crowd. To this aim, convolutional neural networks have been exploited. Results demonstrate that this approach achieves a high accuracy on the recog-nition of crowds, as well as the possibility of a general interpretation of the context trough the classification of characteristics of the segmented background.
1The Evolution of First Person Vision Methods: A Survey
"... Abstract—The emergence of new wearable technologies such as ac-tion cameras and smart-glasses has increased the interest of computer vision scientists in the First Person perspective. Nowadays, this field is attracting attention and investments of companies aiming to develop commercial devices with ..."
Abstract
- Add to MetaCart
Abstract—The emergence of new wearable technologies such as ac-tion cameras and smart-glasses has increased the interest of computer vision scientists in the First Person perspective. Nowadays, this field is attracting attention and investments of companies aiming to develop commercial devices with First Person Vision recording capabilities. Due to this interest, an increasing demand of methods to process these videos, possibly in real-time, is expected. Current approaches present particular combinations of different image features and quantitative methods to accomplish specific objectives like object detection, activity recognition, user machine interaction and so on. This paper summarizes the evolution of the state of the art in First Person Vision video analysis between 1997 and 2014, highlighting, among others, most commonly used features, methods, challenges and opportunities within the field.
Egocentric Video Biometrics
"... Egocentric cameras are being worn by an increasing number of users, among them many security forces world-wide. GoPro cameras already penetrated the mass market, and Google Glass may follow soon. As head-worn cam-eras do not capture the face and body of the wearer, it may seem that the anonymity of ..."
Abstract
- Add to MetaCart
(Show Context)
Egocentric cameras are being worn by an increasing number of users, among them many security forces world-wide. GoPro cameras already penetrated the mass market, and Google Glass may follow soon. As head-worn cam-eras do not capture the face and body of the wearer, it may seem that the anonymity of the wearer can be preserved even when the video is publicly distributed. We show that motion features in egocentric video provide biometric information, and the identity of the user can be reliably determined from a few seconds of video captured when the user is walking. The proposed method achieves more than 90 % identification accuracy in cases where the random success rate is only 3%. Applications may include theft prevention of wearable cameras by locking the camera when not worn by its lawful owner. This work can also provide the first steps towards searching on video sharing services (e.g. YouTube) for ego-centric videos shot by a specific person. An important mes-sage in this paper is that people should be aware that shar-ing egocentric video will compromise their anonymity, even when their face is not visible. 1.
Storyline Representation of Egocentric Videos with an Application to Story-based Search
"... Egocentric videos are a valuable source of information as a daily log of our lives. However, large fraction of ego-centric video content is typically irrelevant and boring to re-watch. It is an agonizing task, for example, to manually search for the moment when your daughter first met Mickey Mouse f ..."
Abstract
- Add to MetaCart
(Show Context)
Egocentric videos are a valuable source of information as a daily log of our lives. However, large fraction of ego-centric video content is typically irrelevant and boring to re-watch. It is an agonizing task, for example, to manually search for the moment when your daughter first met Mickey Mouse from hours-long egocentric videos taken at Disney-land. Although many summarization methods have been successfully proposed to create concise representations of videos, in practice, the value of the subshots to users may change according to their immediate preference/mood; thus summaries with fixed criteria may not fully satisfy users’ various search intents. To address this, we propose a sto-ryline representation that expresses an egocentric video as a set of jointly inferred, through MRF inference, story el-ements comprising of actors, locations, supporting objects and events, depicted on a timeline. We construct such a sto-ryline with very limited annotation data (a list of map loca-tions and weak knowledge of what events may be possible at each location), by bootstrapping the process with data ob-tained through focused Web image and video searches. Our representation promotes story-based search with queries in the form of AND-OR graphs, which span any subset of story elements and their spatio-temporal composition. We show effectiveness of our approach on a set of unconstrained YouTube egocentric videos of visits to Disneyland. 1.