• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

C.: Dynamic Eye Movement Datasets and Learnt Saliency Models for Visual Action Recognition (2012)

by S Mathe, Sminchisescu
Venue:ECCV 2012, Part VII
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 22
Next 10 →

Action Recognition with Improved Trajectories

by Heng Wang, Cordelia Schmid, Heng Wang, Cordelia Schmid - in IEEE International Conference on Computer Vision (ICCV , 2013
"... HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte p ..."
Abstract - Cited by 98 (13 self) - Add to MetaCart
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
(Show Context)

Citation Context

...tions (e.g., pan, tilt and zoom) and only trajectories related to human actions are kept (shown in green in Figure 3). This gives us similar effects as sampling features based on visual saliency maps =-=[23, 39]-=-. The last row of Figure 3 shows two failure cases. The left one is due to severe motion blur, which makes both SURF descriptor matching and optical flow estimation unreliable. Improving motion estima...

Action and Event Recognition with Fisher Vectors on a Compact Feature Set

by Dan Oneata, Jakob Verbeek, Cordelia Schmid, Jakob Verbeek, Cordelia Schmid , 2014
"... HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte p ..."
Abstract - Cited by 35 (8 self) - Add to MetaCart
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
(Show Context)

Citation Context

... Y Y 90.0 54.8 89.0 82.1 63.3 BT’10 [1] — — 77.8 — — LZYN’11 [20] — — 75.8 — 53.3 KGGHW’12 [14] 72.7 29.2 — — — WWQ’12 [44] — 31.8 — — — JDXLN’12 [12] — 40.7 — 80.6 59.5 GHS’12 [7] — — — 82.7 — MS’12 =-=[24]-=- — — — — 61.7 WKSCL’13 [41] 85.6 48.3 85.4 77.2 59.9 JJB’13 [11] — 52.1 — 83.2 62.5 Table 2. Comparison to the state of the art of our FV-based results with SFV+T2+H3, K = 1000 for both MBH and SIFT f...

R.: Sampling strategies for real-time action recognition

by Feng Shi, Emil Petriu - In: CVPR (2013
"... Local spatio-temporal features and bag-of-features rep-resentations have become popular for action recognition. A recent trend is to use dense sampling for better perfor-mance. While many methods claimed to use dense feature sets, most of them are just denser than approaches based on sparse interest ..."
Abstract - Cited by 15 (1 self) - Add to MetaCart
Local spatio-temporal features and bag-of-features rep-resentations have become popular for action recognition. A recent trend is to use dense sampling for better perfor-mance. While many methods claimed to use dense feature sets, most of them are just denser than approaches based on sparse interest point detectors. In this paper, we explore sampling with high density on action recognition. We also investigate the impact of random sampling over dense grid for computational efficiency. We present a real-time action recognition system which integrates fast random sampling method with local spatio-temporal features extracted from a Local Part Model. A new method based on histogram in-tersection kernel is proposed to combine multiple channels of different descriptors. Our technique shows high accuracy on the simple KTH dataset, and achieves state-of-the-art on two very challenging real-world datasets, namely, 93 % on KTH, 83.3 % on UCF50 and 47.6 % on HMDB51. 1.
(Show Context)

Citation Context

...e sampling. A recent study [29] shows that action recognition performance can be maintained with as little as 30% of the densely detected features. Mathe and Sminchisescu also show similar results in =-=[19]-=-. Given the effectiveness of the uniform sampling strategy, one can think of using biased random samplers in order to find more discriminant patches. Yang et al. [34] are able to identify more feature...

Action and Event Recognition with Fisher Vectors on a Compact Feature Set

by Jakob Verbeek, Cordelia Schmid
"... Action recognition in uncontrolled video is an important and challenging computer vision problem. Recent progress in this area is due to new local features and models that capture spatio-temporal structure between local features, or human-object interactions. Instead of working towards more complex ..."
Abstract - Cited by 10 (5 self) - Add to MetaCart
Action recognition in uncontrolled video is an important and challenging computer vision problem. Recent progress in this area is due to new local features and models that capture spatio-temporal structure between local features, or human-object interactions. Instead of working towards more complex models, we focus on the low-level features and their encoding. We evaluate the use of Fisher vectors as an alternative to bag-of-word histograms to aggregate a small set of state-of-the-art low-level descriptors, in combination with linear classifiers. We present a large and varied set of evaluations, considering (i) classification of short actions in five datasets, (ii) localization of such actions in feature-length movies, and (iii) large-scale recognition of complex events. We find that for basic action recognition and localization MBH features alone are enough for stateof-the-art performance. For complex events we find that SIFT and MFCC features provide complementary cues. On all three problems we obtain state-of-the-art results, while using fewer features and less complex models. 1.
(Show Context)

Citation Context

... Y Y 90.0 54.8 89.0 82.1 63.3 BT’10 [1] — — 77.8 — — LZYN’11 [20] — — 75.8 — 53.3 KGGHW’12 [14] 72.7 29.2 — — — WWQ’12 [44] — 31.8 — — — JDXLN’12 [12] — 40.7 — 80.6 59.5 GHS’12 [7] — — — 82.7 — MS’12 =-=[24]-=- — — — — 61.7 WKSCL’13 [41] 85.6 48.3 85.4 77.2 59.9 JJB’13 [11] — 52.1 — 83.2 62.5 Table 2. Comparison to the state of the art of our FV-based results with SFV+T2+H3, K = 1000 for both MBH and SIFT f...

Training object class detectors from eye tracking data

by Dim P. Papadopoulos, Alasdair D. F. Clarke, Frank Keller, Vittorio Ferrari
"... Abstract. Training an object class detector typically requires a large set of im-ages annotated with bounding-boxes, which is expensive and time consuming to create. We propose novel approach to annotate object locations which can sub-stantially reduce annotation time. We first track the eye movemen ..."
Abstract - Cited by 7 (0 self) - Add to MetaCart
Abstract. Training an object class detector typically requires a large set of im-ages annotated with bounding-boxes, which is expensive and time consuming to create. We propose novel approach to annotate object locations which can sub-stantially reduce annotation time. We first track the eye movements of annota-tors instructed to find the object and then propose a technique for deriving ob-ject bounding-boxes from these fixations. To validate our idea, we collected eye tracking data for the trainval part of 10 object classes of Pascal VOC 2012 (6,270 images, 5 observers). Our technique correctly produces bounding-boxes in 50% of the images, while reducing the total annotation time by factor 6.8 × compared to drawing bounding-boxes. Any standard object class detector can be trained on the bounding-boxes predicted by our model. Our large scale eye tracking dataset is available at groups.inf.ed.ac.uk/calvin/eyetrackdataset/. 1
(Show Context)

Citation Context

...then apply standard face and text detectors only there. Several authors have collected eye tracking data for video and shown that saliency maps computed from these data can improve action recognition =-=[31, 46]-=-. Yun et al. [52] collect eye movement data for a 1,000 image subset of Pascal VOC 2008; three observers performed a three second free-viewing task. This data is then used to to re-rank the output of ...

Action from still image dataset and inverse optimal control to learn task specific visual scanpaths

by Stefan Mathe, Cristian Sminchisescu - in Advances in Neural Information and Processing Systems , 2013
"... Human eye movements provide a rich source of information into the human vi-sual information processing. The complex interplay between the task and the visual stimulus is believed to determine human eye movements, yet it is not fully understood, making it difficult to develop reliable eye movement pr ..."
Abstract - Cited by 5 (1 self) - Add to MetaCart
Human eye movements provide a rich source of information into the human vi-sual information processing. The complex interplay between the task and the visual stimulus is believed to determine human eye movements, yet it is not fully understood, making it difficult to develop reliable eye movement prediction sys-tems. Our work makes three contributions towards addressing this problem. First, we complement one of the largest and most challenging static computer vision datasets, VOC 2012 Actions, with human eye movement recordings collected un-der the primary task constraint of action recognition, as well as, separately, for context recognition, in order to analyze the impact of different tasks. Our dataset is unique among the eyetracking datasets of still images in terms of large scale (over 1 million fixations recorded in 9157 images) and different task controls. Sec-ond, we propose Markov models to automatically discover areas of interest (AOI) and introduce novel sequential consistency metrics based on them. Our methods can automatically determine the number, the spatial support and the transitions between AOIs, in addition to their locations. Based on such encodings, we quan-titatively show that given unconstrained read-world stimuli, task instructions have significant influence on the human visual search patterns and are stable across subjects. Finally, we leverage powerful machine learning techniques and com-puter vision features in order to learn task-sensitive reward functions from eye movement data within models that allow to effectively predict the human visual search patterns based on inverse optimal control. The methodology achieves state of the art scanpath modeling results. 1
(Show Context)

Citation Context

...ntly improved estimates. Section §6.2 gives the model and its assessment. 2 Related Work Human gaze pattern annotations have been collected for both static images[11, 13, 14, 12, 26, 18] and for video=-=[19, 23, 15]-=-, see [24] for a recent overview. Most of the image datasets available have been collected under free-viewing, and the few task controlled ones[14, 7] have been designed for small scale studies. In co...

OVERVIEW OF EYE TRACKING DATASETS

by Stefan Winkler, Ramanathan Subramanian
"... Datasets of images or videos annotated with eye tracking data constitute important ground truth for studies on saliency models, which have applications in quality assessment and other areas. Over two dozen such databases are now available in the public domain; they are presented in this paper. Index ..."
Abstract - Cited by 5 (1 self) - Add to MetaCart
Datasets of images or videos annotated with eye tracking data constitute important ground truth for studies on saliency models, which have applications in quality assessment and other areas. Over two dozen such databases are now available in the public domain; they are presented in this paper. Index Terms — Eye fixations, saliency, visual attention 1.
(Show Context)

Citation Context

...mages from 3 image quality databases to validate the hypothesis that salient image regions should contribute more to objective image quality metrics. 2.2. Video Databases • Actions in the Eye Dataset =-=[33]-=- was compiled to model human eye movements in the Hollywood-2 and UCF Sports action datasets. Two subject groups were involved in the study – an active group of 12 subjects performed action recognitio...

Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization

by Nataliya Shapovalova, Michalis Raptis, Leonid Sigal, Greg Mori
"... We propose a weakly-supervised structured learning approach for recognition and spatio-temporal localization of actions in video. As part of the proposed approach, we develop a generalization of the Max-Path search algorithm which allows us to efficiently search over a structured space of multiple s ..."
Abstract - Cited by 4 (1 self) - Add to MetaCart
We propose a weakly-supervised structured learning approach for recognition and spatio-temporal localization of actions in video. As part of the proposed approach, we develop a generalization of the Max-Path search algorithm which allows us to efficiently search over a structured space of multiple spatio-temporal paths while also incorporating context information into the model. Instead of using spatial annotations in the form of bounding boxes to guide the latent model during training, we utilize human gaze data in the form of a weak supervisory signal. This is achieved by incorporating eye gaze, along with the classification, into the structured loss within the latent SVM learning framework. Experiments on a challenging benchmark dataset, UCF-Sports, show that our model is more accurate, in terms of classification, and achieves state-of-the-art results in localization. In addition, our model can produce top-down saliency maps conditioned on the classification label and localized latent paths. 1
(Show Context)

Citation Context

...rface). Bottom-up perceptual saliency, computed from eye-gaze of observers (obtained using an eye tracker), has recently been introduced as another promising alternative to annotation and supervision =-=[11, 21]-=-. It has been shown that traditional BoW models computed over the salient regions of the video result in superior performance, compared to dense sampling of descriptors. However, this comes at expense...

Exploiting surroundedness for saliency detection: a Boolean map approach. Accepted at TPAMI

by Jianming Zhang, Student Member, Stan Sclaroff, Senior Member , 2015
"... Abstract—We demonstrate the usefulness of surroundedness for eye fixation prediction by proposing a Boolean Map based Saliency model (BMS). In our formulation, an image is characterized by a set of binary images, which are generated by randomly thresholding the image’s feature maps in a whitened fea ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
Abstract—We demonstrate the usefulness of surroundedness for eye fixation prediction by proposing a Boolean Map based Saliency model (BMS). In our formulation, an image is characterized by a set of binary images, which are generated by randomly thresholding the image’s feature maps in a whitened feature space. Based on a Gestalt principle of figure-ground segregation, BMS computes a saliency map by discovering surrounded regions via topological analysis of Boolean maps. Furthermore, we draw a connection between BMS and the Minimum Barrier Distance to provide insight into why and how BMS can properly captures the surroundedness cue via Boolean maps. The strength of BMS is verified by its simplicity, efficiency and superior performance compared with ten state-of-the-art methods on seven eye tracking benchmark datasets.
(Show Context)

Citation Context

...hown that many computer vision applications can benefit from such saliency models, e.g. image segmentation [2], object recognition [3], visual tracking [4], gaze estimation [5] and action recognition =-=[6]-=-, to name a few. In this paper, we focus on developing a saliency detection model for eye fixation prediction [7]. A majority of existent saliency models of this type are heavily based on the contrast...

Actions in the eye: Dynamic gaze datasets and learnt saliency models for visual recognition

by Stefan Mathe, Cristian Sminchisescu - University of Bonn
"... Abstract—Systems based on bag-of-words models from image features collected at maxima of sparse interest point operators have been used successfully for both computer visual object and action recognition tasks. While the sparse, interest-point based approach to recognition is not inconsistent with v ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
Abstract—Systems based on bag-of-words models from image features collected at maxima of sparse interest point operators have been used successfully for both computer visual object and action recognition tasks. While the sparse, interest-point based approach to recognition is not inconsistent with visual processing in biological systems that operate in ‘saccade and fixate ’ regimes, the methodology and emphasis in the human and the computer vision communities remains sharply distinct. Here, we make three contributions aiming to bridge this gap. First, we complement existing state-of-the art large scale dynamic computer vision annotated datasets like Hollywood-2 [1] and UCF Sports [2] with human eye movements collected under the ecological constraints of visual action and scene context recognition tasks. To our knowledge these are the first large human eye tracking datasets to be collected and made publicly available for video, vision.imar.ro/eyetracking (497,107 frames, each viewed by 19 subjects), unique in terms of their (a) large scale and computer vision relevance, (b) dynamic, video stimuli, (c) task control, as well as free-viewing. Second, we introduce novel dynamic consistency and alignment measures, which underline the remarkable stability of patterns of visual search among subjects. Third, we leverage the significant amount of collected data in order to pursue studies and build automatic, end-to-end trainable computer vision systems based on human eye movements. Our studies not only shed light on the differences between computer vision spatio-temporal interest point image sampling strategies and the human fixations, as well as their impact for visual recognition performance, but also demonstrate that human fixations can be accurately predicted, and when used in an end-to-end automatic system, leveraging some of the advanced computer vision practice, can lead to state of the art results. Index Terms—visual action recognition, human eye-movements, consistency analysis, saliency prediction, large scale learning F 1
(Show Context)

Citation Context

...task. It shows the potential of interest point operators learnt from human fixations for computer vision. Models and experiments appear in §9, results in table 5. This paper extends our prior work in =-=[10]-=-. The paper is organized as follows. In §2 we briefly review existing studies on human visual attention and saliency, as well as the state-of-the art computational models for automatic action recognit...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University