Results 1 - 10
of
24
Analysis of scores, datasets, and models in visual saliency prediction
"... Significant recent progress has been made in developing high-quality saliency models. However, less effort has been undertaken on fair assessment of these models, over large standardized datasets and correctly addressing confound-ing factors. In this study, we pursue a critical and quanti-tative loo ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
(Show Context)
Significant recent progress has been made in developing high-quality saliency models. However, less effort has been undertaken on fair assessment of these models, over large standardized datasets and correctly addressing confound-ing factors. In this study, we pursue a critical and quanti-tative look at challenges (e.g., center-bias, map smoothing) in saliency modeling and the way they affect model accu-racy. We quantitatively compare 32 state-of-the-art mod-els (using the shuffled AUC score to discount center-bias) on 4 benchmark eye movement datasets, for prediction of human fixation locations and scanpath sequence. We also account for the role of map smoothing. We find that, al-though model rankings vary, some (e.g., AWS, LG, AIM, and HouNIPS) consistently outperform other models over all datasets. Some models work well for prediction of both fix-ation locations and scanpath sequence (e.g., Judd, GBVS). Our results show low prediction accuracy for models over emotional stimuli from the NUSEF dataset. Our last bench-mark, for the first time, gauges the ability of models to de-code the stimulus category from statistics of fixations, sac-cades, and model saliency values at fixated locations. In this test, ITTI and AIM models win over other models. Our benchmark provides a comprehensive high-level picture of the strengths and weaknesses of many popular models, and suggests future research directions in saliency modeling. 1.
MEMORABILITY OF NATURAL SCENES: THE ROLE OF ATTENTION
"... The image memorability consists in the faculty of an image to be recalled after a period of time. Recently, the memorability of an image database was measured and some factors responsible for this memorability were highlighted. In this paper, we investigate the role of visual attention in image memo ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
The image memorability consists in the faculty of an image to be recalled after a period of time. Recently, the memorability of an image database was measured and some factors responsible for this memorability were highlighted. In this paper, we investigate the role of visual attention in image memorability around two axis. The first one is experimental and uses results of eye-tracking performed on a set of images of different memorability scores. The second investigation axis is predictive and we show that attention-related features can advantageously replace low-level features in image memorability prediction. From our work it appears that the role of visual attention is important and should be more taken into account along with other low-level features. Index Terms — Image memorability, Visual attention, Eye tracking, Inter-observer congruency, Saliency
Actions in the eye: Dynamic gaze datasets and learnt saliency models for visual recognition
- University of Bonn
"... Abstract—Systems based on bag-of-words models from image features collected at maxima of sparse interest point operators have been used successfully for both computer visual object and action recognition tasks. While the sparse, interest-point based approach to recognition is not inconsistent with v ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
Abstract—Systems based on bag-of-words models from image features collected at maxima of sparse interest point operators have been used successfully for both computer visual object and action recognition tasks. While the sparse, interest-point based approach to recognition is not inconsistent with visual processing in biological systems that operate in ‘saccade and fixate ’ regimes, the methodology and emphasis in the human and the computer vision communities remains sharply distinct. Here, we make three contributions aiming to bridge this gap. First, we complement existing state-of-the art large scale dynamic computer vision annotated datasets like Hollywood-2 [1] and UCF Sports [2] with human eye movements collected under the ecological constraints of visual action and scene context recognition tasks. To our knowledge these are the first large human eye tracking datasets to be collected and made publicly available for video, vision.imar.ro/eyetracking (497,107 frames, each viewed by 19 subjects), unique in terms of their (a) large scale and computer vision relevance, (b) dynamic, video stimuli, (c) task control, as well as free-viewing. Second, we introduce novel dynamic consistency and alignment measures, which underline the remarkable stability of patterns of visual search among subjects. Third, we leverage the significant amount of collected data in order to pursue studies and build automatic, end-to-end trainable computer vision systems based on human eye movements. Our studies not only shed light on the differences between computer vision spatio-temporal interest point image sampling strategies and the human fixations, as well as their impact for visual recognition performance, but also demonstrate that human fixations can be accurately predicted, and when used in an end-to-end automatic system, leveraging some of the advanced computer vision practice, can lead to state of the art results. Index Terms—visual action recognition, human eye-movements, consistency analysis, saliency prediction, large scale learning F 1
Attribute-driven edge bundling for general graphs with applications in trail analysis
- In Proc. IEEE PacificVis
, 2015
"... Edge bundling methods reduce visual clutter of dense and occluded graphs. However, existing bundling techniques either ignore edge properties such as direction and data attributes, or are otherwise com-putationally not scalable, which makes them unsuitable for tasks such as exploration of large traj ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Edge bundling methods reduce visual clutter of dense and occluded graphs. However, existing bundling techniques either ignore edge properties such as direction and data attributes, or are otherwise com-putationally not scalable, which makes them unsuitable for tasks such as exploration of large trajectory datasets. We present a new framework to generate bundled graph layouts according to any nu-merical edge attributes such as directions, timestamps or weights. We propose a GPU-based implementation linear in number of edges, which makes our algorithm applicable to large datasets. We demon-strate our method with applications in the analysis of aircraft trajec-tory datasets and eye-movement traces.
How close are we to understanding image-based saliency? arXiv preprint arXiv:1409.7686
, 2014
"... Within the set of the many complex factors driving gaze placement, the properities of an image that are associated with fixations under “free viewing ” conditions have been studied ex-tensively. There is a general impression that the field is close to understanding this particu-lar association. Here ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
Within the set of the many complex factors driving gaze placement, the properities of an image that are associated with fixations under “free viewing ” conditions have been studied ex-tensively. There is a general impression that the field is close to understanding this particu-lar association. Here we frame saliency models probabalistically as point processes, allowing the calculation of log-likelihoods and bringing saliency evaluation into the domain of infor-mation. We compared the information gain of state-of-the-art models to a gold standard and find that only one third of the explainable spatial information is captured. We addition-ally provide a principled method to show where and how models fail to capture information in the fixations. Thus, contrary to previous as-sertions, purely spatial saliency remains a sig-nificant challenge.
How saliency, faces, and sound influence gaze in dynamic social scenes
- doi: 10.1167/14.8.5 PMID: 24993019 The Gaze-Replay Paradigm PLOSONE | DOI:10.1371/journal.pone.0134347 August 7, 2015 14 / 14
, 2014
"... Conversation scenes are a typical example in which classical models of visual attention dramatically fail to predict eye positions. Indeed, these models rarely consider faces as particular gaze attractors and never take into account the important auditory information that always accompanies dynamic ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Conversation scenes are a typical example in which classical models of visual attention dramatically fail to predict eye positions. Indeed, these models rarely consider faces as particular gaze attractors and never take into account the important auditory information that always accompanies dynamic social scenes. We recorded the eye movements of participants viewing dynamic conversations taking place in various contexts. Conversations were seen either with their original soundtracks or with unrelated soundtracks (unrelated speech and abrupt or continuous natural sounds). First, we analyze how auditory conditions influence the eye movement parameters of participants. Then, we model the probability distribution of eye positions across each video frame with a statistical method (ExpectationMaximization), allowing the relative contribution of different visual features such as static low-level visual saliency (based on luminance contrast), dynamic lowlevel visual saliency (based on motion amplitude), faces, and center bias to be quantified. Through experimental and modeling results, we show that regardless of the auditory condition, participants look more at faces, and especially at talking faces. Hearing the original soundtrack makes participants follow the speech turn-taking more closely. However, we do not find any difference between the different types of unrelated soundtracks. These eyetracking results are confirmed by our model that shows that faces, and particularly talking faces, are the features that best explain the gazes recorded, especially in the original soundtrack condition. Low-level saliency is not a relevant feature to explain eye positions made on social scenes, even dynamic ones. Finally, we propose groundwork for an audiovisual saliency model.
How visual attention is modified by disparities and textures changes?
"... The 3D image/video quality of experience is a multidimensional concept that depends on 2D image quality, depth quantity and visual comfort. The relationship between these parameters is not yet clearly defined. From this perspective, we aim to understand how texture complexity, depth quantity and vis ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
The 3D image/video quality of experience is a multidimensional concept that depends on 2D image quality, depth quantity and visual comfort. The relationship between these parameters is not yet clearly defined. From this perspective, we aim to understand how texture complexity, depth quantity and visual comfort influence the way people observe 3D content in comparison with 2D. Six scenes with different structural parameters were generated using Blender software. For these six scenes, the following parameters were modified: texture complexity and the amount of depth changing the camera baseline and the convergence distance at the shooting side. Our study was conducted using an eye-tracker and a 3DTV display. During the eye-tracking experiment, each observer freely examined images with different depth levels and texture complexities. To avoid memory bias, we ensured that each observer had only seen scene content once. Collected fixation data were used to build saliency maps and to analyze differences between 2D and 3D conditions. Our results show that the introduction of disparity shortened saccade length; however fixation durations remained unaffected. An analysis of the saliency maps did not reveal any differences between 2D and 3D conditions for the viewing duration of 20 s. When the whole period was divided into smaller intervals, we found that for the first 4 s the introduced disparity was conducive to the section of saliency regions. However, this contribution is quite minimal if the correlation between saliency maps is analyzed. Nevertheless, we did not find that discomfort (comfort) had any influence on visual attention. We believe that existing metrics and methods are depth insensitive and do not reveal such differences. Based on the analysis of heat maps and paired t-tests of inter-observer visual congruency values we deduced that the selected areas of interest depend on texture complexities.
Face exploration dynamics differentiate men and women
"... The human face is central to our everyday social interactions. Recent studies have shown that while gazing at faces, each one of us has a particular eyescanning pattern, highly stable across time. Although variables such as culture or personality have been shown to modulate gaze behavior, we still ..."
Abstract
- Add to MetaCart
The human face is central to our everyday social interactions. Recent studies have shown that while gazing at faces, each one of us has a particular eyescanning pattern, highly stable across time. Although variables such as culture or personality have been shown to modulate gaze behavior, we still don't know what shapes these idiosyncrasies. Moreover, most previous observations rely on static analyses of small-sized eyeposition data sets averaged across time. Here, we probe the temporal dynamics of gaze to explore what information can be extracted about the observers and what is being observed. Controlling for any stimuli effect, we demonstrate that among many individual characteristics, the gender of both the participant (gazer) and the person being observed (actor) are the factors that most influence gaze patterns during face exploration. We record and exploit the largest set of eyetracking data (405 participants, 58 nationalities) from participants watching videos of another person. Using novel data-mining techniques, we show that female gazers follow a much more exploratory scanning strategy than males. Moreover, female gazers watching female actresses look more at the eye on the left side. These results have strong implications in every field using gazebased models from computer vision to clinical psychology. Introduction Our eyes constantly move around to place our highresolution fovea on the most relevant visual information. Arguably, one of the most important objects of regard is another person's face. Until recently, a majority of face perception studies have been pointing to a ''universal'' face exploration pattern: Humans systematically follow a triangular scanpath (sequence of fixations) over the eyes and the mouth of the presented face Methods and results Experiment This data set has been described and used in a pupillometry study Participants We recorded the gaze of 459 visitors to the Science Museum of London, UK. We removed from the analysis the data of participants under age 18 (n ¼ 8) as well as 46 other participants whose eye data exhibited some irregularities (loss of signal, obviously shifted positions). The analyses are performed on a final group of 405 participants (203 males, 202 females). Mean age of participants 30.8 years (SD ¼ 11.5; males: M ¼ 32.3, SD ¼ 12.3; females: M ¼ 29.3, SD ¼ 10.5). The experiment was approved by the UCL research ethics committee and by the London Science Museum ethics board, and the methods were carried out in accordance with the approved guidelines. Signed informed consent was obtained from all participants. Stimuli Stimuli consisted of video clips of eight different actors (four females, four males, see Apparatus The experimental setup consisted of four computers: two for administering the personality questionnaire and two dedicated to the eye-tracking experiment and actor face-rating questionnaire (see Procedure). Each setup consisted of a stimulus presentation PC (Dell precision T3500 and Dell precision T3610) hooked up to a 19-in. LCD monitor (both 1280 3 1024 pixels, 49.98 3 39.98 of visual angle) at 60 Hz and an EyeLink 1000 kit (http:// www.sr-research.com/). Eye-tracking data was collected at 250 Hz. Participants sat 57 cm from the monitor, their head stabilized with a chin rest, forehead rest, and headband. A protective opaque white screen encased the monitor and part of the participant's head in order to shield the participant from environmental distractions. Procedure The study took place at the Live Science Pod in the Who Am I? exhibition of the London Science Museum. Journal of Vision (2016) 16(14):16, 1-19 Coutrot et al. 2 Downloaded from jov.arvojournals.org on 07/01/2019 The room had no windows, and the ambient luminance was very stable across the experiment. It consisted of three phases for a total duration of approximately 15 min. Phase 1 was a 10-item personality questionnaire based on the Big Five personality inventory The dispersion is the mean Euclidian distance between the eye positions of the same observers for a given clip. Small dispersion values reflect clustered eye positions. Scanpath modeling Hidden Markov models To grasp the highly dynamic and individualistic components of gaze behavior, we model participant's scanpaths using hidden Markov models (HMMs;
Taking a(c)count of eye movements: Multiple mechanisms underlie fixations during enumeration
"... $ We habitually move our eyes when we enumerate sets of objects. It remains unclear whether saccades are directed for numerosity processing as distinct from object-oriented visual processing (e.g., object saliency, scanning heuristics). Here we investigated the extent to which enumeration eye movem ..."
Abstract
- Add to MetaCart
$ We habitually move our eyes when we enumerate sets of objects. It remains unclear whether saccades are directed for numerosity processing as distinct from object-oriented visual processing (e.g., object saliency, scanning heuristics). Here we investigated the extent to which enumeration eye movements are contingent upon the location of objects in an array, and whether fixation patterns vary with enumeration demands. Twenty adults enumerated random dot arrays twice: first to report the set cardinality and second to judge the perceived number of subsets. We manipulated the spatial location of dots by presenting arrays at 08, 908, 1808, and 2708 orientations. Participants required a similar time to enumerate the set or the perceived number of subsets in the same array. Fixation patterns were systematically shifted in the direction of array rotation, and distributed across similar locations when the same array was shown on multiple occasions. We modeled fixation patterns and dot saliency using a simple filtering model and show participants judged groups of dots in close proximity (28-2.58 visual angle) as distinct subsets. Modeling results are consistent with the suggestion that enumeration involves visual grouping mechanisms based on object saliency, and specific enumeration demands affect spatial distribution of fixations. Our findings highlight the importance of set computation, rather than object processing per se, for models of numerosity processing.