Results 1 - 10
of
11
Building the gist of a scene: the role of global image features in recognition
- Progress in Brain Research
, 2006
"... frequency, natural image Humans can recognize the gist of a novel image in a single glance, independent of its complexity. How is this remarkable feat accomplished? Based on behavioral and computational evidence, this paper describes a formal approach to the representation and the mechanism of scene ..."
Abstract
-
Cited by 66 (4 self)
- Add to MetaCart
frequency, natural image Humans can recognize the gist of a novel image in a single glance, independent of its complexity. How is this remarkable feat accomplished? Based on behavioral and computational evidence, this paper describes a formal approach to the representation and the mechanism of scene gist understanding, based on scene-centered, rather than objectcentered primitives. We show that the structure of a scene image can be estimated by the mean of global image features, providing a statistical summary of the spatial layout properties (Spatial Envelope representation) of the scene. Global features are based on configurations of spatial scales and are estimated without invoking segmentation or grouping operations. The scene-centered approach is not an alternative to local image analysis but would serve as a feed-forward and parallel pathway of visual processing, able to quickly constrain local feature analysis and enhance object recognition in cluttered natural scenes. 1
Contextual guidance of eye movements and attention in real-world scenes: The role of global features in object search
- PSYCHOLOGICAL REVIEW
, 2006
"... Many experiments have shown that the human visual system makes extensive use of contextual information for facilitating object search in natural scenes. However, the question of how to formally model contextual influences is still open. On the basis of a Bayesian framework, the authors present an or ..."
Abstract
-
Cited by 58 (4 self)
- Add to MetaCart
Many experiments have shown that the human visual system makes extensive use of contextual information for facilitating object search in natural scenes. However, the question of how to formally model contextual influences is still open. On the basis of a Bayesian framework, the authors present an original approach of attentional guidance by global scene context. The model comprises 2 parallel pathways; one pathway computes local features (saliency) and the other computes global (scenecentered) features. The contextual guidance model of attention combines bottom-up saliency, scene context, and top-down mechanisms at an early stage of visual processing and predicts the image regions likely to be fixated by human observers performing natural search tasks in real-world scenes.
From appearance to context-based recognition: Dense labeling in small images
, 2008
"... Traditionally, object recognition is performed based solely on the appearance of the object. However, relevant information also exists in the scene surrounding the object. As supported by our human studies, this contextual information is necessary for accurate recognition in low resolution images. T ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
Traditionally, object recognition is performed based solely on the appearance of the object. However, relevant information also exists in the scene surrounding the object. As supported by our human studies, this contextual information is necessary for accurate recognition in low resolution images. This scenario with impoverished appearance information, as opposed to using images of higher resolution, provides an appropriate venue for studying the role of context in recognition. In this paper, we explore the role of context for dense scene labeling in small images. Given a segmentation of an image, our algorithm assigns each segment to an object category based on the segment’s appearance and contextual information. We explicitly model context between object categories through the use of relative location and relative scale, in addition to co-occurrence. We perform recognition tests on low and high resolution images, which vary significantly in the amount of appearance information present, using just the object appearance information, the combination of appearance and context, as well as just context without object appearance information (blind recognition). We also perform these tests in human studies and analyze our findings to reveal interesting patterns. With the use of our context model, our algorithm achieves state-of-the-art performance on MSRC and Corel. datasets.
An Attention-Driven Model for Grouping Similar Images with Image Retrieval Applications
, 2006
"... Recent work in the computational modeling of visual attention has demonstrated that a purely bottom-up approach to identifying salient regions within an image can be successfully applied to diverse and practical problems from target recognition to the placement of advertisement. This paper propo ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
Recent work in the computational modeling of visual attention has demonstrated that a purely bottom-up approach to identifying salient regions within an image can be successfully applied to diverse and practical problems from target recognition to the placement of advertisement. This paper proposes an application of a combination of computational models of visual attention to the image retrieval problem. We demonstrate that certain shortcomings of existing content-based image retrieval solutions can be addressed by implementing a biologically-motivated, unsupervised way of grouping together images whose salient regions of interest (ROIs) are perceptually similar regardless of the visual contents of other (less relevant) parts of the image. We propose a model in which only the salient regions of an image are encoded as ROIs whose features are then compared against previously seen ROIs and assigned cluster membership accordingly. Experimental results show that the proposed approach works well for several combinations of feature extraction techniques and clustering algorithms, suggesting a promising avenue for future improvements, such as the addition of a top-down component and the inclusion of a relevance feedback mechanism.
Using Visual Attention to Extract Regions of Interest in the Context of Image Retrieval
, 2006
"... Recent research on computational modeling of visual attention has demonstrated that a bottom-up approach to identifying salient regions within an image can be applied to diverse and practical problems for which conventional machine vision techniques have not succeeded in producing robust solutions. ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Recent research on computational modeling of visual attention has demonstrated that a bottom-up approach to identifying salient regions within an image can be applied to diverse and practical problems for which conventional machine vision techniques have not succeeded in producing robust solutions. This paper proposes a new method for extracting regions of interest (ROIs) from images using models of visual attention. It is presented in the context of improving content-based image retrieval (CBIR) solutions by implementing a biologically-motivated, unsupervised technique of grouping together images whose salient ROIs are perceptually similar. In this paper we focus on the process of extracting the salient regions of an image. The excellent results obtained with the proposed method have demonstrated that the ROIs of the images can be independently indexed for comparison against other regions on the basis of similarity for use in a CBIR solution.
Attention, Consciousness, and Data Display
"... Recent advances in our understanding of visual perception have shown it to be a far more complex and counterintuitive process than previously believed. Several important consequences follow from this. First, the design of an effective statistical graphics system is unlikely to succeed based on intui ..."
Abstract
- Add to MetaCart
Recent advances in our understanding of visual perception have shown it to be a far more complex and counterintuitive process than previously believed. Several important consequences follow from this. First, the design of an effective statistical graphics system is unlikely to succeed based on intuition alone; instead, it must rely on a more sophisticated, systematic approach. The basic elements of such an approach are outlined here, along with several design principles. An overview is then given of recent advances in our understanding of visual perception, including rapid perception, visual attention, and scene perception. It then is argued that the mechanisms involved can be successfully harnessed to allow data to be displayed more effectively than at present. Several directions of development are discussed, including effective use of visual attention, the display of dynamic information, and the effective use of nonattentional and nonconscious perceptual systems.
The Modeling and Control of Visual Perception
, 2007
"... Recent developments in vision science have resulted in several major changes in our understanding of human visual perception. For example, attention no longer appears necessary for “visual intelligence”—a large amount of sophisticated processing can be done without it. Scene perception no longer ap ..."
Abstract
- Add to MetaCart
Recent developments in vision science have resulted in several major changes in our understanding of human visual perception. For example, attention no longer appears necessary for “visual intelligence”—a large amount of sophisticated processing can be done without it. Scene perception no longer appears to involve static, general-purpose descriptions, but instead may involve dynamic representations whose content depends on the individual and the task. And vision itself no longer appears to be limited to the production of a conscious “picture”—it may also guide processes outside the conscious awareness of the observer. This chapter surveys some of these new developments and sketches the potential implications they have for the way that vision is modeled and controlled. Emphasis is placed on the emerging view that visual perception involves the sophisticated coordination of several quasi-independent systems, each with its own kind of intelligence. Several consequences of this view will be discussed, including new possibilities for human-machine interaction.
IMAGE RETRIEVAL USING VISUAL ATTENTION
"... Let the honor of your student be as dear to you as your own, the honor of your colleague as the reverence for your teacher, and the reverence for your teacher as the fear of Heaven. Rabbi Elazar ben Shammua, Pirkei Avot My mentor and dear friend Dr. Oge Marques deserves special thanks. His genuine d ..."
Abstract
- Add to MetaCart
Let the honor of your student be as dear to you as your own, the honor of your colleague as the reverence for your teacher, and the reverence for your teacher as the fear of Heaven. Rabbi Elazar ben Shammua, Pirkei Avot My mentor and dear friend Dr. Oge Marques deserves special thanks. His genuine dedication to learning had an impact on me from the moment this work began. Our many discussions were both academically challenging and enlightening. His advice and support were essential to the successful completion of this research. The guidance of Dr. Borko Furht, not only during the course of this dissertation, but since the start of my undergraduate studies, has been invaluable. It was his encouragement that first motivated me to pursue this degree, and for that I will always be grateful. Dr. Hari Kalva provided thoughtful insight as well as resources without which many of the results in this dissertation would not have been possible to obtain. I truly appreciate his help and support.
In a blink of an eye and a switch of a transistor: Cortically-coupled computer vision
"... have resulted in the increasingly problematic issue of information overload–i.e. we have more access to information than we can possibly process. This is nowhere more apparent than in the volume of imagery and video that we can access on a daily basis–for the general public, availability of YouTube ..."
Abstract
- Add to MetaCart
have resulted in the increasingly problematic issue of information overload–i.e. we have more access to information than we can possibly process. This is nowhere more apparent than in the volume of imagery and video that we can access on a daily basis–for the general public, availability of YouTube video and Google Images, or for the image analysis professional tasked with searching security video or satellite reconnaissance. Which images to look at and how to ensure we see the images that are of most interest to us, begs the question of whether there are smart ways to triage this volume of imagery. Over the past decade, computer vision research has focused on the issue of ranking and indexing imagery. However, computer vision is limited in its ability to identify interesting imagery, particularly as “interesting" might be defined by an individual. In this paper we describe our efforts in developing brain computer interfaces (BCIs) which synergistically integrate computer vision and human vision so as to construct a system for image triage. Our approach exploits machine learning for real-time decoding of brain signals which are recorded non-invasively via electroencephalography (EEG). The signals we decode are specific for events related to imagery attracting a user’s attention. We describe two architectures we have developed for this type of cortically-coupled computer vision and discuss potential applications and challenges for the future. Index Terms—brain computer interface, electroencephalography, computer vision, image triage, image search
Reprints and permission: sagepub.com/journalsPermissions.nav
"... Observers can store thousands of object images in visual long-term memory with high fidelity, but the fidelity of scene representations in long-term memory is not known. Here, we probed scene-representation fidelity by varying the number of studied exemplars in different scene categories and testing ..."
Abstract
- Add to MetaCart
Observers can store thousands of object images in visual long-term memory with high fidelity, but the fidelity of scene representations in long-term memory is not known. Here, we probed scene-representation fidelity by varying the number of studied exemplars in different scene categories and testing memory using exemplar-level foils. Observers viewed thousands of scenes over 5.5 hr and then completed a series of forced-choice tests. Memory performance was high, even with up to 64 scenes from the same category in memory. Moreover, there was only a 2 % decrease in accuracy for each doubling of the number of studied scene exemplars. Surprisingly, this degree of categorical interference was similar to the degree previously demonstrated for object memory. Thus, although scenes have often been defined as a superset of objects, our results suggest that scenes and objects may be entities at a similar level of abstraction in visual long-term memory.

