Results 1 - 10
of
133
A Model of Saliency-based Visual Attention for Rapid Scene Analysis
, 1998
"... A visual attention system, inspired by the behavior and the neuronal architecture of the early primate visual system, is presented. Multiscale image features are combined into a single topographical saliency map. A dynamical neural network then selects attended locations in order of decreasing salie ..."
Abstract
-
Cited by 688 (50 self)
- Add to MetaCart
A visual attention system, inspired by the behavior and the neuronal architecture of the early primate visual system, is presented. Multiscale image features are combined into a single topographical saliency map. A dynamical neural network then selects attended locations in order of decreasing saliency. The system breaks down the complex problem of scene understanding by rapidly selecting, in a computationally efficient manner, conspicuous locations to be analyzed in detail. Index terms: Visual attention, scene analysis, feature extraction, target detection, visual search. \Pi I. Introduction Primates have a remarkable ability to interpret complex scenes in real time, despite the limited speed of the neuronal hardware available for such tasks. Intermediate and higher visual processes appear to select a subset of the available sensory information before further processing [1], most likely to reduce the complexity of scene analysis [2]. This selection appears to be implemented in the ...
Hierarchical Models of Object Recognition in Cortex
, 1999
"... The classical model of visual processing in cortex is a hierarchy of increasingly sophisticated representations, extending in a natural way the model of simple to complex cells of Hubel and Wiesel. Somewhat surprisingly, little quantitative modeling has been done in the last 15 years to explore th ..."
Abstract
-
Cited by 345 (67 self)
- Add to MetaCart
The classical model of visual processing in cortex is a hierarchy of increasingly sophisticated representations, extending in a natural way the model of simple to complex cells of Hubel and Wiesel. Somewhat surprisingly, little quantitative modeling has been done in the last 15 years to explore the biological feasibility of this class of models to explain higher level visual processing, such as object recognition. We describe a new hierarchical model that accounts well for this complex visual task, is consistent with several recent physiological experiments in inferotemporal cortex and makes testable predictions. The model is based on a novel MAX-like operation on the inputs to certain cortical neurons which may have a general role in cortical function.
Separating style and content with bilinear models
- NEURAL COMPUTATION
, 2000
"... PERCEPTUAL systems routinely separate content from style, classifying familiar words spoken in an unfamiliar accent, identifying a font or handwriting style across letters, or recognizing a familiar face or object seen under unfamiliar viewing conditions. Yet a general and tractable computational mo ..."
Abstract
-
Cited by 119 (3 self)
- Add to MetaCart
PERCEPTUAL systems routinely separate content from style, classifying familiar words spoken in an unfamiliar accent, identifying a font or handwriting style across letters, or recognizing a familiar face or object seen under unfamiliar viewing conditions. Yet a general and tractable computational model of this ability to untangle the underlying factors of perceptual observations remains elusive. Existing factor models are either insufficiently rich to capture the complex interactions of perceptually meaningful factors such as phoneme and speaker accent or letter and font, or do not allow efficient learning algorithms. Here we show how perceptual systems may learn to solve these crucial tasks using surprisingly simple bilinear models. We report promising results in three realistic perceptual domains: spoken vowel classification with a benchmark multi-speaker database, extrapolation of fonts to unseen letters, and translation of faces to novel illuminants.
Slow Feature Analysis: Unsupervised Learning of Invariances
"... Invariant features of temporally varying signals are useful for analysis and classification. Slow feature analysis (SFA) is a new method for learning invariant or slowly varying features from a vectorial input signal. It is based on a nonlinear expansion of the input signal and application of princi ..."
Abstract
-
Cited by 98 (9 self)
- Add to MetaCart
Invariant features of temporally varying signals are useful for analysis and classification. Slow feature analysis (SFA) is a new method for learning invariant or slowly varying features from a vectorial input signal. It is based on a nonlinear expansion of the input signal and application of principal component analysis to this expanded signal and its time derivative. It is guaranteed to find the optimal solution within a family of functions directly and can learn to extract a large number of decorrelated features, which are ordered by their degree of invariance. SFA can be applied hierarchically to process high-dimensional input signals and extract complex features. SFA is applied first to complex cell tuning properties based on simple cell output, including disparity and motion. Then more complicated input-output functions are learned by repeated application of SFA. Finally, a hierarchical network of SFA modules is presented as a simple model of the visual system. The same unstructured network can learn translation, size, rotation, contrast, or, to a lesser degree, illumination invariance for one-dimensional objects, depending on only the training stimulus. Surprisingly, only a few training objects suffice to achieve good generalization to new objects. The generated representation is suitable for object recognition. Performance degrades if the network is trained to learn multiple invariances simultaneously.
Representation of spatial orientation by the intrinsic dynamics of the head-direction cell ensemble: A theory
- J. Neurosci
, 1996
"... The head-direction (HD) cells found in the limbic system in freely moving rats represent the instantaneous head direction of the animal in the horizontal plane regardless of the location of the animal. The internal direction represented by these cells uses both self-motion information for inet-tiall ..."
Abstract
-
Cited by 94 (1 self)
- Add to MetaCart
The head-direction (HD) cells found in the limbic system in freely moving rats represent the instantaneous head direction of the animal in the horizontal plane regardless of the location of the animal. The internal direction represented by these cells uses both self-motion information for inet-tially based updating and familiar visual landmarks for calibration. Here, a model of the dynamics of the HD cell ensemble is presented. The sta-bility of a localized static activity profile in the network and a dynamic shift mechanism are explained naturally by synaptic weight distribution components with even and odd symmetry, respectively. Under symmetric weights or symmetric reciprocal connections, a stable activity profile close to the known direc-tional tuning curves will emerge. By adding a slight asymmetry to the weights, the activity profile will shift continuously without 1
A Comparison of Feature Combination Strategies for Saliency-Based Visual Attention Systems
- Journal of Electronic Imaging
, 1999
"... Bottom-up or saliency-based visual attention allows primates to detect non-specific conspicuous targets in cluttered scenes. A classical metaphor, derived from electrophysiological and psychophysical studies, describes attention as a rapidly shiftable "spotlight". The model described here reproduces ..."
Abstract
-
Cited by 81 (15 self)
- Add to MetaCart
Bottom-up or saliency-based visual attention allows primates to detect non-specific conspicuous targets in cluttered scenes. A classical metaphor, derived from electrophysiological and psychophysical studies, describes attention as a rapidly shiftable "spotlight". The model described here reproduces the attentional scanpaths of this spotlight: Simple multi-scale "feature maps" detect local spatial discontinuities in intensity, color, orientation or optical flow, and are combined into a unique "master" or "saliency" map. The saliency map is sequentially scanned, in order of decreasing saliency, by the focus of attention. We study the problem of combining feature maps, from different visual modalities and with unrelated dynamic ranges (such as color and motion), into a unique saliency map. Four combination strategies are compared using three databases of natural color images: (1) Simple normalized summation, (2) linear combination with learned weights, (3) global non-linear normalization...
Are cortical models really bound by the “Binding Problem
- Neuron
, 1999
"... Address correspondence to T.P. The usual description of visual processing in cortex is an extension of the simple to complex hi-erarchy postulated by Hubel and Wiesel — a feedforward sequence of more and more complex and invariant features. The capability of this class of models to perform higher le ..."
Abstract
-
Cited by 41 (16 self)
- Add to MetaCart
Address correspondence to T.P. The usual description of visual processing in cortex is an extension of the simple to complex hi-erarchy postulated by Hubel and Wiesel — a feedforward sequence of more and more complex and invariant features. The capability of this class of models to perform higher level visual processing such as viewpoint-invariant object recognition in cluttered scenes has been questioned in recent years by several researchers, who in turn proposed an alternative class of models based on the synchro-nization of large assemblies of cells, within and across cortical areas. The main implicit argument for this novel and controversial view was the assumption that hierarchical models cannot deal with the computational requirements of high level vision and suffer from the so-called “binding problem”. We review the present situation and discuss theoretical and experimental evidence showing that the perceived weaknesses of hierarchical models are not true. In particular, we show that recognition of multiple objects in cluttered scenes, arguably among the most difficult tasks in vision, can be done in a hierarchical feedforward model. 1
A theory of object recognition: computations and circuits in the feedforward path of the ventral stream in primate visual cortex
, 2005
"... ..."
View-Based Object Recognition Using Saliency Maps
, 1998
"... We introduce a novel view-based object representation, called the saliency map graph (SMG), which captures the salient regions of an object view at multiple scales using a wavelet transform. This compact representation is highly invariant to translation, rotation (image and depth), and scaling, and ..."
Abstract
-
Cited by 38 (6 self)
- Add to MetaCart
We introduce a novel view-based object representation, called the saliency map graph (SMG), which captures the salient regions of an object view at multiple scales using a wavelet transform. This compact representation is highly invariant to translation, rotation (image and depth), and scaling, and offers the locality of representation required for occluded object recognition. To compare two saliency map graphs, we introduce two graph similarity algorithms. The first computes the topological similarity between two SMG's, providing a coarse-level matching of two graphs. The second computes the geometrical similarity between two SMG's, providing a fine-level matching of two graphs. We test and compare these two algorithms on a large database of model object views. Keywords: View-Based Object Recognition, Shape Representation and Recovery, Graph Matching. 1 Introduction The view-based approach to 3-D object recognition represents an object as a collection of 2-D views, sometimes called...

