Results 1  10
of
821
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope
 International Journal of Computer Vision
, 2001
"... In this paper, we propose a computational model of the recognition of real world scenes that bypasses the segmentation and the processing of individual objects or regions. The procedure is based on a very low dimensional representation of the scene, that we term the Spatial Envelope. We propose a se ..."
Abstract

Cited by 1287 (81 self)
 Add to MetaCart
(Show Context)
In this paper, we propose a computational model of the recognition of real world scenes that bypasses the segmentation and the processing of individual objects or regions. The procedure is based on a very low dimensional representation of the scene, that we term the Spatial Envelope. We propose a set of perceptual dimensions (naturalness, openness, roughness, expansion, ruggedness) that represent the dominant spatial structure of a scene. Then, we show that these dimensions may be reliably estimated using spectral and coarsely localized information. The model generates a multidimensional space in which scenes sharing membership in semantic categories (e.g., streets, highways, coasts) are projected closed together. The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category.
Sparse coding with an overcomplete basis set: a strategy employed by V1
 Vision Research
, 1997
"... The spatial receptive fields of simple cells in mammalian striate cortex have been reasonably well described physiologically and can be characterized as being localized, oriented, and ban@ass, comparable with the basis functions of wavelet transforms. Previously, we have shown that these receptive f ..."
Abstract

Cited by 954 (12 self)
 Add to MetaCart
(Show Context)
The spatial receptive fields of simple cells in mammalian striate cortex have been reasonably well described physiologically and can be characterized as being localized, oriented, and ban@ass, comparable with the basis functions of wavelet transforms. Previously, we have shown that these receptive field properties may be accounted for in terms of a strategy for producing a sparse distribution of output activity in response to natural images. Here, in addition to describing this work in a more expansive fashion, we examine the neurobiological implications of sparse coding. Of particular interest is the case when the code is overcompletei.e., when the number of code elements is greater than the effective dimensionality of the input space. Because the basis functions are nonorthogonal and not linearly independent of each other, sparsifying the code will recruit only those basis functions necessary for representing a given input, and so the inputoutput function will deviate from being purely linear. These deviations from linearity provide a potential explanation for the weak forms of nonlinearity observed in the response properties of cortical simple cells, and they further make predictions about the expected interactions among units in
Feature detection with automatic scale selection
 International Journal of Computer Vision
, 1998
"... The fact that objects in the world appear in different ways depending on the scale of observation has important implications if one aims at describing them. It shows that the notion of scale is of utmost importance when processing unknown measurement data by automatic methods. In their seminal works ..."
Abstract

Cited by 713 (34 self)
 Add to MetaCart
(Show Context)
The fact that objects in the world appear in different ways depending on the scale of observation has important implications if one aims at describing them. It shows that the notion of scale is of utmost importance when processing unknown measurement data by automatic methods. In their seminal works, Witkin (1983) and Koenderink (1984) proposed to approach this problem by representing image structures at different scales in a socalled scalespace representation. Traditional scalespace theory building on this work, however, does not address the problem of how to select local appropriate scales for further analysis. This article proposes a systematic methodology for dealing with this problem. A framework is proposed for generating hypotheses about interesting scale levels in image data, based on a general principle stating that local extrema over scales of different combinations of γnormalized derivatives are likely candidates to correspond to interesting structures. Specifically, it is shown how this idea can be used as a major mechanism in algorithms for automatic scale selection, which
Distortion invariant object recognition in the dynamic link architecture
 IEEE TRANSACTIONS ON COMPUTERS
, 1993
"... We present an object recognition system based on the Dynamic Link Architecture, which is an extension to classical Artificial Neural Networks. The Dynamic Link Architecture exploits correlations in the finescale temporal structure of cellular signals in order to group neurons dynamically into hig ..."
Abstract

Cited by 636 (81 self)
 Add to MetaCart
We present an object recognition system based on the Dynamic Link Architecture, which is an extension to classical Artificial Neural Networks. The Dynamic Link Architecture exploits correlations in the finescale temporal structure of cellular signals in order to group neurons dynamically into higherorder entities. These entities represent a very rich structure and can code for high level objects. In order to demonstrate the capabilities of the Dynamic Link Architecture we implemented a program that can recognize human faces and other objects from video images. Memorized objects are represented by sparse graphs, whose vertices are labeled by a multiresolution description in terms of a local power spectrum, and whose edges are labeled by geometrical distance vectors. Object recognition can be formulated as elastic graph matching, which is performed here by stochastic optimization of a matching cost function. Our implementation on a transputer network successfully achieves recognition of human faces and office objects from gray level camera images. The performance of the program is evaluated by a statistical analysis of recognition results from a portrait gallery comprising images of 87 persons.
The "Independent Components" of Natural Scenes are Edge Filters
, 1997
"... It has previously been suggested that neurons with line and edge selectivities found in primary visual cortex of cats and monkeys form a sparse, distributed representation of natural scenes, and it has been reasoned that such responses should emerge from an unsupervised learning algorithm that attem ..."
Abstract

Cited by 620 (29 self)
 Add to MetaCart
It has previously been suggested that neurons with line and edge selectivities found in primary visual cortex of cats and monkeys form a sparse, distributed representation of natural scenes, and it has been reasoned that such responses should emerge from an unsupervised learning algorithm that attempts to find a factorial code of independent visual features. We show here that a new unsupervised learning algorithm based on information maximization, a nonlinear "infomax" network, when applied to an ensemble of natural scenes produces sets of visual filters that are localized and oriented. Some of these filters are Gaborlike and resemble those produced by the sparsenessmaximization network. In addition, the outputs of these filters are as independent as possible, since this infomax network performs Independent Components Analysis or ICA, for sparse (supergaussian) component distributions. We compare the resulting ICA filters and their associated basis functions, with other decorrelating filters produced by Principal Components Analysis (PCA) and zerophase whitening filters (ZCA). The ICA filters have more sparsely distributed (kurtotic) outputs on natural scenes. They also resemble the receptive fields of simple cells in visual cortex, which suggests that these neurons form a natural, informationtheoretic
Image denoising using a scale mixture of Gaussians in the wavelet domain
 IEEE TRANS IMAGE PROCESSING
, 2003
"... We describe a method for removing noise from digital images, based on a statistical model of the coefficients of an overcomplete multiscale oriented basis. Neighborhoods of coefficients at adjacent positions and scales are modeled as the product of two independent random variables: a Gaussian vecto ..."
Abstract

Cited by 514 (17 self)
 Add to MetaCart
(Show Context)
We describe a method for removing noise from digital images, based on a statistical model of the coefficients of an overcomplete multiscale oriented basis. Neighborhoods of coefficients at adjacent positions and scales are modeled as the product of two independent random variables: a Gaussian vector and a hidden positive scalar multiplier. The latter modulates the local variance of the coefficients in the neighborhood, and is thus able to account for the empirically observed correlation between the coefficient amplitudes. Under this model, the Bayesian least squares estimate of each coefficient reduces to a weighted average of the local linear estimates over all possible values of the hidden multiplier variable. We demonstrate through simulations with images contaminated by additive white Gaussian noise that the performance of this method substantially surpasses that of previously published methods, both visually and in terms of mean squared error.
Nonnegative matrix factorization with sparseness constraints
 Jour. of
, 2004
"... www.cs.helsinki.fi/patrik.hoyer ..."
A Parametric Texture Model based on Joint Statistics of Complex Wavelet Coefficients
 INTERNATIONAL JOURNAL OF COMPUTER VISION
, 2000
"... We present a universal statistical model for texture images in the context of an overcomplete complex wavelet transform. The model is parameterized by a set of statistics computed on pairs of coefficients corresponding to basis functions at adjacent spatial locations, orientations, and scales. We de ..."
Abstract

Cited by 409 (13 self)
 Add to MetaCart
(Show Context)
We present a universal statistical model for texture images in the context of an overcomplete complex wavelet transform. The model is parameterized by a set of statistics computed on pairs of coefficients corresponding to basis functions at adjacent spatial locations, orientations, and scales. We develop an efficient algorithm for synthesizing random images subject to these constraints, by iteratively projecting onto the set of images satisfying each constraint, and we use this to test the perceptual validity of the model. In particular, we demonstrate the necessity of subgroups of the parameter set by showing examples of texture synthesis that fail when those parameters are removed from the set. We also demonstrate the power of our model by successfully synthesizing examples drawn from a diverse collection of artificial and natural textures.
Independent Component Filters Of Natural Images Compared With Simple Cells In Primary Visual Cortex
, 1998
"... this article we investigate to what extent the statistical properties of natural images can be used to understand the variation of receptive field properties of simple cells in the mammalian primary visual cortex. The receptive fields of simple cells have been studied extensively (e.g., Hubel & ..."
Abstract

Cited by 361 (0 self)
 Add to MetaCart
this article we investigate to what extent the statistical properties of natural images can be used to understand the variation of receptive field properties of simple cells in the mammalian primary visual cortex. The receptive fields of simple cells have been studied extensively (e.g., Hubel & Wiesel 1968, DeValois et al. 1982a, DeAngelis et al. 1993): they are localised in space and time, have bandpass characteristics in the spatial and temporal frequency domains, are oriented, and are often sensitive to the direction of motion of a stimulus. Here we will concentrate on the spatial properties of simple cells. Several hypotheses as to the function of these cells have been proposed. As the cells preferentially respond to oriented edges or lines, they can be viewed as edge or line detectors. Their joint localisation in both the spatial domain and the spatial frequency domain has led to the suggestion that they mimic Gabor filters, minimising uncertainty in both domains (Daugman 1980, Marcelja 1980). More recently, the match between the operations performed by simple cells and the wavelet transform has attracted attention (e.g., Field 1993). The approaches based on Gabor filters and wavelets basically consider processing by the visual cortex as a general image processing strategy, relatively independent of detailed assumptions about image statistics. On the other hand, the edge and line detector hypothesis is based on the intuitive notion that edges and lines are both abundant and important in images. This theme of relating simple cell properties with the statistics of natural images was explored extensively by Field (1987, 1994). He proposed that the cells are optimized specifically for coding natural images. He argued that one possibility for such a code, sparse coding...
Face recognition by independent component analysis
 IEEE Transactions on Neural Networks
, 2002
"... Abstract—A number of current face recognition algorithms use face representations found by unsupervised statistical methods. Typically these methods find a set of basis images and represent faces as a linear combination of those images. Principal component analysis (PCA) is a popular example of such ..."
Abstract

Cited by 333 (5 self)
 Add to MetaCart
(Show Context)
Abstract—A number of current face recognition algorithms use face representations found by unsupervised statistical methods. Typically these methods find a set of basis images and represent faces as a linear combination of those images. Principal component analysis (PCA) is a popular example of such methods. The basis images found by PCA depend only on pairwise relationships between pixels in the image database. In a task such as face recognition, in which important information may be contained in the highorder relationships among pixels, it seems reasonable to expect that better basis images may be found by methods sensitive to these highorder statistics. Independent component analysis (ICA), a generalization of PCA, is one such method. We used a version of ICA derived from the principle of optimal information transfer through sigmoidal neurons. ICA was performed on face images in the FERET database under two different architectures, one which treated the images as random variables and the pixels as outcomes, and a second which treated the pixels as random variables and the images as outcomes. The first architecture found spatially local basis images for the faces. The second architecture produced a factorial face code. Both ICA representations were superior to representations based on PCA for recognizing faces across days and changes in expression. A classifier that combined the two ICA representations gave the best performance. Index Terms—Eigenfaces, face recognition, independent component analysis (ICA), principal component analysis (PCA), unsupervised learning. I.