Results 1  10
of
615
Sparse coding with an overcomplete basis set: a strategy employed by V1
 Vision Research
, 1997
"... The spatial receptive fields of simple cells in mammalian striate cortex have been reasonably well described physiologically and can be characterized as being localized, oriented, and ban@ass, comparable with the basis functions of wavelet transforms. Previously, we have shown that these receptive f ..."
Abstract

Cited by 959 (9 self)
 Add to MetaCart
The spatial receptive fields of simple cells in mammalian striate cortex have been reasonably well described physiologically and can be characterized as being localized, oriented, and ban@ass, comparable with the basis functions of wavelet transforms. Previously, we have shown that these receptive field properties may be accounted for in terms of a strategy for producing a sparse distribution of output activity in response to natural images. Here, in addition to describing this work in a more expansive fashion, we examine the neurobiological implications of sparse coding. Of particular interest is the case when the code is overcompletei.e., when the number of code elements is greater than the effective dimensionality of the input space. Because the basis functions are nonorthogonal and not linearly independent of each other, sparsifying the code will recruit only those basis functions necessary for representing a given input, and so the inputoutput function will deviate from being purely linear. These deviations from linearity provide a potential explanation for the weak forms of nonlinearity observed in the response properties of cortical simple cells, and they further make predictions about the expected interactions among units in
The "Independent Components" of Natural Scenes are Edge Filters
, 1997
"... It has previously been suggested that neurons with line and edge selectivities found in primary visual cortex of cats and monkeys form a sparse, distributed representation of natural scenes, and it has been reasoned that such responses should emerge from an unsupervised learning algorithm that attem ..."
Abstract

Cited by 616 (29 self)
 Add to MetaCart
It has previously been suggested that neurons with line and edge selectivities found in primary visual cortex of cats and monkeys form a sparse, distributed representation of natural scenes, and it has been reasoned that such responses should emerge from an unsupervised learning algorithm that attempts to find a factorial code of independent visual features. We show here that a new unsupervised learning algorithm based on information maximization, a nonlinear "infomax" network, when applied to an ensemble of natural scenes produces sets of visual filters that are localized and oriented. Some of these filters are Gaborlike and resemble those produced by the sparsenessmaximization network. In addition, the outputs of these filters are as independent as possible, since this infomax network performs Independent Components Analysis or ICA, for sparse (supergaussian) component distributions. We compare the resulting ICA filters and their associated basis functions, with other decorrelating filters produced by Principal Components Analysis (PCA) and zerophase whitening filters (ZCA). The ICA filters have more sparsely distributed (kurtotic) outputs on natural scenes. They also resemble the receptive fields of simple cells in visual cortex, which suggests that these neurons form a natural, informationtheoretic
Blind Signal Separation: Statistical Principles
, 2003
"... Blind signal separation (BSS) and independent component analysis (ICA) are emerging techniques of array processing and data analysis, aiming at recovering unobserved signals or `sources' from observed mixtures (typically, the output of an array of sensors), exploiting only the assumption of mut ..."
Abstract

Cited by 530 (4 self)
 Add to MetaCart
Blind signal separation (BSS) and independent component analysis (ICA) are emerging techniques of array processing and data analysis, aiming at recovering unobserved signals or `sources' from observed mixtures (typically, the output of an array of sensors), exploiting only the assumption of mutual independence between the signals. The weakness of the assumptions makes it a powerful approach but requires to venture beyond familiar second order statistics. The objective of this paper is to review some of the approaches that have been recently developed to address this exciting problem, to show how they stem from basic principles and how they relate to each other.
Kernel independent component analysis
 Journal of Machine Learning Research
, 2002
"... We present a class of algorithms for independent component analysis (ICA) which use contrast functions based on canonical correlations in a reproducing kernel Hilbert space. On the one hand, we show that our contrast functions are related to mutual information and have desirable mathematical propert ..."
Abstract

Cited by 463 (24 self)
 Add to MetaCart
We present a class of algorithms for independent component analysis (ICA) which use contrast functions based on canonical correlations in a reproducing kernel Hilbert space. On the one hand, we show that our contrast functions are related to mutual information and have desirable mathematical properties as measures of statistical dependence. On the other hand, building on recent developments in kernel methods, we show that these criteria can be computed efficiently. Minimizing these criteria leads to flexible and robust algorithms for ICA. We illustrate with simulations involving a wide variety of source distributions, showing that our algorithms outperform many of the presently known algorithms. 1.
Learning Overcomplete Representations
, 2000
"... In an overcomplete basis, the number of basis vectors is greater than the dimensionality of the input, and the representation of an input is not a unique combination of basis vectors. Overcomplete representations have been advocated because they have greater robustness in the presence of noise, can ..."
Abstract

Cited by 355 (10 self)
 Add to MetaCart
(Show Context)
In an overcomplete basis, the number of basis vectors is greater than the dimensionality of the input, and the representation of an input is not a unique combination of basis vectors. Overcomplete representations have been advocated because they have greater robustness in the presence of noise, can be sparser, and can have greater flexibility in matching structure in the data. Overcomplete codes have also been proposed as a model of some of the response properties of neurons in primary visual cortex. Previous work has focused on finding the best representation of a signal using a fixed overcomplete basis (or dictionary). We present an algorithm for learning an overcomplete basis by viewing it as probabilistic model of the observed data. We show that overcomplete bases can yield a better approximation of the underlying statistical distribution of the data and can thus lead to greater coding efficiency. This can be viewed as a generalization of the technique of independent component analysis and provides a method for Bayesian reconstruction of signals in the presence of noise and for blind source separation when there are more sources than mixtures.
A Unifying Review of Linear Gaussian Models
, 1999
"... Factor analysis, principal component analysis, mixtures of gaussian clusters, vector quantization, Kalman filter models, and hidden Markov models can all be unified as variations of unsupervised learning under a single basic generative model. This is achieved by collecting together disparate observa ..."
Abstract

Cited by 350 (18 self)
 Add to MetaCart
(Show Context)
Factor analysis, principal component analysis, mixtures of gaussian clusters, vector quantization, Kalman filter models, and hidden Markov models can all be unified as variations of unsupervised learning under a single basic generative model. This is achieved by collecting together disparate observations and derivations made by many previous authors and introducing a new way of linking discrete and continuous state models using a simple nonlinearity. Through the use of other nonlinearities, we show how independent component analysis is also a variation of the same basic generative model. We show that factor analysis and mixtures of gaussians can be implemented in autoencoder neural networks and learned using squared error plus the same regularization term. We introduce a new model for static data, known as sensible principal component analysis, as well as a novel concept of spatially adaptive observation noise. We also review some of the literature involving global and local mixtures of the basic models and provide pseudocode for inference and learning for all the basic models.
Face recognition by independent component analysis
 IEEE Transactions on Neural Networks
, 2002
"... Abstract—A number of current face recognition algorithms use face representations found by unsupervised statistical methods. Typically these methods find a set of basis images and represent faces as a linear combination of those images. Principal component analysis (PCA) is a popular example of such ..."
Abstract

Cited by 346 (5 self)
 Add to MetaCart
(Show Context)
Abstract—A number of current face recognition algorithms use face representations found by unsupervised statistical methods. Typically these methods find a set of basis images and represent faces as a linear combination of those images. Principal component analysis (PCA) is a popular example of such methods. The basis images found by PCA depend only on pairwise relationships between pixels in the image database. In a task such as face recognition, in which important information may be contained in the highorder relationships among pixels, it seems reasonable to expect that better basis images may be found by methods sensitive to these highorder statistics. Independent component analysis (ICA), a generalization of PCA, is one such method. We used a version of ICA derived from the principle of optimal information transfer through sigmoidal neurons. ICA was performed on face images in the FERET database under two different architectures, one which treated the images as random variables and the pixels as outcomes, and a second which treated the pixels as random variables and the images as outcomes. The first architecture found spatially local basis images for the faces. The second architecture produced a factorial face code. Both ICA representations were superior to representations based on PCA for recognizing faces across days and changes in expression. A classifier that combined the two ICA representations gave the best performance. Index Terms—Eigenfaces, face recognition, independent component analysis (ICA), principal component analysis (PCA), unsupervised learning. I.
Analysis of fMRI Data by Blind Separation Into Independent Spatial Components
 HUMAN BRAIN MAPPING 6:160–188(1998)
, 1998
"... Current analytical techniques applied to functional magnetic resonance imaging (fMRI) data require a priori knowledge or specific assumptions about the time courses of processes contributing to the measured signals. Here we describe a new method for analyzing fMRI data based on the independent comp ..."
Abstract

Cited by 316 (18 self)
 Add to MetaCart
Current analytical techniques applied to functional magnetic resonance imaging (fMRI) data require a priori knowledge or specific assumptions about the time courses of processes contributing to the measured signals. Here we describe a new method for analyzing fMRI data based on the independent component analysis (ICA) algorithm of Bell and Sejnowski ([1995]: Neural Comput 7:1129–1159). We decomposed eight fMRI data sets from 4 normal subjects performing Stroop colornaming, the Brown and Peterson word/number task, and control tasks into spatially independent components. Each component consisted of voxel values at fixed threedimensional locations (a component ‘‘map’’), and a unique associated time course of activation. Given data from 144 time points collected during a 6min trial, ICA extracted an equal number of spatially independent components. In all eight trials, ICA derived one and only one component with a time course closely matching the time course of 40sec alternations between experimental and control tasks. The regions of maximum activity in these consistently taskrelated components generally overlapped active regions detected by standard correlational analysis, but included frontal regions not detected by correlation. Time courses of other ICA components were transiently taskrelated, quasiperiodic, or slowly varying. By utilizing higherorder statistics to enforce successively stricter criteria for spatial independence between component maps, both the ICA algorithm and a related
Independent Component Analysis Using an Extended Infomax Algorithm for Mixed Subgaussian and Supergaussian Sources
, 1999
"... An extension of the infomax algorithm of Bell and Sejnowski (1995) is presented that is able blindly to separate mixed signals with sub and supergaussian source distributions. This was achieved by using a simple type of learning rule first derived by Girolami (1997) by choosing negentropy as a proj ..."
Abstract

Cited by 315 (22 self)
 Add to MetaCart
An extension of the infomax algorithm of Bell and Sejnowski (1995) is presented that is able blindly to separate mixed signals with sub and supergaussian source distributions. This was achieved by using a simple type of learning rule first derived by Girolami (1997) by choosing negentropy as a projection pursuit index. Parameterized probability distributions that have sub and supergaussian regimes were used to derive a general learning rule that preserves the simple architecture proposed by Bell and Sejnowski (1995), is optimized using the natural gradient by Amari (1998), and uses the stability analysis of Cardoso and Laheld (1996) to switch between sub and supergaussian regimes. We demonstrate that the extended infomax algorithm is able to separate 20 sources with a variety of source distributions easily. Applied to highdimensional data from electroencephalographic recordings, it is effective at separating artifacts such as eye blinks and line noise from weaker electrical signals that arise from sources in the brain.