Results 1  10
of
110
Efficient learning of sparse representations with an energybased model
 ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS (NIPS 2006
, 2006
"... We describe a novel unsupervised method for learning sparse, overcomplete features. The model uses a linear encoder, and a linear decoder preceded by a sparsifying nonlinearity that turns a code vector into a quasibinary sparse code vector. Given an input, the optimal code minimizes the distance b ..."
Abstract

Cited by 219 (15 self)
 Add to MetaCart
(Show Context)
We describe a novel unsupervised method for learning sparse, overcomplete features. The model uses a linear encoder, and a linear decoder preceded by a sparsifying nonlinearity that turns a code vector into a quasibinary sparse code vector. Given an input, the optimal code minimizes the distance between the output of the decoder and the input patch while being as similar as possible to the encoder output. Learning proceeds in a twophase EMlike fashion: (1) compute the minimumenergy code vector, (2) adjust the parameters of the encoder and decoder so as to decrease the energy. The model produces “stroke detectors ” when trained on handwritten numerals, and Gaborlike filters when trained on natural image patches. Inference and learning are very fast, requiring no preprocessing, and no expensive sampling. Using the proposed unsupervised method to initialize the first layer of a convolutional network, we achieved an error rate slightly lower than the best reported result on the MNIST dataset. Finally, an extension of the method is described to learn topographical filter maps. 1
Learning Invariant Features through Topographic Filter Maps
"... Several recentlyproposed architectures for highperformance object recognition are composed of two main stages: a feature extraction stage that extracts locallyinvariant feature vectors from regularly spaced image patches, and a somewhat generic supervised classifier. The first stage is often compos ..."
Abstract

Cited by 119 (20 self)
 Add to MetaCart
(Show Context)
Several recentlyproposed architectures for highperformance object recognition are composed of two main stages: a feature extraction stage that extracts locallyinvariant feature vectors from regularly spaced image patches, and a somewhat generic supervised classifier. The first stage is often composed of three main modules: (1) a bank of filters (often oriented edge detectors); (2) a nonlinear transform, such as a pointwise squashing functions, quantization, or normalization; (3) a spatial pooling operation which combines the outputs of similar filters over neighboring regions. We propose a method that automatically learns such feature extractors in an unsupervised fashion by simultaneously learning the filters and the pooling units that combine multiple filter outputs together. The method automatically generates topographic maps of similar filters that extract features of orientations, scales, and positions. These similar filters are pooled together, producing locallyinvariant outputs. The learned feature descriptors give comparable results as SIFT on image recognition tasks for which SIFT is well suited, and better results than SIFT on tasks for which SIFT is less well suited. 1.
Neural correlations, population coding and computation.
 Nat. Rev. Neurosci.
, 2006
"... of one neuron around its average are not correlated with the fluctuations of other neurons, population coding is relatively well understood. Specifically, we know which factors control the amount of information a population code contains For these reasons, it is essential that we gain a thorough u ..."
Abstract

Cited by 95 (4 self)
 Add to MetaCart
of one neuron around its average are not correlated with the fluctuations of other neurons, population coding is relatively well understood. Specifically, we know which factors control the amount of information a population code contains For these reasons, it is essential that we gain a thorough understanding of both the correlational structure in the brain and its impact on population coding. Progress has been made on both fronts by adopting two complementary perspectives
A theory of object recognition: computations and circuits in the feedforward path of the ventral stream in primate visual cortex
, 2005
"... ..."
(Show Context)
Energybased models for sparse overcomplete representations
 Journal of Machine Learning Research
, 2003
"... We present a new way of extending independent components analysis (ICA) to overcomplete representations. In contrast to the causal generative extensions of ICA which maintain marginal independence of sources, we define features as deterministic (linear) functions of the inputs. This assumption resul ..."
Abstract

Cited by 69 (15 self)
 Add to MetaCart
We present a new way of extending independent components analysis (ICA) to overcomplete representations. In contrast to the causal generative extensions of ICA which maintain marginal independence of sources, we define features as deterministic (linear) functions of the inputs. This assumption results in marginal dependencies among the features, but conditional independence of the features given the inputs. By assigning energies to the features a probability distribution over the input states is defined through the Boltzmann distribution. Free parameters of this model are trained using the contrastive divergence objective (Hinton, 2002). When the number of features is equal to the number of input dimensions this energybased model reduces to noiseless ICA and we show experimentally that the proposed learning algorithm is able to perform blind source separation on speech data. In additional experiments we train overcomplete energybased models to extract features from various standard datasets containing speech, natural images, handwritten digits and faces.
Synergy, Redundancy, and Independence in Population Codes
 The Journal of Neuroscience
, 2003
"... A key issue in understanding the neural code for an ensemble of neurons is the nature and strength of correlations between neurons and how these correlations are related to the stimulus. The issue is complicated by the fact that there is not a single notion of independence or lack of correlation. We ..."
Abstract

Cited by 59 (0 self)
 Add to MetaCart
A key issue in understanding the neural code for an ensemble of neurons is the nature and strength of correlations between neurons and how these correlations are related to the stimulus. The issue is complicated by the fact that there is not a single notion of independence or lack of correlation. We distinguish three kinds: (1) activity independence; (2) conditional independence; and (3) information independence. Each notion is related to an information measure: the information between cells, the information between cells given the stimulus, and the synergy of cells about the stimulus, respectively. We show that these measures form an interrelated framework for evaluating contributions of signal and noise correlations to the joint information conveyed about the stimulus and that at least two of the three measures must be calculated to characterize a population code. This framework is compared with others recently proposed in the literature. In addition, we distinguish questions about how information is encoded by a population of neurons from how that information can be decoded. Although information theory is natural and powerful for questions of encoding, it is not sufficient for characterizing the process of decoding. Decoding fundamentally requires an error measure that quantifies the importance of the deviations of estimated stimuli from actual stimuli. Because there is no a priori choice of error measure, questions about decoding cannot be put on the same level of generality as for encoding.
Topographic product models applied to natural scene statistics
 Neural Computation
, 2005
"... We present an energybased model that uses a product of generalised Studentt distributions to capture the statistical structure in datasets. This model is inspired by and particularly applicable to “natural ” datasets such as images. We begin by providing the mathematical framework, where we discus ..."
Abstract

Cited by 58 (7 self)
 Add to MetaCart
We present an energybased model that uses a product of generalised Studentt distributions to capture the statistical structure in datasets. This model is inspired by and particularly applicable to “natural ” datasets such as images. We begin by providing the mathematical framework, where we discuss complete and overcomplete models, and provide algorithms for training these models from data. Using patches of natural scenes we demonstrate that our approach represents a viable alternative to “independent components analysis ” as an interpretive model of biological visual systems. Although the two approaches are similar in flavor there are also important differences, particularly when the representations are overcomplete. By constraining the interactions within our model we are also able to study the topographic organization of Gaborlike receptive fields that are learned by our model. Finally, we discuss the relation of our new approach to previous work — in particular Gaussian Scale Mixture models, and variants of independent components analysis. 1
Bubbles: A Unifying Framework for LowLevel Statistical Properties of Natural Image Sequences
, 2003
"... This paper proposes a unifying framework for several models of the statistical structure of natural image sequences. The framework combines three properties: sparseness, temporal coherence, and energy correlations; these will be reviewed below. It leads to models where the joint activation of the li ..."
Abstract

Cited by 50 (7 self)
 Add to MetaCart
This paper proposes a unifying framework for several models of the statistical structure of natural image sequences. The framework combines three properties: sparseness, temporal coherence, and energy correlations; these will be reviewed below. It leads to models where the joint activation of the linear filters (simple cells) takes the form of "bubbles," which are regions of activity that are localized both in time and in space, space meaning the cortical surface or a grid on which the filters are arranged. The paper is organized as follows. First, we discuss the principal statistical properties of natural images investigated so far, and we examine how these can be used in the estimation of a linear image model (Section 2). Then we show how sparseness and temporal coherence can be combined in a single model, which is based on the concept of temporal bubbles, and attempt to demonstrate that this gives a better model of the outputs of Gaborlike linear filters than either of the criteria alone (Section 3). We extend the model to include topography as well, leading to the intuitive notion of spatiotemporal bubbles (Section 4). We also discuss the extensions of the framework to spatiotemporal receptive fields (Section 5). Finally, we discuss the utility of our model and its relation to other models (Section 6)
The coinformation lattice
 in Proc. 4th Int. Symp. Independent Component Analysis and Blind Source Separation
, 2003
"... In 1955, McGill published a multivariate generalisation of Shannon’s mutual information. Algorithms such as Independent Component Analysis use a different generalisation, the redundancy, ormultiinformation [13]. McGill’s concept expresses the information shared by all of K random variables, while t ..."
Abstract

Cited by 49 (0 self)
 Add to MetaCart
(Show Context)
In 1955, McGill published a multivariate generalisation of Shannon’s mutual information. Algorithms such as Independent Component Analysis use a different generalisation, the redundancy, ormultiinformation [13]. McGill’s concept expresses the information shared by all of K random variables, while the multiinformation expresses the information shared by any two or more of them. Partly to avoid confusion with the multiinformation, I call his concept here the coinformation. Coinformations, oddly, can be negative. They form a partially ordered set, or lattice, as do the entropies. Entropies and coinformations are simply and symmetrically related by Möbius inversion [12]. The coinformation lattice sheds light on the problem of approximating