Results 1  10
of
75
Representation learning: A review and new perspectives.
 of IEEE Conf. Comp. Vision Pattern Recog. (CVPR),
, 2005
"... AbstractThe success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can b ..."
Abstract

Cited by 173 (4 self)
 Add to MetaCart
(Show Context)
AbstractThe success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representationlearning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks. This motivates longer term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation, and manifold learning.
Modeling Pixel Means and Covariances Using Factorized ThirdOrder Boltzmann Machines
, 2010
"... Learning a generative model of natural images is a useful way of extracting features that capture interesting regularities. Previous work on learning such models has focused on methods in which the latent features are used to determine the mean and variance of each pixel independently, or on methods ..."
Abstract

Cited by 75 (2 self)
 Add to MetaCart
Learning a generative model of natural images is a useful way of extracting features that capture interesting regularities. Previous work on learning such models has focused on methods in which the latent features are used to determine the mean and variance of each pixel independently, or on methods in which the hidden units determine the covariance matrix of a zeromean Gaussian distribution. In this work, we propose a probabilistic model that combines these two approaches into a single framework. We represent each image using one set of binary latent features that model the imagespecific covariance and a separate set that model the mean. We show that this approach provides a probabilistic framework for the widely used simplecell complexcell architecture, it produces very realistic samples of natural images and it extracts features that yield stateoftheart recognition accuracy on the challenging CIFAR 10 dataset.
Convolutional Learning of Spatiotemporal Features
"... Abstract. We address the problem of learning good features for understanding video data. We introduce a model that learns latent representations of image sequences from pairs of successive images. The convolutional architecture of our model allows it to scale to realistic image sizes whilst using a ..."
Abstract

Cited by 74 (4 self)
 Add to MetaCart
Abstract. We address the problem of learning good features for understanding video data. We introduce a model that learns latent representations of image sequences from pairs of successive images. The convolutional architecture of our model allows it to scale to realistic image sizes whilst using a compact parametrization. In experiments on the NORB dataset, we show our model extracts latent “flow fields ” which correspond to the transformation between the pair of input frames. We also use our model to extract lowlevel motion features in a multistage architecture for action recognition, demonstrating competitive performance on both the KTH and Hollywood2 datasets.
Learning to combine foveal glimpses with a thirdorder Boltzmann machine
"... We describe a model based on a Boltzmann machine with thirdorder connections that can learn how to accumulate information about a shape over several fixations. The model uses a retina that only has enough high resolution pixels to cover a small area of the image, so it must decide on a sequence of ..."
Abstract

Cited by 34 (3 self)
 Add to MetaCart
(Show Context)
We describe a model based on a Boltzmann machine with thirdorder connections that can learn how to accumulate information about a shape over several fixations. The model uses a retina that only has enough high resolution pixels to cover a small area of the image, so it must decide on a sequence of fixations and it must combine the “glimpse ” at each fixation with the location of the fixation before integrating the information with information from other glimpses of the same object. We evaluate this model on a synthetic dataset and two image classification datasets, showing that it can perform at least as well as a model trained on whole images. 1
Conditional Restricted Boltzmann Machines for Structured Output Prediction
"... Conditional Restricted Boltzmann Machines (CRBMs) are rich probabilistic models that have recently been applied to a wide range of problems, including collaborative filtering, classification, and modeling motion capture data. While much progress has been made in training nonconditional RBMs, these ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
(Show Context)
Conditional Restricted Boltzmann Machines (CRBMs) are rich probabilistic models that have recently been applied to a wide range of problems, including collaborative filtering, classification, and modeling motion capture data. While much progress has been made in training nonconditional RBMs, these algorithms are not applicable to conditional models and there has been almost no work on training and generating predictions from conditional RBMs for structured output problems. We first argue that standard Contrastive Divergencebased learning may not be suitable for training CRBMs. We then identify two distinct types of structured output prediction problems and propose an improved learning algorithm for each. The first problem type is one where the output space has arbitrary structure but the set of likely output configurations is relatively small, such as in multilabel classification. The second problem is one where the output space is arbitrarily structured but where the output space variability is much greater, such as in image denoising or pixel labeling. We show that the new learning algorithms can work much better than Contrastive Divergence on both types of problems. 1
Transforming Autoencoders
"... Abstract. The artificial neural networks that are used to recognize shapes typically use one or more layers of learned feature detectors that produce scalar outputs. By contrast, the computer vision community uses complicated, handengineered features, like SIFT [6], that produce a whole vector of o ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
(Show Context)
Abstract. The artificial neural networks that are used to recognize shapes typically use one or more layers of learned feature detectors that produce scalar outputs. By contrast, the computer vision community uses complicated, handengineered features, like SIFT [6], that produce a whole vector of outputs including an explicit representation of the pose of the feature. We show how neural networks can be used to learn features that output a whole vector of instantiation parameters and we argue that this is a much more promising way of dealing with variations in position, orientation, scale and lighting than the methods currently employed in the neural networks community. It is also more promising than the handengineered features currently used in computer vision because it provides an efficient way of adapting the features to the domain.
LEARNING A BETTER REPRESENTATION OF SPEECH SOUND WAVES USING RESTRICTED BOLTZMANN MACHINES
"... State of the art speech recognition systems rely on preprocessed speech features such as Mel cepstrum or linear predictive coding coefficients that collapse high dimensional speech sound waves into low dimensional encodings. While these have been successfully applied in speech recognition systems, s ..."
Abstract

Cited by 21 (5 self)
 Add to MetaCart
(Show Context)
State of the art speech recognition systems rely on preprocessed speech features such as Mel cepstrum or linear predictive coding coefficients that collapse high dimensional speech sound waves into low dimensional encodings. While these have been successfully applied in speech recognition systems, such low dimensional encodings may lose some relevant information and express other information in a way that makes it difficult to use for discrimination. Higher dimensional encodings could both improve performance in recognition tasks, and also be applied to speech synthesis by better modeling the statistical structure of the sound waves. In this paper we present a novel approach for modeling speech sound waves using a Restricted Boltzmann machine (RBM) with a novel type of hidden variable and we report initial results demonstrating phoneme recognition performance better than the current stateoftheart for methods based on Mel cepstrum coefficients.
Gradientbased learning of higherorder image features
 In Proceedings of the International Conference on Computer Vision
, 2011
"... Recent work on unsupervised feature learning has shown that learning on polynomial expansions of input patches, such as on pairwise products of pixel intensities, can improve the performance of feature learners and extend their applicability to spatiotemporal problems, such as human action recogni ..."
Abstract

Cited by 20 (10 self)
 Add to MetaCart
(Show Context)
Recent work on unsupervised feature learning has shown that learning on polynomial expansions of input patches, such as on pairwise products of pixel intensities, can improve the performance of feature learners and extend their applicability to spatiotemporal problems, such as human action recognition or learning of image transformations. Learning of such higher order features, however, has been much more difficult than standard dictionary learning, because of the high dimensionality and because standard learning criteria are not applicable. Here, we show how one can cast the problem of learning higherorder features as the problem of learning a parametric family of manifolds. This allows us to apply a variant of a denoising autoencoder network to learn higherorder features using simple gradient based optimization. Our experiments show that the approach can outperform existing higherorder models, while training and inference are exact, fast, and simple. 1.
On Autoencoders and Score Matching for Energy Based Models
"... We consider estimation methods for the class of continuousdata energy based models (EBMs). Our main result shows that estimating the parameters of an EBM using score matching when the conditional distribution over the visible units is Gaussian corresponds to training a particular form of regularize ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
We consider estimation methods for the class of continuousdata energy based models (EBMs). Our main result shows that estimating the parameters of an EBM using score matching when the conditional distribution over the visible units is Gaussian corresponds to training a particular form of regularized autoencoder. We show how different Gaussian EBMs lead to different autoencoder architectures, providing deep links between these two families of models. We compare the score matching estimator for the mPoT model, a particular Gaussian EBM, to several other training methods on a variety of tasks including image denoising and unsupervised feature extraction. We show that the regularization function induced by score matching leads to superior classification performance relative to a standard autoencoder. We also show that score matching yields classification results that are indistinguishable from betterknown stochastic approximation maximum likelihood estimators. 1.
Two DistributedState Models For Generating HighDimensional Time Series
, 2011
"... In this paper we develop a class of nonlinear generative models for highdimensional time series. We first propose a model based on the restricted Boltzmann machine (RBM) that uses an undirected model with binary latent variables and realvalued “visible” variables. The latent and visible variables ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
In this paper we develop a class of nonlinear generative models for highdimensional time series. We first propose a model based on the restricted Boltzmann machine (RBM) that uses an undirected model with binary latent variables and realvalued “visible” variables. The latent and visible variables at each time step receive directed connections from the visible variables at the last few timesteps. This “conditional” RBM (CRBM) makes online inference efficient and allows us to use a simple approximate learning procedure. We demonstrate the power of our approach by synthesizing various sequences from a model trained on motion capture data and by performing online filling in of data lost during capture. We extend the CRBM in a way that preserves its most important computational properties and introduces multiplicative threeway interactions that allow the effective interaction weight between two variables to be modulated by the dynamic state of a third variable. We introduce a factoring of the implied threeway weight tensor to permit a more compact parameterization. The resulting model can capture diverse styles of motion with a single set of parameters, and the threeway interactions greatly improve its ability to blend motion styles or to transition smoothly among them. Videos and source code can be found at