• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Learning to represent spatial transformations with factored higher-order boltzmann machines (0)

by G Hinton, R Memisevic
Venue:Neural Computation
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 75
Next 10 →

Representation learning: A review and new perspectives.

by Yoshua Bengio , Aaron Courville , Pascal Vincent - of IEEE Conf. Comp. Vision Pattern Recog. (CVPR), , 2005
"... Abstract-The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can b ..."
Abstract - Cited by 173 (4 self) - Add to MetaCart
Abstract-The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks. This motivates longer term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation, and manifold learning.
(Show Context)

Citation Context

...he RBM does not extend to its partition function, which still involves summing an exponential number of terms. It does imply, however, that we can limit the number of terms to minf2dx ; 2dhg. Usually, this is still an unmanageable number of terms, and therefore we must resort to approximate methods to deal with its estimation. It is difficult to overstate the impact the RBM has had to the fields of unsupervised feature learning and deep learning. It has been used in a truly impressive variety of applications, including fMRI image classification [180], motion and spatial transformations [201], [144], collaborative filtering [178], and natural image modeling [160], [53]. 6.3 Generalizations of the RBM to Real-Valued Data Important progress has been made in the last few years in defining generalizations of the RBM that better capture realvalued data, in particular real-valued image data, by better modeling the conditional covariance of the input pixels. The standard RBM, as discussed above, is defined with both binary visible variables v 2 f0; 1g and binary latent variables h 2 f0; 1g. The tractability of inference and learning in the RBM has inspired many authors to extend it, via modific...

Modeling Pixel Means and Covariances Using Factorized Third-Order Boltzmann Machines

by Marc' Aurelio Ranzato, Geoffrey E. Hinton , 2010
"... Learning a generative model of natural images is a useful way of extracting features that capture interesting regularities. Previous work on learning such models has focused on methods in which the latent features are used to determine the mean and variance of each pixel independently, or on methods ..."
Abstract - Cited by 75 (2 self) - Add to MetaCart
Learning a generative model of natural images is a useful way of extracting features that capture interesting regularities. Previous work on learning such models has focused on methods in which the latent features are used to determine the mean and variance of each pixel independently, or on methods in which the hidden units determine the covariance matrix of a zero-mean Gaussian distribution. In this work, we propose a probabilistic model that combines these two approaches into a single framework. We represent each image using one set of binary latent features that model the image-specific covariance and a separate set that model the mean. We show that this approach provides a probabilistic framework for the widely used simple-cell complex-cell architecture, it produces very realistic samples of natural images and it extracts features that yield state-of-the-art recognition accuracy on the challenging CIFAR 10 dataset.

Convolutional Learning of Spatio-temporal Features

by Graham W. Taylor, Rob Fergus, Yann Lecun, Christoph Bregler
"... Abstract. We address the problem of learning good features for understanding video data. We introduce a model that learns latent representations of image sequences from pairs of successive images. The convolutional architecture of our model allows it to scale to realistic image sizes whilst using a ..."
Abstract - Cited by 74 (4 self) - Add to MetaCart
Abstract. We address the problem of learning good features for understanding video data. We introduce a model that learns latent representations of image sequences from pairs of successive images. The convolutional architecture of our model allows it to scale to realistic image sizes whilst using a compact parametrization. In experiments on the NORB dataset, we show our model extracts latent “flow fields ” which correspond to the transformation between the pair of input frames. We also use our model to extract low-level motion features in a multi-stage architecture for action recognition, demonstrating competitive performance on both the KTH and Hollywood2 datasets.

Learning to combine foveal glimpses with a third-order Boltzmann machine

by Hugo Larochelle, Geoffrey Hinton
"... We describe a model based on a Boltzmann machine with third-order connections that can learn how to accumulate information about a shape over several fixations. The model uses a retina that only has enough high resolution pixels to cover a small area of the image, so it must decide on a sequence of ..."
Abstract - Cited by 34 (3 self) - Add to MetaCart
We describe a model based on a Boltzmann machine with third-order connections that can learn how to accumulate information about a shape over several fixations. The model uses a retina that only has enough high resolution pixels to cover a small area of the image, so it must decide on a sequence of fixations and it must combine the “glimpse ” at each fixation with the location of the fixation before integrating the information with information from other glimpses of the same object. We evaluate this model on a synthetic dataset and two image classification datasets, showing that it can perform at least as well as a model trained on whole images. 1
(Show Context)

Citation Context

...”. Learning modules that incorporate multiplicative interactions have recently been developed [3, 4]. These can be viewed as energy-based models with three-way interactions. In this work, we build on =-=[5, 6]-=- who introduced a method of keeping the number of parameters under control when incorporating such high-order interactions in a restricted Boltzmann machine. We start by describing the standard RBM mo...

Conditional Restricted Boltzmann Machines for Structured Output Prediction

by Volodymyr Mnih, Hugo Larochelle, Geoffrey E. Hinton
"... Conditional Restricted Boltzmann Machines (CRBMs) are rich probabilistic models that have recently been applied to a wide range of problems, including collaborative filtering, classification, and modeling motion capture data. While much progress has been made in training non-conditional RBMs, these ..."
Abstract - Cited by 22 (2 self) - Add to MetaCart
Conditional Restricted Boltzmann Machines (CRBMs) are rich probabilistic models that have recently been applied to a wide range of problems, including collaborative filtering, classification, and modeling motion capture data. While much progress has been made in training non-conditional RBMs, these algorithms are not applicable to conditional models and there has been almost no work on training and generating predictions from conditional RBMs for structured output problems. We first argue that standard Contrastive Divergence-based learning may not be suitable for training CRBMs. We then identify two distinct types of structured output prediction problems and propose an improved learning algorithm for each. The first problem type is one where the output space has arbitrary structure but the set of likely output configurations is relatively small, such as in multi-label classification. The second problem is one where the output space is arbitrarily structured but where the output space variability is much greater, such as in image denoising or pixel labeling. We show that the new learning algorithms can work much better than Contrastive Divergence on both types of problems. 1
(Show Context)

Citation Context

... large output spaces where exact gradients are intractable (Larochelle & Bengio, 2008), CD learning has been the only algorithm used to train CRBMs (Salakhutdinov et al., 2007; Taylor & Hinton, 2009; =-=Memisevic & Hinton, 2010-=-). In this work, we argue that CD learning may not be a very good algorithm for training CRBMs and propose two new algorithms for tackling structured output prediction problems in two different settin...

Transforming Auto-encoders

by G. E. Hinton, A. Krizhevsky, S. D. Wang
"... Abstract. The artificial neural networks that are used to recognize shapes typically use one or more layers of learned feature detectors that produce scalar outputs. By contrast, the computer vision community uses complicated, hand-engineered features, like SIFT [6], that produce a whole vector of o ..."
Abstract - Cited by 22 (2 self) - Add to MetaCart
Abstract. The artificial neural networks that are used to recognize shapes typically use one or more layers of learned feature detectors that produce scalar outputs. By contrast, the computer vision community uses complicated, hand-engineered features, like SIFT [6], that produce a whole vector of outputs including an explicit representation of the pose of the feature. We show how neural networks can be used to learn features that output a whole vector of instantiation parameters and we argue that this is a much more promising way of dealing with variations in position, orientation, scale and lighting than the methods currently employed in the neural networks community. It is also more promising than the handengineered features currently used in computer vision because it provides an efficient way of adapting the features to the domain.
(Show Context)

Citation Context

...twork with additional external inputs that specify the way in which the image has been transformed may appear unnecessary because this information could, in principle, be computed from the two images =-=[7]-=-. However, this information is often readily available and it makes the learning much easier, so it is silly not to use it. Specifying a global transformation of the image is much easier than explicit...

LEARNING A BETTER REPRESENTATION OF SPEECH SOUND WAVES USING RESTRICTED BOLTZMANN MACHINES

by Navdeep Jaitly, Geoffrey Hinton
"... State of the art speech recognition systems rely on preprocessed speech features such as Mel cepstrum or linear predictive coding coefficients that collapse high dimensional speech sound waves into low dimensional encodings. While these have been successfully applied in speech recognition systems, s ..."
Abstract - Cited by 21 (5 self) - Add to MetaCart
State of the art speech recognition systems rely on preprocessed speech features such as Mel cepstrum or linear predictive coding coefficients that collapse high dimensional speech sound waves into low dimensional encodings. While these have been successfully applied in speech recognition systems, such low dimensional encodings may lose some relevant information and express other information in a way that makes it difficult to use for discrimination. Higher dimensional encodings could both improve performance in recognition tasks, and also be applied to speech synthesis by better modeling the statistical structure of the sound waves. In this paper we present a novel approach for modeling speech sound waves using a Restricted Boltzmann machine (RBM) with a novel type of hidden variable and we report initial results demonstrating phoneme recognition performance better than the current state-of-the-art for methods based on Mel cepstrum coefficients.
(Show Context)

Citation Context

...C, CIFAR & Microsoft for funding. algorithm [6] has been shown to be very effective in training RBMs to model a variety of high dimensional data distributions such as images and image transformations =-=[7, 8]-=-. Several RBMs can be stacked on top of each other such that higher level RBMs learn to model the posterior distributions of the hidden variables of the lower level RBMs. This stacking process has the...

Gradient-based learning of higherorder image features

by Roland Memisevic - In Proceedings of the International Conference on Computer Vision , 2011
"... Recent work on unsupervised feature learning has shown that learning on polynomial expansions of input patches, such as on pair-wise products of pixel intensities, can improve the performance of feature learners and extend their applicability to spatio-temporal problems, such as human action recogni ..."
Abstract - Cited by 20 (10 self) - Add to MetaCart
Recent work on unsupervised feature learning has shown that learning on polynomial expansions of input patches, such as on pair-wise products of pixel intensities, can improve the performance of feature learners and extend their applicability to spatio-temporal problems, such as human action recognition or learning of image transformations. Learning of such higher order features, however, has been much more difficult than standard dictionary learning, because of the high dimensionality and because standard learning criteria are not applicable. Here, we show how one can cast the problem of learning higher-order features as the problem of learning a parametric family of manifolds. This allows us to apply a variant of a de-noising autoencoder network to learn higher-order features using simple gradient based optimization. Our experiments show that the approach can outperform existing higher-order models, while training and inference are exact, fast, and simple. 1.
(Show Context)

Citation Context

.... An extension of feature learning that has received a lot of attention recently, is the learning of relations between pixel intensities, rather than of pixel intensities themselves [18], [12], [25], =-=[19]-=-. For this end, one can extend the bi-partite graph of a standard sparse coding model with a tri-partite graph that connects hidden variables with two images. Hidden units then turn into “mapping” uni...

On Autoencoders and Score Matching for Energy Based Models

by Kevin Swersky, David Buchman, Benjamin M. Marlin, Nando De Freitas
"... We consider estimation methods for the class of continuous-data energy based models (EBMs). Our main result shows that estimating the parameters of an EBM using score matching when the conditional distribution over the visible units is Gaussian corresponds to training a particular form of regularize ..."
Abstract - Cited by 17 (2 self) - Add to MetaCart
We consider estimation methods for the class of continuous-data energy based models (EBMs). Our main result shows that estimating the parameters of an EBM using score matching when the conditional distribution over the visible units is Gaussian corresponds to training a particular form of regularized autoencoder. We show how different Gaussian EBMs lead to different autoencoder architectures, providing deep links between these two families of models. We compare the score matching estimator for the mPoT model, a particular Gaussian EBM, to several other training methods on a variety of tasks including image denoising and unsupervised feature extraction. We show that the regularization function induced by score matching leads to superior classification performance relative to a standard autoencoder. We also show that score matching yields classification results that are indistinguishable from better-known stochastic approximation maximum likelihood estimators. 1.

Two Distributed-State Models For Generating High-Dimensional Time Series

by Graham W. Taylor, Geoffrey E. Hinton, Sam T. Roweis , 2011
"... In this paper we develop a class of nonlinear generative models for high-dimensional time series. We first propose a model based on the restricted Boltzmann machine (RBM) that uses an undirected model with binary latent variables and real-valued “visible” variables. The latent and visible variables ..."
Abstract - Cited by 15 (1 self) - Add to MetaCart
In this paper we develop a class of nonlinear generative models for high-dimensional time series. We first propose a model based on the restricted Boltzmann machine (RBM) that uses an undirected model with binary latent variables and real-valued “visible” variables. The latent and visible variables at each time step receive directed connections from the visible variables at the last few time-steps. This “conditional” RBM (CRBM) makes on-line inference efficient and allows us to use a simple approximate learning procedure. We demonstrate the power of our approach by synthesizing various sequences from a model trained on motion capture data and by performing on-line filling in of data lost during capture. We extend the CRBM in a way that preserves its most important computational properties and introduces multiplicative three-way interactions that allow the effective interaction weight between two variables to be modulated by the dynamic state of a third variable. We introduce a factoring of the implied three-way weight tensor to permit a more compact parameterization. The resulting model can capture diverse styles of motion with a single set of parameters, and the three-way interactions greatly improve its ability to blend motion styles or to transition smoothly among them. Videos and source code can be found at
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University