• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Implicit Density Estimation by Local Moment Matching to Sample from Auto-Encoders,” (2012)

by Y Bengio, G Alain, S Rifai
Add To MetaCart

Tools

Sorted by:
Results 1 - 3 of 3

Representation learning: A review and new perspectives.

by Yoshua Bengio , Aaron Courville , Pascal Vincent - of IEEE Conf. Comp. Vision Pattern Recog. (CVPR), , 2005
"... Abstract-The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can b ..."
Abstract - Cited by 173 (4 self) - Add to MetaCart
Abstract-The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks. This motivates longer term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation, and manifold learning.
(Show Context)

Citation Context

...principle, such as score matching. The score matching connection is discussed in Section 7.2.2 and has been shown for a particular parametrization of DAE and equivalent GRBM [210]. The work in [1] generalizes this idea to a broader class of parameterizations (arbitrary encoders and decoders) and shows that by regularizing the autoencoder so that it be contractive, one obtains that the reconstruction function and its derivative estimate first and second derivatives of the underlying data-generative density. This view can be exploited to successfully sample from autoencoders, as shown in [170], [26]. The proposed sampling algorithms are MCMCs similar to Langevin MCMC, using not just the estimated first derivative of the density, but also the estimated manifold tangents so as to stay close to manifolds of high density. This interpretation connects well with the geometric perspective introduced in Section 8. The regularization effects (e.g., due to a sparsity regularizer, a contractive regularizer, or the denoising criterion) ask the learned representation to be as insensitive as possible to the input, while minimizing reconstruction error on the training examples forces the representation...

Practical recommendations for gradient-based training of deep architectures

by Yoshua Bengio - Neural Networks: Tricks of the Trade , 2013
"... ar ..."
Abstract - Cited by 28 (2 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...e likely input configurations. The difference between the reconstruction vector and the input vector can be shown to be related to the log-density gradient as estimated by the learner (Vincent, 2011; =-=Bengio et al., 2012-=-) and the Jacobian matrix of the reconstruction with respect to the input gives information about the second derivative of the density, i.e., in which direction the density remains high when you are o...

What regularized auto-encoders learn from the data generating distribution

by Guillaume Alain, Yoshua Bengio , 2012
"... What do auto-encoders learn about the underlying data generating distribution? Recent work suggests that some auto-encoder variants do a good job of capturing the local manifold structure of data. This paper clarifies some of these previous observations by showing that minimizing a particular form o ..."
Abstract - Cited by 17 (7 self) - Add to MetaCart
What do auto-encoders learn about the underlying data generating distribution? Recent work suggests that some auto-encoder variants do a good job of capturing the local manifold structure of data. This paper clarifies some of these previous observations by showing that minimizing a particular form of regularized reconstruction error yields a reconstruction function that lo-cally characterizes the shape of the data generating density. We show that the auto-encoder captures the score (derivative of the log-density with respect to the input). It contradicts pre-vious interpretations of reconstruction error as an energy function. Unlike previous results, the theorems provided here are completely generic and do not depend on the parametrization of the auto-encoder: they show what the auto-encoder would tend to if given enough capacity and examples. These results are for a contractive training criterion we show to be similar to the denoising auto-encoder training criterion with small corruption noise, but with contrac-tion applied on the whole reconstruction function rather than just encoder. Similarly to score matching, one can consider the proposed training criterion as a convenient alternative to max-imum likelihood because it does not involve a partition function. Finally, we show how an approximate Metropolis-Hastings MCMC can be setup to recover samples from the estimated distribution, and this is confirmed in sampling experiments. 1.
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University