Results 1  10
of
13
Stochastic backpropagation and approximate inference in deep generative models
, 2014
"... We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed generative models, endowed with a new algorithm for scalable inference and learning. Our algorithm introduces a recognition model to represent an approximate posterior distri ..."
Abstract

Cited by 36 (4 self)
 Add to MetaCart
(Show Context)
We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed generative models, endowed with a new algorithm for scalable inference and learning. Our algorithm introduces a recognition model to represent an approximate posterior distribution and uses this for optimisation of a variational lower bound. We develop stochastic backpropagation – rules for gradient backpropagation through stochastic variables – and derive an algorithm that allows for joint optimisation of the parameters of both the generative and recognition models. We demonstrate on several realworld data sets that by using stochastic backpropagation and variational inference, we obtain models that are able to generate realistic samples of data, allow for accurate imputations of missing data, and provide a useful tool for highdimensional data visualisation. 1.
DRAW: A recurrent neural network for image generation
 CoRR
, 2015
"... This paper introduces the Deep Recurrent Attentive Writer (DRAW) neural network architecture for image generation. DRAW networks combine a novel spatial attention mechanism that mimics the foveation of the human eye, with a sequential variational autoencoding framework that allows for the iterativ ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
(Show Context)
This paper introduces the Deep Recurrent Attentive Writer (DRAW) neural network architecture for image generation. DRAW networks combine a novel spatial attention mechanism that mimics the foveation of the human eye, with a sequential variational autoencoding framework that allows for the iterative construction of complex images. The system substantially improves on the state of the art for generative models on MNIST, and, when trained on the Street View House Numbers dataset, it generates images that cannot be distinguished from real data with the naked eye. 1.
Accurate and Conservative Estimates of MRF Loglikelihood using Reverse Annealing. arXiv:1412.8566
, 2014
"... Markov random fields (MRFs) are difficult to evaluate as generative models because computing the test logprobabilities requires the intractable partition function. Annealed importance sampling (AIS) is widely used to estimate MRF partition functions, and often yields quite accurate results. Howe ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Markov random fields (MRFs) are difficult to evaluate as generative models because computing the test logprobabilities requires the intractable partition function. Annealed importance sampling (AIS) is widely used to estimate MRF partition functions, and often yields quite accurate results. However, AIS is prone to overestimate the loglikelihood with little indication that anything is wrong. We present the Reverse AIS Estimator (RAISE), a stochastic lower bound on the loglikelihood of an approximation to the original MRF model. RAISE requires only the same MCMC transition operators as standard AIS. Experimental results indicate that RAISE agrees closely with AIS logprobability estimates for RBMs, DBMs, and DBNs, but typically errs on the side of underestimating, rather than overestimating, the loglikelihood. 1
How autoencoders could provide credit assignment in deep networks via target propagation
, 2014
"... We propose to exploit reconstruction as a layerlocal training signal for deep learning. Reconstructions can be propagated in a form of target propagation playing a role similar to backpropagation but helping to reduce the reliance on derivatives in order to perform credit assignment across many l ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
We propose to exploit reconstruction as a layerlocal training signal for deep learning. Reconstructions can be propagated in a form of target propagation playing a role similar to backpropagation but helping to reduce the reliance on derivatives in order to perform credit assignment across many levels of possibly strong nonlinearities (which is difficult for backpropagation). A regularized autoencoder tends produce a reconstruction that is a more likely version of its input, i.e., a small move in the direction of higher likelihood. By generalizing gradients, target propagation may also allow to train deep networks with discrete hidden units. If the autoencoder takes both a representation of input and target (or of any side information) in input, then its reconstruction of input representation provides a target towards a representation that is more likely, conditioned on all the side information. A deep autoencoder decoding path generalizes gradient propagation in a learned way that can could thus handle not just infinitesimal changes but larger, discrete changes, hopefully allowing credit assignment through a long chain of nonlinear operations. In addition to each layer being a good autoencoder, the encoder also learns to please the upper layers by transforming the data into a space where it is easier to model by them, flattening manifolds and disentangling factors. The motivations and theoretical justifications for this approach are laid down in this paper, along with conjectures that will have to be verified either mathematically or experimentally, including a hypothesis stating that such autoencoder mediated target propagation could play in brains the role of credit assignment through many nonlinear, noisy and discrete transformations. 1
MADE: Masked Autoencoder for Distribution Estimation Mathieu Germain
"... There has been a lot of recent interest in designing neural network models to estimate a distribution from a set of examples. We introduce a simple modification for autoencoder neural networks that yields powerful generative models. Our method masks the autoencoder’s parameters to respect autoregres ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
There has been a lot of recent interest in designing neural network models to estimate a distribution from a set of examples. We introduce a simple modification for autoencoder neural networks that yields powerful generative models. Our method masks the autoencoder’s parameters to respect autoregressive constraints: each input is reconstructed only from previous inputs in a given ordering. Constrained this way, the autoencoder outputs can be interpreted as a set of conditional probabilities, and their product, the full joint probability. We can also train a single network that can decompose the joint probability in multiple different orderings. Our simple framework can be applied to multiple architectures, including deep ones. Vectorized implementations, such as on GPUs, are simple and fast. Experiments demonstrate that this approach is competitive with stateoftheart tractable distribution estimators. At test time, the method is significantly faster and scales better than other autoregressive estimators. 1.
Generative Image Modeling Using Spatial LSTMs
 In Advances in Neural Information Processing Systems 28
, 2015
"... Modeling the distribution of natural images is challenging, partly because of strong statistical dependencies which can extend over hundreds of pixels. Recurrent neural networks have been successful in capturing longrange dependencies in a number of problems but only recently have found their way ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Modeling the distribution of natural images is challenging, partly because of strong statistical dependencies which can extend over hundreds of pixels. Recurrent neural networks have been successful in capturing longrange dependencies in a number of problems but only recently have found their way into generative image models. We here introduce a recurrent image model based on multidimensional long shortterm memory units which are particularly suited for image modeling due to their spatial structure. Our model scales to images of arbitrary size and its likelihood is computationally tractable. We find that it outperforms the state of the art in quantitative comparisons on several image datasets and produces promising results when used for texture synthesis and inpainting. 1
Deep Directed Generative Autoencoders
"... For discrete data, the likelihood P (x) can be rewritten exactly and parametrized ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
For discrete data, the likelihood P (x) can be rewritten exactly and parametrized
Weight Uncertainty in Neural Networks
"... We introduce a new, efficient, principled and backpropagationcompatible algorithm for learning a probability distribution on the weights of a neural network, called Bayes by Backprop. It regularises the weights by minimising a compression cost, known as the variational free energy or the expecte ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
We introduce a new, efficient, principled and backpropagationcompatible algorithm for learning a probability distribution on the weights of a neural network, called Bayes by Backprop. It regularises the weights by minimising a compression cost, known as the variational free energy or the expected lower bound on the marginal likelihood. We show that this principled kind of regularisation yields comparable performance to dropout on MNIST classification. We then demonstrate how the learnt uncertainty in the weights can be used to improve generalisation in nonlinear regression problems, and how this weight uncertainty can be used to drive the explorationexploitation tradeoff in reinforcement learning. 1.
Variational Inference with Normalizing Flows Danilo Jimenez Rezende
"... The choice of approximate posterior distribution is one of the core problems in variational inference. Most applications of variational inference employ simple families of posterior approximations in order to allow for efficient inference, focusing on meanfield or other simple structured approxi ..."
Abstract
 Add to MetaCart
(Show Context)
The choice of approximate posterior distribution is one of the core problems in variational inference. Most applications of variational inference employ simple families of posterior approximations in order to allow for efficient inference, focusing on meanfield or other simple structured approximations. This restriction has a significant impact on the quality of inferences made using variational methods. We introduce a new approach for specifying flexible, arbitrarily complex and scalable approximate posterior distributions. Our approximations are distributions constructed through a normalizing flow, whereby a simple initial density is transformed into a more complex one by applying a sequence of invertible transformations until a desired level of complexity is attained. We use this view of normalizing flows to develop categories of finite and infinitesimal flows and provide a unified view of approaches for constructing rich posterior approximations. We demonstrate that the theoretical advantages of having posteriors that better match the true posterior, combined with the scalability of amortized variational approaches, provides a clear improvement in performance and applicability of variational inference. 1.
NICE: Nonlinear Independent Components Estimation
"... We propose a deep learning framework for modeling complex highdimensional densities via Nonlinear Independent Component Estimation (NICE). It is based on the idea that a good representation is one in which the data has a distribution that is easy to model. For this purpose, a nonlinear determinis ..."
Abstract
 Add to MetaCart
(Show Context)
We propose a deep learning framework for modeling complex highdimensional densities via Nonlinear Independent Component Estimation (NICE). It is based on the idea that a good representation is one in which the data has a distribution that is easy to model. For this purpose, a nonlinear deterministic transformation of the data is learned that maps it to a latent space so as to make the transformed data conform to a factorized distribution, i.e., resulting in independent latent variables. We parametrize this transformation so that computing the determinant of the Jacobian and inverse Jacobian is trivial, yet we maintain the ability to learn complex nonlinear transformations, via a composition of simple building blocks, each based on a deep neural network. The training criterion is simply the exact loglikelihood, which is tractable, and unbiased ancestral sampling is also easy. We show that this approach yields good generative models on four image datasets and can be used for inpainting. 1