Results 1  10
of
18
Neural Variational Inference and Learning in Belief Networks
"... Highly expressive directed latent variable models, such as sigmoid belief networks, are difficult to train on large datasets because exact inference in them is intractable and none of the approximate inference methods that have been applied to them scale well. We propose a fast noniterative appr ..."
Abstract

Cited by 27 (2 self)
 Add to MetaCart
(Show Context)
Highly expressive directed latent variable models, such as sigmoid belief networks, are difficult to train on large datasets because exact inference in them is intractable and none of the approximate inference methods that have been applied to them scale well. We propose a fast noniterative approximate inference method that uses a feedforward network to implement efficient exact sampling from the variational posterior. The model and this inference network are trained jointly by maximizing a variational lower bound on the loglikelihood. Although the naive estimator of the inference network gradient is too highvariance to be useful, we make it practical by applying several straightforward modelindependent variance reduction techniques. Applying our approach to training sigmoid belief networks and deep autoregressive networks, we show that it outperforms the wakesleep algorithm on MNIST and achieves stateoftheart results on the Reuters RCV1 document dataset. 1.
Doubly stochastic variational Bayes for nonconjugate inference
"... We propose a simple and effective variational inference algorithm based on stochastic optimisation that can be widely applied for Bayesian nonconjugate inference in continuous parameter spaces. This algorithm is based on stochastic approximation and allows for efficient use of gradient informati ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
We propose a simple and effective variational inference algorithm based on stochastic optimisation that can be widely applied for Bayesian nonconjugate inference in continuous parameter spaces. This algorithm is based on stochastic approximation and allows for efficient use of gradient information from the model joint density. We demonstrate these properties using illustrative examples as well as in challenging and diverse Bayesian inference problems such as variable selection in logistic regression and fully Bayesian inference over kernel hyperparameters in Gaussian process regression. 1.
Stochastic backpropagation and variational inference in deep latent gaussian models. arXiv preprint arXiv:1401.4082
, 2014
"... We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed generative models, endowed with a new algorithm for scalable inference and learning. Our algorithm introduces a recognition model to represent approximate posterior distribut ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed generative models, endowed with a new algorithm for scalable inference and learning. Our algorithm introduces a recognition model to represent approximate posterior distributions, and that acts as a stochastic encoder of the data. We develop stochastic backpropagation – rules for backpropagation through stochastic variables – and use this to develop an algorithm that allows for joint optimisation of the parameters of both the generative and recognition model. We demonstrate on several realworld data sets that the model generates realistic samples, provides accurate imputations of missing data and is a useful tool for highdimensional data visualisation. 1.
Efficient GradientBased Inference through Transformations between Bayes Nets and Neural Nets
"... Hierarchical Bayesian networks and neural networks with stochastic hidden units are commonly perceived as two separate types of models. We show that either of these types of models can often be transformed into an instance of the other, by switching between centered and differentiable noncentered p ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Hierarchical Bayesian networks and neural networks with stochastic hidden units are commonly perceived as two separate types of models. We show that either of these types of models can often be transformed into an instance of the other, by switching between centered and differentiable noncentered parameterizations of the latent variables. The choice of parameterization greatly influences the efficiency of gradientbased posterior inference; we show that they are often complementary to eachother, we clarify when each parameterization is preferred and show how inference can be made robust. In the noncentered form, a simple Monte Carlo estimator of the marginal likelihood can be used for learning the parameters. Theoretical results are supported by experiments. 1.
Stochastic Structured MeanField Variational Inference
"... Stochastic variational inference makes it possible to approximate posterior distributions induced by large datasets quickly. The algorithm relies heavily on the use of fully factorized variational distributions. However, this “meanfield ” independence approximation introduces bias. We show how t ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Stochastic variational inference makes it possible to approximate posterior distributions induced by large datasets quickly. The algorithm relies heavily on the use of fully factorized variational distributions. However, this “meanfield ” independence approximation introduces bias. We show how to relax the meanfield approximation to allow arbitrary dependences between global parameters and local hidden variables, reducing both bias and sensitivity to local optima. 1
The Population Posterior and Bayesian Modeling on Streams
"... Abstract Many modern data analysis problems involve inferences from streaming data. However, streaming data is not easily amenable to the standard probabilistic modeling approaches, which require conditioning on finite data. We develop population variational Bayes, a new approach for using Bayesian ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Many modern data analysis problems involve inferences from streaming data. However, streaming data is not easily amenable to the standard probabilistic modeling approaches, which require conditioning on finite data. We develop population variational Bayes, a new approach for using Bayesian modeling to analyze streams of data. It approximates a new type of distribution, the population posterior, which combines the notion of a population distribution of the data with Bayesian inference in a probabilistic model. We develop the population posterior for latent Dirichlet allocation and Dirichlet process mixtures. We study our method with several largescale data sets.
Local Expectation Gradients for Black Box Variational Inference
"... Abstract We introduce local expectation gradients which is a general purpose stochastic variational inference algorithm for constructing stochastic gradients by sampling from the variational distribution. This algorithm divides the problem of estimating the stochastic gradients over multiple variat ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract We introduce local expectation gradients which is a general purpose stochastic variational inference algorithm for constructing stochastic gradients by sampling from the variational distribution. This algorithm divides the problem of estimating the stochastic gradients over multiple variational parameters into smaller subtasks so that each subtask explores intelligently the most relevant part of the variational distribution. This is achieved by performing an exact expectation over the single random variable that most correlates with the variational parameter of interest resulting in a RaoBlackwellized estimate that has low variance. Our method works efficiently for both continuous and discrete random variables. Furthermore, the proposed algorithm has interesting similarities with Gibbs sampling but at the same time, unlike Gibbs sampling, can be trivially parallelized.
Structured Stochastic Variational Inference
"... Abstract Stochastic variational inference makes it possible to approximate posterior distributions induced by large datasets quickly using stochastic optimization. The algorithm relies on the use of fully factorized variational distributions. However, this "meanfield" independence approx ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Stochastic variational inference makes it possible to approximate posterior distributions induced by large datasets quickly using stochastic optimization. The algorithm relies on the use of fully factorized variational distributions. However, this "meanfield" independence approximation limits the fidelity of the posterior approximation, and introduces local optima. We show how to relax the meanfield approximation to allow arbitrary dependencies between global parameters and local hidden variables, producing better parameter estimates by reducing bias, sensitivity to local optima, and sensitivity to hyperparameters.
A Tutorial on Deep Learning Part 2: Autoencoders, Convolutional Neural Networks and Recurrent Neural Networks
, 2015
"... In the previous tutorial, I discussed the use of deep networks to classify nonlinear data. In addition to their ability to handle nonlinear data, deep networks also have a special strength in their flexibility which sets them apart from other tranditional machine learning models: we can modify them ..."
Abstract
 Add to MetaCart
(Show Context)
In the previous tutorial, I discussed the use of deep networks to classify nonlinear data. In addition to their ability to handle nonlinear data, deep networks also have a special strength in their flexibility which sets them apart from other tranditional machine learning models: we can modify them in many ways to suit our tasks. In the following, I will discuss three most common modifications:
Under review as a conference paper at ICLR 2016 LEARNING TO GENERATE IMAGES WITH PERCEPTUAL SIMILARITY METRICS
"... Deep networks are increasingly being applied to problems involving image synthesis, e.g., generating images from textual descriptions, or generating reconstructions of an input image in an autoencoder architecture. Supervised training of imagesynthesis networks typically uses a pixelwise squared ..."
Abstract
 Add to MetaCart
(Show Context)
Deep networks are increasingly being applied to problems involving image synthesis, e.g., generating images from textual descriptions, or generating reconstructions of an input image in an autoencoder architecture. Supervised training of imagesynthesis networks typically uses a pixelwise squared error (SE) loss to indicate the mismatch between a generated image and its corresponding target image. We propose to instead use a loss function that is better calibrated to human perceptual judgments of image quality: the structuralsimilarity (SSIM) score of Wang, Bovik, Sheikh, and Simoncelli (2004). Because the SSIM score is differentiable, it is easily incorporated into gradientdescent learning. We compare the consequences of using SSIM versus SE loss on representations formed in deep autoencoder and recurrent neural network architectures. SSIMoptimized representations yield a superior basis for image classification compared to SEoptimized representations. Further, human observers prefer images generated by the SSIMoptimized networks by nearly a 7:1 ratio. Just as computer vision has advanced through the use of convolutional architectures that mimic the structure of the mammalian visual system, we argue that significant additional advances can be made in modeling images through the use of training objectives that are well aligned to characteristics of human perception. 1