Results 1 - 10
of
18
Neural Variational Inference and Learning in Belief Networks
"... Highly expressive directed latent variable mod-els, such as sigmoid belief networks, are diffi-cult to train on large datasets because exact in-ference in them is intractable and none of the approximate inference methods that have been applied to them scale well. We propose a fast non-iterative appr ..."
Abstract
-
Cited by 27 (2 self)
- Add to MetaCart
(Show Context)
Highly expressive directed latent variable mod-els, such as sigmoid belief networks, are diffi-cult to train on large datasets because exact in-ference in them is intractable and none of the approximate inference methods that have been applied to them scale well. We propose a fast non-iterative approximate inference method that uses a feedforward network to implement effi-cient exact sampling from the variational poste-rior. The model and this inference network are trained jointly by maximizing a variational lower bound on the log-likelihood. Although the naive estimator of the inference network gradient is too high-variance to be useful, we make it practi-cal by applying several straightforward model-independent variance reduction techniques. Ap-plying our approach to training sigmoid belief networks and deep autoregressive networks, we show that it outperforms the wake-sleep algo-rithm on MNIST and achieves state-of-the-art re-sults on the Reuters RCV1 document dataset. 1.
Doubly stochastic variational Bayes for non-conjugate inference
"... We propose a simple and effective variational inference algorithm based on stochastic optimi-sation that can be widely applied for Bayesian non-conjugate inference in continuous parameter spaces. This algorithm is based on stochastic ap-proximation and allows for efficient use of gra-dient informati ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
We propose a simple and effective variational inference algorithm based on stochastic optimi-sation that can be widely applied for Bayesian non-conjugate inference in continuous parameter spaces. This algorithm is based on stochastic ap-proximation and allows for efficient use of gra-dient information from the model joint density. We demonstrate these properties using illustra-tive examples as well as in challenging and di-verse Bayesian inference problems such as vari-able selection in logistic regression and fully Bayesian inference over kernel hyperparameters in Gaussian process regression. 1.
Stochastic backpropagation and variational inference in deep latent gaussian models. arXiv preprint arXiv:1401.4082
, 2014
"... We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed genera-tive models, endowed with a new algorithm for scalable inference and learning. Our algorithm introduces a recognition model to represent approximate posterior distribu-t ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed genera-tive models, endowed with a new algorithm for scalable inference and learning. Our algorithm introduces a recognition model to represent approximate posterior distribu-tions, and that acts as a stochastic encoder of the data. We develop stochastic back-propagation – rules for back-propagation through stochastic variables – and use this to develop an algorithm that allows for joint op-timisation of the parameters of both the gen-erative and recognition model. We demon-strate on several real-world data sets that the model generates realistic samples, provides accurate imputations of missing data and is a useful tool for high-dimensional data visu-alisation. 1.
Efficient Gradient-Based Inference through Transformations between Bayes Nets and Neural Nets
"... Hierarchical Bayesian networks and neural networks with stochastic hidden units are commonly perceived as two separate types of models. We show that either of these types of models can often be transformed into an instance of the other, by switching between centered and differentiable non-centered p ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Hierarchical Bayesian networks and neural networks with stochastic hidden units are commonly perceived as two separate types of models. We show that either of these types of models can often be transformed into an instance of the other, by switching between centered and differentiable non-centered pa-rameterizations of the latent variables. The choice of parameterization greatly influences the efficiency of gradient-based posterior in-ference; we show that they are often comple-mentary to eachother, we clarify when each parameterization is preferred and show how inference can be made robust. In the non-centered form, a simple Monte Carlo estima-tor of the marginal likelihood can be used for learning the parameters. Theoretical results are supported by experiments. 1.
Stochastic Structured Mean-Field Variational Inference
"... Stochastic variational inference makes it possible to approximate posterior distributions induced by large datasets quickly. The algorithm relies heav-ily on the use of fully factorized variational dis-tributions. However, this “mean-field ” indepen-dence approximation introduces bias. We show how t ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Stochastic variational inference makes it possible to approximate posterior distributions induced by large datasets quickly. The algorithm relies heav-ily on the use of fully factorized variational dis-tributions. However, this “mean-field ” indepen-dence approximation introduces bias. We show how to relax the mean-field approximation to al-low arbitrary dependences between global pa-rameters and local hidden variables, reducing both bias and sensitivity to local optima. 1
The Population Posterior and Bayesian Modeling on Streams
"... Abstract Many modern data analysis problems involve inferences from streaming data. However, streaming data is not easily amenable to the standard probabilistic modeling approaches, which require conditioning on finite data. We develop population variational Bayes, a new approach for using Bayesian ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract Many modern data analysis problems involve inferences from streaming data. However, streaming data is not easily amenable to the standard probabilistic modeling approaches, which require conditioning on finite data. We develop population variational Bayes, a new approach for using Bayesian modeling to analyze streams of data. It approximates a new type of distribution, the population posterior, which combines the notion of a population distribution of the data with Bayesian inference in a probabilistic model. We develop the population posterior for latent Dirichlet allocation and Dirichlet process mixtures. We study our method with several large-scale data sets.
Local Expectation Gradients for Black Box Variational Inference
"... Abstract We introduce local expectation gradients which is a general purpose stochastic variational inference algorithm for constructing stochastic gradients by sampling from the variational distribution. This algorithm divides the problem of estimating the stochastic gradients over multiple variat ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract We introduce local expectation gradients which is a general purpose stochastic variational inference algorithm for constructing stochastic gradients by sampling from the variational distribution. This algorithm divides the problem of estimating the stochastic gradients over multiple variational parameters into smaller sub-tasks so that each sub-task explores intelligently the most relevant part of the variational distribution. This is achieved by performing an exact expectation over the single random variable that most correlates with the variational parameter of interest resulting in a Rao-Blackwellized estimate that has low variance. Our method works efficiently for both continuous and discrete random variables. Furthermore, the proposed algorithm has interesting similarities with Gibbs sampling but at the same time, unlike Gibbs sampling, can be trivially parallelized.
Structured Stochastic Variational Inference
"... Abstract Stochastic variational inference makes it possible to approximate posterior distributions induced by large datasets quickly using stochastic optimization. The algorithm relies on the use of fully factorized variational distributions. However, this "mean-field" independence approx ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract Stochastic variational inference makes it possible to approximate posterior distributions induced by large datasets quickly using stochastic optimization. The algorithm relies on the use of fully factorized variational distributions. However, this "mean-field" independence approximation limits the fidelity of the posterior approximation, and introduces local optima. We show how to relax the mean-field approximation to allow arbitrary dependencies between global parameters and local hidden variables, producing better parameter estimates by reducing bias, sensitivity to local optima, and sensitivity to hyperparameters.
A Tutorial on Deep Learning Part 2: Autoencoders, Convolutional Neural Networks and Recurrent Neural Networks
, 2015
"... In the previous tutorial, I discussed the use of deep networks to classify nonlinear data. In addition to their ability to handle nonlinear data, deep networks also have a special strength in their flexibility which sets them apart from other tranditional machine learning models: we can modify them ..."
Abstract
- Add to MetaCart
(Show Context)
In the previous tutorial, I discussed the use of deep networks to classify nonlinear data. In addition to their ability to handle nonlinear data, deep networks also have a special strength in their flexibility which sets them apart from other tranditional machine learning models: we can modify them in many ways to suit our tasks. In the following, I will discuss three most common modifications:
Under review as a conference paper at ICLR 2016 LEARNING TO GENERATE IMAGES WITH PERCEPTUAL SIMILARITY METRICS
"... Deep networks are increasingly being applied to problems involving image syn-thesis, e.g., generating images from textual descriptions, or generating reconstruc-tions of an input image in an autoencoder architecture. Supervised training of image-synthesis networks typically uses a pixel-wise squared ..."
Abstract
- Add to MetaCart
(Show Context)
Deep networks are increasingly being applied to problems involving image syn-thesis, e.g., generating images from textual descriptions, or generating reconstruc-tions of an input image in an autoencoder architecture. Supervised training of image-synthesis networks typically uses a pixel-wise squared error (SE) loss to indicate the mismatch between a generated image and its corresponding target image. We propose to instead use a loss function that is better calibrated to human perceptual judgments of image quality: the structural-similarity (SSIM) score of Wang, Bovik, Sheikh, and Simoncelli (2004). Because the SSIM score is differ-entiable, it is easily incorporated into gradient-descent learning. We compare the consequences of using SSIM versus SE loss on representations formed in deep au-toencoder and recurrent neural network architectures. SSIM-optimized represen-tations yield a superior basis for image classification compared to SE-optimized representations. Further, human observers prefer images generated by the SSIM-optimized networks by nearly a 7:1 ratio. Just as computer vision has advanced through the use of convolutional architectures that mimic the structure of the mam-malian visual system, we argue that significant additional advances can be made in modeling images through the use of training objectives that are well aligned to characteristics of human perception. 1