Results 1  10
of
25
Stochastic Variational Inference
 JOURNAL OF MACHINE LEARNING RESEARCH (2013, IN PRESS)
, 2013
"... We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet allocation and the hierarchical Dirichlet proce ..."
Abstract

Cited by 131 (27 self)
 Add to MetaCart
We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet allocation and the hierarchical Dirichlet process topic model. Using stochastic variational inference, we analyze several large collections of documents: 300K articles from Nature, 1.8M articles from The New York Times, and 3.8M articles from Wikipedia. Stochastic inference can easily handle data sets of this size and outperforms traditional variational inference, which can only handle a smaller subset. (We also show that the Bayesian nonparametric topic model outperforms its parametric counterpart.) Stochastic variational inference lets us apply complex Bayesian models to massive data sets.
Stochastic backpropagation and approximate inference in deep generative models
, 2014
"... We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed generative models, endowed with a new algorithm for scalable inference and learning. Our algorithm introduces a recognition model to represent an approximate posterior distri ..."
Abstract

Cited by 37 (4 self)
 Add to MetaCart
(Show Context)
We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed generative models, endowed with a new algorithm for scalable inference and learning. Our algorithm introduces a recognition model to represent an approximate posterior distribution and uses this for optimisation of a variational lower bound. We develop stochastic backpropagation – rules for gradient backpropagation through stochastic variables – and derive an algorithm that allows for joint optimisation of the parameters of both the generative and recognition models. We demonstrate on several realworld data sets that by using stochastic backpropagation and variational inference, we obtain models that are able to generate realistic samples of data, allow for accurate imputations of missing data, and provide a useful tool for highdimensional data visualisation. 1.
Neural Variational Inference and Learning in Belief Networks
"... Highly expressive directed latent variable models, such as sigmoid belief networks, are difficult to train on large datasets because exact inference in them is intractable and none of the approximate inference methods that have been applied to them scale well. We propose a fast noniterative appr ..."
Abstract

Cited by 27 (2 self)
 Add to MetaCart
Highly expressive directed latent variable models, such as sigmoid belief networks, are difficult to train on large datasets because exact inference in them is intractable and none of the approximate inference methods that have been applied to them scale well. We propose a fast noniterative approximate inference method that uses a feedforward network to implement efficient exact sampling from the variational posterior. The model and this inference network are trained jointly by maximizing a variational lower bound on the loglikelihood. Although the naive estimator of the inference network gradient is too highvariance to be useful, we make it practical by applying several straightforward modelindependent variance reduction techniques. Applying our approach to training sigmoid belief networks and deep autoregressive networks, we show that it outperforms the wakesleep algorithm on MNIST and achieves stateoftheart results on the Reuters RCV1 document dataset. 1.
Black box variational inference
 In Proceedings of the 17th International Conference on Artificial Intelligence and Statistics
, 2014
"... Variational inference has become a widely used method to approximate posteriors in complex latent variables models. However, deriving a variational inference algorithm generally requires significant modelspecific analysis. These efforts can hinder and deter us from quickly developing and explorin ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
(Show Context)
Variational inference has become a widely used method to approximate posteriors in complex latent variables models. However, deriving a variational inference algorithm generally requires significant modelspecific analysis. These efforts can hinder and deter us from quickly developing and exploring a variety of models for a problem at hand. In this paper, we present a “black box ” variational inference algorithm, one that can be quickly applied to many models with little additional derivation. Our method is based on a stochastic optimization of the variational objective where the noisy gradient is computed from Monte Carlo samples from the variational distribution. We develop a number of methods to reduce the variance of the gradient, always maintaining the criterion that we want to avoid difficult modelbased derivations. We evaluate our method against the corresponding black box sampling based methods. We find that our method reaches better predictive likelihoods much faster than sampling methods. Finally, we demonstrate that Black Box Variational Inference lets us easily explore a wide space of models by quickly constructing and evaluating several models of longitudinal healthcare data. 1
Deep AutoRegressive Networks
, 2014
"... We introduce a deep, generative autoencoder capable of learning hierarchies of distributed representations from data. Successive deep stochastic hidden layers are equipped with autoregressive connections, which enable the model to be sampled from quickly and exactly via ancestral sampling. We derive ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
(Show Context)
We introduce a deep, generative autoencoder capable of learning hierarchies of distributed representations from data. Successive deep stochastic hidden layers are equipped with autoregressive connections, which enable the model to be sampled from quickly and exactly via ancestral sampling. We derive an efficient approximate parameter estimation method based on the minimum description length (MDL) principle, which can be seen as maximising a variational lower bound on the loglikelihood, with a feedforward neural network implementing approximate inference. We demonstrate stateoftheart generative performance on a number of classic data sets: several UCI data sets, MNIST and Atari 2600 games.
FixedForm Variational Posterior Approximation through Stochastic Linear Regression.” Bayesian Analysis
, 2013
"... We propose a general algorithm for approximating nonstandard Bayesian posterior distributions. The algorithm minimizes the KullbackLeibler divergence of an approximating distribution to the intractable posterior distribution. Our method can be used to approximate any posterior distribution, provid ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
(Show Context)
We propose a general algorithm for approximating nonstandard Bayesian posterior distributions. The algorithm minimizes the KullbackLeibler divergence of an approximating distribution to the intractable posterior distribution. Our method can be used to approximate any posterior distribution, provided that it is given in closed form up to the proportionality constant. The approximation can be any distribution in the exponential family or any mixture of such distributions, which means that it can be made arbitrarily precise. Several examples illustrate the speed and accuracy of our approximation method in practice. 1
Doubly stochastic variational Bayes for nonconjugate inference
"... We propose a simple and effective variational inference algorithm based on stochastic optimisation that can be widely applied for Bayesian nonconjugate inference in continuous parameter spaces. This algorithm is based on stochastic approximation and allows for efficient use of gradient informati ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
We propose a simple and effective variational inference algorithm based on stochastic optimisation that can be widely applied for Bayesian nonconjugate inference in continuous parameter spaces. This algorithm is based on stochastic approximation and allows for efficient use of gradient information from the model joint density. We demonstrate these properties using illustrative examples as well as in challenging and diverse Bayesian inference problems such as variable selection in logistic regression and fully Bayesian inference over kernel hyperparameters in Gaussian process regression. 1.
Markov chain monte carlo and variational inference: Bridging the gap. arXiv preprint arXiv:1410.6460
, 2014
"... Recent advances in stochastic gradient variational inference have made it possible to perform variational Bayesian inference with posterior approximations containing auxiliary random variables. This enables us to explore a new synthesis of variational inference and Monte Carlo methods where we i ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
Recent advances in stochastic gradient variational inference have made it possible to perform variational Bayesian inference with posterior approximations containing auxiliary random variables. This enables us to explore a new synthesis of variational inference and Monte Carlo methods where we incorporate one or more steps of MCMC into our variational approximation. By doing so we obtain a rich class of inference algorithms bridging the gap between variational methods and MCMC, and offering the best of both worlds: fast posterior approximation through the maximization of an explicit objective, with the option of trading off additional computation for additional accuracy. We describe the theoretical foundations that make this possible and show some promising first results. 1. MCMC and Variational Inference Bayesian analysis gives us a very simple recipe for learning from data: given a set of unknown parameters or latent variables z that are of interest, we specify a prior distribution p(z) quantifying what we know about z before observing any data. Then we quantify how the observed data x relates to z by specifying a likelihood function p(xz). Finally, we apply Bayes ’ rule p(zx) = p(z)p(xz) / R p(z)p(xz)dz to give the posterior distribution, which quantifies what we know about z after seeing the data. Although this recipe is very simple conceptually, the implied computation is often intractable. We therefore need to resort to approximation methods in order to perform Bayesian inference in practice. The two most popular ap
Variance reduction for stochastic gradient optimization
 Advances in Neural Information Processing Systems 26
"... Stochastic gradient optimization is a class of widely used algorithms for training machine learning models. To optimize an objective, it uses the noisy gradient computed from the random data samples instead of the true gradient computed from the entire dataset. However, when the variance of the nois ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
Stochastic gradient optimization is a class of widely used algorithms for training machine learning models. To optimize an objective, it uses the noisy gradient computed from the random data samples instead of the true gradient computed from the entire dataset. However, when the variance of the noisy gradient is large, the algorithm might spend much time bouncing around, leading to slower convergence and worse performance. In this paper, we develop a general approach of using control variate for variance reduction in stochastic gradient. Data statistics such as loworder moments (precomputed or estimated online) is used to form the control variate. We demonstrate how to construct the control variate for two practical problems using stochastic gradient optimization. One is convex—the MAP estimation for logistic regression, and the other is nonconvex—stochastic variational inference for latent Dirichlet allocation. On both problems, our approach shows faster convergence and better performance than the classical approach. 1
Integrated nonfactorized variational inference
 Advances in Neural Information Processing Systems 26
, 2013
"... We present a nonfactorized variational method for full posterior inference in Bayesian hierarchical models, with the goal of capturing the posterior variable dependencies via efficient and possibly parallel computation. Our approach unifies the integrated nested Laplace approximation (INLA) under ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
We present a nonfactorized variational method for full posterior inference in Bayesian hierarchical models, with the goal of capturing the posterior variable dependencies via efficient and possibly parallel computation. Our approach unifies the integrated nested Laplace approximation (INLA) under the variational framework. The proposed method is applicable in more challenging scenarios than typically assumed by INLA, such as Bayesian Lasso, which is characterized by the nondifferentiability of the `1 norm arising from independent Laplace priors. We derive an upper bound for the KullbackLeibler divergence, which yields a fast closedform solution via decoupled optimization. Our method is a reliable analytic alternative to Markov chain Monte Carlo (MCMC), and it results in a tighter evidence lower bound than that of meanfield variational Bayes (VB) method. 1