Results 1  10
of
19
Stochastic backpropagation and approximate inference in deep generative models
, 2014
"... We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed generative models, endowed with a new algorithm for scalable inference and learning. Our algorithm introduces a recognition model to represent an approximate posterior distri ..."
Abstract

Cited by 37 (4 self)
 Add to MetaCart
(Show Context)
We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed generative models, endowed with a new algorithm for scalable inference and learning. Our algorithm introduces a recognition model to represent an approximate posterior distribution and uses this for optimisation of a variational lower bound. We develop stochastic backpropagation – rules for gradient backpropagation through stochastic variables – and derive an algorithm that allows for joint optimisation of the parameters of both the generative and recognition models. We demonstrate on several realworld data sets that by using stochastic backpropagation and variational inference, we obtain models that are able to generate realistic samples of data, allow for accurate imputations of missing data, and provide a useful tool for highdimensional data visualisation. 1.
Firefly Monte Carlo: Exact MCMC with subsets of data
 In 30th Conference on Uncertainty in Artificial Intelligence (UAI
, 2014
"... Markov chain Monte Carlo (MCMC) is a popular and successful generalpurpose tool for Bayesian inference. However, MCMC cannot be practically applied to large data sets because of the prohibitive cost of evaluating every likelihood term at every iteration. Here we present Firefly Monte Carlo (FlyMC ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
(Show Context)
Markov chain Monte Carlo (MCMC) is a popular and successful generalpurpose tool for Bayesian inference. However, MCMC cannot be practically applied to large data sets because of the prohibitive cost of evaluating every likelihood term at every iteration. Here we present Firefly Monte Carlo (FlyMC) an auxiliary variable MCMC algorithm that only queries the likelihoods of a potentially small subset of the data at each iteration yet simulates from the exact posterior distribution, in contrast to recent proposals that are approximate even in the asymptotic limit. FlyMC is compatible with a wide variety of modern MCMC algorithms, and only requires a lower bound on the perdatum likelihood factors. In experiments, we find that FlyMC generates samples from the posterior more than an order of magnitude faster than regular MCMC, opening up MCMC methods to larger datasets than were previously considered feasible. 1
Doubly stochastic variational Bayes for nonconjugate inference
"... We propose a simple and effective variational inference algorithm based on stochastic optimisation that can be widely applied for Bayesian nonconjugate inference in continuous parameter spaces. This algorithm is based on stochastic approximation and allows for efficient use of gradient informati ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
We propose a simple and effective variational inference algorithm based on stochastic optimisation that can be widely applied for Bayesian nonconjugate inference in continuous parameter spaces. This algorithm is based on stochastic approximation and allows for efficient use of gradient information from the model joint density. We demonstrate these properties using illustrative examples as well as in challenging and diverse Bayesian inference problems such as variable selection in logistic regression and fully Bayesian inference over kernel hyperparameters in Gaussian process regression. 1.
Markov chain monte carlo and variational inference: Bridging the gap. arXiv preprint arXiv:1410.6460
, 2014
"... Recent advances in stochastic gradient variational inference have made it possible to perform variational Bayesian inference with posterior approximations containing auxiliary random variables. This enables us to explore a new synthesis of variational inference and Monte Carlo methods where we i ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
Recent advances in stochastic gradient variational inference have made it possible to perform variational Bayesian inference with posterior approximations containing auxiliary random variables. This enables us to explore a new synthesis of variational inference and Monte Carlo methods where we incorporate one or more steps of MCMC into our variational approximation. By doing so we obtain a rich class of inference algorithms bridging the gap between variational methods and MCMC, and offering the best of both worlds: fast posterior approximation through the maximization of an explicit objective, with the option of trading off additional computation for additional accuracy. We describe the theoretical foundations that make this possible and show some promising first results. 1. MCMC and Variational Inference Bayesian analysis gives us a very simple recipe for learning from data: given a set of unknown parameters or latent variables z that are of interest, we specify a prior distribution p(z) quantifying what we know about z before observing any data. Then we quantify how the observed data x relates to z by specifying a likelihood function p(xz). Finally, we apply Bayes ’ rule p(zx) = p(z)p(xz) / R p(z)p(xz)dz to give the posterior distribution, which quantifies what we know about z after seeing the data. Although this recipe is very simple conceptually, the implied computation is often intractable. We therefore need to resort to approximation methods in order to perform Bayesian inference in practice. The two most popular ap
Accurate and Conservative Estimates of MRF Loglikelihood using Reverse Annealing. arXiv:1412.8566
, 2014
"... Markov random fields (MRFs) are difficult to evaluate as generative models because computing the test logprobabilities requires the intractable partition function. Annealed importance sampling (AIS) is widely used to estimate MRF partition functions, and often yields quite accurate results. Howe ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Markov random fields (MRFs) are difficult to evaluate as generative models because computing the test logprobabilities requires the intractable partition function. Annealed importance sampling (AIS) is widely used to estimate MRF partition functions, and often yields quite accurate results. However, AIS is prone to overestimate the loglikelihood with little indication that anything is wrong. We present the Reverse AIS Estimator (RAISE), a stochastic lower bound on the loglikelihood of an approximation to the original MRF model. RAISE requires only the same MCMC transition operators as standard AIS. Experimental results indicate that RAISE agrees closely with AIS logprobability estimates for RBMs, DBMs, and DBNs, but typically errs on the side of underestimating, rather than overestimating, the loglikelihood. 1
The Survival Filter: Joint Survival Analysis with a Latent Time Series
"... Survival analysis is a core task in applied statistics, which models timetofailure or timetoevent data. In the clinical domain, for example, meaningful events are defined as the onset of different diseases for a given patient. Survival analysis is limited, however, for analyzing modern electron ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Survival analysis is a core task in applied statistics, which models timetofailure or timetoevent data. In the clinical domain, for example, meaningful events are defined as the onset of different diseases for a given patient. Survival analysis is limited, however, for analyzing modern electronic health records. Patients often have a wide range of diseases, and there are complex interactions among the relative risks of different events. To this end, we develop the survival filter model, a timeseries model for joint survival analysis that models multiple patients and multiple diseases. We develop a scalable variational inference algorithm and apply our method to a large data set of longitudinal patient records. The survival filter gives good predictive performance when compared to two baselines and identifies clinically meaningful patterns of disease interaction. 1
Deep Exponential Families
"... We describe deep exponential families (DEFs), a class of latent variable models that are inspired by the hidden structures used in deep neural networks. DEFs capture a hierarchy of dependencies between latent variables, and are easily generalized to many settings through exponential families. We per ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
We describe deep exponential families (DEFs), a class of latent variable models that are inspired by the hidden structures used in deep neural networks. DEFs capture a hierarchy of dependencies between latent variables, and are easily generalized to many settings through exponential families. We perform inference using recent “black box” variational inference techniques. We then evaluate various DEFs on text and combine multiple DEFs into a model for pairwise recommendation data. In an extensive study, we show going beyond one layer improves predictions for DEFs. We demonstrate that DEFs find interesting exploratory structure in large data sets, and give better predictive performance than stateoftheart models. 1
Scalable joint modeling of longitudinal and point process data for disease trajectory prediction and improving management of chronic kidney disease. In UAI, to appear,
, 2016
"... Abstract A major goal in personalized medicine is the ability to provide individualized predictions about the future trajectory of a disease. Moreover, for many complex chronic diseases, patients simultaneously have additional comorbid conditions. Accurate determination of the risk of developing se ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract A major goal in personalized medicine is the ability to provide individualized predictions about the future trajectory of a disease. Moreover, for many complex chronic diseases, patients simultaneously have additional comorbid conditions. Accurate determination of the risk of developing serious complications associated with a disease or its comorbidities may be more clinically useful than prediction of future disease trajectory in such cases. We propose a novel probabilistic generative model that can provide individualized predictions of future disease progression while jointly modeling the pattern of related recurrent adverse events. We fit our model using a scalable variational inference algorithm and apply our method to a large dataset of longitudinal electronic patient health records. Our model gives superior performance in terms of both prediction of future disease trajectories and of future serious events when compared to nonjoint models. Our predictions are currently being utilized by our local accountable care organization during chart reviews of high risk patients.
The Population Posterior and Bayesian Modeling on Streams
"... Abstract Many modern data analysis problems involve inferences from streaming data. However, streaming data is not easily amenable to the standard probabilistic modeling approaches, which require conditioning on finite data. We develop population variational Bayes, a new approach for using Bayesian ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Many modern data analysis problems involve inferences from streaming data. However, streaming data is not easily amenable to the standard probabilistic modeling approaches, which require conditioning on finite data. We develop population variational Bayes, a new approach for using Bayesian modeling to analyze streams of data. It approximates a new type of distribution, the population posterior, which combines the notion of a population distribution of the data with Bayesian inference in a probabilistic model. We develop the population posterior for latent Dirichlet allocation and Dirichlet process mixtures. We study our method with several largescale data sets.
Local Expectation Gradients for Black Box Variational Inference
"... Abstract We introduce local expectation gradients which is a general purpose stochastic variational inference algorithm for constructing stochastic gradients by sampling from the variational distribution. This algorithm divides the problem of estimating the stochastic gradients over multiple variat ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract We introduce local expectation gradients which is a general purpose stochastic variational inference algorithm for constructing stochastic gradients by sampling from the variational distribution. This algorithm divides the problem of estimating the stochastic gradients over multiple variational parameters into smaller subtasks so that each subtask explores intelligently the most relevant part of the variational distribution. This is achieved by performing an exact expectation over the single random variable that most correlates with the variational parameter of interest resulting in a RaoBlackwellized estimate that has low variance. Our method works efficiently for both continuous and discrete random variables. Furthermore, the proposed algorithm has interesting similarities with Gibbs sampling but at the same time, unlike Gibbs sampling, can be trivially parallelized.