Results 1  10
of
131
Stochastic backpropagation and approximate inference in deep generative models
, 2014
"... We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed generative models, endowed with a new algorithm for scalable inference and learning. Our algorithm introduces a recognition model to represent an approximate posterior distri ..."
Abstract

Cited by 37 (4 self)
 Add to MetaCart
(Show Context)
We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed generative models, endowed with a new algorithm for scalable inference and learning. Our algorithm introduces a recognition model to represent an approximate posterior distribution and uses this for optimisation of a variational lower bound. We develop stochastic backpropagation – rules for gradient backpropagation through stochastic variables – and derive an algorithm that allows for joint optimisation of the parameters of both the generative and recognition models. We demonstrate on several realworld data sets that by using stochastic backpropagation and variational inference, we obtain models that are able to generate realistic samples of data, allow for accurate imputations of missing data, and provide a useful tool for highdimensional data visualisation. 1.
Streaming variational bayes
 In Neural Information Processing Systems (NIPS
, 2013
"... We present SDABayes, a framework for (S)treaming, (D)istributed, (A)synchronous computation of a Bayesian posterior. The framework makes streaming updates to the estimated posterior according to a userspecified approximation batch primitive. We demonstrate the usefulness of our framework, with va ..."
Abstract

Cited by 31 (0 self)
 Add to MetaCart
We present SDABayes, a framework for (S)treaming, (D)istributed, (A)synchronous computation of a Bayesian posterior. The framework makes streaming updates to the estimated posterior according to a userspecified approximation batch primitive. We demonstrate the usefulness of our framework, with variational Bayes (VB) as the primitive, by fitting the latent Dirichlet allocation model to two largescale document collections. We demonstrate the advantages of our algorithm over stochastic variational inference (SVI) by comparing the two after a single pass through a known amount of data—a case where SVI may be applied—and in the streaming setting, where SVI does not apply. 1
Black box variational inference
 In Proceedings of the 17th International Conference on Artificial Intelligence and Statistics
, 2014
"... Variational inference has become a widely used method to approximate posteriors in complex latent variables models. However, deriving a variational inference algorithm generally requires significant modelspecific analysis. These efforts can hinder and deter us from quickly developing and explorin ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
Variational inference has become a widely used method to approximate posteriors in complex latent variables models. However, deriving a variational inference algorithm generally requires significant modelspecific analysis. These efforts can hinder and deter us from quickly developing and exploring a variety of models for a problem at hand. In this paper, we present a “black box ” variational inference algorithm, one that can be quickly applied to many models with little additional derivation. Our method is based on a stochastic optimization of the variational objective where the noisy gradient is computed from Monte Carlo samples from the variational distribution. We develop a number of methods to reduce the variance of the gradient, always maintaining the criterion that we want to avoid difficult modelbased derivations. We evaluate our method against the corresponding black box sampling based methods. We find that our method reaches better predictive likelihoods much faster than sampling methods. Finally, we demonstrate that Black Box Variational Inference lets us easily explore a wide space of models by quickly constructing and evaluating several models of longitudinal healthcare data. 1
Scalable Recommendation with Poisson Factorization
, 1311
"... We develop a Bayesian Poisson matrix factorization model for forming recommendations from sparse user behavior data. These data are large user/item matrices where each user has provided feedback on only a small subset of items, either explicitly (e.g., through star ratings) or implicitly (e.g., thro ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
(Show Context)
We develop a Bayesian Poisson matrix factorization model for forming recommendations from sparse user behavior data. These data are large user/item matrices where each user has provided feedback on only a small subset of items, either explicitly (e.g., through star ratings) or implicitly (e.g., through views or purchases). In contrast to traditional matrix factorization approaches, Poisson factorization implicitly models each user’s limited attention to consume items. Moreover, because of the mathematical form of the Poisson likelihood, the model needs only to explicitly consider the observed entries in the matrix, leading to both scalable computation and good predictive performance. We develop a variational inference algorithm for approximate posterior inference that scales up to massive data sets. This is an efficient algorithm that iterates over the observed entries and adjusts an approximate posterior over the user/item representations. We apply our method to large realworld user data containing users rating movies, users listening to songs, and users reading scientific papers. In all these settings, Bayesian Poisson factorization outperforms stateoftheart matrix factorization methods. 1.
Distributed variational inference in sparse Gaussian process regression and latent variable models
 In Cortes and Lawrence
, 2014
"... Abstract Gaussian processes (GPs) are a powerful tool for probabilistic inference over functions. They have been applied to both regression and nonlinear dimensionality reduction, and offer desirable properties such as uncertainty estimates, robustness to overfitting, and principled ways for tuni ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
Abstract Gaussian processes (GPs) are a powerful tool for probabilistic inference over functions. They have been applied to both regression and nonlinear dimensionality reduction, and offer desirable properties such as uncertainty estimates, robustness to overfitting, and principled ways for tuning hyperparameters. However the scalability of these models to big datasets remains an active topic of research. We introduce a novel reparametrisation of variational inference for sparse GP regression and latent variable models that allows for an efficient distributed algorithm. This is done by exploiting the decoupling of the data given the inducing points to reformulate the evidence lower bound in a MapReduce setting. We show that the inference scales well with data and computational resources, while preserving a balanced distribution of the load among the nodes. We further demonstrate the utility in scaling Gaussian processes to big data. We show that GP performance improves with increasing amounts of data in regression (on flight data with 2 million records) and latent variable modelling (on MNIST). The results show that GPs perform better than many common models often used for big data.
Stochastic Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation
"... There has been an explosion in the amount of digital text information available in recent years, leading to challenges of scale for traditional inference algorithms for topic models. Recent advances in stochastic variational inference algorithms for latent Dirichlet allocation (LDA) have made it fea ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
(Show Context)
There has been an explosion in the amount of digital text information available in recent years, leading to challenges of scale for traditional inference algorithms for topic models. Recent advances in stochastic variational inference algorithms for latent Dirichlet allocation (LDA) have made it feasible to learn topic models on very largescale corpora, but these methods do not currently take full advantage of the collapsed representation of the model. We propose a stochastic algorithm for collapsed variational Bayesian inference for LDA, which is simpler and more efficient than the state of the art method. In experiments on largescale text corpora, the algorithm was found to converge faster and often to a better solution than previous methods. Humansubject experiments also demonstrated that the method can learn coherent topics in seconds on small corpora, facilitating the use of topic models in interactive document analysis software.
Scaling distributed machine learning with the parameter server.
 In USENIX OSDI,
, 2014
"... Abstract We propose a parameter server framework for distributed machine learning problems. Both data and workloads are distributed over worker nodes, while the server nodes maintain globally shared parameters, represented as dense or sparse vectors and matrices. The framework manages asynchronous ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
(Show Context)
Abstract We propose a parameter server framework for distributed machine learning problems. Both data and workloads are distributed over worker nodes, while the server nodes maintain globally shared parameters, represented as dense or sparse vectors and matrices. The framework manages asynchronous data communication between nodes, and supports flexible consistency models, elastic scalability, and continuous fault tolerance. To demonstrate the scalability of the proposed framework, we show experimental results on petabytes of real data with billions of examples and parameters on problems ranging from Sparse Logistic Regression to Latent Dirichlet Allocation and Distributed Sketching.
Firefly Monte Carlo: Exact MCMC with subsets of data
 In 30th Conference on Uncertainty in Artificial Intelligence (UAI
, 2014
"... Markov chain Monte Carlo (MCMC) is a popular and successful generalpurpose tool for Bayesian inference. However, MCMC cannot be practically applied to large data sets because of the prohibitive cost of evaluating every likelihood term at every iteration. Here we present Firefly Monte Carlo (FlyMC ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
Markov chain Monte Carlo (MCMC) is a popular and successful generalpurpose tool for Bayesian inference. However, MCMC cannot be practically applied to large data sets because of the prohibitive cost of evaluating every likelihood term at every iteration. Here we present Firefly Monte Carlo (FlyMC) an auxiliary variable MCMC algorithm that only queries the likelihoods of a potentially small subset of the data at each iteration yet simulates from the exact posterior distribution, in contrast to recent proposals that are approximate even in the asymptotic limit. FlyMC is compatible with a wide variety of modern MCMC algorithms, and only requires a lower bound on the perdatum likelihood factors. In experiments, we find that FlyMC generates samples from the posterior more than an order of magnitude faster than regular MCMC, opening up MCMC methods to larger datasets than were previously considered feasible. 1
Oneclass Collaborative Filtering with Random Graphs
"... The bane of oneclass collaborative filtering is interpreting and modelling the latent signal from the missing class. In this paper we present a novel Bayesian generative model for implicit collaborative filtering. It forms a core component of the Xbox Live architecture, and unlike previous approach ..."
Abstract

Cited by 12 (6 self)
 Add to MetaCart
(Show Context)
The bane of oneclass collaborative filtering is interpreting and modelling the latent signal from the missing class. In this paper we present a novel Bayesian generative model for implicit collaborative filtering. It forms a core component of the Xbox Live architecture, and unlike previous approaches, delineates the odds of a user disliking an item from simply being unaware of it. The latent signal is treated as an unobserved random graph connecting users with items they might have encountered. We demonstrate how largescale distributed learning can be achieved through a combination of stochastic gradient descent and mean field variational inference over random graph samples. A finegrained comparison is done against a state of the art baseline on real world data.
Doubly stochastic variational Bayes for nonconjugate inference
"... We propose a simple and effective variational inference algorithm based on stochastic optimisation that can be widely applied for Bayesian nonconjugate inference in continuous parameter spaces. This algorithm is based on stochastic approximation and allows for efficient use of gradient informati ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
We propose a simple and effective variational inference algorithm based on stochastic optimisation that can be widely applied for Bayesian nonconjugate inference in continuous parameter spaces. This algorithm is based on stochastic approximation and allows for efficient use of gradient information from the model joint density. We demonstrate these properties using illustrative examples as well as in challenging and diverse Bayesian inference problems such as variable selection in logistic regression and fully Bayesian inference over kernel hyperparameters in Gaussian process regression. 1.