Results 1  10
of
94
Streaming variational bayes
 In Neural Information Processing Systems (NIPS
, 2013
"... We present SDABayes, a framework for (S)treaming, (D)istributed, (A)synchronous computation of a Bayesian posterior. The framework makes streaming updates to the estimated posterior according to a userspecified approximation batch primitive. We demonstrate the usefulness of our framework, with va ..."
Abstract

Cited by 26 (0 self)
 Add to MetaCart
We present SDABayes, a framework for (S)treaming, (D)istributed, (A)synchronous computation of a Bayesian posterior. The framework makes streaming updates to the estimated posterior according to a userspecified approximation batch primitive. We demonstrate the usefulness of our framework, with variational Bayes (VB) as the primitive, by fitting the latent Dirichlet allocation model to two largescale document collections. We demonstrate the advantages of our algorithm over stochastic variational inference (SVI) by comparing the two after a single pass through a known amount of data—a case where SVI may be applied—and in the streaming setting, where SVI does not apply. 1
Stochastic backpropagation and approximate inference in deep generative models
, 2014
"... We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed generative models, endowed with a new algorithm for scalable inference and learning. Our algorithm introduces a recognition model to represent an approximate posterior distri ..."
Abstract

Cited by 25 (3 self)
 Add to MetaCart
(Show Context)
We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed generative models, endowed with a new algorithm for scalable inference and learning. Our algorithm introduces a recognition model to represent an approximate posterior distribution and uses this for optimisation of a variational lower bound. We develop stochastic backpropagation – rules for gradient backpropagation through stochastic variables – and derive an algorithm that allows for joint optimisation of the parameters of both the generative and recognition models. We demonstrate on several realworld data sets that by using stochastic backpropagation and variational inference, we obtain models that are able to generate realistic samples of data, allow for accurate imputations of missing data, and provide a useful tool for highdimensional data visualisation. 1.
Scalable Recommendation with Poisson Factorization
, 1311
"... We develop a Bayesian Poisson matrix factorization model for forming recommendations from sparse user behavior data. These data are large user/item matrices where each user has provided feedback on only a small subset of items, either explicitly (e.g., through star ratings) or implicitly (e.g., thro ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
(Show Context)
We develop a Bayesian Poisson matrix factorization model for forming recommendations from sparse user behavior data. These data are large user/item matrices where each user has provided feedback on only a small subset of items, either explicitly (e.g., through star ratings) or implicitly (e.g., through views or purchases). In contrast to traditional matrix factorization approaches, Poisson factorization implicitly models each user’s limited attention to consume items. Moreover, because of the mathematical form of the Poisson likelihood, the model needs only to explicitly consider the observed entries in the matrix, leading to both scalable computation and good predictive performance. We develop a variational inference algorithm for approximate posterior inference that scales up to massive data sets. This is an efficient algorithm that iterates over the observed entries and adjusts an approximate posterior over the user/item representations. We apply our method to large realworld user data containing users rating movies, users listening to songs, and users reading scientific papers. In all these settings, Bayesian Poisson factorization outperforms stateoftheart matrix factorization methods. 1.
Black box variational inference
 In Proceedings of the 17th International Conference on Artificial Intelligence and Statistics
, 2014
"... Variational inference has become a widely used method to approximate posteriors in complex latent variables models. However, deriving a variational inference algorithm generally requires significant modelspecific analysis. These efforts can hinder and deter us from quickly developing and explorin ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
Variational inference has become a widely used method to approximate posteriors in complex latent variables models. However, deriving a variational inference algorithm generally requires significant modelspecific analysis. These efforts can hinder and deter us from quickly developing and exploring a variety of models for a problem at hand. In this paper, we present a “black box ” variational inference algorithm, one that can be quickly applied to many models with little additional derivation. Our method is based on a stochastic optimization of the variational objective where the noisy gradient is computed from Monte Carlo samples from the variational distribution. We develop a number of methods to reduce the variance of the gradient, always maintaining the criterion that we want to avoid difficult modelbased derivations. We evaluate our method against the corresponding black box sampling based methods. We find that our method reaches better predictive likelihoods much faster than sampling methods. Finally, we demonstrate that Black Box Variational Inference lets us easily explore a wide space of models by quickly constructing and evaluating several models of longitudinal healthcare data. 1
Oneclass Collaborative Filtering with Random Graphs
"... The bane of oneclass collaborative filtering is interpreting and modelling the latent signal from the missing class. In this paper we present a novel Bayesian generative model for implicit collaborative filtering. It forms a core component of the Xbox Live architecture, and unlike previous approach ..."
Abstract

Cited by 12 (6 self)
 Add to MetaCart
(Show Context)
The bane of oneclass collaborative filtering is interpreting and modelling the latent signal from the missing class. In this paper we present a novel Bayesian generative model for implicit collaborative filtering. It forms a core component of the Xbox Live architecture, and unlike previous approaches, delineates the odds of a user disliking an item from simply being unaware of it. The latent signal is treated as an unobserved random graph connecting users with items they might have encountered. We demonstrate how largescale distributed learning can be achieved through a combination of stochastic gradient descent and mean field variational inference over random graph samples. A finegrained comparison is done against a state of the art baseline on real world data.
Stochastic Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation
"... There has been an explosion in the amount of digital text information available in recent years, leading to challenges of scale for traditional inference algorithms for topic models. Recent advances in stochastic variational inference algorithms for latent Dirichlet allocation (LDA) have made it fea ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
(Show Context)
There has been an explosion in the amount of digital text information available in recent years, leading to challenges of scale for traditional inference algorithms for topic models. Recent advances in stochastic variational inference algorithms for latent Dirichlet allocation (LDA) have made it feasible to learn topic models on very largescale corpora, but these methods do not currently take full advantage of the collapsed representation of the model. We propose a stochastic algorithm for collapsed variational Bayesian inference for LDA, which is simpler and more efficient than the state of the art method. In experiments on largescale text corpora, the algorithm was found to converge faster and often to a better solution than previous methods. Humansubject experiments also demonstrated that the method can learn coherent topics in seconds on small corpora, facilitating the use of topic models in interactive document analysis software.
Firefly Monte Carlo: Exact MCMC with subsets of data
 In 30th Conference on Uncertainty in Artificial Intelligence (UAI
, 2014
"... Markov chain Monte Carlo (MCMC) is a popular and successful generalpurpose tool for Bayesian inference. However, MCMC cannot be practically applied to large data sets because of the prohibitive cost of evaluating every likelihood term at every iteration. Here we present Firefly Monte Carlo (FlyMC ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
Markov chain Monte Carlo (MCMC) is a popular and successful generalpurpose tool for Bayesian inference. However, MCMC cannot be practically applied to large data sets because of the prohibitive cost of evaluating every likelihood term at every iteration. Here we present Firefly Monte Carlo (FlyMC) an auxiliary variable MCMC algorithm that only queries the likelihoods of a potentially small subset of the data at each iteration yet simulates from the exact posterior distribution, in contrast to recent proposals that are approximate even in the asymptotic limit. FlyMC is compatible with a wide variety of modern MCMC algorithms, and only requires a lower bound on the perdatum likelihood factors. In experiments, we find that FlyMC generates samples from the posterior more than an order of magnitude faster than regular MCMC, opening up MCMC methods to larger datasets than were previously considered feasible. 1
Reducing the Sampling Complexity of Topic Models
"... Inference in topic models typically involves a sampling step to associate latent variables with observations. Unfortunately the generative model loses sparsity as the amount of data increases, requiring O(k) operations per word for k topics. In this paper we propose an algorithm which scales linear ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
Inference in topic models typically involves a sampling step to associate latent variables with observations. Unfortunately the generative model loses sparsity as the amount of data increases, requiring O(k) operations per word for k topics. In this paper we propose an algorithm which scales linearly with the number of actually instantiated topics kd in the document. For large document collections and in structured hierarchical models kd k. This yields an order of magnitude speedup. Our method applies to a wide variety of statistical models such as PDP [16, 4] and HDP [19]. At its core is the idea that dense, slowly changing distributions can be approximated efficiently by the combination of a MetropolisHastings step, use of sparsity, and amortized constant time sampling via Walker’s alias method.
Stochastic gradient hamiltonian monte carlo
, 2014
"... Hamiltonian Monte Carlo (HMC) sampling methods provide a mechanism for defining distant proposals with high acceptance probabilities in a MetropolisHastings framework, enabling more efficient exploration of the state space than standard randomwalk proposals. The popularity of such methods has gro ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
Hamiltonian Monte Carlo (HMC) sampling methods provide a mechanism for defining distant proposals with high acceptance probabilities in a MetropolisHastings framework, enabling more efficient exploration of the state space than standard randomwalk proposals. The popularity of such methods has grown significantly in recent years. However, a limitation of HMC methods is the required gradient computation for simulation of the Hamiltonian dynamical system—such computation is infeasible in problems involving a large sample size or streaming data. Instead, we must rely on a noisy gradient estimate computed from a subset of the data. In this paper, we explore the properties of such a stochastic gradient HMC approach. Surprisingly, the natural implementation of the stochastic approximation can be arbitrarily bad. To address this problem we introduce a variant that uses secondorder Langevin dynamics with a friction term that counteracts the effects of the noisy gradient, maintaining the desired target distribution as the invariant distribution. Results on simulated data validate our theory. We also provide an application of our methods to a classification task using neural networks and to online Bayesian matrix factorization. 1.
Nested hierarchical dirichlet processes
, 2012
"... Abstract—We develop a nested hierarchical Dirichlet process (nHDP) for hierarchical topic modeling. The nHDP generalizes the nested Chinese restaurant process (nCRP) to allow each word to follow its own path to a topic node according to a perdocument distribution over the paths on a shared tree. Th ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
(Show Context)
Abstract—We develop a nested hierarchical Dirichlet process (nHDP) for hierarchical topic modeling. The nHDP generalizes the nested Chinese restaurant process (nCRP) to allow each word to follow its own path to a topic node according to a perdocument distribution over the paths on a shared tree. This alleviates the rigid, singlepath formulation assumed by the nCRP, allowing documents to easily express complex thematic borrowings. We derive a stochastic variational inference algorithm for the model, which enables efficient inference for massive collections of text documents. We demonstrate our algorithm on 1.8 million documents from The New York Times and 2.7 million documents fromWikipedia. Index Terms—Bayesian nonparametrics, Dirichlet process, topic modeling, stochastic optimization