• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Propagation algorithms for variational Bayesian learning. (2001)

by Z Ghahramani, M Beal
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 139
Next 10 →

Learning in graphical models

by Michael I. Jordan - STATISTICAL SCIENCE , 2004
"... Statistical applications in fields such as bioinformatics, information retrieval, speech processing, image processing and communications often involve large-scale models in which thousands or millions of random variables are linked in complex ways. Graphical models provide a general methodology for ..."
Abstract - Cited by 806 (10 self) - Add to MetaCart
Statistical applications in fields such as bioinformatics, information retrieval, speech processing, image processing and communications often involve large-scale models in which thousands or millions of random variables are linked in complex ways. Graphical models provide a general methodology for approaching these problems, and indeed many of the models developed by researchers in these applied fields are instances of the general graphical model formalism. We review some of the basic ideas underlying graphical models, including the algorithmic ideas that allow graphical models to be deployed in large-scale data analysis problems. We also present examples of graphical models in bioinformatics, error-control coding and language processing.

Dynamic Bayesian Networks: Representation, Inference and Learning

by Kevin Patrick Murphy , 2002
"... Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and bio-sequence analysis, and KFMs have bee ..."
Abstract - Cited by 770 (3 self) - Add to MetaCart
Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and bio-sequence analysis, and KFMs have been used for problems ranging from tracking planes and missiles to predicting the economy. However, HMMs and KFMs are limited in their “expressive power”. Dynamic Bayesian Networks (DBNs) generalize HMMs by allowing the state space to be represented in factored form, instead of as a single discrete random variable. DBNs generalize KFMs by allowing arbitrary probability distributions, not just (unimodal) linear-Gaussian. In this thesis, I will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in DBNs, and how to learn DBN models from sequential data. In particular, the main novel technical contributions of this thesis are as follows: a way of representing Hierarchical HMMs as DBNs, which enables inference to be done in O(T) time instead of O(T 3), where T is the length of the sequence; an exact smoothing algorithm that takes O(log T) space instead of O(T); a simple way of using the junction tree algorithm for online inference in DBNs; new complexity bounds on exact online inference in DBNs; a new deterministic approximate inference algorithm called factored frontier; an analysis of the relationship between the BK algorithm and loopy belief propagation; a way of applying Rao-Blackwellised particle filtering to DBNs in general, and the SLAM (simultaneous localization and mapping) problem in particular; a way of extending the structural EM algorithm to DBNs; and a variety of different applications of DBNs. However, perhaps the main value of the thesis is its catholic presentation of the field of sequential data modelling.

The Bayes Net Toolbox for MATLAB

by Kevin P. Murphy - Computing Science and Statistics , 2001
"... The Bayes Net Toolbox (BNT) is an open-source Matlab package for directed graphical models. BNT supports many kinds of nodes (probability distributions), exact and approximate inference, parameter and structure learning, and static and dynamic models. BNT is widely used in teaching and research: the ..."
Abstract - Cited by 250 (1 self) - Add to MetaCart
The Bayes Net Toolbox (BNT) is an open-source Matlab package for directed graphical models. BNT supports many kinds of nodes (probability distributions), exact and approximate inference, parameter and structure learning, and static and dynamic models. BNT is widely used in teaching and research: the web page has received over 28,000 hits since May 2000. In this paper, we discuss a broad spectrum of issues related to graphical models (directed and undirected), and describe, at a high-level, how BNT was designed to cope with them all. We also compare BNT to other software packages for graphical models, and to the nascent OpenBayes effort.
(Show Context)

Citation Context

...e, which give tighter lower (and upper) bounds. See [JGJS98] for a tutorial. Recently this technique has been extended to do approximate Bayesian inference, using a technique called Variational Bayes =-=[GB00-=-]. Belief propagation (BP). This entails applying the message passing algorithm to the original graph, even if it has loops (undirected cycles). Originally this was believed to be unsound, but the ou...

Variational inference for Dirichlet process mixtures

by David M. Blei, Michael I. Jordan - Bayesian Analysis , 2005
"... Abstract. Dirichlet process (DP) mixture models are the cornerstone of nonparametric Bayesian statistics, and the development of Monte-Carlo Markov chain (MCMC) sampling methods for DP mixtures has enabled the application of nonparametric Bayesian methods to a variety of practical data analysis prob ..."
Abstract - Cited by 244 (27 self) - Add to MetaCart
Abstract. Dirichlet process (DP) mixture models are the cornerstone of nonparametric Bayesian statistics, and the development of Monte-Carlo Markov chain (MCMC) sampling methods for DP mixtures has enabled the application of nonparametric Bayesian methods to a variety of practical data analysis problems. However, MCMC sampling can be prohibitively slow, and it is important to explore alternatives. One class of alternatives is provided by variational methods, a class of deterministic algorithms that convert inference problems into optimization problems (Opper and Saad 2001; Wainwright and Jordan 2003). Thus far, variational methods have mainly been explored in the parametric setting, in particular within the formalism of the exponential family (Attias 2000; Ghahramani and Beal 2001; Blei et al. 2003). In this paper, we present a variational inference algorithm for DP mixtures. We present experiments that compare the algorithm to Gibbs sampling algorithms for DP mixtures of Gaussians and present an application to a large-scale image analysis problem.
(Show Context)

Citation Context

...aad, 2001; Wainwright and Jordan, 2003). Thus far, variational methods have mainly been explored in the parametric setting, in particular within the formalism of the exponential family (Attias, 2000; =-=Ghahramani and Beal, 2001-=-; Blei et al., 2003). In this paper, we present a variational inference algorithm for DP mixtures. We present experiments that compare the algorithm to Gibbs sampling algorithms for DP mixtures of Gau...

Multi-task learning for classification with dirichlet process priors

by Ya Xue, Xuejun Liao, Lawrence Carin, Balaji Krishnapuram - Journal of Machine Learning Research , 2007
"... Multi-task learning (MTL) is considered for logistic-regression classifiers, based on a Dirichlet process (DP) formulation. A symmetric MTL (SMTL) formulation is considered in which classifiers for multiple tasks are learned jointly, with a variational Bayesian (VB) solution. We also consider an asy ..."
Abstract - Cited by 140 (12 self) - Add to MetaCart
Multi-task learning (MTL) is considered for logistic-regression classifiers, based on a Dirichlet process (DP) formulation. A symmetric MTL (SMTL) formulation is considered in which classifiers for multiple tasks are learned jointly, with a variational Bayesian (VB) solution. We also consider an asymmetric MTL (AMTL) formulation in which the posterior density function from the SMTL model parameters, from previous tasks, is used as a prior for a new task; this approach has the significant advantage of not requiring storage and use of all previous data from prior tasks. The AMTL formulation is solved with a simple Markov Chain Monte Carlo (MCMC) construction. Comparisons are also made to simpler approaches, such as single-task learning, pooling of data across tasks, and simplified approximations to DP. A comprehensive analysis of algorithm performance is addressed through consideration of two data sets that are matched to the MTL problem.
(Show Context)

Citation Context

... convergence of the DP Gibbs sampler impede its application in many practical situations. In this work, we employ a computationally efficient approach, mean-field variational Bayesian (VB) inference (=-=Ghahramani and Beal, 2001-=-). The VB method approximates the true posterior p(Z|{Dm} M m=1 , Φ) by a variational distribution q(Z). It converts computation of posteriors into an optimization problem of minimizing the Kullback-L...

Stochastic Variational Inference

by Matt Hoffman, David M. Blei, Chong Wang, John Paisley - JOURNAL OF MACHINE LEARNING RESEARCH (2013, IN PRESS) , 2013
"... We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet allocation and the hierarchical Dirichlet proce ..."
Abstract - Cited by 131 (27 self) - Add to MetaCart
We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet allocation and the hierarchical Dirichlet process topic model. Using stochastic variational inference, we analyze several large collections of documents: 300K articles from Nature, 1.8M articles from The New York Times, and 3.8M articles from Wikipedia. Stochastic inference can easily handle data sets of this size and outperforms traditional variational inference, which can only handle a smaller subset. (We also show that the Bayesian nonparametric topic model outperforms its parametric counterpart.) Stochastic variational inference lets us apply complex Bayesian models to massive data sets.
(Show Context)

Citation Context

...rs began to understand the potential for variational inference in more general settings and developed generic algorithms for conjugate exponential-family models (Attias, 1999, 2000; Wiegerinck, 2000; =-=Ghahramani and Beal, 2001-=-; Xing et al., 2003). These innovations led to automated variational inference, allowing a practitioner to write down a model and immediately use variational inference to estimate its posterior (Bisho...

Bayesian inference and optimal design in the sparse linear model

by Matthias W. Seeger, Martin Wainwright - Workshop on Artificial Intelligence and Statistics
"... The linear model with sparsity-favouring prior on the coefficients has important applications in many different domains. In machine learning, most methods to date search for maximum a posteriori sparse solutions and neglect to represent posterior uncertainties. In this paper, we address problems of ..."
Abstract - Cited by 111 (13 self) - Add to MetaCart
The linear model with sparsity-favouring prior on the coefficients has important applications in many different domains. In machine learning, most methods to date search for maximum a posteriori sparse solutions and neglect to represent posterior uncertainties. In this paper, we address problems of Bayesian optimal design (or experiment planning), for which accurate estimates of uncertainty are essential. To this end, we employ expectation propagation approximate inference for the linear model with Laplace prior, giving new insight into numerical stability properties and proposing a robust algorithm. We also show how to estimate model hyperparameters by empirical Bayesian maximisation of the marginal likelihood, and propose ideas in order to scale up the method to very large underdetermined problems. We demonstrate the versatility of our framework on the application of gene regulatory network identification from micro-array expression data, where both the Laplace prior and the active experimental design approach are shown to result in significant improvements. We also address the problem of sparse coding of natural images, and show how our framework can be used for compressive sensing tasks. Part of this work appeared in Seeger et al. (2007b). The gene network identification application appears in Steinke et al. (2007).
(Show Context)

Citation Context

...ich conclusions can be drawn, is subject to future work. A comparison between approximate inference techniques would be incomplete without including variational mean field Bayes (VMFB) (Attias, 2000; =-=Ghahramani and Beal, 2001-=-), maybe the most well known variational technique in the moment. It is also simply known as “variational Bayes” (see www.variational-bayes.org), although we understand this term as encompassing other...

Variational Extensions to EM and Multinomial PCA

by Wray Buntine - In ECML 2002 , 2002
"... Several authors in recent years have proposed discrete analogues to principle component analysis intended to handle discrete or positive only data, for instance suited to analyzing sets of documents. Methods include non-negative matrix factorization, probabilistic latent semantic analysis, and laten ..."
Abstract - Cited by 95 (14 self) - Add to MetaCart
Several authors in recent years have proposed discrete analogues to principle component analysis intended to handle discrete or positive only data, for instance suited to analyzing sets of documents. Methods include non-negative matrix factorization, probabilistic latent semantic analysis, and latent Dirichlet allocation. This paper begins with a review of the basic theory of the variational extension to the expectation maximization algorithm, and then presents discrete component finding algorithms in that light. Experiments are conducted on both bigram word data and document bag-of-word to expose some of the subtleties of this new class of algorithms.
(Show Context)

Citation Context

...ed. Multinomial PCA { ECML'02 3 3 Background Theory The theory of exponential family distributions and Kullback-Leibler approximations is brie y reviewed here. The formulations of Ghahramani and Beal =-=[10]-=- and Buntine [11] are roughly followed. A notation convention used here is that indicessi; j; k; l in sums and products always range over 1 to I ; J; K;L respectively. i usually denotes a sample index...

An Unsupervised Ensemble Learning Method for Nonlinear Dynamic State-Space Models

by Harri Valpola, Juha Karhunen - Neural Computation , 2001
"... A Bayesian ensemble learning method is introduced for unsupervised extraction of dynamic processes from noisy data. The data are assumed to be generated by an unknown nonlinear mapping from unknown factors. The dynamics of the factors are modeled using a nonlinear statespace model. The nonlinear map ..."
Abstract - Cited by 91 (32 self) - Add to MetaCart
A Bayesian ensemble learning method is introduced for unsupervised extraction of dynamic processes from noisy data. The data are assumed to be generated by an unknown nonlinear mapping from unknown factors. The dynamics of the factors are modeled using a nonlinear statespace model. The nonlinear mappings in the model are represented using multilayer perceptron networks. The proposed method is computationally demanding, but it allows the use of higher dimensional nonlinear latent variable models than other existing approaches. Experiments with chaotic data show that the new method is able to blindly estimate the factors and the dynamic process which have generated the data. It clearly outperforms currently available nonlinear prediction techniques in this very di#cult test problem.
(Show Context)

Citation Context

...each other. It is also possible that the iterations become unstable. For linear gaussian models it is possible to derive algorithms similar to Kalman smoothing using ensemble learning as was done in (=-=Ghahramani and Beal, 2001-=-). Our algorithm is designed for learning nonlinear models, and only propagates information one step forward and backward in time in the forward and backward phase. This makes learning stable but does...

A Generalized Mean Field Algorithm for Variational Inference in Exponential Families

by Eric P. Xing, Michael I. Jordan, Stuart Russell , 2003
"... We present a class of generalized mean field (GMF) algorithms for approximate inference in exponential family graphical models which is analogous to the generalized belief propagation (GBP) or cluster variational methods. While those methods are based on... ..."
Abstract - Cited by 82 (18 self) - Add to MetaCart
We present a class of generalized mean field (GMF) algorithms for approximate inference in exponential family graphical models which is analogous to the generalized belief propagation (GBP) or cluster variational methods. While those methods are based on...
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University