Results 1  10
of
375
Hierarchical Dirichlet processes
 Journal of the American Statistical Association
, 2004
"... program. The authors wish to acknowledge helpful discussions with Lancelot James and Jim Pitman and the referees for useful comments. 1 We consider problems involving groups of data, where each observation within a group is a draw from a mixture model, and where it is desirable to share mixture comp ..."
Abstract

Cited by 926 (78 self)
 Add to MetaCart
(Show Context)
program. The authors wish to acknowledge helpful discussions with Lancelot James and Jim Pitman and the referees for useful comments. 1 We consider problems involving groups of data, where each observation within a group is a draw from a mixture model, and where it is desirable to share mixture components between groups. We assume that the number of mixture components is unknown a priori and is to be inferred from the data. In this setting it is natural to consider sets of Dirichlet processes, one for each group, where the wellknown clustering property of the Dirichlet process provides a nonparametric prior for the number of mixture components within each group. Given our desire to tie the mixture models in the various groups, we consider a hierarchical model, specifically one in which the base measure for the child Dirichlet processes is itself distributed according to a Dirichlet process. Such a base measure being discrete, the child Dirichlet processes necessarily share atoms. Thus, as desired, the mixture models in the different groups necessarily share mixture components. We discuss representations of hierarchical Dirichlet processes in terms of
Infinite Latent Feature Models and the Indian Buffet Process
, 2005
"... We define a probability distribution over equivalence classes of binary matrices with a finite number of rows and an unbounded number of columns. This distribution ..."
Abstract

Cited by 272 (45 self)
 Add to MetaCart
We define a probability distribution over equivalence classes of binary matrices with a finite number of rows and an unbounded number of columns. This distribution
Variational inference for Dirichlet process mixtures
 Bayesian Analysis
, 2005
"... Abstract. Dirichlet process (DP) mixture models are the cornerstone of nonparametric Bayesian statistics, and the development of MonteCarlo Markov chain (MCMC) sampling methods for DP mixtures has enabled the application of nonparametric Bayesian methods to a variety of practical data analysis prob ..."
Abstract

Cited by 240 (27 self)
 Add to MetaCart
(Show Context)
Abstract. Dirichlet process (DP) mixture models are the cornerstone of nonparametric Bayesian statistics, and the development of MonteCarlo Markov chain (MCMC) sampling methods for DP mixtures has enabled the application of nonparametric Bayesian methods to a variety of practical data analysis problems. However, MCMC sampling can be prohibitively slow, and it is important to explore alternatives. One class of alternatives is provided by variational methods, a class of deterministic algorithms that convert inference problems into optimization problems (Opper and Saad 2001; Wainwright and Jordan 2003). Thus far, variational methods have mainly been explored in the parametric setting, in particular within the formalism of the exponential family (Attias 2000; Ghahramani and Beal 2001; Blei et al. 2003). In this paper, we present a variational inference algorithm for DP mixtures. We present experiments that compare the algorithm to Gibbs sampling algorithms for DP mixtures of Gaussians and present an application to a largescale image analysis problem.
Computational Discovery of Gene Modules, Regulatory Networks and Expression Programs
, 2007
"... Highthroughput molecular data are revolutionizing biology by providing massive amounts of information about gene expression and regulation. Such information is applicable both to furthering our understanding of fundamental biology and to developing new diagnostic and treatment approaches for diseas ..."
Abstract

Cited by 232 (17 self)
 Add to MetaCart
Highthroughput molecular data are revolutionizing biology by providing massive amounts of information about gene expression and regulation. Such information is applicable both to furthering our understanding of fundamental biology and to developing new diagnostic and treatment approaches for diseases. However, novel mathematical methods are needed for extracting biological knowledge from highdimensional, complex and noisy data sources. In this thesis, I develop and apply three novel computational approaches for this task. The common theme of these approaches is that they seek to discover meaningful groups of genes, which confer robustness to noise and compress complex information into interpretable models. I first present the GRAM algorithm, which fuses information from genomewide expression and in vivo transcription factorDNA binding data to discover regulatory networks of
A hierarchical Bayesian language model based on Pitman–Yor processes
 In Coling/ACL, 2006. 9
, 2006
"... We propose a new hierarchical Bayesian ngram model of natural languages. Our model makes use of a generalization of the commonly used Dirichlet distributions called PitmanYor processes which produce powerlaw distributions more closely resembling those in natural languages. We show that an approxi ..."
Abstract

Cited by 143 (10 self)
 Add to MetaCart
(Show Context)
We propose a new hierarchical Bayesian ngram model of natural languages. Our model makes use of a generalization of the commonly used Dirichlet distributions called PitmanYor processes which produce powerlaw distributions more closely resembling those in natural languages. We show that an approximation to the hierarchical PitmanYor language model recovers the exact formulation of interpolated KneserNey, one of the best smoothing methods for ngram language models. Experiments verify that our model gives cross entropy results superior to interpolated KneserNey and comparable to modified KneserNey. 1
Multitask learning for classification with dirichlet process priors
 Journal of Machine Learning Research
, 2007
"... Multitask learning (MTL) is considered for logisticregression classifiers, based on a Dirichlet process (DP) formulation. A symmetric MTL (SMTL) formulation is considered in which classifiers for multiple tasks are learned jointly, with a variational Bayesian (VB) solution. We also consider an asy ..."
Abstract

Cited by 137 (11 self)
 Add to MetaCart
Multitask learning (MTL) is considered for logisticregression classifiers, based on a Dirichlet process (DP) formulation. A symmetric MTL (SMTL) formulation is considered in which classifiers for multiple tasks are learned jointly, with a variational Bayesian (VB) solution. We also consider an asymmetric MTL (AMTL) formulation in which the posterior density function from the SMTL model parameters, from previous tasks, is used as a prior for a new task; this approach has the significant advantage of not requiring storage and use of all previous data from prior tasks. The AMTL formulation is solved with a simple Markov Chain Monte Carlo (MCMC) construction. Comparisons are also made to simpler approaches, such as singletask learning, pooling of data across tasks, and simplified approximations to DP. A comprehensive analysis of algorithm performance is addressed through consideration of two data sets that are matched to the MTL problem.
Orderbased dependent dirichlet processes
 Journal of the American Statistical Association
"... In this paper we propose a new framework for Bayesian nonparametric modelling with continuous covariates. In particular, we allow the nonparametric distribution to depend on covariates through ordering the random variables building the weights in the stickbreaking representation. We focus mostly o ..."
Abstract

Cited by 122 (5 self)
 Add to MetaCart
In this paper we propose a new framework for Bayesian nonparametric modelling with continuous covariates. In particular, we allow the nonparametric distribution to depend on covariates through ordering the random variables building the weights in the stickbreaking representation. We focus mostly on the class of random distributions which induces a Dirichlet process at each covariate value. We derive the correlation between distributions at different covariate values, and use a point process to implement a practically useful type of ordering. Two main constructions with analytically known correlation structures are proposed. Practical and efficient computational methods are introduced. We apply our framework, though mixtures of these processes, to regression modelling, the modelling of stochastic volatility in time series data and spatial geostatistical modelling.
The infinite PCFG using hierarchical Dirichlet processes
 In EMNLP ’07
, 2007
"... We present a nonparametric Bayesian model of tree structures based on the hierarchical Dirichlet process (HDP). Our HDPPCFG model allows the complexity of the grammar to grow as more training data is available. In addition to presenting a fully Bayesian model for the PCFG, we also develop an effici ..."
Abstract

Cited by 118 (6 self)
 Add to MetaCart
(Show Context)
We present a nonparametric Bayesian model of tree structures based on the hierarchical Dirichlet process (HDP). Our HDPPCFG model allows the complexity of the grammar to grow as more training data is available. In addition to presenting a fully Bayesian model for the PCFG, we also develop an efficient variational inference procedure. On synthetic data, we recover the correct grammar without having to specify its complexity in advance. We also show that our techniques can be applied to fullscale parsing applications by demonstrating its effectiveness in learning statesplit grammars. 1
Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models
 PROC. IEEE
, 2008
"... Inference for Dirichlet process hierarchical models is typically performed using Markov chain Monte Carlo methods, which can be roughly categorised into marginal and conditional methods. The former integrate out analytically the infinitedimensional component of the hierarchical model and sample fro ..."
Abstract

Cited by 84 (5 self)
 Add to MetaCart
(Show Context)
Inference for Dirichlet process hierarchical models is typically performed using Markov chain Monte Carlo methods, which can be roughly categorised into marginal and conditional methods. The former integrate out analytically the infinitedimensional component of the hierarchical model and sample from the marginal distribution of the remaining variables using the Gibbs sampler. Conditional methods impute the Dirichlet process and update it as a component of the Gibbs sampler. Since this requires imputation of an infinitedimensional process, implementation of the conditional method has relied on finite approximations. In this paper we show how to avoid such approximations by designing two novel Markov chain Monte Carlo algorithms which sample from the exact posterior distribution of quantities of interest. The approximations are avoided by the new technique of retrospective sampling. We also show how the algorithms can obtain samples from functionals of the Dirichlet process. The marginal and the conditional methods are compared and a careful simulation study is included, which involves a nonconjugate model, different datasets and prior specifications.
Generalized weighted Chinese restaurant processes for species sampling mixture models
 STATISTICA SINICA
, 2003
"... The class of species sampling mixture models is introduced as an extension of semiparametric models based on the Dirichlet process to models based on the general class of species sampling priors, or equivalently the class of all exchangeable urn distributions. Using Fubini calculus in conjunction ..."
Abstract

Cited by 83 (11 self)
 Add to MetaCart
The class of species sampling mixture models is introduced as an extension of semiparametric models based on the Dirichlet process to models based on the general class of species sampling priors, or equivalently the class of all exchangeable urn distributions. Using Fubini calculus in conjunction with Pitman (1995, 1996), we derive characterizations of the posterior distribution in terms of a posterior partition distribution that extend the results of Lo (1984) for the Dirichlet process. These results provide a better understanding of models and have both theoretical and practical applications. To facilitate the use of our models we generalize the work in Brunner, Chan, James and Lo (2001) by extending their weighted Chinese restaurant (WCR) Monte Carlo procedure, an i.i.d. sequential importance sampling (SIS) procedure for approximating posterior mean functionals based on the Dirichlet process, to the case of approximation of mean functionals and additionally their posterior laws in species sampling mixture models. We also discuss collapsed Gibbs sampling, Pólya urn Gibbs sampling and a Pólya urn SIS scheme. Our framework allows for numerous applications, including multiplicative counting process models subject to weighted gamma processes, as well as nonparametric and semiparametric hierarchical models based on the Dirichlet process, its twoparameter extension, the PitmanYor process and finite dimensional Dirichlet priors.