Results 1  10
of
356
Hierarchical Dirichlet processes.
 Journal of the American Statistical Association,
, 2006
"... We consider problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups. We assume that the number of mixture components is unknown a priori and is to be inferred from the data. In this s ..."
Abstract

Cited by 942 (78 self)
 Add to MetaCart
(Show Context)
We consider problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups. We assume that the number of mixture components is unknown a priori and is to be inferred from the data. In this setting it is natural to consider sets of Dirichlet processes, one for each group, where the wellknown clustering property of the Dirichlet process provides a nonparametric prior for the number of mixture components within each group. Given our desire to tie the mixture models in the various groups, we consider a hierarchical model, specifically one in which the base measure for the child Dirichlet processes is itself distributed according to a Dirichlet process. Such a base measure being discrete, the child Dirichlet processes necessarily share atoms. Thus, as desired, the mixture models in the different groups necessarily share mixture components. We discuss representations of hierarchical Dirichlet processes in terms of a stickbreaking process, and a generalization of the Chinese restaurant process that we refer to as the "Chinese restaurant franchise." We present Markov chain Monte Carlo algorithms for posterior inference in hierarchical Dirichlet process mixtures and describe applications to problems in information retrieval and text modeling.
Gibbs Sampling Methods for StickBreaking Priors
"... ... In this paper we present two general types of Gibbs samplers that can be used to fit posteriors of Bayesian hierarchical models based on stickbreaking priors. The first type of Gibbs sampler, referred to as a Polya urn Gibbs sampler, is a generalized version of a widely used Gibbs sampling meth ..."
Abstract

Cited by 388 (19 self)
 Add to MetaCart
(Show Context)
... In this paper we present two general types of Gibbs samplers that can be used to fit posteriors of Bayesian hierarchical models based on stickbreaking priors. The first type of Gibbs sampler, referred to as a Polya urn Gibbs sampler, is a generalized version of a widely used Gibbs sampling method currently employed for Dirichlet process computing. This method applies to stickbreaking priors with a known P'olya urn characterization; that is priors with an explicit and simple prediction rule. Our second method, the blocked Gibbs sampler, is based on a entirely different approach that works by directly sampling values from the posterior of the random measure. The blocked Gibbs sampler can be viewed as a more general approach as it works without requiring an explicit prediction rule. We find that the blocked Gibbs avoids some of the limitations seen with the Polya urn approach and should be simpler for nonexperts to use.
Coalescents With Multiple Collisions
 Ann. Probab
, 1999
"... For each finite measure on [0 ..."
(Show Context)
A hierarchical Bayesian language model based on Pitman–Yor processes
 In Coling/ACL, 2006. 9
, 2006
"... We propose a new hierarchical Bayesian ngram model of natural languages. Our model makes use of a generalization of the commonly used Dirichlet distributions called PitmanYor processes which produce powerlaw distributions more closely resembling those in natural languages. We show that an approxi ..."
Abstract

Cited by 148 (10 self)
 Add to MetaCart
(Show Context)
We propose a new hierarchical Bayesian ngram model of natural languages. Our model makes use of a generalization of the commonly used Dirichlet distributions called PitmanYor processes which produce powerlaw distributions more closely resembling those in natural languages. We show that an approximation to the hierarchical PitmanYor language model recovers the exact formulation of interpolated KneserNey, one of the best smoothing methods for ngram language models. Experiments verify that our model gives cross entropy results superior to interpolated KneserNey and comparable to modified KneserNey. 1
Interpolating between types and tokens by estimating powerlaw generators
 In Advances in Neural Information Processing Systems 18
, 2006
"... Standard statistical models of language fail to capture one of the most striking properties of natural languages: the powerlaw distribution in the frequencies of word tokens. We present a framework for developing statistical models that generically produce powerlaws, augmenting standard generative ..."
Abstract

Cited by 123 (19 self)
 Add to MetaCart
Standard statistical models of language fail to capture one of the most striking properties of natural languages: the powerlaw distribution in the frequencies of word tokens. We present a framework for developing statistical models that generically produce powerlaws, augmenting standard generative models with an adaptor that produces the appropriate pattern of token frequencies. We show that taking a particular stochastic process – the PitmanYor process – as an adaptor justifies the appearance of type frequencies in formal analyses of natural language, and improves the performance of a model for unsupervised learning of morphology. 1
Adaptor grammars: a framework for specifying compositional nonparametric Bayesian models
 In Advances in Neural Information Processing Systems 19
, 2007
"... This paper introduces adaptor grammars, a class of probabilistic models of language that generalize probabilistic contextfree grammars (PCFGs). Adaptor grammars augment the probabilistic rules of PCFGs with “adaptors ” that can induce dependencies among successive uses. With a particular choice o ..."
Abstract

Cited by 117 (19 self)
 Add to MetaCart
(Show Context)
This paper introduces adaptor grammars, a class of probabilistic models of language that generalize probabilistic contextfree grammars (PCFGs). Adaptor grammars augment the probabilistic rules of PCFGs with “adaptors ” that can induce dependencies among successive uses. With a particular choice of adaptor, based on the PitmanYor process, nonparametric Bayesian models of language using Dirichlet processes and hierarchical Dirichlet processes can be written as simple grammars. We present a generalpurpose inference algorithm for adaptor grammars, making it easy to define and use such models, and illustrate how several existing nonparametric Bayesian models can be expressed within this framework. 1
Brownian Excursions, Critical Random Graphs and the Multiplicative Coalescent
, 1996
"... Let (B t (s); 0 s ! 1) be reflecting inhomogeneous Brownian motion with drift t \Gamma s at time s, started with B t (0) = 0. Consider the random graph G(n; n \Gamma1 +tn \Gamma4=3 ), whose largest components have size of order n 2=3 . Normalizing by n \Gamma2=3 , the asymptotic joint d ..."
Abstract

Cited by 106 (8 self)
 Add to MetaCart
Let (B t (s); 0 s ! 1) be reflecting inhomogeneous Brownian motion with drift t \Gamma s at time s, started with B t (0) = 0. Consider the random graph G(n; n \Gamma1 +tn \Gamma4=3 ), whose largest components have size of order n 2=3 . Normalizing by n \Gamma2=3 , the asymptotic joint distribution of component sizes is the same as the joint distribution of excursion lengths of B t (Corollary 2). The dynamics of merging of components as t increases are abstracted to define the multiplicative coalescent process. The states of this process are vectors x of nonnegative real cluster sizes (x i ), and clusters with sizes x i and x j merge at rate x i x j . The multiplicative coalescent is shown to be a Feller process on l 2 . The random graph limit specifies the standard multiplicative coalescent, which starts from infinitesimally small clusters at time \Gamma1: the existence of such a process is not obvious. AMS 1991 subject classifications. 60C05, 60J50, Key words and phras...
The Standard Additive Coalescent
, 1997
"... Regard an element of the set \Delta := f(x 1 ; x 2 ; : : :) : x 1 x 2 : : : 0; X i x i = 1g as a fragmentation of unit mass into clusters of masses x i . The additive coalescent of Evans and Pitman (1997) is the \Deltavalued Markov process in which pairs of clusters of masses fx i ; x j g mer ..."
Abstract

Cited by 87 (21 self)
 Add to MetaCart
Regard an element of the set \Delta := f(x 1 ; x 2 ; : : :) : x 1 x 2 : : : 0; X i x i = 1g as a fragmentation of unit mass into clusters of masses x i . The additive coalescent of Evans and Pitman (1997) is the \Deltavalued Markov process in which pairs of clusters of masses fx i ; x j g merge into a cluster of mass x i +x j at rate x i +x j . They showed that a version (X 1 (t); \Gamma1 ! t ! 1) of this process arises as a n !1 weak limit of the process started at time \Gamma 1 2 log n with n clusters of mass 1=n. We show this standard additive coalescent may be constructed from the continuum random tree of Aldous (1991,1993) by Poisson splitting along the skeleton of the tree. We describe the distribution of X 1 (t) on \Delta at a fixed time t. We show that the size of the cluster containing a given atom, as a process in t, has a simple representation in terms of the stable subordinator of index 1=2. As t ! \Gamma1, we establish a Gaussian limit for (centered and norm...
Generalized weighted Chinese restaurant processes for species sampling mixture models
 STATISTICA SINICA
, 2003
"... The class of species sampling mixture models is introduced as an extension of semiparametric models based on the Dirichlet process to models based on the general class of species sampling priors, or equivalently the class of all exchangeable urn distributions. Using Fubini calculus in conjunction ..."
Abstract

Cited by 86 (11 self)
 Add to MetaCart
The class of species sampling mixture models is introduced as an extension of semiparametric models based on the Dirichlet process to models based on the general class of species sampling priors, or equivalently the class of all exchangeable urn distributions. Using Fubini calculus in conjunction with Pitman (1995, 1996), we derive characterizations of the posterior distribution in terms of a posterior partition distribution that extend the results of Lo (1984) for the Dirichlet process. These results provide a better understanding of models and have both theoretical and practical applications. To facilitate the use of our models we generalize the work in Brunner, Chan, James and Lo (2001) by extending their weighted Chinese restaurant (WCR) Monte Carlo procedure, an i.i.d. sequential importance sampling (SIS) procedure for approximating posterior mean functionals based on the Dirichlet process, to the case of approximation of mean functionals and additionally their posterior laws in species sampling mixture models. We also discuss collapsed Gibbs sampling, Pólya urn Gibbs sampling and a Pólya urn SIS scheme. Our framework allows for numerous applications, including multiplicative counting process models subject to weighted gamma processes, as well as nonparametric and semiparametric hierarchical models based on the Dirichlet process, its twoparameter extension, the PitmanYor process and finite dimensional Dirichlet priors.
Kernel stickbreaking processes
, 2007
"... Summary. This article proposes a class of kernel stickbreaking processes (KSBP) for uncountable collections of dependent random probability measures. The KSBP is constructed by first introducing an infinite sequence of random locations. Independent random probability measures and betadistributed ..."
Abstract

Cited by 74 (17 self)
 Add to MetaCart
Summary. This article proposes a class of kernel stickbreaking processes (KSBP) for uncountable collections of dependent random probability measures. The KSBP is constructed by first introducing an infinite sequence of random locations. Independent random probability measures and betadistributed random weights are assigned to each location. Predictordependent random probability measures are then constructed by mixing over the locations, with stickbreaking probabilities expressed as a kernel multiplied by the beta weights. Some theoretical properties of the KSBP are described, including a covariatedependent prediction rule. A retrospective MCMC algorithm is developed for posterior computation, and the methods are illustrated using a simulated example and an epidemiologic application.