Results 1  10
of
83
Hierarchical topic models and the nested Chinese restaurant process
 Advances in Neural Information Processing Systems
, 2004
"... We address the problem of learning topic hierarchies from data. The model selection problem in this domain is daunting—which of the large collection of possible trees to use? We take a Bayesian approach, generating an appropriate prior via a distribution on partitions that we refer to as the nested ..."
Abstract

Cited by 279 (32 self)
 Add to MetaCart
(Show Context)
We address the problem of learning topic hierarchies from data. The model selection problem in this domain is daunting—which of the large collection of possible trees to use? We take a Bayesian approach, generating an appropriate prior via a distribution on partitions that we refer to as the nested Chinese restaurant process. This nonparametric prior allows arbitrarily large branching factors and readily accommodates growing data collections. We build a hierarchical topic model by combining this prior with a likelihood that is based on a hierarchical variant of latent Dirichlet allocation. We illustrate our approach on simulated data and with an application to the modeling of NIPS abstracts. 1
Interpolating between types and tokens by estimating powerlaw generators
 In Advances in Neural Information Processing Systems 18
, 2006
"... Standard statistical models of language fail to capture one of the most striking properties of natural languages: the powerlaw distribution in the frequencies of word tokens. We present a framework for developing statistical models that generically produce powerlaws, augmenting standard generative ..."
Abstract

Cited by 121 (18 self)
 Add to MetaCart
Standard statistical models of language fail to capture one of the most striking properties of natural languages: the powerlaw distribution in the frequencies of word tokens. We present a framework for developing statistical models that generically produce powerlaws, augmenting standard generative models with an adaptor that produces the appropriate pattern of token frequencies. We show that taking a particular stochastic process – the PitmanYor process – as an adaptor justifies the appearance of type frequencies in formal analyses of natural language, and improves the performance of a model for unsupervised learning of morphology. 1
Adaptor grammars: a framework for specifying compositional nonparametric Bayesian models
 In Advances in Neural Information Processing Systems 19
, 2007
"... This paper introduces adaptor grammars, a class of probabilistic models of language that generalize probabilistic contextfree grammars (PCFGs). Adaptor grammars augment the probabilistic rules of PCFGs with “adaptors ” that can induce dependencies among successive uses. With a particular choice o ..."
Abstract

Cited by 115 (18 self)
 Add to MetaCart
(Show Context)
This paper introduces adaptor grammars, a class of probabilistic models of language that generalize probabilistic contextfree grammars (PCFGs). Adaptor grammars augment the probabilistic rules of PCFGs with “adaptors ” that can induce dependencies among successive uses. With a particular choice of adaptor, based on the PitmanYor process, nonparametric Bayesian models of language using Dirichlet processes and hierarchical Dirichlet processes can be written as simple grammars. We present a generalpurpose inference algorithm for adaptor grammars, making it easy to define and use such models, and illustrate how several existing nonparametric Bayesian models can be expressed within this framework. 1
Kernel stickbreaking processes
, 2007
"... Summary. This article proposes a class of kernel stickbreaking processes (KSBP) for uncountable collections of dependent random probability measures. The KSBP is constructed by first introducing an infinite sequence of random locations. Independent random probability measures and betadistributed ..."
Abstract

Cited by 73 (17 self)
 Add to MetaCart
(Show Context)
Summary. This article proposes a class of kernel stickbreaking processes (KSBP) for uncountable collections of dependent random probability measures. The KSBP is constructed by first introducing an infinite sequence of random locations. Independent random probability measures and betadistributed random weights are assigned to each location. Predictordependent random probability measures are then constructed by mixing over the locations, with stickbreaking probabilities expressed as a kernel multiplied by the beta weights. Some theoretical properties of the KSBP are described, including a covariatedependent prediction rule. A retrospective MCMC algorithm is developed for posterior computation, and the methods are illustrated using a simulated example and an epidemiologic application.
Bayesian density regression
 JOURNAL OF THE ROYAL STATISTICAL SOCIETY B
, 2007
"... This article considers Bayesian methods for density regression, allowing a random probability distribution to change flexibly with multiple predictors. The conditional response distribution is expressed as a nonparametric mixture of parametric densities, with the mixture distribution changing acc ..."
Abstract

Cited by 70 (27 self)
 Add to MetaCart
This article considers Bayesian methods for density regression, allowing a random probability distribution to change flexibly with multiple predictors. The conditional response distribution is expressed as a nonparametric mixture of parametric densities, with the mixture distribution changing according to location in the predictor space. A new class of priors for dependent random measures is proposed for the collection of random mixing measures at each location. The conditional prior for the random measure at a given location is expressed as a mixture of a Dirichlet process (DP) distributed innovation measure and neighboring random measures. This specification results in a coherent prior for the joint measure, with the marginal random measure at each location being a finite mixture of DP basis measures. Integrating out the infinitedimensional collection of mixing measures, we obtain a simple expression for the conditional distribution of the subjectspecific random variables, which generalizes the Pólya urn scheme. Properties are considered and a simple Gibbs sampling algorithm is developed for posterior computation. The methods are illustrated using simulated data examples and epidemiologic studies.
Poisson process partition calculus with an application to Bayesian Levy moving averages
, 2005
"... This article develops, and describes how to use, results concerning disintegrations of Poisson random measures. These results are fashioned as simple tools that can be tailormade to address inferential questions arising in a wide range of Bayesian nonparametric and spatial statistical models. The P ..."
Abstract

Cited by 54 (13 self)
 Add to MetaCart
(Show Context)
This article develops, and describes how to use, results concerning disintegrations of Poisson random measures. These results are fashioned as simple tools that can be tailormade to address inferential questions arising in a wide range of Bayesian nonparametric and spatial statistical models. The Poisson disintegration method is based on the formal statement of two results concerning a Laplace functional change of measure and a Poisson Palm/Fubini calculus in terms of random partitions of the integers {1,...,n}. The techniques are analogous to, but much more general than, techniques for the Dirichlet process and weighted gamma process developed in [Ann. Statist. 12
Improving nonparameteric Bayesian inference: experiments on unsupervised word segmentation with adaptor grammars
"... One of the reasons nonparametric Bayesian inference is attracting attention in computational linguistics is because it provides a principled way of learning the units of generalization together with their probabilities. Adaptor grammars are a framework for defining a variety of hierarchical nonparam ..."
Abstract

Cited by 36 (4 self)
 Add to MetaCart
(Show Context)
One of the reasons nonparametric Bayesian inference is attracting attention in computational linguistics is because it provides a principled way of learning the units of generalization together with their probabilities. Adaptor grammars are a framework for defining a variety of hierarchical nonparametric Bayesian models. This paper investigates some of the choices that arise in formulating adaptor grammars and associated inference procedures, and shows that they can have a dramatic impact on performance in an unsupervised word segmentation task. With appropriate adaptor grammars and inference procedures we achieve an 87 % word token fscore on the standard Brent version of the BernsteinRatner corpus, which is an error reduction of over 35 % over the best previously reported results for this corpus. 1
Bayesian Model Selection in Finite Mixtures by Marginal Density Decompositions
 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 2001
"... ..."