Results 1  10
of
123
Hierarchical Dirichlet processes.
 Journal of the American Statistical Association,
, 2006
"... We consider problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups. We assume that the number of mixture components is unknown a priori and is to be inferred from the data. In this s ..."
Abstract

Cited by 942 (78 self)
 Add to MetaCart
(Show Context)
We consider problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups. We assume that the number of mixture components is unknown a priori and is to be inferred from the data. In this setting it is natural to consider sets of Dirichlet processes, one for each group, where the wellknown clustering property of the Dirichlet process provides a nonparametric prior for the number of mixture components within each group. Given our desire to tie the mixture models in the various groups, we consider a hierarchical model, specifically one in which the base measure for the child Dirichlet processes is itself distributed according to a Dirichlet process. Such a base measure being discrete, the child Dirichlet processes necessarily share atoms. Thus, as desired, the mixture models in the different groups necessarily share mixture components. We discuss representations of hierarchical Dirichlet processes in terms of a stickbreaking process, and a generalization of the Chinese restaurant process that we refer to as the "Chinese restaurant franchise." We present Markov chain Monte Carlo algorithms for posterior inference in hierarchical Dirichlet process mixtures and describe applications to problems in information retrieval and text modeling.
A hierarchical Bayesian language model based on Pitman–Yor processes
 In Coling/ACL, 2006. 9
, 2006
"... We propose a new hierarchical Bayesian ngram model of natural languages. Our model makes use of a generalization of the commonly used Dirichlet distributions called PitmanYor processes which produce powerlaw distributions more closely resembling those in natural languages. We show that an approxi ..."
Abstract

Cited by 148 (10 self)
 Add to MetaCart
(Show Context)
We propose a new hierarchical Bayesian ngram model of natural languages. Our model makes use of a generalization of the commonly used Dirichlet distributions called PitmanYor processes which produce powerlaw distributions more closely resembling those in natural languages. We show that an approximation to the hierarchical PitmanYor language model recovers the exact formulation of interpolated KneserNey, one of the best smoothing methods for ngram language models. Experiments verify that our model gives cross entropy results superior to interpolated KneserNey and comparable to modified KneserNey. 1
The nested chinese restaurant process and bayesian inference of topic hierarchies
, 2007
"... We present the nested Chinese restaurant process (nCRP), a stochastic process which assigns probability distributions to infinitelydeep, infinitelybranching trees. We show how this stochastic process can be used as a prior distribution in a Bayesian nonparametric model of document collections. Spe ..."
Abstract

Cited by 128 (15 self)
 Add to MetaCart
(Show Context)
We present the nested Chinese restaurant process (nCRP), a stochastic process which assigns probability distributions to infinitelydeep, infinitelybranching trees. We show how this stochastic process can be used as a prior distribution in a Bayesian nonparametric model of document collections. Specifically, we present an application to information retrieval in which documents are modeled as paths down a random tree, and the preferential attachment dynamics of the nCRP leads to clustering of documents according to sharing of topics at multiple levels of abstraction. Given a corpus of documents, a posterior inference algorithm finds an approximation to a posterior distribution over trees, topics and allocations of words to levels of the tree. We demonstrate this algorithm on collections of scientific abstracts from several journals. This model exemplifies a recent trend in statistical machine learning—the use of Bayesian nonparametric methods to infer distributions on flexible data structures.
Adaptor grammars: a framework for specifying compositional nonparametric Bayesian models
 In Advances in Neural Information Processing Systems 19
, 2007
"... This paper introduces adaptor grammars, a class of probabilistic models of language that generalize probabilistic contextfree grammars (PCFGs). Adaptor grammars augment the probabilistic rules of PCFGs with “adaptors ” that can induce dependencies among successive uses. With a particular choice o ..."
Abstract

Cited by 117 (19 self)
 Add to MetaCart
(Show Context)
This paper introduces adaptor grammars, a class of probabilistic models of language that generalize probabilistic contextfree grammars (PCFGs). Adaptor grammars augment the probabilistic rules of PCFGs with “adaptors ” that can induce dependencies among successive uses. With a particular choice of adaptor, based on the PitmanYor process, nonparametric Bayesian models of language using Dirichlet processes and hierarchical Dirichlet processes can be written as simple grammars. We present a generalpurpose inference algorithm for adaptor grammars, making it easy to define and use such models, and illustrate how several existing nonparametric Bayesian models can be expressed within this framework. 1
A bayesian framework for word segmentation: Exploring the effects of context
 In 46th Annual Meeting of the ACL
, 2009
"... Since the experiments of Saffran et al. (1996a), there has been a great deal of interest in the question of how statistical regularities in the speech stream might be used by infants to begin to identify individual words. In this work, we use computational modeling to explore the effects of differen ..."
Abstract

Cited by 110 (30 self)
 Add to MetaCart
Since the experiments of Saffran et al. (1996a), there has been a great deal of interest in the question of how statistical regularities in the speech stream might be used by infants to begin to identify individual words. In this work, we use computational modeling to explore the effects of different assumptions the learner might make regarding the nature of words – in particular, how these assumptions affect the kinds of words that are segmented from a corpus of transcribed childdirected speech. We develop several models within a Bayesian ideal observer framework, and use them to examine the consequences of assuming either that words are independent units, or units that help to predict other units. We show through empirical and theoretical results that the assumption of independence causes the learner to undersegment the corpus, with many two and threeword sequences (e.g. what’s that, do you, in the house) misidentified as individual words. In contrast, when the learner assumes that words are predictive, the resulting segmentation is far more accurate. These results indicate that taking context into account is important for a statistical word segmentation strategy to be successful, and raise the possibility that even young infants may be able to exploit more subtle statistical patterns than have usually been considered. 1
Contextual dependencies in unsupervised word segmentation
 In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics
, 2006
"... Developing better methods for segmenting continuous text into words is important for improving the processing of Asian languages, and may shed light on how humans learn to segment speech. We propose two new Bayesian word segmentation methods that assume unigram and bigram models of word dependencies ..."
Abstract

Cited by 83 (17 self)
 Add to MetaCart
(Show Context)
Developing better methods for segmenting continuous text into words is important for improving the processing of Asian languages, and may shed light on how humans learn to segment speech. We propose two new Bayesian word segmentation methods that assume unigram and bigram models of word dependencies respectively. The bigram model greatly outperforms the unigram model (and previous probabilistic models), demonstrating the importance of such dependencies for word segmentation. We also show that previous probabilistic models rely crucially on suboptimal search procedures. 1
Stickbreaking construction for the Indian buffet process
 In Proceedings of the International Conference on Artificial Intelligence and Statistics
"... The Indian buffet process (IBP) is a Bayesian nonparametric distribution whereby objects are modelled using an unbounded number of latent features. In this paper we derive a stickbreaking representation for the IBP. Based on this new representation, we develop slice samplers for the IBP that are ef ..."
Abstract

Cited by 79 (13 self)
 Add to MetaCart
The Indian buffet process (IBP) is a Bayesian nonparametric distribution whereby objects are modelled using an unbounded number of latent features. In this paper we derive a stickbreaking representation for the IBP. Based on this new representation, we develop slice samplers for the IBP that are efficient, easy to implement and are more generally applicable than the currently available Gibbs sampler. This representation, along with the work of Thibaux and Jordan [17], also illuminates interesting theoretical connections between the IBP, Chinese restaurant processes, Beta processes and Dirichlet processes. 1
Poverty of the stimulus? A rational approach
 In the Proceedings of the 2006 Cognitive Science conference. 2006
, 2006
"... The Poverty of the Stimulus (PoS) argument holds that children do not receive enough evidence to infer the existence of core aspects of language, such as the dependence of linguistic rules on hierarchical phrase structure. We reevaluate one version of this argument with a Bayesian model of grammar i ..."
Abstract

Cited by 58 (9 self)
 Add to MetaCart
The Poverty of the Stimulus (PoS) argument holds that children do not receive enough evidence to infer the existence of core aspects of language, such as the dependence of linguistic rules on hierarchical phrase structure. We reevaluate one version of this argument with a Bayesian model of grammar induction, and show that a rational learner without any initial languagespecific biases could learn this dependency given typical childdirected input. This choice enables the learner to master aspects of syntax, such as the auxiliary fronting rule in interrogative formation, even without having heard directly relevant data (e.g., interrogatives containing an auxiliary in a relative clause in the subject NP).