Results 1  10
of
148
Hierarchical Dirichlet processes.
 Journal of the American Statistical Association,
, 2006
"... We consider problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups. We assume that the number of mixture components is unknown a priori and is to be inferred from the data. In this s ..."
Abstract

Cited by 942 (78 self)
 Add to MetaCart
(Show Context)
We consider problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups. We assume that the number of mixture components is unknown a priori and is to be inferred from the data. In this setting it is natural to consider sets of Dirichlet processes, one for each group, where the wellknown clustering property of the Dirichlet process provides a nonparametric prior for the number of mixture components within each group. Given our desire to tie the mixture models in the various groups, we consider a hierarchical model, specifically one in which the base measure for the child Dirichlet processes is itself distributed according to a Dirichlet process. Such a base measure being discrete, the child Dirichlet processes necessarily share atoms. Thus, as desired, the mixture models in the different groups necessarily share mixture components. We discuss representations of hierarchical Dirichlet processes in terms of a stickbreaking process, and a generalization of the Chinese restaurant process that we refer to as the "Chinese restaurant franchise." We present Markov chain Monte Carlo algorithms for posterior inference in hierarchical Dirichlet process mixtures and describe applications to problems in information retrieval and text modeling.
Syntactic Topic Models
"... We develop the syntactic topic model (STM), a nonparametric Bayesian model of parsed documents. The STM generates words that are both thematically and syntactically constrained, which combines the semantic insights of topic models with the syntactic information available from parse trees. Each word ..."
Abstract

Cited by 70 (11 self)
 Add to MetaCart
We develop the syntactic topic model (STM), a nonparametric Bayesian model of parsed documents. The STM generates words that are both thematically and syntactically constrained, which combines the semantic insights of topic models with the syntactic information available from parse trees. Each word of a sentence is generated by a distribution that combines documentspecific topic weights and parsetreespecific syntactic transitions. Words are assumed to be generated in an order that respects the parse tree. We derive an approximate posterior inference method based on variational methods for hierarchical Dirichlet processes, and we report qualitative and quantitative results on both synthetic data and handparsed documents. 1
Shared logistic normal distributions for soft parameter tying in unsupervised grammar induction
 In Proceedings of NAACLHLT 2009. Shay
, 2009
"... We present a family of priors over probabilistic grammar weights, called the shared logistic normal distribution. This family extends the partitioned logistic normal distribution, enabling factored covariance between the probabilities of different derivation events in the probabilistic grammar, prov ..."
Abstract

Cited by 67 (13 self)
 Add to MetaCart
(Show Context)
We present a family of priors over probabilistic grammar weights, called the shared logistic normal distribution. This family extends the partitioned logistic normal distribution, enabling factored covariance between the probabilities of different derivation events in the probabilistic grammar, providing a new way to encode prior knowledge about an unknown grammar. We describe a variational EM algorithm for learning a probabilistic grammar based on this family of priors. We then experiment with unsupervised dependency grammar induction and show significant improvements using our model for both monolingual learning and bilingual learning with a nonparallel, multilingual corpus. 1
Distance dependent Chinese restaurant processes
"... We develop the distance dependent Chinese restaurant process (CRP), a flexible class of distributions over partitions that allows for nonexchangeability. This class can be used to model dependencies between data in infinite clustering models, including dependencies across time or space. We examine t ..."
Abstract

Cited by 56 (5 self)
 Add to MetaCart
(Show Context)
We develop the distance dependent Chinese restaurant process (CRP), a flexible class of distributions over partitions that allows for nonexchangeability. This class can be used to model dependencies between data in infinite clustering models, including dependencies across time or space. We examine the properties of the distance dependent CRP, discuss its connections to Bayesian nonparametric mixture models, and derive a Gibbs sampler for both observed and mixture settings. We study its performance with timedependent models and three text corpora. We show that relaxing the assumption of exchangeability with distance dependent CRPs can provide a better fit to sequential data. We also show its alternative formulation of the traditional CRP leads to a fastermixing Gibbs sampling algorithm than the one based on the original formulation. 1.
Sampling alignment structure under a Bayesian translation model
 In Empirical Methods in Natural Language Processing (EMNLP
, 2008
"... We describe the first tractable Gibbs sampling procedure for estimating phrase pair frequencies under a probabilistic model of phrase alignment. We propose and evaluate two nonparametric priors that successfully avoid the degenerate behavior noted in previous work, where overly large phrases memoriz ..."
Abstract

Cited by 45 (4 self)
 Add to MetaCart
(Show Context)
We describe the first tractable Gibbs sampling procedure for estimating phrase pair frequencies under a probabilistic model of phrase alignment. We propose and evaluate two nonparametric priors that successfully avoid the degenerate behavior noted in previous work, where overly large phrases memorize the training data. Phrase table weights learned under our model yield an increase in BLEU score over the wordalignment based heuristic estimates used regularly in phrasebased translation systems. 1
Bayesian nonparametric estimators derived from conditional Gibbs structures
 J. PHYS. A: MATH. GEN
, 2008
"... We consider discrete nonparametric priors which induce Gibbstype exchangeable random partitions and investigate their posterior behavior in detail. In particular, we deduce conditional distributions and the corresponding Bayesian nonparametric estimators, which can be readily exploited for predictin ..."
Abstract

Cited by 32 (8 self)
 Add to MetaCart
(Show Context)
We consider discrete nonparametric priors which induce Gibbstype exchangeable random partitions and investigate their posterior behavior in detail. In particular, we deduce conditional distributions and the corresponding Bayesian nonparametric estimators, which can be readily exploited for predicting various features of additional samples. The results provide useful tools for genomic applications where prediction of future outcomes is required.
Discriminative Clustering by Regularized Information Maximization
"... Is there a principled way to learn a probabilistic discriminative classifier from an unlabeled data set? We present a framework that simultaneously clusters the data and trains a discriminative classifier. We call it Regularized Information Maximization (RIM). RIM optimizes an intuitive information ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
(Show Context)
Is there a principled way to learn a probabilistic discriminative classifier from an unlabeled data set? We present a framework that simultaneously clusters the data and trains a discriminative classifier. We call it Regularized Information Maximization (RIM). RIM optimizes an intuitive informationtheoretic objective function which balances class separation, class balance and classifier complexity. The approach can flexibly incorporate different likelihood functions, express prior assumptions about the relative size of different classes and incorporate partial labels for semisupervised learning. In particular, we instantiate the framework to unsupervised, multiclass kernelized logistic regression. Our empirical evaluation indicates that RIM outperforms existing methods on several real data sets, and demonstrates that RIM is an effective model selection method. 1
A Stochastic Memoizer for Sequence Data
"... We propose an unboundeddepth, hierarchical, Bayesian nonparametric model for discrete sequence data. This model can be estimated from a single training sequence, yet shares statistical strength between subsequent symbol predictive distributions in such a way that predictive performance generalizes ..."
Abstract

Cited by 25 (7 self)
 Add to MetaCart
(Show Context)
We propose an unboundeddepth, hierarchical, Bayesian nonparametric model for discrete sequence data. This model can be estimated from a single training sequence, yet shares statistical strength between subsequent symbol predictive distributions in such a way that predictive performance generalizes well. The model builds on a specific parameterization of an unboundeddepth hierarchical PitmanYor process. We introduce analytic marginalization steps (using coagulation operators) to reduce this model to one that can be represented in time and space linear in the length of the training sequence. We show how to perform inference in such a model without truncation approximation and introduce fragmentation operators necessary to do predictive inference. We demonstrate the sequence memoizer by using it as a language model, achieving stateoftheart results. 1.
Indian Buffet Processes with Powerlaw Behavior
"... The Indian buffet process (IBP) is an exchangeable distribution over binary matrices used in Bayesian nonparametric featural models. In this paper we propose a threeparameter generalization of the IBP exhibiting powerlaw behavior. We achieve this by generalizing the beta process (the de Finetti me ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
(Show Context)
The Indian buffet process (IBP) is an exchangeable distribution over binary matrices used in Bayesian nonparametric featural models. In this paper we propose a threeparameter generalization of the IBP exhibiting powerlaw behavior. We achieve this by generalizing the beta process (the de Finetti measure of the IBP) to the stablebeta process and deriving the IBP corresponding to it. We find interesting relationships between the stablebeta process and the PitmanYor process (another stochastic process used in Bayesian nonparametric models with interesting powerlaw properties). We derive a stickbreaking construction for the stablebeta process, and find that our powerlaw IBP is a good model for word occurrences in document corpora. 1
An Unsupervised Model for Joint Phrase Alignment and Extraction
"... We present an unsupervised model for joint phrase alignment and extraction using nonparametric Bayesian methods and inversion transduction grammars (ITGs). The key contribution is that phrases of many granularities are included directly in the model through the use of a novel formulation that memori ..."
Abstract

Cited by 23 (5 self)
 Add to MetaCart
(Show Context)
We present an unsupervised model for joint phrase alignment and extraction using nonparametric Bayesian methods and inversion transduction grammars (ITGs). The key contribution is that phrases of many granularities are included directly in the model through the use of a novel formulation that memorizes phrases generated not only by terminal, but also nonterminal symbols. This allows for a completely probabilistic model that is able to create a phrase table that achieves competitive accuracy on phrasebased machine translation tasks directly from unaligned sentence pairs. Experiments on several language pairs demonstrate that the proposed model matches the accuracy of traditional twostep word alignment/phrase extraction approach while reducing the phrase table to a fraction of the original size. 1