Results 1  10
of
114
Church: A language for generative models
 In UAI
, 2008
"... Formal languages for probabilistic modeling enable reuse, modularity, and descriptive clarity, and can foster generic inference techniques. We introduce Church, a universal language for describing stochastic generative processes. Church is based on the Lisp model of lambda calculus, containing a pu ..."
Abstract

Cited by 129 (26 self)
 Add to MetaCart
(Show Context)
Formal languages for probabilistic modeling enable reuse, modularity, and descriptive clarity, and can foster generic inference techniques. We introduce Church, a universal language for describing stochastic generative processes. Church is based on the Lisp model of lambda calculus, containing a pure Lisp as its deterministic subset. The semantics of Church is defined in terms of evaluation histories and conditional distributions on such histories. Church also includes a novel language construct, the stochastic memoizer, which enables simple description of many complex nonparametric models. We illustrate language features through several examples, including: a generalized Bayes net in which parameters cluster over trials, infinite PCFGs, planning by inference, and various nonparametric clustering models. Finally, we show how to implement query on any Church program, exactly and approximately, using Monte Carlo techniques. 1
The nested chinese restaurant process and bayesian inference of topic hierarchies
, 2007
"... We present the nested Chinese restaurant process (nCRP), a stochastic process which assigns probability distributions to infinitelydeep, infinitelybranching trees. We show how this stochastic process can be used as a prior distribution in a Bayesian nonparametric model of document collections. Spe ..."
Abstract

Cited by 123 (15 self)
 Add to MetaCart
(Show Context)
We present the nested Chinese restaurant process (nCRP), a stochastic process which assigns probability distributions to infinitelydeep, infinitelybranching trees. We show how this stochastic process can be used as a prior distribution in a Bayesian nonparametric model of document collections. Specifically, we present an application to information retrieval in which documents are modeled as paths down a random tree, and the preferential attachment dynamics of the nCRP leads to clustering of documents according to sharing of topics at multiple levels of abstraction. Given a corpus of documents, a posterior inference algorithm finds an approximation to a posterior distribution over trees, topics and allocations of words to levels of the tree. We demonstrate this algorithm on collections of scientific abstracts from several journals. This model exemplifies a recent trend in statistical machine learning—the use of Bayesian nonparametric methods to infer distributions on flexible data structures.
A bayesian framework for word segmentation: Exploring the effects of context
 In 46th Annual Meeting of the ACL
, 2009
"... Since the experiments of Saffran et al. (1996a), there has been a great deal of interest in the question of how statistical regularities in the speech stream might be used by infants to begin to identify individual words. In this work, we use computational modeling to explore the effects of differen ..."
Abstract

Cited by 110 (30 self)
 Add to MetaCart
Since the experiments of Saffran et al. (1996a), there has been a great deal of interest in the question of how statistical regularities in the speech stream might be used by infants to begin to identify individual words. In this work, we use computational modeling to explore the effects of different assumptions the learner might make regarding the nature of words – in particular, how these assumptions affect the kinds of words that are segmented from a corpus of transcribed childdirected speech. We develop several models within a Bayesian ideal observer framework, and use them to examine the consequences of assuming either that words are independent units, or units that help to predict other units. We show through empirical and theoretical results that the assumption of independence causes the learner to undersegment the corpus, with many two and threeword sequences (e.g. what’s that, do you, in the house) misidentified as individual words. In contrast, when the learner assumes that words are predictive, the resulting segmentation is far more accurate. These results indicate that taking context into account is important for a statistical word segmentation strategy to be successful, and raise the possibility that even young infants may be able to exploit more subtle statistical patterns than have usually been considered. 1
Syntactic Topic Models
"... We develop the syntactic topic model (STM), a nonparametric Bayesian model of parsed documents. The STM generates words that are both thematically and syntactically constrained, which combines the semantic insights of topic models with the syntactic information available from parse trees. Each word ..."
Abstract

Cited by 71 (11 self)
 Add to MetaCart
We develop the syntactic topic model (STM), a nonparametric Bayesian model of parsed documents. The STM generates words that are both thematically and syntactically constrained, which combines the semantic insights of topic models with the syntactic information available from parse trees. Each word of a sentence is generated by a distribution that combines documentspecific topic weights and parsetreespecific syntactic transitions. Words are assumed to be generated in an order that respects the parse tree. We derive an approximate posterior inference method based on variational methods for hierarchical Dirichlet processes, and we report qualitative and quantitative results on both synthetic data and handparsed documents. 1
Shared logistic normal distributions for soft parameter tying in unsupervised grammar induction
 In Proceedings of NAACLHLT 2009. Shay
, 2009
"... We present a family of priors over probabilistic grammar weights, called the shared logistic normal distribution. This family extends the partitioned logistic normal distribution, enabling factored covariance between the probabilities of different derivation events in the probabilistic grammar, prov ..."
Abstract

Cited by 65 (11 self)
 Add to MetaCart
(Show Context)
We present a family of priors over probabilistic grammar weights, called the shared logistic normal distribution. This family extends the partitioned logistic normal distribution, enabling factored covariance between the probabilities of different derivation events in the probabilistic grammar, providing a new way to encode prior knowledge about an unknown grammar. We describe a variational EM algorithm for learning a probabilistic grammar based on this family of priors. We then experiment with unsupervised dependency grammar induction and show significant improvements using our model for both monolingual learning and bilingual learning with a nonparallel, multilingual corpus. 1
Collapsed Variational Inference for HDP
"... A wide variety of Dirichletmultinomial ‘topic ’ models have found interesting applications in recent years. While Gibbs sampling remains an important method of inference in such models, variational techniques have certain advantages such as easy assessment of convergence, easy optimization without ..."
Abstract

Cited by 57 (1 self)
 Add to MetaCart
(Show Context)
A wide variety of Dirichletmultinomial ‘topic ’ models have found interesting applications in recent years. While Gibbs sampling remains an important method of inference in such models, variational techniques have certain advantages such as easy assessment of convergence, easy optimization without the need to maintain detailed balance, a bound on the marginal likelihood, and sidestepping of issues with topicidentifiability. The most accurate variational technique thus far, namely collapsed variational latent Dirichlet allocation, did not deal with model selection nor did it include inference for hyperparameters. We address both issues by generalizing the technique, obtaining the first variational algorithm to deal with the hierarchical Dirichlet process and to deal with hyperparameters of Dirichlet variables. Experiments show a significant improvement in accuracy. 1
Online Variational Inference for the Hierarchical Dirichlet Process
"... The hierarchical Dirichlet process (HDP) is a Bayesian nonparametric model that can be used to model mixedmembership data with a potentially infinite number of components. It has been applied widely in probabilistic topic modeling, where the data are documents and the components are distributions o ..."
Abstract

Cited by 57 (7 self)
 Add to MetaCart
The hierarchical Dirichlet process (HDP) is a Bayesian nonparametric model that can be used to model mixedmembership data with a potentially infinite number of components. It has been applied widely in probabilistic topic modeling, where the data are documents and the components are distributions of terms that reflect recurring patterns (or “topics”) in the collection. Given a document collection, posterior inference is used to determine the number of topics needed and to characterize their distributions. One limitation of HDP analysis is that existing posterior inference algorithms require multiple passes through all the data—these algorithms are intractable for very large scale applications. We propose an online variational inference algorithm for the HDP, an algorithm that is easily applicable to massive and streaming data. Our algorithm is significantly faster than traditional inference algorithms for the HDP, and lets us analyze much larger data sets. We illustrate the approach on two large collections of text, showing improved performance over online LDA, the finite counterpart to the HDP topic model. 1
Using Universal Linguistic Knowledge to Guide Grammar Induction
"... We present an approach to grammar induction that utilizes syntactic universals to improve dependency parsing across a range of languages. Our method uses a single set of manuallyspecified languageindependent rules that identify syntactic dependencies between pairs of syntactic categories that comm ..."
Abstract

Cited by 56 (7 self)
 Add to MetaCart
We present an approach to grammar induction that utilizes syntactic universals to improve dependency parsing across a range of languages. Our method uses a single set of manuallyspecified languageindependent rules that identify syntactic dependencies between pairs of syntactic categories that commonly occur across languages. During inference of the probabilistic model, we use posterior expectation constraints to require that a minimum proportion of the dependencies we infer be instances of these rules. We also automatically refine the syntactic categories given in our coarsely tagged input. Across six languages our approach outperforms stateoftheart unsupervised methods by a significant margin. 1 1
Discriminative LogLinear Grammars with Latent Variables
 Advances in Neural Information Processing Systems 20 (NIPS
, 2008
"... We demonstrate that loglinear grammars with latent variables can be practically trained using discriminative methods. Central to efficient discriminative training is a hierarchical pruning procedure which allows feature expectations to be efficiently approximated in a gradientbased procedure. We ..."
Abstract

Cited by 44 (6 self)
 Add to MetaCart
(Show Context)
We demonstrate that loglinear grammars with latent variables can be practically trained using discriminative methods. Central to efficient discriminative training is a hierarchical pruning procedure which allows feature expectations to be efficiently approximated in a gradientbased procedure. We compare L1 and L2 regularization and show that L1 regularization is superior, requiring fewer iterations to converge, and yielding sparser solutions. On fullscale treebank parsing experiments, the discriminative latent models outperform both the comparable generative latent models as well as the discriminative nonlatent baselines. 1
A latent variable model for generative dependency parsing
 In Proceedings of IWPT
, 2007
"... We propose a generative dependency parsing model which uses binary latent variables to induce conditioning features. To define this model we use a recently proposed class of Bayesian Networks for structured prediction, Incremental Sigmoid Belief Networks. We demonstrate that the proposed model achie ..."
Abstract

Cited by 43 (8 self)
 Add to MetaCart
(Show Context)
We propose a generative dependency parsing model which uses binary latent variables to induce conditioning features. To define this model we use a recently proposed class of Bayesian Networks for structured prediction, Incremental Sigmoid Belief Networks. We demonstrate that the proposed model achieves stateoftheart results on three different languages. We also demonstrate that the features induced by the ISBN’s latent variables are crucial to this success, and show that the proposed model is particularly good on long dependencies. 1