Results 1  10
of
58
A maximum entropy model of phonotactics and phonotactic learning
, 2006
"... The study of phonotactics (e.g., the ability of English speakers to distinguish possible words like blick from impossible words like *bnick) is a central topic in phonology. We propose a theory of phonotactic grammars and a learning algorithm that constructs such grammars from positive evidence. Our ..."
Abstract

Cited by 132 (15 self)
 Add to MetaCart
(Show Context)
The study of phonotactics (e.g., the ability of English speakers to distinguish possible words like blick from impossible words like *bnick) is a central topic in phonology. We propose a theory of phonotactic grammars and a learning algorithm that constructs such grammars from positive evidence. Our grammars consist of constraints that are assigned numerical weights according to the principle of maximum entropy. Possible words are assessed by these grammars based on the weighted sum of their constraint violations. The learning algorithm yields grammars that can capture both categorical and gradient phonotactic patterns. The algorithm is not provided with any constraints in advance, but uses its own resources to form constraints and weight them. A baseline model, in which Universal Grammar is reduced to a feature set and an SPEstyle constraint format, suffices to learn many phonotactic phenomena. In order to learn nonlocal phenomena such as stress and vowel harmony, it is necessary to augment the model with autosegmental tiers and metrical grids. Our results thus offer novel, learningtheoretic support for such representations. We apply the model to English syllable onsets, Shona vowel harmony, quantityinsensitive stress typology, and the full phonotactics of Wargamay, showing that the learned grammars capture the distributional generalizations of these languages and accurately predict the findings of a phonotactic experiment.
A bayesian framework for word segmentation: Exploring the effects of context
 In 46th Annual Meeting of the ACL
, 2009
"... Since the experiments of Saffran et al. (1996a), there has been a great deal of interest in the question of how statistical regularities in the speech stream might be used by infants to begin to identify individual words. In this work, we use computational modeling to explore the effects of differen ..."
Abstract

Cited by 110 (30 self)
 Add to MetaCart
Since the experiments of Saffran et al. (1996a), there has been a great deal of interest in the question of how statistical regularities in the speech stream might be used by infants to begin to identify individual words. In this work, we use computational modeling to explore the effects of different assumptions the learner might make regarding the nature of words – in particular, how these assumptions affect the kinds of words that are segmented from a corpus of transcribed childdirected speech. We develop several models within a Bayesian ideal observer framework, and use them to examine the consequences of assuming either that words are independent units, or units that help to predict other units. We show through empirical and theoretical results that the assumption of independence causes the learner to undersegment the corpus, with many two and threeword sequences (e.g. what’s that, do you, in the house) misidentified as individual words. In contrast, when the learner assumes that words are predictive, the resulting segmentation is far more accurate. These results indicate that taking context into account is important for a statistical word segmentation strategy to be successful, and raise the possibility that even young infants may be able to exploit more subtle statistical patterns than have usually been considered. 1
Syntactic Topic Models
"... We develop the syntactic topic model (STM), a nonparametric Bayesian model of parsed documents. The STM generates words that are both thematically and syntactically constrained, which combines the semantic insights of topic models with the syntactic information available from parse trees. Each word ..."
Abstract

Cited by 71 (11 self)
 Add to MetaCart
We develop the syntactic topic model (STM), a nonparametric Bayesian model of parsed documents. The STM generates words that are both thematically and syntactically constrained, which combines the semantic insights of topic models with the syntactic information available from parse trees. Each word of a sentence is generated by a distribution that combines documentspecific topic weights and parsetreespecific syntactic transitions. Words are assumed to be generated in an order that respects the parse tree. We derive an approximate posterior inference method based on variational methods for hierarchical Dirichlet processes, and we report qualitative and quantitative results on both synthetic data and handparsed documents. 1
Weighted Constraints in Generative Linguistics
 Cognitive Science
, 2009
"... Harmonic Grammar (HG) and Optimality Theory (OT) are closely related formal frameworks for the study of language. In both, the structure of a given language is determined by the relative strengths of a set of constraints. They differ in how these strengths are represented: as numerical weights (HG) ..."
Abstract

Cited by 21 (3 self)
 Add to MetaCart
Harmonic Grammar (HG) and Optimality Theory (OT) are closely related formal frameworks for the study of language. In both, the structure of a given language is determined by the relative strengths of a set of constraints. They differ in how these strengths are represented: as numerical weights (HG) or as ranks (OT). Weighted constraints have advantages for the construction of accounts of language learning and other cognitive processes, partly because they allow for the adaptation of connectionist and statistical models. HG has been little studied in generative linguistics, however, largely due to influential claims that weighted constraints make incorrect predictions about the typology of natural languages, predictions that are not shared by the more popular OT. This paper makes the case that HG is in fact a promising framework for typological research, and reviews and extends the existing arguments for weighted over ranked constraints. 1
Lexicalized phonotactic word segmentation
 Proceedings of the Association for Computational Linguistics (ACL
, 2008
"... This paper presents a new unsupervised algorithm (WordEnds) for inferring word boundaries from transcribed adult conversations. Phone ngrams before and after observed pauses are used to bootstrap a simple discriminative model of boundary marking. This fast algorithm delivers high performance even on ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
(Show Context)
This paper presents a new unsupervised algorithm (WordEnds) for inferring word boundaries from transcribed adult conversations. Phone ngrams before and after observed pauses are used to bootstrap a simple discriminative model of boundary marking. This fast algorithm delivers high performance even on morphologically complex words in English and Arabic, and promising results on accurate phonetic transcriptions with extensive pronunciation variation. Expanding training data beyond the traditional miniature datasets pushes performance numbers well above those previously reported. This suggests that WordEnds is a viable model of child language acquisition and might be useful in speech understanding. 1
Online learning mechanisms for bayesian models of word segmentation. Research on Language and Computation, 8(2), 107– 132. (special issue on computational models of language acquisition
 In Proceedings of the 34th
, 2011
"... © The Author(s) 2011. This article is published with open access at Springerlink.com Abstract In recent years, Bayesian models have become increasingly popular as a way of understanding human cognition. Ideal learner Bayesian models assume that cognition can be usefully understood as optimal behavio ..."
Abstract

Cited by 15 (7 self)
 Add to MetaCart
© The Author(s) 2011. This article is published with open access at Springerlink.com Abstract In recent years, Bayesian models have become increasingly popular as a way of understanding human cognition. Ideal learner Bayesian models assume that cognition can be usefully understood as optimal behavior under uncertainty, a hypothesis that has been supported by a number of modeling studies across various domains (e.g., Griffiths and Tenenbaum, Cognitive Psychology, 51, 354–384, 2005; Xu and Tenenbaum, Psychological Review, 114, 245–272, 2007). The models in these studies aim to explain why humans behave as they do given the task and data they encounter, but typically avoid some questions addressed by more traditional psychological models, such as how the observed behavior is produced given constraints on memory and processing. Here, we use the task of word segmentation as a case study for investigating these questions within a Bayesian framework. We consider some limitations of the infant learner, and develop several online learning algorithms that take these limitations into account. Each algorithm can be viewed as a different method of approximating the same ideal learner. When tested on corpora of English childdirected speech, we find that the constrained learner’s behavior depends nontrivially on how the learner’s limitations are implemented. Interestingly, sometimes biases that are helpful to an ideal learner hinder a constrained learner, and in a few cases, constrained learners perform equivalently or better than the ideal learner. This suggests that the transition from a computationallevel solution for acquisition to an algorithmiclevel one is not straightforward.
Gibbs Sampling for the Uninitiated
, 2009
"... VERSION 0.3 This document is intended for computer scientists who would like to try out a Markov Chain Monte Carlo (MCMC) technique, particularly in order to do inference with Bayesian models on problems related to text processing. We try to keep theory to the absolute minimum needed, and we work th ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
(Show Context)
VERSION 0.3 This document is intended for computer scientists who would like to try out a Markov Chain Monte Carlo (MCMC) technique, particularly in order to do inference with Bayesian models on problems related to text processing. We try to keep theory to the absolute minimum needed, and we work through the details much more explicitly than you usually see even in “introductory ” explanations. That means we’ve attempted to be ridiculously explicit in our exposition and notation. After providing the reasons and reasoning behind Gibbs sampling (and at least nodding our heads in the direction of theory), we work through two applications in detail. The first is the derivation of a Gibbs sampler for Naive Bayes models, which illustrates a simple case where the math works out very cleanly and it’s possible to “integrate out ” the model’s continuous parameters to build a more efficient algorithm. The second application derives the Gibbs sampler for a model that is similar to Naive Bayes, but which adds an additional latent variable. Having gone through the two examples, we discuss some practical implementation issues. We conclude with some pointers to literature that we’ve found to be somewhat more friendly to uninitiated readers. 1
Unsupervised and Constrained Dirichlet Process Mixture Models for Verb Clustering
"... In this work, we apply Dirichlet Process Mixture Models (DPMMs) to a learning task in natural language processing (NLP): lexicalsemantic verb clustering. We thoroughly evaluate a method of guiding DPMMs towards a particular clustering solution using pairwise constraints. The quantitative and quali ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
(Show Context)
In this work, we apply Dirichlet Process Mixture Models (DPMMs) to a learning task in natural language processing (NLP): lexicalsemantic verb clustering. We thoroughly evaluate a method of guiding DPMMs towards a particular clustering solution using pairwise constraints. The quantitative and qualitative evaluation performed highlights the benefits of both standard and constrained DPMMs compared to previously used approaches. In addition, it sheds light on the use of evaluation measures and their practical application. 1
Covariance in Unsupervised Learning of Probabilistic Grammars
"... Probabilistic grammars offer great flexibility in modeling discrete sequential data like natural language text. Their symbolic component is amenable to inspection by humans, while their probabilistic component helps resolve ambiguity. They also permit the use of wellunderstood, generalpurpose learn ..."
Abstract

Cited by 13 (5 self)
 Add to MetaCart
(Show Context)
Probabilistic grammars offer great flexibility in modeling discrete sequential data like natural language text. Their symbolic component is amenable to inspection by humans, while their probabilistic component helps resolve ambiguity. They also permit the use of wellunderstood, generalpurpose learning algorithms. There has been an increased interest in using probabilistic grammars in the Bayesian setting. To date, most of the literature has focused on using a Dirichlet prior. The Dirichlet prior has several limitations, including that it cannot directly model covariance between the probabilistic grammar’s parameters. Yet, various grammar parameters are expected to be correlated because the elements in language they represent share linguistic properties. In this paper, we suggest an alternative to the Dirichlet prior, a family of logistic normal distributions. We derive an inference algorithm for this family of distributions and experiment with the task of dependency grammar induction, demonstrating performance improvements with our priors on a set of six treebanks in different natural languages. Our covariance framework permits soft parameter tying within grammars and across grammars for text in different languages, and we show empirical gains in a novel learning setting using bilingual, nonparallel data.
Modeling the contribution of phonotactic cues to the problem of
, 2009
"... doi:10.1017/S030500090999050X ..."
(Show Context)