• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Corpusbased induction of syntactic structure: Models of dependency and constituency. (2004)

by Dan Klein, Christopher D Manning
Venue:In ACL,
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 229
Next 10 →

Alignment by agreement

by Percy Liang, et al. , 2006
"... We present an unsupervised approach to symmetric word alignment in which two simple asymmetric models are trained jointly to maximize a combination of data likelihood and agreement between the models. Compared to the standard practice of intersecting predictions of independently-trained models, join ..."
Abstract - Cited by 216 (22 self) - Add to MetaCart
We present an unsupervised approach to symmetric word alignment in which two simple asymmetric models are trained jointly to maximize a combination of data likelihood and agreement between the models. Compared to the standard practice of intersecting predictions of independently-trained models, joint training provides a 32 % reduction in AER. Moreover, a simple and efficient pair of HMM aligners provides a 29 % reduction in AER over symmetrized IBM model 4 predictions.

Posterior regularization for structured latent variable models

by Kuzman Ganchev, João Graça, Lf Inesc-id, Jennifer Gillenwater, Ben Taskar - Journal of Machine Learning Research , 2010
"... We present posterior regularization, a probabilistic framework for structured, weakly supervised learning. Our framework efficiently incorporates indirect supervision via constraints on posterior distributions of probabilistic models with latent variables. Posterior regularization separates model co ..."
Abstract - Cited by 138 (8 self) - Add to MetaCart
We present posterior regularization, a probabilistic framework for structured, weakly supervised learning. Our framework efficiently incorporates indirect supervision via constraints on posterior distributions of probabilistic models with latent variables. Posterior regularization separates model complexity from the complexity of structural constraints it is desired to satisfy. By directly imposing decomposable regularization on the posterior moments of latent variables during learning, we retain the computational efficiency of the unconstrained model while ensuring desired constraints hold in expectation. We present an efficient algorithm for learning with posterior regularization and illustrate its versatility on a diverse set of structural constraints such as bijectivity, symmetry and group sparsity in several large scale experiments, including multi-view learning, cross-lingual dependency grammar induction, unsupervised part-of-speech induction,
(Show Context)

Citation Context

...ide algorithm (Lee and Choi, 1997) performs this computation. Viterbi decoding is done using Eisner’s algorithm (Eisner, 1996). We also used a generative model based on dependency model with valence (=-=Klein and Manning, 2004-=-). Under this model, the probability of a particular parse y and a sentence with part-of-speech tags x is given by y∈y pθ(y, x) = proot(r(x)) · ( ∏ ) p¬stop(yp, yd, vy) pchild(yp, yd, yc) · y∈y ( ∏ x∈...

Generalized expectation criteria for semi-supervised learning of conditional random fields

by Gideon S. Mann, Andrew Mccallum - In In Proc. ACL, pages 870 – 878 , 2008
"... This paper presents a semi-supervised training method for linear-chain conditional random fields that makes use of labeled features rather than labeled instances. This is accomplished by using generalized expectation criteria to express a preference for parameter settings in which the model’s distri ..."
Abstract - Cited by 108 (11 self) - Add to MetaCart
This paper presents a semi-supervised training method for linear-chain conditional random fields that makes use of labeled features rather than labeled instances. This is accomplished by using generalized expectation criteria to express a preference for parameter settings in which the model’s distribution on unlabeled data matches a target distribution. We induce target conditional probability distributions of labels given features from both annotated feature occurrences in context and adhoc feature majority label assignment. The use of generalized expectation criteria allows for a dramatic reduction in annotation time by shifting from traditional instance-labeling to feature-labeling, and the methods presented outperform traditional CRF training and other semi-supervised methods when limited human effort is available. 1
(Show Context)

Citation Context

...98) which presents a naïve Bayes model for text classification trained using EM and semisupervised data. EM has also been applied to structured classification problems such as part-of-speech tagging (=-=Klein and Manning, 2004-=-), where EM can succeed after very careful and clever initialization. While these models can often be very effective, especially when used with “prototypes” (Haghighi and Klein, 2006b), they cannot ef...

Spectral learning

by Sepandar D. Kamvar, Dan Klein, Christopher D. Manning - In IJCAI , 2003
"... We present a simple, easily implemented spectral learning algorithm which applies equally whether we have no supervisory information, pairwise link constraints, or labeled examples. In the unsupervised case, it performs consistently with other spectral clustering algorithms. In the supervised case, ..."
Abstract - Cited by 106 (6 self) - Add to MetaCart
We present a simple, easily implemented spectral learning algorithm which applies equally whether we have no supervisory information, pairwise link constraints, or labeled examples. In the unsupervised case, it performs consistently with other spectral clustering algorithms. In the supervised case, our approach achieves high accuracy on the categorization of thousands of documents given only a few dozen labeled training documents for the 20 Newsgroups data set. Furthermore, its classification accuracy increases with the addition of unlabeled documents, demonstrating effective use of unlabeled data. By using normalized affinity matrices which are both symmetric and stochastic, we also obtain both a probabilistic interpretation of our method and certain guarantees of performance. 1

Unsupervised Learning of Natural Languages

by Zach Solan , 2006
"... ..."
Abstract - Cited by 100 (12 self) - Add to MetaCart
Abstract not found

Painless Unsupervised Learning with Features

by Taylor Berg-kirkpatrick, Alexandre Bouchard-côté, John Denero, Dan Klein
"... We show how features can easily be added to standard generative models for unsupervised learning, without requiring complex new training methods. In particular, each component multinomial of a generative model can be turned into a miniature logistic regression model if feature locality permits. The ..."
Abstract - Cited by 98 (3 self) - Add to MetaCart
We show how features can easily be added to standard generative models for unsupervised learning, without requiring complex new training methods. In particular, each component multinomial of a generative model can be turned into a miniature logistic regression model if feature locality permits. The intuitive EM algorithm still applies, but with a gradient-based M-step familiar from discriminative training of logistic regression models. We apply this technique to part-of-speech induction, grammar induction, word alignment, and word segmentation, incorporating a few linguistically-motivated features into the standard generative model for each task. These feature-enhanced models each outperform their basic counterparts by a substantial margin, and even compete with and surpass more complex state-of-the-art models. 1
(Show Context)

Citation Context

...supervised POS tagging. 3 5 Grammar Induction We next apply our technique to a grammar induction task: the unsupervised learning of dependency parse trees via the dependency model with valence (DMV) (=-=Klein and Manning, 2004-=-). A dependency parse is a directed tree over tokens in a sentence. Each edge of the tree specifies a directed dependency from a head token to a dependent, or argument token. Thus, the number of depen...

A universal part-of-speech tagset

by Slav Petrov, Dipanjan Das, Ryan McDonald - IN ARXIV:1104.2086 , 2011
"... To facilitate future research in unsupervised induction of syntactic structure and to standardize best-practices, we propose a tagset that consists of twelve universal part-of-speech categories. In addition to the tagset, we develop a mapping from 25 different treebank tagsets to this universal set. ..."
Abstract - Cited by 82 (13 self) - Add to MetaCart
To facilitate future research in unsupervised induction of syntactic structure and to standardize best-practices, we propose a tagset that consists of twelve universal part-of-speech categories. In addition to the tagset, we develop a mapping from 25 different treebank tagsets to this universal set. As a result, when combined with the original treebank data, this universal tagset and mapping produce a dataset consisting of common parts-of-speech for 22 different languages. We highlight the use of this resource via three experiments, that (1) compare tagging accuracies across languages, (2) present an unsupervised grammar induction approach that does not use gold standard part-of-speech tags, and (3) use the universal tags to transfer dependency parsers between languages, achieving state-of-the-art results.
(Show Context)

Citation Context

...a grammar induction experiment. To decouple the challenges of POS tagging and parsing, golden POS tags are typically assumed in unsupervised grammar induction experiments (Carroll and Charniak, 1992; =-=Klein and Manning, 2004-=-). 4 We propose to remove this unrealistic simplification by using POS tags automatically projected from English as the basis of a grammar induction model. Das and Petrov (2011) describe a cross-lingu...

Probabilistic models of language processing and acquisition

by Nick Chater , Christopher D Manning - Trends in Cognitive Science , 2006
"... Probabilistic methods are providing new explanatory approaches to fundamental cognitive science questions of how humans structure, process and acquire language. This review examines probabilistic models defined over traditional symbolic structures. Language comprehension and production involve prob ..."
Abstract - Cited by 71 (5 self) - Add to MetaCart
Probabilistic methods are providing new explanatory approaches to fundamental cognitive science questions of how humans structure, process and acquire language. This review examines probabilistic models defined over traditional symbolic structures. Language comprehension and production involve probabilistic inference in such models; and acquisition involves choosing the best model, given innate constraints and linguistic and other input. Probabilistic models can account for the learning and processing of language, while maintaining the sophistication of symbolic models. A recent burgeoning of theoretical developments and online corpus creation has enabled large models to be tested, revealing probabilistic constraints in processing, undermining acquisition arguments based on a perceived poverty of the stimulus, and suggesting fruitful links with probabilistic theories of categorization and ambiguity resolution in perception.
(Show Context)

Citation Context

...ompetence; assign probability to performance [1] Revisionist: Probabilistic versus rigid linguistic rules Status of rules / subrules / exceptions in morphology [7,14] Gradedness of grammaticality judgements [11,12] To restrict linguistics to core competence grammar, where intuitions are clear [35]. Probabilistic models of cognitive processes Language processing Stochastic phrase-structure grammars and related methods [29] Assume that structural principles guide processing, e.g. minimal attachment [18] Connectionist models [42] Language acquisition Probabilistic algorithms for grammar learning [46,47] Trigger-based acquisition models [54] Theoretical learnability results [38,39] Identification in the limit [36] Bayesian word learning [17] Review TRENDS in Cognitive Sciences Vol.10 No.7 July 2006336past instances and not via the construction of a model of the language [15]). Moreover, for reasons of space, we shall focus mainly on parsing and learning grammar, rather than, for example, exploring probabilistic models of how words are recognized [16] or learned [17]. We will see that a probabilistic perspective adds to, but also substantially modifies, current theories of the rules, represent...

Shared logistic normal distributions for soft parameter tying in unsupervised grammar induction

by Shay B. Cohen, Noah A. Smith - In Proceedings of NAACL-HLT 2009. Shay , 2009
"... We present a family of priors over probabilistic grammar weights, called the shared logistic normal distribution. This family extends the partitioned logistic normal distribution, enabling factored covariance between the probabilities of different derivation events in the probabilistic grammar, prov ..."
Abstract - Cited by 67 (13 self) - Add to MetaCart
We present a family of priors over probabilistic grammar weights, called the shared logistic normal distribution. This family extends the partitioned logistic normal distribution, enabling factored covariance between the probabilities of different derivation events in the probabilistic grammar, providing a new way to encode prior knowledge about an unknown grammar. We describe a variational EM algorithm for learning a probabilistic grammar based on this family of priors. We then experiment with unsupervised dependency grammar induction and show significant improvements using our model for both monolingual learning and bilingual learning with a non-parallel, multilingual corpus. 1
(Show Context)

Citation Context

...escribed above. The attachment accuracy for this set of experiments is described in Table 1. The baselines include right attachment (where each word is attached to the word to its right), MLE via EM (=-=Klein and Manning, 2004-=-), and empirical Bayes with Dirichlet and LN priors (Cohen et al., 2008). We also include a “ceiling” (DMV trained using supervised MLE from the training sentences’ trees). For English, we see that ty...

Using Universal Linguistic Knowledge to Guide Grammar Induction

by Tahira Naseem, Harr Chen, Regina Barzilay, Mark Johnson
"... We present an approach to grammar induction that utilizes syntactic universals to improve dependency parsing across a range of languages. Our method uses a single set of manually-specified language-independent rules that identify syntactic dependencies between pairs of syntactic categories that comm ..."
Abstract - Cited by 57 (7 self) - Add to MetaCart
We present an approach to grammar induction that utilizes syntactic universals to improve dependency parsing across a range of languages. Our method uses a single set of manually-specified language-independent rules that identify syntactic dependencies between pairs of syntactic categories that commonly occur across languages. During inference of the probabilistic model, we use posterior expectation constraints to require that a minimum proportion of the dependencies we infer be instances of these rules. We also automatically refine the syntactic categories given in our coarsely tagged input. Across six languages our approach outperforms state-of-theart unsupervised methods by a significant margin. 1 1
(Show Context)

Citation Context

...tions are drawn from a symmetric Dirichlet prior. Generating the Tree Structure We now consider how the structure of the tree arises. We follow an approach similar to the widely-referenced DMV model (=-=Klein and Manning, 2004-=-), which forms the basis of the current state-of-the-art unsupervised grammar induction model (Headden III et al., 2009). After a node is drawn we generate children on each side until we produce a des...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University