• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Learning to parse natural language with maximum entropy models”. (1999)

by Adwait Ratnaparkhi
Venue:Machine Learning,
Add To MetaCart

Tools

Sorted by:
Results 11 - 20 of 191
Next 10 →

Combining lexical, syntactic and semantic features with Maximum Entropy models for extracting relations

by Nanda Kambhatla - Proceedings of ACL’04
"... Extracting semantic relationships between entities is challenging because of a paucity of annotated data and the errors induced by entity detection modules. We employ Maximum Entropy models to combine diverse lexical, syntactic and semantic features derived from the text. Our system obtained competi ..."
Abstract - Cited by 89 (0 self) - Add to MetaCart
Extracting semantic relationships between entities is challenging because of a paucity of annotated data and the errors induced by entity detection modules. We employ Maximum Entropy models to combine diverse lexical, syntactic and semantic features derived from the text. Our system obtained competitive results in the Automatic Content Extraction (ACE) evaluation. Here we present our general approach and describe our ACE results. 1
(Show Context)

Citation Context

...l the syntactic features are derived from the syntactic parse tree and the dependency tree that we compute using a statistical parser trained on the PennTree Bank using the Maximum Entropy framework (=-=Ratnaparkhi, 1999-=-). The feature streams are: Words The words of both the mentions and all the words in between. Entity Type The entity type (one of PERSON, ORGANIZATION, LOCATION, FACILITY, Geo-Political Entity or GPE...

2006b. Reranking and self-training for parser adaptation

by David Mcclosky, Eugene Charniak, Mark Johnson - ACL-COLING
"... Statistical parsers trained and tested on the Penn Wall Street Journal (WSJ) treebank have shown vast improvements over the last 10 years. Much of this improvement, however, is based upon an ever-increasing number of features to be trained on (typically) the WSJ treebank data. This has led to concer ..."
Abstract - Cited by 89 (2 self) - Add to MetaCart
Statistical parsers trained and tested on the Penn Wall Street Journal (WSJ) treebank have shown vast improvements over the last 10 years. Much of this improvement, however, is based upon an ever-increasing number of features to be trained on (typically) the WSJ treebank data. This has led to concern that such parsers may be too finely tuned to this corpus at the expense of portability to other genres. Such worries have merit. The standard “Charniak parser ” checks in at a labeled precisionrecall f-measure of 89.7 % on the Penn WSJ test set, but only 82.9 % on the test set from the Brown treebank corpus. This paper should allay these fears. In particular, we show that the reranking parser described in Charniak and Johnson (2005) improves performance of the parser on Brown to 85.2%. Furthermore, use of the self-training techniques described in (Mc-Closky et al., 2006) raise this to 87.8% (an error reduction of 28%) again without any use of labeled Brown data. This is remarkable since training the parser and reranker on labeled Brown data achieves only 88.4%. 1
(Show Context)

Citation Context

...al data and find that the use of out-of-domain trees and in-domain vocabulary information can considerably improve performance. However, the work which is most directly comparable to ours is that of (=-=Ratnaparkhi, 1999-=-; Hwa, 1999; Gildea, 2001; Bacchiani et al., 2006). All of these papers look at what happens to modern WSJ-trained statistical parsers (Ratnaparkhi’s, Collins’, Gildea’s and Roark’s, respectively) as ...

Parsing biomedical literature

by Matthew Lease, Eugene Charniak - In Proceedings of the Second International Joint Conference on Natural Language Processing (IJCNLP-05), Jeju Island, Korea , 2005
"... Abstract. We present a preliminary study of several parser adaptation techniques evaluated on the GENIA corpus of MEDLINE abstracts [1, 2]. We begin by observing that the Penn Treebank (PTB) is lexically impoverished when measured on various genres of scientific and technical writing, and that this ..."
Abstract - Cited by 71 (2 self) - Add to MetaCart
Abstract. We present a preliminary study of several parser adaptation techniques evaluated on the GENIA corpus of MEDLINE abstracts [1, 2]. We begin by observing that the Penn Treebank (PTB) is lexically impoverished when measured on various genres of scientific and technical writing, and that this significantly impacts parse accuracy. To resolve this without requiring in-domain treebank data, we show how existing domain-specific lexical resources may be leveraged to augment PTB-training: part-of-speech tags, dictionary collocations, and namedentities. Using a state-of-the-art statistical parser [3] as our baseline, our lexically-adapted parser achieves a 14.2 % reduction in error. With oracleknowledge of named-entities, this error reduction improves to 21.2%. 1
(Show Context)

Citation Context

...ining data in order to mitigate the need for domain-specific training examples. These issues have been most notably explored in parser adaptation studies conducted between PTB’s WSJ and Brown corpora =-=[6,7,8,9]-=-. As part of our own exploration of these issues, we have been investigating statistical parser adaptation to a novel domain: biomedical literature. This literature presents a stark contrast to WSJ an...

Edit Detection and Parsing for Transcribed Speech

by Eugene Charniak, Mark Johnson - In Proc. NAACL , 2001
"... We present a simple architecture for parsing transcribed speech in which an edited-word detector first removes such words from the sentence string, and then a standard statistical parser trained on transcribed speech parses the remaining words. The edit detector achieves a misclassification rate on ..."
Abstract - Cited by 67 (6 self) - Add to MetaCart
We present a simple architecture for parsing transcribed speech in which an edited-word detector first removes such words from the sentence string, and then a standard statistical parser trained on transcribed speech parses the remaining words. The edit detector achieves a misclassification rate on edited words of 2.2%. (The NULL-model, which marks everything as not edited, has an error rate of 5.9%.) To evaluate our parsing results we introduce a new evaluation metric, the purpose of which is to make evaluation of a parse tree relatively indi#erent to the exact tree position of EDITED nodes. By this metric the parser achieves 85.3% precision and 86.5% recall.

Learning to classify email into speech acts

by William W. Cohen, Vitor R. Carvalho, Tom M. Mitchell - In Proceedings of Empirical Methods in Natural Language Processing , 2004
"... It is often useful to classify email according to the intent of the sender (e.g., "propose a meeting", "deliver information"). We present experimental results in learning to classify email in this fashion, where each class corresponds to a verb-noun pair taken from a predefined o ..."
Abstract - Cited by 62 (8 self) - Add to MetaCart
It is often useful to classify email according to the intent of the sender (e.g., "propose a meeting", "deliver information"). We present experimental results in learning to classify email in this fashion, where each class corresponds to a verb-noun pair taken from a predefined ontology describing typical “email speech acts”. We demonstrate that, although this categorization problem is quite different from “topical ” text classification, certain categories of messages can nonetheless be detected with high precision (above 80%) and reasonable recall (above 50%) using existing text-classification learning methods. This result suggests that useful tasktracking tools could be constructed based on automatic classification into this taxonomy. 1
(Show Context)

Citation Context

...multaneously to a message, leading to a thread structure which is a tree, rather than a sequence. Finally, most sequential learning models assume a single category is assigned to each instance—e.g., (=-=Ratnaparkhi, 1999-=-)—whereas our scheme allows multiple categories. Classification of emails according to our verb-noun ontology constitutes a special case of a general family of learning problems we might call factored...

Probabilistic Syntax

by Christopher D. Manning , 2002
"... istic methods for syntax, just as for a long time McCarthy and Hayes (1969) discouraged exploration of probabilistic methods in Artificial Intelligence. Among his arguments were that: (i) Probabilistic models wrongly mix in world knowledge (New York occurs more in text than Dayton, Ohio, but for no ..."
Abstract - Cited by 55 (2 self) - Add to MetaCart
istic methods for syntax, just as for a long time McCarthy and Hayes (1969) discouraged exploration of probabilistic methods in Artificial Intelligence. Among his arguments were that: (i) Probabilistic models wrongly mix in world knowledge (New York occurs more in text than Dayton, Ohio, but for no linguistic reason), (ii) Probabilistic models don't model grammaticality (neither Colorless green ideas sleep furiously nor Furiously sleep ideas green colorless have previously been uttered -- and hence must be estimated to have probability zero, Chomsky wrongly assumes -- but the former is grammatical while the latter is not, and (iii) Use of probabilities does not meet the goal of describing the mind-internal I-language as opposed to the observed-in-the-world E-language. This chapter is not meant to be a detailed critique of Chomsky's arguments -- Abney (1996) provides a survey and a rebuttal, and Pereira (2000) has further useful discussion -- but some of these concerns are still importa

Discriminative Training of a Neural Network Statistical Parser

by James Henderson , 2004
"... Discriminative methods have shown significant improvements over traditional generative methods in many machine learning applications, but there has been difficulty in extending them to natural language parsing. One problem is that much of the work on discriminative methods conflates changes to the l ..."
Abstract - Cited by 53 (8 self) - Add to MetaCart
Discriminative methods have shown significant improvements over traditional generative methods in many machine learning applications, but there has been difficulty in extending them to natural language parsing. One problem is that much of the work on discriminative methods conflates changes to the learning method with changes to the parameterization of the problem. We show how a parser can be trained with a discriminative learning method while still parameterizing the problem according to a generative probability model. We present three methods for training a neural network to estimate the probabilities for a statistical parser, one generative, one discriminative, and one where the probability model is generative but the training criteria is discriminative. The latter model outperforms the previous two, achieving state-ofthe-art levels of performance (90.1 % F-measure on constituents).

Investigating GIS and smoothing for maximum entropy taggers

by James R. Curran, Stephen Clark - In Proceedings of the 10th Meeting of the EACL , 2003
"... This paper investigates two elements of Maximum Entropy tagging: the use of a correction feature in the Generalised Iterative Scaling (GIS) estimation algorithm, and techniques for model smoothing. We show analytically and empirically that the correction feature, assumed to be required for the corre ..."
Abstract - Cited by 52 (11 self) - Add to MetaCart
This paper investigates two elements of Maximum Entropy tagging: the use of a correction feature in the Generalised Iterative Scaling (GIS) estimation algorithm, and techniques for model smoothing. We show analytically and empirically that the correction feature, assumed to be required for the correctness of GIS, is unnecessary. We also explore the use of a Gaussian prior and a simple cutoff for smoothing. The experiments are performed with two tagsets: the standard Penn Treebank POS tagset and the larger set of lexical types from Combinatory Categorial Grammar. 1
(Show Context)

Citation Context

...rial Grammar. 1 Introduction The use of maximum entropy (ME) models has become popular in Statistical NLP; some example applications include part-of-speech (Pos) tagging (Ratnaparkhi, 1996), parsing (=-=Ratnaparkhi, 1999-=-; Johnson et al., 1999) and language modelling (Rosenfeld, 1996). Many tagging problems have been successfully modelled in the ME framework, including POS tagging, with state of the art performance (v...

Supervised and unsupervised PCFG adaptation to novel domains

by Brian Roark , Michiel Bacchiani , 2003
"... This paper investigates adapting a lexicalized probabilistic context-free grammar (PCFG) to a novel domain, using maximum a posteriori (MAP) estimation. The MAP framework is general enough to include some previous model adaptation approaches, such as corpus mixing in Gildea (2001), for example ..."
Abstract - Cited by 48 (0 self) - Add to MetaCart
This paper investigates adapting a lexicalized probabilistic context-free grammar (PCFG) to a novel domain, using maximum a posteriori (MAP) estimation. The MAP framework is general enough to include some previous model adaptation approaches, such as corpus mixing in Gildea (2001), for example. Other approaches falling within this framework are more effective. In contrast to the results

Investigating Loss Functions and Optimization Methods for Discriminative Learning of Label Sequences

by Yasemin Altun, Mark Johnson, Thomas Hofmann - In Proc. EMNLP , 2003
"... Discriminative models have been of interest in the NLP community in recent years. ..."
Abstract - Cited by 41 (1 self) - Add to MetaCart
Discriminative models have been of interest in the NLP community in recent years.
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University