Results 11 - 20
of
191
Combining lexical, syntactic and semantic features with Maximum Entropy models for extracting relations
- Proceedings of ACL’04
"... Extracting semantic relationships between entities is challenging because of a paucity of annotated data and the errors induced by entity detection modules. We employ Maximum Entropy models to combine diverse lexical, syntactic and semantic features derived from the text. Our system obtained competi ..."
Abstract
-
Cited by 89 (0 self)
- Add to MetaCart
(Show Context)
Extracting semantic relationships between entities is challenging because of a paucity of annotated data and the errors induced by entity detection modules. We employ Maximum Entropy models to combine diverse lexical, syntactic and semantic features derived from the text. Our system obtained competitive results in the Automatic Content Extraction (ACE) evaluation. Here we present our general approach and describe our ACE results. 1
2006b. Reranking and self-training for parser adaptation
- ACL-COLING
"... Statistical parsers trained and tested on the Penn Wall Street Journal (WSJ) treebank have shown vast improvements over the last 10 years. Much of this improvement, however, is based upon an ever-increasing number of features to be trained on (typically) the WSJ treebank data. This has led to concer ..."
Abstract
-
Cited by 89 (2 self)
- Add to MetaCart
(Show Context)
Statistical parsers trained and tested on the Penn Wall Street Journal (WSJ) treebank have shown vast improvements over the last 10 years. Much of this improvement, however, is based upon an ever-increasing number of features to be trained on (typically) the WSJ treebank data. This has led to concern that such parsers may be too finely tuned to this corpus at the expense of portability to other genres. Such worries have merit. The standard “Charniak parser ” checks in at a labeled precisionrecall f-measure of 89.7 % on the Penn WSJ test set, but only 82.9 % on the test set from the Brown treebank corpus. This paper should allay these fears. In particular, we show that the reranking parser described in Charniak and Johnson (2005) improves performance of the parser on Brown to 85.2%. Furthermore, use of the self-training techniques described in (Mc-Closky et al., 2006) raise this to 87.8% (an error reduction of 28%) again without any use of labeled Brown data. This is remarkable since training the parser and reranker on labeled Brown data achieves only 88.4%. 1
Parsing biomedical literature
- In Proceedings of the Second International Joint Conference on Natural Language Processing (IJCNLP-05), Jeju Island, Korea
, 2005
"... Abstract. We present a preliminary study of several parser adaptation techniques evaluated on the GENIA corpus of MEDLINE abstracts [1, 2]. We begin by observing that the Penn Treebank (PTB) is lexically impoverished when measured on various genres of scientific and technical writing, and that this ..."
Abstract
-
Cited by 71 (2 self)
- Add to MetaCart
(Show Context)
Abstract. We present a preliminary study of several parser adaptation techniques evaluated on the GENIA corpus of MEDLINE abstracts [1, 2]. We begin by observing that the Penn Treebank (PTB) is lexically impoverished when measured on various genres of scientific and technical writing, and that this significantly impacts parse accuracy. To resolve this without requiring in-domain treebank data, we show how existing domain-specific lexical resources may be leveraged to augment PTB-training: part-of-speech tags, dictionary collocations, and namedentities. Using a state-of-the-art statistical parser [3] as our baseline, our lexically-adapted parser achieves a 14.2 % reduction in error. With oracleknowledge of named-entities, this error reduction improves to 21.2%. 1
Edit Detection and Parsing for Transcribed Speech
- In Proc. NAACL
, 2001
"... We present a simple architecture for parsing transcribed speech in which an edited-word detector first removes such words from the sentence string, and then a standard statistical parser trained on transcribed speech parses the remaining words. The edit detector achieves a misclassification rate on ..."
Abstract
-
Cited by 67 (6 self)
- Add to MetaCart
We present a simple architecture for parsing transcribed speech in which an edited-word detector first removes such words from the sentence string, and then a standard statistical parser trained on transcribed speech parses the remaining words. The edit detector achieves a misclassification rate on edited words of 2.2%. (The NULL-model, which marks everything as not edited, has an error rate of 5.9%.) To evaluate our parsing results we introduce a new evaluation metric, the purpose of which is to make evaluation of a parse tree relatively indi#erent to the exact tree position of EDITED nodes. By this metric the parser achieves 85.3% precision and 86.5% recall.
Learning to classify email into speech acts
- In Proceedings of Empirical Methods in Natural Language Processing
, 2004
"... It is often useful to classify email according to the intent of the sender (e.g., "propose a meeting", "deliver information"). We present experimental results in learning to classify email in this fashion, where each class corresponds to a verb-noun pair taken from a predefined o ..."
Abstract
-
Cited by 62 (8 self)
- Add to MetaCart
(Show Context)
It is often useful to classify email according to the intent of the sender (e.g., "propose a meeting", "deliver information"). We present experimental results in learning to classify email in this fashion, where each class corresponds to a verb-noun pair taken from a predefined ontology describing typical “email speech acts”. We demonstrate that, although this categorization problem is quite different from “topical ” text classification, certain categories of messages can nonetheless be detected with high precision (above 80%) and reasonable recall (above 50%) using existing text-classification learning methods. This result suggests that useful tasktracking tools could be constructed based on automatic classification into this taxonomy. 1
Probabilistic Syntax
, 2002
"... istic methods for syntax, just as for a long time McCarthy and Hayes (1969) discouraged exploration of probabilistic methods in Artificial Intelligence. Among his arguments were that: (i) Probabilistic models wrongly mix in world knowledge (New York occurs more in text than Dayton, Ohio, but for no ..."
Abstract
-
Cited by 55 (2 self)
- Add to MetaCart
istic methods for syntax, just as for a long time McCarthy and Hayes (1969) discouraged exploration of probabilistic methods in Artificial Intelligence. Among his arguments were that: (i) Probabilistic models wrongly mix in world knowledge (New York occurs more in text than Dayton, Ohio, but for no linguistic reason), (ii) Probabilistic models don't model grammaticality (neither Colorless green ideas sleep furiously nor Furiously sleep ideas green colorless have previously been uttered -- and hence must be estimated to have probability zero, Chomsky wrongly assumes -- but the former is grammatical while the latter is not, and (iii) Use of probabilities does not meet the goal of describing the mind-internal I-language as opposed to the observed-in-the-world E-language. This chapter is not meant to be a detailed critique of Chomsky's arguments -- Abney (1996) provides a survey and a rebuttal, and Pereira (2000) has further useful discussion -- but some of these concerns are still importa
Discriminative Training of a Neural Network Statistical Parser
, 2004
"... Discriminative methods have shown significant improvements over traditional generative methods in many machine learning applications, but there has been difficulty in extending them to natural language parsing. One problem is that much of the work on discriminative methods conflates changes to the l ..."
Abstract
-
Cited by 53 (8 self)
- Add to MetaCart
Discriminative methods have shown significant improvements over traditional generative methods in many machine learning applications, but there has been difficulty in extending them to natural language parsing. One problem is that much of the work on discriminative methods conflates changes to the learning method with changes to the parameterization of the problem. We show how a parser can be trained with a discriminative learning method while still parameterizing the problem according to a generative probability model. We present three methods for training a neural network to estimate the probabilities for a statistical parser, one generative, one discriminative, and one where the probability model is generative but the training criteria is discriminative. The latter model outperforms the previous two, achieving state-ofthe-art levels of performance (90.1 % F-measure on constituents).
Investigating GIS and smoothing for maximum entropy taggers
- In Proceedings of the 10th Meeting of the EACL
, 2003
"... This paper investigates two elements of Maximum Entropy tagging: the use of a correction feature in the Generalised Iterative Scaling (GIS) estimation algorithm, and techniques for model smoothing. We show analytically and empirically that the correction feature, assumed to be required for the corre ..."
Abstract
-
Cited by 52 (11 self)
- Add to MetaCart
(Show Context)
This paper investigates two elements of Maximum Entropy tagging: the use of a correction feature in the Generalised Iterative Scaling (GIS) estimation algorithm, and techniques for model smoothing. We show analytically and empirically that the correction feature, assumed to be required for the correctness of GIS, is unnecessary. We also explore the use of a Gaussian prior and a simple cutoff for smoothing. The experiments are performed with two tagsets: the standard Penn Treebank POS tagset and the larger set of lexical types from Combinatory Categorial Grammar. 1
Supervised and unsupervised PCFG adaptation to novel domains
, 2003
"... This paper investigates adapting a lexicalized probabilistic context-free grammar (PCFG) to a novel domain, using maximum a posteriori (MAP) estimation. The MAP framework is general enough to include some previous model adaptation approaches, such as corpus mixing in Gildea (2001), for example ..."
Abstract
-
Cited by 48 (0 self)
- Add to MetaCart
This paper investigates adapting a lexicalized probabilistic context-free grammar (PCFG) to a novel domain, using maximum a posteriori (MAP) estimation. The MAP framework is general enough to include some previous model adaptation approaches, such as corpus mixing in Gildea (2001), for example. Other approaches falling within this framework are more effective. In contrast to the results
Investigating Loss Functions and Optimization Methods for Discriminative Learning of Label Sequences
- In Proc. EMNLP
, 2003
"... Discriminative models have been of interest in the NLP community in recent years. ..."
Abstract
-
Cited by 41 (1 self)
- Add to MetaCart
Discriminative models have been of interest in the NLP community in recent years.