Results 1 - 10
of
236
TnT - A Statistical Part-Of-Speech Tagger
, 2000
"... Trigrams'n'Tags (TnT) is an efficient statistical part-of-speech tagger. Contrary to claims found elsewhere in the literature, we argue that a tagger based on Markov models performs at least as well as other current approaches, including the Maximum Entropy framework. A recent comparison h ..."
Abstract
-
Cited by 540 (5 self)
- Add to MetaCart
(Show Context)
Trigrams'n'Tags (TnT) is an efficient statistical part-of-speech tagger. Contrary to claims found elsewhere in the literature, we argue that a tagger based on Markov models performs at least as well as other current approaches, including the Maximum Entropy framework. A recent comparison has even shown that TnT performs significantly better for the tested corpora. We describe the basic model of TnT, the techniques used for smoothing and for handling unknown words. Furthermore, we present evaluations on two corpora.
CoNLL-X shared task on multilingual dependency parsing
- In Proc. of CoNLL
, 2006
"... Each year the Conference on Computational Natural Language Learning (CoNLL) 1 features a shared task, in which participants train and test their systems on exactly the same data sets, in order to better compare systems. The tenth CoNLL (CoNLL-X) saw a shared task on Multilingual Dependency Parsing. ..."
Abstract
-
Cited by 344 (2 self)
- Add to MetaCart
(Show Context)
Each year the Conference on Computational Natural Language Learning (CoNLL) 1 features a shared task, in which participants train and test their systems on exactly the same data sets, in order to better compare systems. The tenth CoNLL (CoNLL-X) saw a shared task on Multilingual Dependency Parsing. In this paper, we describe how treebanks for 13 languages were converted into the same dependency format and how parsing performance was measured. We also give an overview of the parsing approaches that participants took and the results that they achieved. Finally, we try to draw general conclusions about multi-lingual parsing: What makes a particular language, treebank or annotation scheme easier or harder to parse and which phenomena are challenging for any dependency parser? Acknowledgement Many thanks to Amit Dubey and Yuval Krymolowski, the other two organizers of the shared task, for discussions, converting treebanks, writing software and helping with the papers. 2
Forgetting Exceptions is Harmful in Language Learning
- MACHINE LEARNING, SPECIAL ISSUE ON NATURAL LANGUAGE LEARNING
, 1999
"... We show that in language learning, contrary to received wisdom, keeping exceptional training instances in memory can be beneficial for generalization accuracy. We investigate this phenomenon empirically on a selection of benchmark natural language processing tasks: grapheme-to-phoneme conversion, pa ..."
Abstract
-
Cited by 136 (44 self)
- Add to MetaCart
We show that in language learning, contrary to received wisdom, keeping exceptional training instances in memory can be beneficial for generalization accuracy. We investigate this phenomenon empirically on a selection of benchmark natural language processing tasks: grapheme-to-phoneme conversion, part-of-speech tagging, prepositional-phrase attachment, and base noun phrase chunking. In a first series of experiments we combine memory-based learning with training set editing techniques, in which instances are edited based on their typicality and class prediction strength. Results show that editing exceptional instances (with low typicality or low class prediction strength) tends to harm generalization accuracy. In a second series of experiments we compare memory-based learning and decision-tree learning methods on the same selection of tasks, and find that decision-tree learning often performs worse than memory-based learning. Moreover, the decrease in performance can be linked to the degree of abstraction from exceptions (i.e., pruning or eagerness). We provide explanations for both results in terms of the properties of the natural language processing tasks and the learning algorithms.
Empirical Methods in Information Extraction
- AI magazine
, 1997
"... this article surveys the use of empirical methods for a particular natural language understanding task that is inherently domain-specific. The task is information extraction. Very generally, an information extraction system takes as input an unrestricted text and "summarizes" the text with ..."
Abstract
-
Cited by 123 (7 self)
- Add to MetaCart
this article surveys the use of empirical methods for a particular natural language understanding task that is inherently domain-specific. The task is information extraction. Very generally, an information extraction system takes as input an unrestricted text and "summarizes" the text with respect to a prespecified topic or domain of interest: it finds useful information about the domain and encodes that information in a structured form, suitable for populating databases. In contrast to in-depth natural language understanding tasks, information extraction systems effectively skim a text to find relevant sections and then focus only on these sections in subsequent processing. The information extraction system in Figure 1, for example, summarizes stories about natural disasters, extracting for each such event the type of disaster, the date and time that it occurred, and data on any property damage or human injury caused by the event. Infor
Classifier Combination for Improved Lexical Disambiguation
, 1998
"... One of the most exciting recent directions in machine learning is the discovery that the combination of multiple classifiers often results in significantly better performance than what can be achieved with a single classifier. In this paper, we first show that the errors made from three differ ..."
Abstract
-
Cited by 107 (1 self)
- Add to MetaCart
One of the most exciting recent directions in machine learning is the discovery that the combination of multiple classifiers often results in significantly better performance than what can be achieved with a single classifier. In this paper, we first show that the errors made from three different state of the art part of speech taggers are strongly complementary. Next, we show how this complementary behavior can be used to our advantage. By using contextual cues to guide tagger combination, we are able to derive a new tagger that achieves performance significantly greater than any of the individual taggers.
Memory-Based Shallow Parsing
- In Proceedings of CoNLL
, 1999
"... We present a memory-based learning (MBL) approach to shallow parsing in which POS tagging, chunking, and identification of syntactic relations are formulated as nemory-based modules. The experiments reported in this paper show competitive results, the Fa= for the Wall Street Journal (WSJ) treebank i ..."
Abstract
-
Cited by 86 (20 self)
- Add to MetaCart
We present a memory-based learning (MBL) approach to shallow parsing in which POS tagging, chunking, and identification of syntactic relations are formulated as nemory-based modules. The experiments reported in this paper show competitive results, the Fa= for the Wall Street Journal (WSJ) treebank is: 93.8% for NP chunking, 94.7% for VP chunking, 77.1% fox' subject detection and 79.0% for object detection.
A second-order hidden markov model for part-of-speech tagging
- In Proceedings of the 37th Annual Meeting of the ACL
, 1999
"... This paper describes an extension to the hidden Markov model for part-of-speech tagging using second-order approximations for both contex-tual and lexical probabilities. This model in-creases the accuracy of the tagger to state of the art levels. These approximations make use of more contextual info ..."
Abstract
-
Cited by 82 (7 self)
- Add to MetaCart
This paper describes an extension to the hidden Markov model for part-of-speech tagging using second-order approximations for both contex-tual and lexical probabilities. This model in-creases the accuracy of the tagger to state of the art levels. These approximations make use of more contextual information than standard statistical systems. New methods of smoothing the estimated probabilities are also introduced to address the sparse data problem. 1
The Interaction of Knowledge Sources for Word Sense Disambiguation
- Computational Linguistics
, 2001
"... Word sense disambiguation (WSD) is a computational linguistics task likely to benefit from the tradition of combining different knowledge sources in artificial in telligence research. An important step in the exploration of this hypothesis is to determine which linguistic knowledge sources are most ..."
Abstract
-
Cited by 81 (4 self)
- Add to MetaCart
(Show Context)
Word sense disambiguation (WSD) is a computational linguistics task likely to benefit from the tradition of combining different knowledge sources in artificial in telligence research. An important step in the exploration of this hypothesis is to determine which linguistic knowledge sources are most useful and whether their combination leads to improved results. We present a sense tagger which uses several knowledge sources. Tested accuracy exceeds 94 % on our evaluation corpus. Our system attempts to disambiguate all content words in running text rather than limiting itself to treating a restricted vocabulary of words. It is argued that this approach is more likely to assist the creation of practical systems. 1.
Memory-Based Morphological Analysis
, 1999
"... We present a general architecture for efficient and deterministic morphological analysis based on memory-based learning, and apply it to morphological analysis of Dutch. The system makes direct mappings from letters in context to rich categories that encode morphological boundaries, syntactic class ..."
Abstract
-
Cited by 64 (18 self)
- Add to MetaCart
We present a general architecture for efficient and deterministic morphological analysis based on memory-based learning, and apply it to morphological analysis of Dutch. The system makes direct mappings from letters in context to rich categories that encode morphological boundaries, syntactic class labels, and spelling changes. Both precision and recall of labeled morphemes are over 84% on held-out dictionary test words and estimated to be over 93% in free text.
Incremental integer linear programming for non-projective dependency parsing
- In EMNLP
, 2006
"... Integer Linear Programming has recently been used for decoding in a number of probabilistic models in order to enforce global constraints. However, in certain applications, such as non-projective dependency parsing and machine translation, the complete formulation of the decoding problem as an integ ..."
Abstract
-
Cited by 63 (6 self)
- Add to MetaCart
(Show Context)
Integer Linear Programming has recently been used for decoding in a number of probabilistic models in order to enforce global constraints. However, in certain applications, such as non-projective dependency parsing and machine translation, the complete formulation of the decoding problem as an integer linear program renders solving intractable. We present an approach which solves the problem incrementally, thus we avoid creating intractable integer linear programs. This approach is applied to Dutch dependency parsing and we show how the addition of linguistically motivated constraints can yield a significant improvement over stateof-the-art. 1