Automatic Rule Induction for Unknown Word Guessing (1997)
| Venue: | Computational Linguistics |
| Citations: | 104 - 6 self |
BibTeX
@ARTICLE{Mikheev97automaticrule,
author = {Andrei Mikheev},
title = {Automatic Rule Induction for Unknown Word Guessing},
journal = {Computational Linguistics},
year = {1997},
volume = {23},
pages = {405--423}
}
Years of Citing Articles
OpenURL
Abstract
Words unknown to the lexicon present a substantial problem to NLP modules that rely on mor-phosyntactic information, such as part-of-speech taggers or syntactic parsers. In this paper we present a technique for fully automatic acquisition of rules that guess possible part-of-speech tags for unknown words using their starting and ending segments. The learning is performed from a general-purpose lexicon and word frequencies collected from a raw corpus. Three complimentary sets of word-guessing rules are statistically induced: prefix morphological rules, suffix morpho-logical rules and ending-guessing rules. Using the proposed technique, unknown-word-guessing rule sets were induced and integrated into a stochastic tagger and a rule-based tagger, which were then applied to texts with unknown words. 1.







