Abstract:
A system for recognition and morphological classification of unknown German words is described. Given raw texts it outputs a list of the unknown nouns together with hypotheses about their possible stems and morphological class(es). The system exploits both global and local information as well as morphological properties and external linguistic knowledge sources. It learns and applies ending-guessing rules similar to the ones originally proposed for POS guessing. The paper presents the system design and implementation and discusses its performance by extensive evaluation. Similar ideas for ending-guessing rules have been applied to Bulgarian as well but the performance is worse due to the difficulties of noun recognition as well as to the highly inflexional morphology with numerous ambiguous endings. 1
Citations
|
283
|
A practical part-of-speech tagger
– Cutting, Kupiec, et al.
- 1992
|
|
154
|
Robust part-of-speech tagging using a hidden Markov model. Computer Speech and Language
– Kupiec
- 1992
|
|
116
|
Coping with ambiguity and unknown words through probabilistic models. Computational Linguistics
– Weischedel, Meteer, et al.
- 1993
|
|
86
|
Unsupervised learning of disambiguation rules for part of speech tagging
– Brill
- 1995
|
|
77
|
Automatic rule induction for unknown-word guessing
– Mikheev
- 1997
|
|
43
|
Improvements In Part-of-Speech Tagging With an Application To German
– Schmid
- 1995
|
|
33
|
Knowledge-free induction of morphology using latent semantic analysis
– Schone, Jurafsky
- 2000
|
|
31
|
Memory-based morphological analysis
– Bosch, Daelemans
|
|
28
|
Guessing morphology from terms and corpora
– Jacquemin
- 1997
|
|
27
|
Unsupervised Learning of Derivational Morphology from Inflectional Lexicons
– Gaussier
- 1999
|
|
26
|
Word segmentation by letter successor varities
– Dasgupta, Hafer, et al.
- 1974
|
|
24
|
Language independent minimally supervised induction of lexical probabilities
– Cucerzan, Yarowsky
- 2000
|
|
22
|
Minimally Supervised Morphological Analysis by Multimodal Alignment
– Yarowsky, Wicentowski
- 2000
|
|
21
|
Morphemes as necessary concept for structures discovery from untagged corpora
– Déjean
- 1998
|
|
6
|
Automatic Collection and Analysis of German Compounds
– Goldsmith, Reutter
- 1998
|
|
5
|
Morphy - German Morphology, Part-of-Speech Tagging and Applications
– Lezius
- 2000
|
|
4
|
Treatment of unknown words
– Daciuk
- 1999
|
|
2
|
Morphological decomposition for ASR
– Adda-Decker, Adda
- 2000
|
|
2
|
A Fast Realization of a Classification-Based Approach to Morphology
– MORPHIX
- 1988
|
|
1
|
Aufbau and Regelformat von DeKo
– deprojektedeko, Säuberlich
- 2001
|
|
1
|
Unsupervised Learning of the Morphology of a Natural Language
– J
- 2001
|
|
1
|
Productive second elements in nominal compounds: The matching of English and German
– Hietsch
- 1984
|
|
1
|
Automatic Recognition and Morphological Classification of Unknown German Nouns. Bericht 243, FBI-HH-B-243/02, Universitaet
– Nakov, Angelova, et al.
- 2002
|
|
1
|
Domainadaptive IE
– Neumann, Mazzini
- 1999
|
|
1
|
Analysis of Unknown Lexical Items using Morphological and
– S, Harper
- 1997
|
|
1
|
Decomposing German Compound Nouns
– Ulmann
- 1995
|