Download:
|
by Jakub Zavrel, Walter Daelemans
In Proceedings of the second international conference on language resources and evaluation (LREC-2000
http://pcger40.uia.ac.be/walter/papers/zd00.ps
Add To MetaCart
Abstract:
This paper describes a new method, COMBI-BOOTSTRAP, to exploit existing taggers and lexical resources for the annotation of corpora with new tagsets. COMBI-BOOTSTRAP uses existing resources as features for a second level machine learning module, that is trained to make the mapping to the new tagset on a very small sample of annotated corpus material. Experiments show that COMBI-BOOTSTRAP: i) can integrate a wide variety of existing resources, and ii) achieves much higher accuracy (up to 44.7 % error reduction) than both the best single tagger and an ensemble tagger constructed out of the same small training sample. 1.
Citations
|
792
|
Instance-Based Learning Algorithms
– Kibler
- 1991
|
|
549
|
Transformation-based error-driven learning and natural language processing: A case study in part of speech tagging
– Brill
- 1995
|
|
222
|
A maximum entropy part-of-speech tagger
– Ratnaparkhi
- 1996
|
|
202
|
TnT – A statistical Part-of-Speech tagger
– Brants
- 2000
|
|
188
|
A.(2004), TiMBL: Tilburg Memory Based Learner, version 5.1, reference guide
– Daelemans, Zavrel, et al.
|
|
155
|
MBT: A Memory-Based Part of Speech Tagger-Generator
– Daelemans, Zavrel, et al.
- 1996
|
|
76
|
Classifier combination for improved lexical disambiguation
– Brill, Wu
- 1998
|
|
60
|
Improving data driven wordclass tagging by system combination
– Halteren, Zavrel, et al.
- 1998
|
|
13
|
Amalgam: Automatic mapping among lexicogrammatical annotation models
– Atwell, Hughes, et al.
- 1994
|
|
9
|
A support tool for tagset mapping
– Teufel
- 1995
|
|
9
|
Woordfrequenties in geschreven en gesproken Nederlands
– Boogaart, C
- 1975
|
|
7
|
The CELEX lexical data base
– Baayen, Piepenbrock, et al.
- 1993
|
|
5
|
Wotan, een automatische grammatikale tagger voor het Nederlands
– Berghmans
- 1994
|
|
5
|
Woordfrequenties in geschreven en gesproken Nederlands
– Boogaart, C
- 1975
|
|
5
|
The WOTAN2 Tagset Manual (under construction). Katholieke Universiteit Nijmegen
– Halteren
- 1999
|
|
4
|
Part of speech tagging and lemmatisation for the spoken dutch corpus
– Eynde, Zavrel, et al.
- 2000
|
|
1
|
Improving accuracy in NLP through combination of machine learning systems
– Halteren, Zavrel, et al.
- 2000
|