MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Bootstrapping a tagged corpus through combination of existing heterogeneous taggers (2000) [2 citations — 1 self]

Download:
Download as a PDF | Download as a PS
by Jakub Zavrel, Walter Daelemans
In Proceedings of the second international conference on language resources and evaluation (LREC-2000
http://pcger40.uia.ac.be/walter/papers/zd00.ps
Add To MetaCart

Abstract:

This paper describes a new method, COMBI-BOOTSTRAP, to exploit existing taggers and lexical resources for the annotation of corpora with new tagsets. COMBI-BOOTSTRAP uses existing resources as features for a second level machine learning module, that is trained to make the mapping to the new tagset on a very small sample of annotated corpus material. Experiments show that COMBI-BOOTSTRAP: i) can integrate a wide variety of existing resources, and ii) achieves much higher accuracy (up to 44.7 % error reduction) than both the best single tagger and an ensemble tagger constructed out of the same small training sample. 1.

Citations

792 Instance-Based Learning Algorithms – Kibler - 1991
549 Transformation-based error-driven learning and natural language processing: A case study in part of speech tagging – Brill - 1995
222 A maximum entropy part-of-speech tagger – Ratnaparkhi - 1996
202 TnT – A statistical Part-of-Speech tagger – Brants - 2000
188 A.(2004), TiMBL: Tilburg Memory Based Learner, version 5.1, reference guide – Daelemans, Zavrel, et al.
155 MBT: A Memory-Based Part of Speech Tagger-Generator – Daelemans, Zavrel, et al. - 1996
76 Classifier combination for improved lexical disambiguation – Brill, Wu - 1998
60 Improving data driven wordclass tagging by system combination – Halteren, Zavrel, et al. - 1998
13 Amalgam: Automatic mapping among lexicogrammatical annotation models – Atwell, Hughes, et al. - 1994
9 A support tool for tagset mapping – Teufel - 1995
9 Woordfrequenties in geschreven en gesproken Nederlands – Boogaart, C - 1975
7 The CELEX lexical data base – Baayen, Piepenbrock, et al. - 1993
5 Wotan, een automatische grammatikale tagger voor het Nederlands – Berghmans - 1994
5 Woordfrequenties in geschreven en gesproken Nederlands – Boogaart, C - 1975
5 The WOTAN2 Tagset Manual (under construction). Katholieke Universiteit Nijmegen – Halteren - 1999
4 Part of speech tagging and lemmatisation for the spoken dutch corpus – Eynde, Zavrel, et al. - 2000
1 Improving accuracy in NLP through combination of machine learning systems – Halteren, Zavrel, et al. - 2000