(Enter summary)
Abstract: The amount of readily available on-line
text has reached hundreds of billions of
words and continues to grow. Yet for
most core natural language tasks,
algorithms continue to be optimized,
tested and compared after training on
corpora consisting of only one million
words or less. In this paper, we
evaluate the performance of different
learning methods on a prototypical
natural language disambiguation task,
confusion set disambiguation, when
trained on orders of magnitude more... (Update)
Cited by: More
Web Text Corpus for Natural Language Processing - Liu, Curran (2006)
(Correct)
Weakly Supervised Learning Methods for Improving the Quality of.. - Wellner (2005)
(Correct)
Web-Scale Information Extraction in KnowItAll - Etzioni, Cafarella, Downey.. (2004)
(Correct)
Similar documents based on text: More All
1.0: Mitigating the Paucity-of-Data Problem: Exploring the Effect.. - Banko, Brill (2001)
(Correct)
0.8: Pattern-Based Disambiguation for Natural Language Processing - Eric Brill Microsoft (2000)
(Correct)
0.6: Data-Intensive Question Answering - Brill, Lin, Banko, Dumais, Ng (2001)
(Correct)
Related documents from co-citation: More All
2: LCC tools for question answering
- Moldovan, Harabagiu et al.
2: Extracting patterns and relations from the world wide web
- Brin - 1998
2: Overview of the TREC
- Voorhees - 2001
BibTeX entry: (Update)
Michele Banko and Eric Brill. Scaling to very very large corpora for natural language disambiguation. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, pages 26--33. Association for Computational Linguistics, 2001. http://citeseer.ist.psu.edu/banko01scaling.html More
@inproceedings{ banko01scaling,
author = "Michele Banko and Eric Brill",
title = "Scaling to Very Very Large Corpora for Natural Language Disambiguation",
booktitle = "Meeting of the Association for Computational Linguistics",
pages = "26-33",
year = "2001",
url = "citeseer.ist.psu.edu/banko01scaling.html" }
Citations not processed or no citations identified.
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://acl.ldc.upenn.edu/P/P01/): More
Grammars for Local and Long Dependencies. - Dikovsky
(Correct)
Practical Issues in Compiling Typed Unification Grammars.. - Dowding, Hockey, Gawron (2001)
(Correct)
Evaluating CETEMPúblico, a free resource for Portuguese - Santos, Rocha (2001)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC