(Enter summary)
Abstract: this paper, we review our experience with constructing one such large annotated
corpus--the Penn Treebank, a corpus consisting of over 4.5 million words of American
English. During the first three-year phase of the Penn Treebank Project (1989-1992), this
corpus has been annotated for part-of-speech (POS) information. In addition, over half
of it has been annotated for skeletal syntactic structure. These materials are available
to members of the Linguistic Data Consortium; for details, see... (Update)
Cited by: More
Text Preprocessing for Speech Synthesis - Uwe Reichel Hartmut
(Correct)
A Resource-light Approach to Russian Morphology: Tagging.. - Hana, Feldman, Brew (2004)
(Correct)
Identifying Anatomical Phrases in Clinical Reports by Shallow.. - Ricky
(Correct)
Similar documents (at the sentence level):
73.0%: Building a large annotated corpus of English: the Penn.. - Marcus, Santorini.. (1993)
(Correct)
7.3%: The Penn Treebank: An Overview - Taylor, Marcus, Santorini (2003)
(Correct)
Similar documents based on text: More All
0.4: The Penn Treebank: Annotating Predicate Argument.. - Marcus, Kim.. (1994)
(Correct)
0.3: Extracting and Evaluating General World Knowledge from the.. - Schubert, Tong
(Correct)
0.3: On the Complexity of Queries for Structurally Annotated.. - Kallmeyer (2000)
(Correct)
Related documents from co-citation: More All
16: Statistical decisiontree models for parsing
- Magerman - 1995
15: Structural ambiguity and lexical relations
- Hindle, Rooth - 1993
14: A simple rule-based part of speech tagger
- Brill - 1992
BibTeX entry: (Update)
Mitchell P. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. Building a large annotated corpus of english: The penn treebank. Computational Linguistics, 19:313--330. http://citeseer.ist.psu.edu/marcus04building.html More
@article{ marcus94building,
author = "Mitchell P. Marcus and Beatrice Santorini and Mary Ann Marcinkiewicz",
title = "Building a Large Annotated Corpus of English: The Penn Treebank",
journal = "Computational Linguistics",
volume = "19",
number = "2",
pages = "313-330",
year = "1994",
url = "citeseer.ist.psu.edu/marcus04building.html" }
Citations not processed or no citations identified.
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://acl.ldc.upenn.edu/J/J93/): More
The Mathematics of Statistical Machine Translation.. - Brown, Pietra.. (1993)
(Correct)
The Logic of Typed Feature Structures - Bob Carpenter Carnegie (1992)
(Correct)
Questions and Information Systems - Thomas Lauer Eileen (1992)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC