See this document in CiteSeerX!

Building a Large Annotated Corpus of English: The Penn Treebank (2004)  (Make Corrections)  (475 citations)
Mitchell P. Marcus, Mary Ann Marcinkiewicz, Beatrice Santorini
Computational Linguistics



  Home/Search   Context   Related

Links:   DBLP

 
View or download:
upenn.edu/J/J93/J932004.pdf
Cached:  PS.gz  PS  PDF   Image  Update  Help
Problem Downloading?
From:  upenn.edu/J/J93/ (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: this paper, we review our experience with constructing one such large annotated corpus--the Penn Treebank, a corpus consisting of over 4.5 million words of American English. During the first three-year phase of the Penn Treebank Project (1989-1992), this corpus has been annotated for part-of-speech (POS) information. In addition, over half of it has been annotated for skeletal syntactic structure. These materials are available to members of the Linguistic Data Consortium; for details, see... (Update)

Cited by:   More
Text Preprocessing for Speech Synthesis - Uwe Reichel Hartmut   (Correct)
A Resource-light Approach to Russian Morphology: Tagging.. - Hana, Feldman, Brew (2004)   (Correct)
Identifying Anatomical Phrases in Clinical Reports by Shallow.. - Ricky   (Correct)

Similar documents (at the sentence level):
73.0%:   Building a large annotated corpus of English: the Penn.. - Marcus, Santorini.. (1993)   (Correct)
7.3%:   The Penn Treebank: An Overview - Taylor, Marcus, Santorini (2003)   (Correct)

Similar documents based on text:   More   All
0.4:   The Penn Treebank: Annotating Predicate Argument.. - Marcus, Kim.. (1994)   (Correct)
0.3:   Extracting and Evaluating General World Knowledge from the.. - Schubert, Tong   (Correct)
0.3:   On the Complexity of Queries for Structurally Annotated.. - Kallmeyer (2000)   (Correct)

Related documents from co-citation:   More   All
16:   Statistical decisiontree models for parsing - Magerman - 1995
15:   Structural ambiguity and lexical relations - Hindle, Rooth - 1993
14:   A simple rule-based part of speech tagger - Brill - 1992

BibTeX entry:   (Update)

Mitchell P. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. Building a large annotated corpus of english: The penn treebank. Computational Linguistics, 19:313--330. http://citeseer.ist.psu.edu/marcus04building.html   More

@article{ marcus94building,
    author = "Mitchell P. Marcus and Beatrice Santorini and Mary Ann Marcinkiewicz",
    title = "Building a Large Annotated Corpus of English: The Penn Treebank",
    journal = "Computational Linguistics",
    volume = "19",
    number = "2",
    pages = "313-330",
    year = "1994",
    url = "citeseer.ist.psu.edu/marcus04building.html" }
Citations not processed or no citations identified.



The graph only includes citing articles where the year of publication is known.


Documents on the same site (http://acl.ldc.upenn.edu/J/J93/):   More
The Mathematics of Statistical Machine Translation.. - Brown, Pietra.. (1993)   (Correct)
The Logic of Typed Feature Structures - Bob Carpenter Carnegie (1992)   (Correct)
Questions and Information Systems - Thomas Lauer Eileen (1992)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC