Results 1 -
2 of
2
Building a Large Annotated Corpus of English: The Penn Treebank
- COMPUTATIONAL LINGUISTICS
, 1993
"... There is a growing consensus that significant, rapid progress can be made in both text understanding and spoken language understanding by investigating those phenomena that occur most centrally in naturally occurring unconstrained materials and by attempting to automatically extract information abou ..."
Abstract
-
Cited by 1654 (9 self)
- Add to MetaCart
There is a growing consensus that significant, rapid progress can be made in both text understanding and spoken language understanding by investigating those phenomena that occur most centrally in naturally occurring unconstrained materials and by attempting to automatically extract information about language from very large corpora. Such corpora are beginning to serve as important research tools for investigators in natural language processing, speech recognition, and integrated spoken language systems, as well as in theoretical linguistics. Annotated corpora promise to be valuable for enterprises as diverse as the automatic construction of statistical models for the grammar of the written and the colloquial spoken language, the development of explicit formal theories of the differing grammars of writing and speech, the investigation of prosodic phenomena in speech, and the evaluation and comparison of the adequacy of parsing models.
In this paper, we review our experience with constructing one such large annotated corpus--the Penn Treebank, a corpus 1 consisting of over 4.5 million words of American English. During the first three-year phase of the Penn Treebank Project (1989-1992), this corpus has been annotated for part-of-speech (POS) information. In addition, over half of it has been annotated for skeletal syntactic structure. These materials are available to members of the Linguistic Data Consortium; for details, see Section 5.1.
Support for Maintaining Object-Oriented Programs
- IEEE Transactions on Software Engineering
, 1992
"... In this paper, we explain how inheritance and dynamic binding make object-oriented programs difficult to maintain, and we give a concrete example of the problems that arise. We show that the difficulty lies in the fact that conventional tools are poorly suited for work with object-oriented languages ..."
Abstract
-
Cited by 39 (5 self)
- Add to MetaCart
In this paper, we explain how inheritance and dynamic binding make object-oriented programs difficult to maintain, and we give a concrete example of the problems that arise. We show that the difficulty lies in the fact that conventional tools are poorly suited for work with object-oriented languages, and we argue that semantics-based tools are essential for effective maintenance of object-oriented programs. We then describe a system we have developed for working with C++ programs. This system comprises a relational database system for information about programs, and an interactive database interface integrated with a text editor. We describe our system architecture, detail the database relations, provide informal evidence on the system's effectiveness, and compare it to other research with similar goals. Keywords: software maintenance, object-oriented languages and environments, programming environments, semantic analysis of code, software representation in relational databases, C++. ...

