| W. Francis and H. Kucera. Manual of information to accompany a standard corpus of present-day edited american english, for use with digital computers, 1964. |
....proper names and abbreviations. Other sources recommend to use a dictionary that contains only the most frequent tokens, in order to find a compromise between principles (1) and (2) Word frequencies in dictionaries are usually obtained from an analysis of a large corpus, such as the Brown Corpus [2] or the British National Corpus (BNC) The problem with these approaches is that the dictionary is not adapted to the given thematic topic. In practice, depending on the topic, C is likely to contain a nontrivial amount of tokens that are not found in D, even if D is very large. In addition, word ....
W. Francis and H. Kucera. Manual of information to accompany a standard corpus of present-day edited american english, for use with digital computers, 1964.
....learning is especially suited. We point out how it manages to largely avoid difficulties with overtraining, and show a way of recording the dependencies between rules in the learned sequence. The analysis throughout is based on part of speech tagging experiments using the tagged Brown Corpus (Francis and Kucera, 1979) and a tagged Septuagint Greek version of the first five books of the Bible (CATSS, 1991) Brill s Approach This learning approach starts with a supervised training corpus and a baseline heuristic for assigning initial values. In the part of speech tagging application, for example, the baseline ....
Francis, W. Nelson and Henry Kucera. 1979. Manual of information to accompany a standard corpus of present-day edited American English, for use with digital computers. Technical report, Department of Linguistics, Brown University.
....we created a small test set, blindly choosing the last 117 sentences, or 1 , of our 220k word corpus, sentences which were, as it happens, from section r of the Brown Corpus. After some disappointing parsing results using both the regular parser and our WordNet extended version, we peeked in (Francis and Kuera, 1979) and discovered this was the humor writing section; our initial test corpus was literally a joke. To create a more representative test set, we sampled every 100th sentence to create a new 117sentence test set that spanned the entire range of styles in the 220k words; we put all other sentences in ....
W. N. Francis and H. Kuera. 1979. Manual of Information to accompany A Standard Corpus of Present-Day Edited American English, for use with Digital Computers. Department of Linguistics, Brown University, Providence, Rhode Island.
....will not need to select a WordNet sense for each ambiguous word. In this auto disambiguation investigation, it will be interesting to determine whether a specialized corpus, e.g. of photo captions, performs sense tagging significantly better than a generalpurpose corpus, such as the Brown corpus (Francis and Kucera, 1979). ....
Francis, W. N. and H. Kucera 1979. Manual of Information to Accompany a Standard Corpus of Present-Day Edited American English, for use with Digital Computers (Corrected and Revised Edition), Department of Linguistics, Brown University, Providence, RI.
....Research Council (EPSRC) and Introduction Several research projects around the world are building grammatically analysed corpora; that is, collections of text annotated with part of speech wordtags and syntax trees. Tagged and parsed English corpora (Bank of English [54] BNC [44] 22] Brown [24]; ICE [12] 28] 64] Lancaster IBM [26] 22] LOB [1] 3] 38] 42] London Lund [62] Nijmegen [12] PoW [23] 55] 57] SEC [63] TOSCA [46] 31] 12] UPenn [53] 45] etc) are used, among other things, as authoritative examples by researchers in English Language Teaching and ....
W.N. Francis and H. Kucera. 1979. Manual of Information to Accompany a Standard Corpus of Present-Day Edited American English, for use with Digital Computers (Corrected and Revised edition). Department of Linguistics, Brown University, Providence, Rhode Island.
....to new domains. They are usually characterized by using large text corpora and performing some analysis which uses primarily the text characteristics without adding significant linguistic or world knowledge [5, 11, 48] Text corpora that have been built include the still widely used Brown corpus [27], and newer corpora such as the LOB corpus [28] and the Penn treebank [49] Annotation of corpora with part of speech tags or parse trees has been a focus of corpus based language analysis. Additional important application areas of statistical techniques to written natural language are ....
W. N. Francis and H. Kucera. Manual of Information to Accompany a Standard Corpus of Present-day Edited American English. Brown University, Department of Linguistics, 1979.
No context found.
W. Francis and H. Kucera. Manual of information to accompany a standard corpus of present-day edited american english, for use with digital computers, 1964.
No context found.
W. Francis and H. Kucera. Manual of information to accompany a standard corpus of present-day edited american english, for use with digital computers, 1964.
No context found.
W.N.Francis, H.Kucera "Manual of information to accompany A Standard Corpus of Present-Day Edited American English" Brown University, Department of Linguistics (1964.
No context found.
Francis, W.N. & Kucera, H. [1979] Manual of information to accompany a standard corpus of present-day edited American English for use with digital computers. Technical Report, Department of Linguistics, Brown University, Rhode Island.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC