| C. E. Shannon. Prediction and entropy of printed English. The Bell System Technical Journal, 30:50--64, January 1951. |
....representing useful information transfer is F IF INF C C Bandwidth Utilised = 8) and the proportion of wasted bandwidth is F IF INF C F IF INF Bandwidth Wasted = 9) Purists will rightfully disagree. The units of C, INF, IF, and F, are characters, not bits. Yet Shannon [6] argues that it is possible to measure the information content of a character. A future direction for this work is to cast these formulae in terms of information content, instead of characters. Then these relations would also apply to non character based text input methods. Equations 8 and 9 ....
Shannon, C.E., (1951) Prediction and entropy of printed English. Bell System Technical Journal, 30, 51-64.
....If a sentence is represented as a bag of words with no information on their order little knowledge can be gleaned. The approach taken here is to combine adjacent words and represent the sentence as a set of pairs and triples. These tuples partially preserve sequential order. Shannon (1951) [25] investigated the amount of information held in sequences of letters in English text, and his results were later extended to words. He found that the additional information held by sequences longer than 3 was very small. This is an important natural constraint. However, the great number of words ....
.... If words are ranked in order of frequency, denoted by n, there is an empirical relationship between the probability of the word at rank n occurring, p(n) and n itself, known as Zipf s Law: p(n) This gives a surprisingly good approximation to word probabilities in English and other languages [25, 1951], and indicates the extent to which a significant number of words occur infrequently. The LOB corpus of about 1 million word tokens contains about 50,000 different word forms, of which about 42,000 occur less than 10 times each [21, 1987] In the Brown corpus of comparable size 40 of word forms ....
[Article contains additional citation context not shown here]
C E shannon. Prediction and Entropy of Printed English. Bell System Technical Journal, 1951.
....Visualization Similar to the behavior of N block models in the text domain, the accuracy of N block models for describing the contents of images increases when we increase the value of N. We look at some examples to get an intuition by generalizing a visualization technique proposed by Shannon [79, 80, 81] from the text domain to the image domain. The idea is to train various N blocks and then use each to generate random images. Given a training database D of M images and a codebook of N codes c 1 c N , let I M be the VQ encoded images, we train the uni block by calculating the ....
C. Shannon. Prediction and entropy of printed English. Bell Systems Technical Journal, 30:50--64, 1951.
.... informative as the fulltexts for a task of solving English reading comprehension questions [9] Although this work is sometimes misquoted as if they discovered the optimal CR for any kind of text for any purpose [1, 27] it is not clear that Shannon s 25 redundancy limit from information theory [19] applies to text summarization, or whether reading comprehension questions cover all informative content. Moreover, since the CR can a ect summary evaluation [5] we examine extracts at 5 , 10 , 30 and 50 CR. We also generate abstract length 3 query Single search runs summary index ....
Shannon, C. E.: Prediction and Entropy of Printed English, Bell Systems Technical Journal 30, pp. 50-64 (1951.
....task in natural language processing. An Ngram might be considered interesting if it occurs more often than would be expected by chance, or has some tendency to predict the occurrence of other phenomena in text. There is a long history of research in this area. Character Ngrams were used by Shannon [10] to estimate the per letter entropy of the English language. In the last decade there has been a large amount of work in developing corpus based techniques to identify collocations in text (e.g. 2] 3] 6] 9] This paper describes the Ngram Statistics Package (NSP) a general purpose ....
C. Shannon. Prediction and entropy of printed English. The Bell System Technical Journal, 30(50-64), 1951.
....Seidenberg[17] proposed a model based on tri grams. Srinivas et al. 21] indicate that assuming orthographic encoding is in most cases sufficient to describe word completion performance in humans. Orthographic Markov models of words have often been used computationally, as, for example, in Shannon s[19] famous work. Following this work, our model is also orthographic. We find that a bigram rather than a trigram representation is sufficient, and leads to a simpler model. Contradicting evidence exists for the influence of fragment length on word completion. Oloffsson and Nyberg [12] failed to ....
C. Shannon. "Prediction and Entropy of Printed English," Bell Systems Technical Journal, 30:50-64, 1951.
....the definition of the complexity and the choice of the system being studied. Using the assumption that meaning and information content in text is founded on the correlation between language symbols one of the meaningful measures of complexity of human writings is entropy as established by Shannon [11]. Yet, when a text is very long it is almost impossible to calculate the Shannon information entropy so Grassberger [12] proposed an approximate method to estimate entropy. But entropy does not reveal directly the correlation properties of texts so another more general measure is needed. One ....
Shannon, C.E., "Prediction and entropy of printed English", Bell System Technical Journal, 1951, pp 30, 50.
....ENTROPY OF ENGLISH USING PPM BASED MODELS W. J. Teahan, John G. Cleary Department of Computer Science, University of Waikato, New Zealand Over 45 years ago Claude E. Shannon estimated the entropy of English to be about 1 bit per character [16]. He did his by having human subjects guess samples of text, letter by letter. From the number of guesses made by each subject he estimated upper and lower bounds of 1.3 and 0.6 bits per character (bpc) for the entropy of English. Shannon s methodology was not improved upon until 1978 when Cover ....
C.E. Shannon. Prediction and entropy of printed English. Bell System Technical Journal, pages 50-64, 1951. '
No context found.
C. E. Shannon. Prediction and entropy of printed English. The Bell System Technical Journal, 30:50--64, January 1951.
No context found.
C. E. Shannon. Prediction and entropy of printed English. The Bell System Technical Journal, 30:50--64, January 1951.
No context found.
C. E. Shannon. Prediction and Entropy of Printed English. Bell Systems Technical Journal, Volume 30, pages 50--64, 1951.
No context found.
C. Shannon. Prediction and Entropy of Printed English. Bell System Technical Journal, (30):50-64. 1951.
No context found.
C. Shannon. Prediction and entropy of printed English. Technical report, Bell Systems, 1951.
No context found.
C. Shannon. Prediction and entropy of printed english. Bell Systems Technical Journal, 30:50--64, 1951.
No context found.
C. E. Shannon. Prediction and entropy of printed English. Bell System Technical Journal, 30(1):50--64, January 1951.
No context found.
C. E. Shannon. Prediction and entropy of printed English. Bell Syst. Tech. J., Vol. 30, pp. 50-64, 1951. 10
No context found.
C. Shannon. Prediction and entropy of printed english. Technical Report 30, Bell Systems, 1951.
No context found.
C. Shannon. Prediction and entropy of printed english. Technical Report 30, Bell Systems, 1951.
No context found.
C. E. Shannon (1951) Prediction and entropy of printed English, Bell System Technical J. pp50-64, January.
No context found.
Shannon, Claude E. Prediction and entropy of printed English, Bell Systems Technical Journal 30:50-64, 1951.
No context found.
C. E. Shannon. Prediction and entropy of printed english. The Bell System Technical Journal, 30:50--64, 1951.
No context found.
Shannon, C. E. (1951). Prediction and entropy of printed English. Bell System Technical Journal, 30, 51-64.
No context found.
C. E. Shannon. Prediction and entropy of printed English. Bell systems technical journal, 30:50-64, January 1951.
No context found.
C. E. Shannon, "Prediction and entropy of printed English," Bell Sys. Tech. J., vol. 30, pp. 5--64, 1951.
No context found.
Shannon C.E. "Prediction and Entropy of Printed English". Bell Systems Technical Journal, Vol.30, pp.50-64, 1951.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC