| W. Li. Random texts exhibit Zipf's-law-like word frequency distribution. IEEE Transactions on Information Theory, 38(6):1842--1845, 1992. |
....income distribution follows a lognormal or power law distribution also dates back to at least the 1950 s. The issue arises for other nancial models, as detailed in [59] Similar issues continue to arise in biology [37] chemistry [67] ecology [4, 80] astronomy [82] and information theory [48, 70]. These cases serve as a reminder that the problems we face as computer scientists are not necessarily new, and we should look to other sciences both for tools and understanding. A third discovery from examining previous work is that power law and lognormal distributions are intrinsically ....
W. Li. Random texts exhibit Zipf's-law-like word frequency distribution. IEEE Transactions on Information Theory, 38(6):1842-1845, 1992.
....for frequency, can be used for random occurrence probability instead of equation (10) We will explain it in the following section. 2.3. 1 Pareto Distribution for Random Occurrence Distribution It is well known that rank of words in randomly generated texts follows Zipf s distribution [13] which is used to alleviate noise of rare events in information retrieval area [14] And it is also well known that a heavytailed distribution for the number of events given frequency follows Pareto distribution which is often used for modeling incomes in economics [15] Both of them are a kind of ....
Li, WentJan, "Random Texts Exhibit Zipf' s-Law-Like Word Frequency Distribution", IEEE Transactions on Information Theory, 38(6), 1842-1845, 1992
....law distribution best applies to income distribution, for example, also dates back to at least the 1950 s. The issue arises for other nancial models, as detailed in [48] Similar issues continue to arise in biology [31] chemistry [54] ecology [3, 66] astronomy [67] and information theory [40, 57]. These cases serve as a reminder that the problems we face as computer scientists are not necessarily new, and we should look to other sciences both for tools and understanding. Another discovery from looking at previous work is that power law and lognormal distributions are intrinsically ....
W. Li. Random Texts Exhibit Zipf's-Law-Like Word Frequency Distribution. IEEE Transactions on Information Theory, 38(6):1842-1845, 1992.
....fields over many years. The question of whether a lognormal or power law distribution best applies to income distribution, for example, also dates back to at least the 1950 s. Similar issues continue to arise in biology [31] chemistry [53] ecology [3, 65] astronomy [66] and information theory [40, 56]. These cases serve as a reminder that the problems we face as computer scientists are not necessarily new, and we should look to other sciences both for tools and understanding. Another discovery from looking at previous work is that power law and lognormal distributions are intrinsically ....
W. Li. Random Texts Exhibit Zipf's-Law-Like Word Frequency Distribution. IEEE Transactions on Information Theory, 38(6):1842-1845, 1992.
....Zipf s law [8] which observes (but does not explain) that the ranked frequencies of words in a corpus follow a power law with an exponent of 1. However, we found much higher exponents, ranging from 2.86 to 4.30. Therefore, we do not believe that the published explanations of Zipf s law (such as [6]) explain the distribution of variable length interjectives on the Web. We believe that this group of words comprises a simple model system for studying word lengths that provides an alternative to monkey languages [3] in which text is simply a random stream of letters and spaces like the ....
W. Li. Random texts exhibit zipf's law-like word frequency distribution. IEEE Transactions on Information Theory, 38(6):1842-1845, 1992.
....is the fraction of occurrences of S in the sample. Zipf s law was originally derived from the analysis of the frequency of words in literary texts [83] and has since been found in a variety of contexts (e.g. 84] The form given above can be derived analytically for simple models of random text [85, 86]. Zipf s law suggests that most sequences fold into few very common structures while most structures are extremely rare. Because of the asymptotic power law a constant fraction (about 0.5 1 c ) of structures will occur only once in a sample regardless of sample size. In of its simplest ....
Wentian Li. Random texts exhibit Zipf's-law-like word frequency distribution. Technical Report 91-03-016, Santa Fe Institute, 1991.
....set #testing or validation set#. Note from Fig.1 that for the training set, genes from rank 3 to 9 exhibits similar likelihood, and form a #at step on the Zipf s plot. Such steps Li 5 are a dominant feature in the Zipf s plot of the frequency of word occurrence in randomly generated texts #Li, 1992#. 4 Classi#cation of multiple cancer classes The logistic regression Eq.#3# for a two class data set can be generalized to multiple classes: multinomial logistic regression #Agresti, 1996#: P #y i = Ijx ji #= e ,a I ,b I x ji P C K=1 e ,aK,bKx ji j =1; 2; ###p, I =1; 2; ###C, i =1; 2; ....
....et al. 1994#. Li 8 This paper drew criticism on its strong claim concerning the connection between Zipf s law and human language #Martindale Konopka, 1995; Israelo#, et al. 1996, Bonhoe#er, et al. 1996a,1996b, Voss, 1996#: one of the best counter examples is Zipf s law in money typing texts #Li, 1992#. Also, the paper did not show convincingly that the Zipf s plot for oligonucleotide usage was better #tted byapower law function #Martindale Konopka, 1996#: the deviation from the power law #tting function can be gradual and systematic, an indication that the power law function is not the best ....
Li W #1992#, #Random texts exhibit Zipf's-law-likeword frequency distribution", IEEE Transactions on Information Theory, 38:1842-1845.
....are said to be emergent. We have made much of the surprise aspect of Zipf s law which is held by some to be indicative of emergence. However, a very simple proof of Mandelbrot s result (for a random 4 language ) relying on nothing other than elementary probability theory, has been given by Li (1992), who states (p. 1844) Zipf s law is not a deep law in natural language as one might first have thought. It is very much related to the particular representation one chooses, i.e. rank as the independent variable. Li goes on to state (presumably because she was able to show its simple ....
Li, W. (1992). Random texts exhibit Zipf's-law-like word frequency distribution. IEEE Transactions on Information Theory IT-38(6), 1842--1845.
....of the human mind, and came up with an explanation he called the principle of least effort. At the time, there was a great deal of controversy over this explanation, but today it is commonly accepted that for both English texts and random texts, the cause of Zipf s law is purely statistical[Li 92] Glassman 94] was the first to observe the Zipf distribution for Web traffic patterns. They analyzed the 12 weeks of proxy logs from the DEC client population. On a log log scale, they plotted the number of accesses versus the page rank for all the requests in their logs, and compared the ....
W. Li. Random texts exhibit zipf's-law-like word frequency distribution. IEEE Transactions on Information Theory, 1992.
....of leaf nodes accessed on the number of distinct terms V (the vocabulary size) appearing in an update batch of size U . Since an update batch is most likely comprised of number of documents to be added or deleted, we may expect that the distribution of terms in the update batch follows Zipf s law [10]. Zipf s law predicts that the product of a term s rank r and its frequency f r will be constant and consequently that this constant is equal to the vocabulary size: f r = V=r Summing over rank: V X r=1 f r = V X r=1 V=r gives U = V V X r=1 1=r = V HV V (ln V fl) where H V is ....
Wentian Li. Random texts exhibit Zipf's-lawlike word frequency distibution. IEEE Transactions on Information Theory, 38(6):1842--1845, November 1992.
....It states that if one takes the words making up an extended body of text and rank them by frequency of occurrence, then the rank multiplied by it s frequency of occurrence f(r) will be approximately a constant . The form given above can be derived analytically for simple models of random text [43, 11]. Zipf s law suggests that most sequences fold into few very common structures while most structures are extremely rare. In the above parameterization of Zipf s Law the exponent c describes the distribution of rare sequences, the constant b is a rough measure for the number of frequent structures, ....
W. Li. Random texts exhibit Zipf's-law-like word frequency distribution. Technical Report 91-03-016, Santa Fe Institute, 1991.
....it has been used in the study of systems such as chaotic dynamical systems [5] biological sequences [6,7] and economic systems [8,9] Zipf found, for texts written in natural languages, a universal power law behavior characterized by a power law exponent close to 1. Several theoretical models [10,11,12,13] have been proposed to explain Zipf s law. Some of them [11,13] show, theoretically and empirically, that Zipf s law is also satisfied in randomly generated symbolic sequences, with an exponent i close to one. These theoretical models and empirical results suggest that Zipf s law and the value of ....
....systems [5] biological sequences [6,7] and economic systems [8,9] Zipf found, for texts written in natural languages, a universal power law behavior characterized by a power law exponent close to 1. Several theoretical models [10,11,12,13] have been proposed to explain Zipf s law. Some of them [11,13] show, theoretically and empirically, that Zipf s law is also satisfied in randomly generated symbolic sequences, with an exponent i close to one. These theoretical models and empirical results suggest that Zipf s law and the value of its exponent, i, reflect little about the linguistic nature of ....
[Article contains additional citation context not shown here]
W. Li, Random Texts Exhibit Zipf's-Law-Like Word Frequency Distribution, IEEE Trans. on Inf. Theory 38, No. 6, (1992).
....training data even once is not uncommon. This leads to the conclusion that an a priori distribution for the word frequencies is necessary. Zipf s law states that the logarithm of the word frequency is approximately proportional to the logarithm of the rank of the word (Zipf, 1949; Pierce, 1961; Li, 1992), where rank is defined such that the Nth most frequent word has rank N. We get: log log p R R p = r for some constant r,where R is the rank and p is the word frequency. If the word class is large then the difference in word frequency is small between two words whose rank differs by ....
Li W (1992). Random texts exhibit Zipf's law-like word frequency distribution. IEEE Trans Info Theory 38: 5.
....is the fraction of occurrences of S in the sample. Zipf s law was originally derived from the analysis of the frequency of words in literary texts [83] and has since been found in a variety of contexts (e.g. 84] The form given above can be derived analytically for simple models of random text [85, 86]. Zipf s law suggests that most sequences fold into few very common structures while most structures are extremely rare. Because of the asymptotic power law a constant fraction (about 0:5 1=c ) of structures will occur only once in a sample regardless of sample size. In of its simplest ....
Wentian Li. Random texts exhibit Zipf's-law-like word frequency distribution. Technical Report 91-03-016, Santa Fe Institute, 1991.
....unit is small (as in words) However, we see one advantage for small chunking units. A small chunking unit increases locality. That is most documents will have a relatively small working set of words rather than sentences. Consider the frequency distribution of N words to follow Zipf s Law [26, 23, 11]. If the words are ranked in non increasing order of frequencies, then the probability that a word w of rank r occurs is P (w) 1 r P N v=1 1=v If we assume a vocabulary of about 1.8 million words [23] about 40,000 (about 2 of 1.8 million) words constitute nearly 75 of the actual ....
W. Li. Random texts exhibit zipf's law-like word frequency distribution. IEEE Transactions on Information Theory, 38(6):1842, 1992.
No context found.
W. Li, #Random texts exhibit Zipf's-law-likeword frequency distribution," IEEE Transactions on Information Theory, 38#6#, 1842-1845 #1992#.
No context found.
W. Li. Random texts exhibit Zipf's-law-like word frequency distribution. IEEE Transactions on Information Theory, 38(6):1842--1845, 1992.
No context found.
W. Li. Random texts exhibit Zipf's-law-like word frequency distribution. IEEE Transactions on Information Theory, 38(6):1842--1845, 1992.
No context found.
W. Li. Random texts exhibit Zipf's-law-like word frequency distribution. IEEE Transactions on Information Theory, 38(6), 1992.
No context found.
W. Li. Random texts exhibit zipf's-law-like word frequency distribution. IEEE Transactions on Information Theory, 6:1842--1845, 1992.
No context found.
Wentian Li. Random texts exhibit zipf's-law-like word frequency distribution. IEEE Transactions on Information Theory, 38(6) 1842-1845, 1992. 8
No context found.
W. Li. Random texts exhibit Zipf's law-like word frequency distribution. IEEE Trans. Inf. Theory, 38(6):1842, 1992.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC