See this document in CiteSeerX!

Random Texts Exhibit Zipf's-Law-Like Word Frequency Distribution (1992)  (Make Corrections)  (22 citations)
Wentian Li
IEEETIT: IEEE Transactions on Information Theory



  Home/Search   Context   Related

 
View or download:
rockefeller.edu/wli/pub/zipf.ps
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  rockefeller.edu/wli/pub/ (more)
Homepages:  W.Li  

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: It is shown that the distribution of word frequencies for randomly generated texts is very similar to Zipf's law observed in natural languages such as the English. The facts that the frequency of occurrence of a word is almost an inverse power law function of its rank and the exponent of this inverse power law is very close to 1 are largely due to the transformation from the word's length to its rank, which stretches an exponential function to a power law function. (Update)

Similar documents based on text:   More   All
0.3:   Phenomenology of Non-local Cellular Automata - Wentian Li Santa (1992)   (Correct)
0.3:   Long-range Correlation and Partial 1=f - Spectrum In Non-Coding   (Correct)
0.3:   Can Zipf Analyses and Entropy Distinguish Between.. - Cohen, Mantegna, Havlin (1996)   (Correct)

Related documents from co-citation:   More   All
6:   An empirical study of smoothing techniques for language modeling - Stanley - 1996
6:   An informational theory of the statistical structure of language (context) - Mandelbrot - 1952
5:   The Psycho-biology of Language: an Introduction to Dynamic Philology (context) - Zipf - 1936

BibTeX entry:   (Update)

W. Li. Random texts exhibit zipf's law-like word frequency distribution. IEEE Transactions on Information Theory, 38(6):1842, 1992. http://citeseer.ist.psu.edu/li92random.html   More

@article{ li92random,
    author = "Li",
    title = "Random Texts Exhibit {Zipf's} Law-Like Word Frequency Distribution",
    journal = "IEEETIT: IEEE Transactions on Information Theory",
    volume = "38",
    year = "1992",
    url = "citeseer.ist.psu.edu/li92random.html" }
Citations (may not include all citations):
15   Computational Analysis of Present-Day American English (context) - Ku, Francis - 1967
13   An informational theory of the statistical structure of lang.. (context) - Mandelbrot - 1953
5   Mutual information functions of natural language texts (context) - Li - 1989
4   The Psycho-biology of Language: An Introduction to Dynamic P.. (context) - Miller, in - 1965
1   unpublished notes (context) - Gell-Mann
1   self-similarity and 1/f spectrum in dissipative dynamical sy.. (context) - Manneville - 1980
1   The peculiar distribution of rst digits (context) - Raimi - 1969



The graph only includes citing articles where the year of publication is known.


Documents on the same site (http://linkage.rockefeller.edu/wli/pub/):   More
The Complexity of DNA - The Measure Of . . . - Li   (Correct)
Understanding Long-Range Correlations in DNA Sequences - Li, Marr, Kaneko (1994)   (Correct)
Comments to "Bell Curves and Monkey Languages", J. Casti.. - Li   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC