(Enter summary)
Abstract: It is shown that the distribution of word frequencies for randomly generated texts is very similar to Zipf's law observed in natural languages such as the English. The facts that the frequency of occurrence of a word is almost an inverse power law function of its rank and the exponent of this inverse power law is very close to 1 are largely due to the transformation from the word's length to its rank, which stretches an exponential function to a power law function. (Update)
Cited by: More
Log-Linear Interpolation of Language Models - Gutkin (2006)
(Correct)
Creating Synthetic Temporal Document Collections - Nørvåg, Nybø
(Correct)
Centre for Advanced Spatial Analysis - University College London
(Correct)
Active bibliography (related documents): More All
0.2: Language-Models for Questions - Schofield (2003)
(Correct)
0.2: Lessons Learned in Building and Testing a Tool for Lexical.. - Hutches, Savitch
(Correct)
0.2: Learning Unification-Based Natural Language Grammars - Osborne (1994)
(Correct)
Similar documents based on text: More All
0.3: Phenomenology of Non-local Cellular Automata - Wentian Li Santa (1992)
(Correct)
0.3: Long-range Correlation and Partial 1=f - Spectrum In Non-Coding
(Correct)
0.3: Can Zipf Analyses and Entropy Distinguish Between.. - Cohen, Mantegna, Havlin (1996)
(Correct)
Related documents from co-citation: More All
6: An empirical study of smoothing techniques for language modeling
- Stanley - 1996
6: An informational theory of the statistical structure of language (context) - Mandelbrot - 1952
5: The Psycho-biology of Language: an Introduction to Dynamic Philology (context) - Zipf - 1936
BibTeX entry: (Update)
W. Li. Random texts exhibit zipf's law-like word frequency distribution. IEEE Transactions on Information Theory, 38(6):1842, 1992. http://citeseer.ist.psu.edu/li92random.html More
@article{ li92random,
author = "Li",
title = "Random Texts Exhibit {Zipf's} Law-Like Word Frequency Distribution",
journal = "IEEETIT: IEEE Transactions on Information Theory",
volume = "38",
year = "1992",
url = "citeseer.ist.psu.edu/li92random.html" }
Citations (may not include all citations):
15
Computational Analysis of Present-Day American English (context) - Ku, Francis - 1967
13
An informational theory of the statistical structure of lang.. (context) - Mandelbrot - 1953
5
Mutual information functions of natural language texts (context) - Li - 1989
4
The Psycho-biology of Language: An Introduction to Dynamic P.. (context) - Miller, in - 1965
1
unpublished notes (context) - Gell-Mann
1
self-similarity and 1/f spectrum in dissipative dynamical sy.. (context) - Manneville - 1980
1
The peculiar distribution of rst digits (context) - Raimi - 1969
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://linkage.rockefeller.edu/wli/pub/): More
The Complexity of DNA - The Measure Of . . . - Li
(Correct)
Understanding Long-Range Correlations in DNA Sequences - Li, Marr, Kaneko (1994)
(Correct)
Comments to "Bell Curves and Monkey Languages", J. Casti.. - Li
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC