• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

A brief history of generative models for power law and lognormal distributions (0)

by M Mitzenmacher
Venue:Internet Mathematics
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 412
Next 10 →

Power-law distributions in empirical data

by Aaron Clauset, Cosma Rohilla Shalizi, M. E. J. Newman - ISSN 00361445. doi: 10.1137/ 070710111. URL http://dx.doi.org/10.1137/070710111 , 2009
"... Power-law distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and man-made phenomena. Unfortunately, the empirical detection and characterization of power laws is made difficult by the large fluctuations that occur in the t ..."
Abstract - Cited by 607 (7 self) - Add to MetaCart
Power-law distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and man-made phenomena. Unfortunately, the empirical detection and characterization of power laws is made difficult by the large fluctuations that occur in the tail of the distribution. In particular, standard methods such as least-squares fitting are known to produce systematically biased estimates of parameters for power-law distributions and should not be used in most circumstances. Here we describe statistical techniques for making accurate parameter estimates for power-law data, based on maximum likelihood methods and the Kolmogorov-Smirnov statistic. We also show how to tell whether the data follow a power-law distribution at all, defining quantitative measures that indicate when the power law is a reasonable fit to the data and when it is not. We demonstrate these methods by applying them to twentyfour real-world data sets from a range of different disciplines. Each of the data sets has been conjectured previously to follow a power-law distribution. In some cases we find these conjectures to be consistent with the data while in others the power law is ruled out.

Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations

by Jure Leskovec, Jon Kleinberg, Christos Faloutsos , 2005
"... How do real graphs evolve over time? What are “normal” growth patterns in social, technological, and information networks? Many studies have discovered patterns in static graphs, identifying properties in a single snapshot of a large network, or in a very small number of snapshots; these include hea ..."
Abstract - Cited by 541 (48 self) - Add to MetaCart
How do real graphs evolve over time? What are “normal” growth patterns in social, technological, and information networks? Many studies have discovered patterns in static graphs, identifying properties in a single snapshot of a large network, or in a very small number of snapshots; these include heavy tails for in- and out-degree distributions, communities, small-world phenomena, and others. However, given the lack of information about network evolution over long periods, it has been hard to convert these findings into statements about trends over time. Here we study a wide range of real graphs, and we observe some surprising phenomena. First, most of these graphs densify over time, with the number of edges growing superlinearly in the number of nodes. Second, the average distance between nodes often shrinks over time, in contrast to the conventional wisdom that such distance parameters should increase slowly as a function of the number of nodes (like O(log n) orO(log(log n)). Existing graph generation models do not exhibit these types of behavior, even at a qualitative level. We provide a new graph generator, based on a “forest fire” spreading process, that has a simple, intuitive justification, requires very few parameters (like the “flammability” of nodes), and produces graphs exhibiting the full range of properties observed both in prior work and in the present study.

Power laws, Pareto distributions and Zipf’s law

by M. E. J. Newman
"... Many of the things that scientists measure have a typical size or “scale”—a typical value around which individual measurements are centred. A simple example would be the heights of human beings. Most adult human beings are about 180cm tall. There is some variation around this figure, notably dependi ..."
Abstract - Cited by 413 (0 self) - Add to MetaCart
Many of the things that scientists measure have a typical size or “scale”—a typical value around which individual measurements are centred. A simple example would be the heights of human beings. Most adult human beings are about 180cm tall. There is some variation around this figure, notably depending on sex, but we never see people who are 10cm tall, or 500cm. To make this observation more quantitative, one can plot a histogram of people’s heights, as I have done in Fig. 1a. The figure shows the heights in centimetres of adult men in the United States measured between 1959 and 1962, and indeed the distribution is relatively narrow and peaked around 180cm. Another telling observation is the ratio of the heights of the tallest and shortest people.
(Show Context)

Citation Context

...s and describe some of the mechanisms by which power-law behaviour can arise. Readers interested in pursuing the subject further may also wish to consult the reviews by Sornette [18] and Mitzenmacher =-=[19]-=-, as well as the bibliography by Li. 2 tical distributions of quantities. For instance, Newton’s famous 1/r 2 law for gravity has a power-law form with exponent α = 2. While such laws are certainly in...

I Tube, You Tube, Everybody Tubes: Analyzing the World’s Largest User Generated Content Video System

by Meeyoung Cha, Haewoon Kwak, Pablo Rodriguez, Yong-yeol Ahn, Sue Moon - In Proceedings of the 5th ACM/USENIX Internet Measurement Conference (IMC’07 , 2007
"... User Generated Content (UGC) is re-shaping the way people watch video and TV, with millions of video producers and consumers. In particular, UGC sites are creating new viewing patterns and social interactions, empowering users to be more creative, and developing new business opportunities. To better ..."
Abstract - Cited by 373 (7 self) - Add to MetaCart
User Generated Content (UGC) is re-shaping the way people watch video and TV, with millions of video producers and consumers. In particular, UGC sites are creating new viewing patterns and social interactions, empowering users to be more creative, and developing new business opportunities. To better understand the impact of UGC systems, we have analyzed YouTube, the world’s largest UGC VoD system. Based on a large amount of data collected, we provide an in-depth study of YouTube and other similar UGC systems. In particular, we study the popularity life-cycle of videos, the intrinsic statistical properties of requests and their relationship with video age, and the level of content aliasing or of illegal content in the system. We also provide insights on the potential for more efficient UGC VoD systems (e.g. utilizing P2P techniques or making better use of caching). Finally, we discuss the opportunities to leverage the latent demand for niche videos that are not reached today due to information filtering effects or other system scarcity distortions. Overall, we believe that the results presented in this paper are crucial in understanding UGC systems and can provide valuable information to ISPs, site administrators, and content owners with major commercial and technical implications. Categories and Subject Descriptors Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
(Show Context)

Citation Context

...very similar shape. Hence it is a nontrivial task to determine whether a certain distribution is powerlaw or log-normal, unless the plot shows a clear straight line across several orders of magnitude =-=[17,19,32,35,38]-=-. The shape of a distribution reflects the underlying mechanism that generates it. Normally, the power-law distribution arises from rich-get-richer principle, while the lognormal distribution arises f...

Graph evolution: Densification and shrinking diameters

by Jure Leskovec, Jon Kleinberg, Christos Faloutsos - ACM TKDD , 2007
"... How do real graphs evolve over time? What are “normal” growth patterns in social, technological, and information networks? Many studies have discovered patterns in static graphs, identifying properties in a single snapshot of a large network, or in a very small number of snapshots; these include hea ..."
Abstract - Cited by 267 (16 self) - Add to MetaCart
How do real graphs evolve over time? What are “normal” growth patterns in social, technological, and information networks? Many studies have discovered patterns in static graphs, identifying properties in a single snapshot of a large network, or in a very small number of snapshots; these include heavy tails for in- and out-degree distributions, communities, small-world phenomena, and others. However, given the lack of information about network evolution over long periods, it has been hard to convert these findings into statements about trends over time. Here we study a wide range of real graphs, and we observe some surprising phenomena. First, most of these graphs densify over time, with the number of edges growing super-linearly in the number of nodes. Second, the average distance between nodes often shrinks over time, in contrast to the conventional wisdom that such distance parameters should increase slowly as a function of the number of nodes (like O(log n) or O(log(log n)). Existing graph generation models do not exhibit these types of behavior, even at a qualitative level. We provide a new graph generator, based on a “forest fire” spreading process, that has a simple, intuitive justification, requires very few parameters (like the “flammability ” of nodes), and produces graphs exhibiting the full range of properties observed both in prior work and in the present study. We also notice that the “forest fire” model exhibits a sharp transition between sparse graphs and graphs that are densifying. Graphs with decreasing distance between the nodes are generated around this transition point. Last, we analyze the connection between the temporal evolution of the degree distribution and densification of a graph. We find that the two are fundamentally related. We also observe that real networks exhibit this type of r
(Show Context)

Citation Context

...tion experiments also indicated that the diameter of networks generated by the recursive search does not decrease over time, but it either slowly increases or remains constant. We point the reader to =-=[39, 40, 7]-=- for overviews of this area. Recent work of Chakrabarti and Faloutsos [12] gives a survey of the properties of real world graphs and the underlying generative models for graphs. It is important to not...

A First-Principles Approach to Understanding the Internet's Router-level Topology

by Lun Li, David Alderson, Walter Willinger, John Doyle , 2004
"... A detailed understanding of the many facets of the Internet's topological structure is critical for evaluating the performance of networking protocols, for assessing the effectiveness of proposed techniques to protect the network from nefarious intrusions and attacks, or for developing improved ..."
Abstract - Cited by 213 (19 self) - Add to MetaCart
A detailed understanding of the many facets of the Internet's topological structure is critical for evaluating the performance of networking protocols, for assessing the effectiveness of proposed techniques to protect the network from nefarious intrusions and attacks, or for developing improved designs for resource provisioning. Previous studies of topology have focused on interpreting measurements or on phenomenological descriptions and evaluation of graph-theoretic properties of topology generators. We propose a complementary approach of combining a more subtle use of statistics and graph theory with a first-principles theory of router-level topology that reflects practical constraints and tradeoffs. While there is an inevitable tradeoff between model complexity and fidelity, a challenge is to distill from the seemingly endless list of potentially relevant technological and economic issues the features that are most essential to a solid understanding of the intrinsic fundamentals of network topology. We claim that very simple models that incorporate hard technological constraints on router and link bandwidth and connectivity, together with abstract models of user demand and network performance, can successfully address this challenge and further resolve much of the confusion and controversy that has surrounded topology generation and evaluation.
(Show Context)

Citation Context

...enerated considerable discussion is the prevalence of heavy-tailed distributions in node degree (e.g., number of connections) and whether or not these heavy-tailed distributions conform to power laws =-=[23, 31, 16, 32]-=-. This macroscopic statistic has greatly influenced the generation and evaluation of network topologies. In the current environment, degree distributions and other large-scale statistics are popular m...

Heuristically optimized trade-offs: a new paradigm for power laws in the internet

by Alex Fabrikant, Christos H. Papadimitriou , 2002
"... Abstract We give a plausible explanation of the power law distributions of degrees observed in the graphs arising in the Internet topology [5] based on a toy model of Internet growth in which two objectives are optimized simultaneously: "last mile " connection costs, and transmissi ..."
Abstract - Cited by 178 (1 self) - Add to MetaCart
Abstract We give a plausible explanation of the power law distributions of degrees observed in the graphs arising in the Internet topology [5] based on a toy model of Internet growth in which two objectives are optimized simultaneously: "last mile " connection costs, and transmission delays measured in hops. We also point out a similar phenomenon, anticipated in [2], in the distribution of file sizes. Our results seem to suggest that power laws tend to arise as a result of complex, multi-objective optimization.
(Show Context)

Citation Context

...ey have been termed “the signature of human activity” (even though they do occasionally arise in nature) 1 . There have been several attempts to explain power laws by so-called generative models (see =-=[12]-=- for a technical survey). The vast majority of such models fall into one large category (with important differences and considerable technical difficulties, of course) that can be termed scale-free gr...

Graph mining: laws, generators, and algorithms

by Deepayan Chakrabarti, Christos Faloutsos - ACM COMPUT SURV (CSUR , 2006
"... How does the Web look? How could we tell an abnormal social network from a normal one? These and similar questions are important in many fields where the data can intuitively be cast as a graph; examples range from computer networks to sociology to biology and many more. Indeed, any M: N relation in ..."
Abstract - Cited by 132 (7 self) - Add to MetaCart
How does the Web look? How could we tell an abnormal social network from a normal one? These and similar questions are important in many fields where the data can intuitively be cast as a graph; examples range from computer networks to sociology to biology and many more. Indeed, any M: N relation in database terminology can be represented as a graph. A lot of these questions boil down to the following: “How can we generate synthetic but realistic graphs? ” To answer this, we must first understand what patterns are common in real-world graphs and can thus be considered a mark of normality/realism. This survey give an overview of the incredible variety of work that has been done on these problems. One of our main contributions is the integration of points of view from physics, mathematics, sociology, and computer science. Further, we briefly describe recent advances on some related and interesting graph problems.

A survey of random processes with reinforcement

by Robin Pemantle , 2006
"... ..."
Abstract - Cited by 126 (1 self) - Add to MetaCart
Abstract not found

Interpolating between types and tokens by estimating power-law generators

by Sharon Goldwater, Thomas L. Griffiths, Mark Johnson - In Advances in Neural Information Processing Systems 18 , 2006
"... Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statistical models that generically produce power-laws, augmenting standard generative ..."
Abstract - Cited by 123 (19 self) - Add to MetaCart
Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statistical models that generically produce power-laws, augmenting standard generative models with an adaptor that produces the appropriate pattern of token frequencies. We show that taking a particular stochastic process – the Pitman-Yor process – as an adaptor justifies the appearance of type frequencies in formal analyses of natural language, and improves the performance of a model for unsupervised learning of morphology. 1
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University