Results 1 - 10
of
412
Power-law distributions in empirical data
- ISSN 00361445. doi: 10.1137/ 070710111. URL http://dx.doi.org/10.1137/070710111
, 2009
"... Power-law distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and man-made phenomena. Unfortunately, the empirical detection and characterization of power laws is made difficult by the large fluctuations that occur in the t ..."
Abstract
-
Cited by 607 (7 self)
- Add to MetaCart
Power-law distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and man-made phenomena. Unfortunately, the empirical detection and characterization of power laws is made difficult by the large fluctuations that occur in the tail of the distribution. In particular, standard methods such as least-squares fitting are known to produce systematically biased estimates of parameters for power-law distributions and should not be used in most circumstances. Here we describe statistical techniques for making accurate parameter estimates for power-law data, based on maximum likelihood methods and the Kolmogorov-Smirnov statistic. We also show how to tell whether the data follow a power-law distribution at all, defining quantitative measures that indicate when the power law is a reasonable fit to the data and when it is not. We demonstrate these methods by applying them to twentyfour real-world data sets from a range of different disciplines. Each of the data sets has been conjectured previously to follow a power-law distribution. In some cases we find these conjectures to be consistent with the data while in others the power law is ruled out.
Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations
, 2005
"... How do real graphs evolve over time? What are “normal” growth patterns in social, technological, and information networks? Many studies have discovered patterns in static graphs, identifying properties in a single snapshot of a large network, or in a very small number of snapshots; these include hea ..."
Abstract
-
Cited by 541 (48 self)
- Add to MetaCart
How do real graphs evolve over time? What are “normal” growth patterns in social, technological, and information networks? Many studies have discovered patterns in static graphs, identifying properties in a single snapshot of a large network, or in a very small number of snapshots; these include heavy tails for in- and out-degree distributions, communities, small-world phenomena, and others. However, given the lack of information about network evolution over long periods, it has been hard to convert these findings into statements about trends over time. Here we study a wide range of real graphs, and we observe some surprising phenomena. First, most of these graphs densify over time, with the number of edges growing superlinearly in the number of nodes. Second, the average distance between nodes often shrinks over time, in contrast to the conventional wisdom that such distance parameters should increase slowly as a function of the number of nodes (like O(log n) orO(log(log n)). Existing graph generation models do not exhibit these types of behavior, even at a qualitative level. We provide a new graph generator, based on a “forest fire” spreading process, that has a simple, intuitive justification, requires very few parameters (like the “flammability” of nodes), and produces graphs exhibiting the full range of properties observed both in prior work and in the present study.
Power laws, Pareto distributions and Zipf’s law
"... Many of the things that scientists measure have a typical size or “scale”—a typical value around which individual measurements are centred. A simple example would be the heights of human beings. Most adult human beings are about 180cm tall. There is some variation around this figure, notably dependi ..."
Abstract
-
Cited by 413 (0 self)
- Add to MetaCart
(Show Context)
Many of the things that scientists measure have a typical size or “scale”—a typical value around which individual measurements are centred. A simple example would be the heights of human beings. Most adult human beings are about 180cm tall. There is some variation around this figure, notably depending on sex, but we never see people who are 10cm tall, or 500cm. To make this observation more quantitative, one can plot a histogram of people’s heights, as I have done in Fig. 1a. The figure shows the heights in centimetres of adult men in the United States measured between 1959 and 1962, and indeed the distribution is relatively narrow and peaked around 180cm. Another telling observation is the ratio of the heights of the tallest and shortest people.
I Tube, You Tube, Everybody Tubes: Analyzing the World’s Largest User Generated Content Video System
- In Proceedings of the 5th ACM/USENIX Internet Measurement Conference (IMC’07
, 2007
"... User Generated Content (UGC) is re-shaping the way people watch video and TV, with millions of video producers and consumers. In particular, UGC sites are creating new viewing patterns and social interactions, empowering users to be more creative, and developing new business opportunities. To better ..."
Abstract
-
Cited by 373 (7 self)
- Add to MetaCart
(Show Context)
User Generated Content (UGC) is re-shaping the way people watch video and TV, with millions of video producers and consumers. In particular, UGC sites are creating new viewing patterns and social interactions, empowering users to be more creative, and developing new business opportunities. To better understand the impact of UGC systems, we have analyzed YouTube, the world’s largest UGC VoD system. Based on a large amount of data collected, we provide an in-depth study of YouTube and other similar UGC systems. In particular, we study the popularity life-cycle of videos, the intrinsic statistical properties of requests and their relationship with video age, and the level of content aliasing or of illegal content in the system. We also provide insights on the potential for more efficient UGC VoD systems (e.g. utilizing P2P techniques or making better use of caching). Finally, we discuss the opportunities to leverage the latent demand for niche videos that are not reached today due to information filtering effects or other system scarcity distortions. Overall, we believe that the results presented in this paper are crucial in understanding UGC systems and can provide valuable information to ISPs, site administrators, and content owners with major commercial and technical implications. Categories and Subject Descriptors Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
Graph evolution: Densification and shrinking diameters
- ACM TKDD
, 2007
"... How do real graphs evolve over time? What are “normal” growth patterns in social, technological, and information networks? Many studies have discovered patterns in static graphs, identifying properties in a single snapshot of a large network, or in a very small number of snapshots; these include hea ..."
Abstract
-
Cited by 267 (16 self)
- Add to MetaCart
(Show Context)
How do real graphs evolve over time? What are “normal” growth patterns in social, technological, and information networks? Many studies have discovered patterns in static graphs, identifying properties in a single snapshot of a large network, or in a very small number of snapshots; these include heavy tails for in- and out-degree distributions, communities, small-world phenomena, and others. However, given the lack of information about network evolution over long periods, it has been hard to convert these findings into statements about trends over time. Here we study a wide range of real graphs, and we observe some surprising phenomena. First, most of these graphs densify over time, with the number of edges growing super-linearly in the number of nodes. Second, the average distance between nodes often shrinks over time, in contrast to the conventional wisdom that such distance parameters should increase slowly as a function of the number of nodes (like O(log n) or O(log(log n)). Existing graph generation models do not exhibit these types of behavior, even at a qualitative level. We provide a new graph generator, based on a “forest fire” spreading process, that has a simple, intuitive justification, requires very few parameters (like the “flammability ” of nodes), and produces graphs exhibiting the full range of properties observed both in prior work and in the present study. We also notice that the “forest fire” model exhibits a sharp transition between sparse graphs and graphs that are densifying. Graphs with decreasing distance between the nodes are generated around this transition point. Last, we analyze the connection between the temporal evolution of the degree distribution and densification of a graph. We find that the two are fundamentally related. We also observe that real networks exhibit this type of r
A First-Principles Approach to Understanding the Internet's Router-level Topology
, 2004
"... A detailed understanding of the many facets of the Internet's topological structure is critical for evaluating the performance of networking protocols, for assessing the effectiveness of proposed techniques to protect the network from nefarious intrusions and attacks, or for developing improved ..."
Abstract
-
Cited by 213 (19 self)
- Add to MetaCart
(Show Context)
A detailed understanding of the many facets of the Internet's topological structure is critical for evaluating the performance of networking protocols, for assessing the effectiveness of proposed techniques to protect the network from nefarious intrusions and attacks, or for developing improved designs for resource provisioning. Previous studies of topology have focused on interpreting measurements or on phenomenological descriptions and evaluation of graph-theoretic properties of topology generators. We propose a complementary approach of combining a more subtle use of statistics and graph theory with a first-principles theory of router-level topology that reflects practical constraints and tradeoffs. While there is an inevitable tradeoff between model complexity and fidelity, a challenge is to distill from the seemingly endless list of potentially relevant technological and economic issues the features that are most essential to a solid understanding of the intrinsic fundamentals of network topology. We claim that very simple models that incorporate hard technological constraints on router and link bandwidth and connectivity, together with abstract models of user demand and network performance, can successfully address this challenge and further resolve much of the confusion and controversy that has surrounded topology generation and evaluation.
Heuristically optimized trade-offs: a new paradigm for power laws in the internet
, 2002
"... Abstract We give a plausible explanation of the power law distributions of degrees observed in the graphs arising in the Internet topology [5] based on a toy model of Internet growth in which two objectives are optimized simultaneously: "last mile " connection costs, and transmissi ..."
Abstract
-
Cited by 178 (1 self)
- Add to MetaCart
(Show Context)
Abstract We give a plausible explanation of the power law distributions of degrees observed in the graphs arising in the Internet topology [5] based on a toy model of Internet growth in which two objectives are optimized simultaneously: "last mile " connection costs, and transmission delays measured in hops. We also point out a similar phenomenon, anticipated in [2], in the distribution of file sizes. Our results seem to suggest that power laws tend to arise as a result of complex, multi-objective optimization.
Graph mining: laws, generators, and algorithms
- ACM COMPUT SURV (CSUR
, 2006
"... How does the Web look? How could we tell an abnormal social network from a normal one? These and similar questions are important in many fields where the data can intuitively be cast as a graph; examples range from computer networks to sociology to biology and many more. Indeed, any M: N relation in ..."
Abstract
-
Cited by 132 (7 self)
- Add to MetaCart
How does the Web look? How could we tell an abnormal social network from a normal one? These and similar questions are important in many fields where the data can intuitively be cast as a graph; examples range from computer networks to sociology to biology and many more. Indeed, any M: N relation in database terminology can be represented as a graph. A lot of these questions boil down to the following: “How can we generate synthetic but realistic graphs? ” To answer this, we must first understand what patterns are common in real-world graphs and can thus be considered a mark of normality/realism. This survey give an overview of the incredible variety of work that has been done on these problems. One of our main contributions is the integration of points of view from physics, mathematics, sociology, and computer science. Further, we briefly describe recent advances on some related and interesting graph problems.
Interpolating between types and tokens by estimating power-law generators
- In Advances in Neural Information Processing Systems 18
, 2006
"... Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statistical models that generically produce power-laws, augmenting standard generative ..."
Abstract
-
Cited by 123 (19 self)
- Add to MetaCart
Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statistical models that generically produce power-laws, augmenting standard generative models with an adaptor that produces the appropriate pattern of token frequencies. We show that taking a particular stochastic process – the Pitman-Yor process – as an adaptor justifies the appearance of type frequencies in formal analyses of natural language, and improves the performance of a model for unsupervised learning of morphology. 1