#### DMCA

## Power-law distributions in empirical data (2009)

### Cached

### Download Links

Venue: | ISSN 00361445. doi: 10.1137/ 070710111. URL http://dx.doi.org/10.1137/070710111 |

Citations: | 579 - 7 self |

### Citations

4956 | R.: An Introduction to the Bootstrap - Efron, Tibshirani - 1993 |

1871 |
Information Theory, Inference, and Learning Algorithms
- MacKay
- 2003
(Show Context)
Citation Context ... as a parameter because we know its value automatically once we are given a list of the other parameters—it is just the length of that list.Power-law distributions in empirical data 11 the evidence) =-=[30, 35]-=-, i.e., the likelihood of the data given the number of model parameters, integrated over the parameters’ possible values. Unfortunately, the integral cannot usually be performed analytically, but one ... |

1150 | Cross-validatory choice and assessment of statistical prediction - Stone - 1974 |

706 | Likelihood ratio tests for model selection and non-nested hypotheses - Vuong - 1989 |

541 | A simple general approach to inference about the tail of a distribution - Hill - 1975 |

448 | Continuous Univariate Distributions - Johnson, Kotz, et al. - 1995 |

408 | A brief history of generative models for power law and lognormal distributions - MITZENMACHER - 2003 |

406 | On the mathematical foundations of theoretical statistics - Fisher - 1922 |

405 | A random graph model for massive graphs - Aiello, Chung, et al. |

395 |
Applied Linear Regression
- Weisberg
- 1985
(Show Context)
Citation Context ...ect them. Similar considerations apply for the PDF, which must integrate to 1 over the range from xmin to ∞. Standard methods exist to incorporate constraints like these into the regression analysis (=-=Weisberg, 1985-=-), but they are not used to any significant extent in the literature on power laws. APPENDIX B: Maximum likelihood estimators for the power law In this section we give derivations of the maximum likel... |

329 | Goodness-of-fit techniques - D’Agostino, Stephens - 1986 |

256 |
The minimum description length principle
- Grunwald
- 2007
(Show Context)
Citation Context ...eral other established and statistically principled approaches for model comparison, such as a fully Bayesian approach [32], a cross-validation approach [59], or a minimum description length approach =-=[20]-=-, although none of these methods are described here. In the discrete case, x can take only a discrete set of values. In this paper we consider only the case of integer values with a probability distri... |

253 | Extreme Value Theory: An Introduction - Haan, Ferreira - 2006 |

253 | How popular is your paper? an empirical study of the citation distribution - Redner - 1998 |

242 | All of statistics: a concise course in statistical inference - Wasserman - 2004 |

230 |
An introduction to the bootstrap, Chapman
- Efron, Tibshirani
- 1993
(Show Context)
Citation Context ...l. Finally, as with our estimate of the scaling parameter, we would like to quantify the uncertainty in our estimate for xmin. One way to do this is to make use of a nonparametric “bootstrap” method (=-=Efron and Tibshirani, 1993-=-). Given our n measurements, we generate a synthetic data set with a similar distribution to the original by drawing a new sequence of points xi, i = 1 . . . n uniformly at random from the original da... |

223 | Toward a protein–protein interaction map of the budding yeast: a comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins - Ito - 2000 |

206 | Error and the Growth of Experimental Knowledge - Mayo - 1996 |

175 |
Resort to Arms
- Small, Singer
- 1982
(Show Context)
Citation Context ... 1998; Aiello et al., 2000). f) The intensity of wars from 1816–1980 measured as the number of battle deaths per 10 000 of the combined populations of the warring nations (Roberts and Turcotte, 1998; =-=Small and Singer, 1982-=-). g) The severity of terrorist attacks worldwide from February 1968 to June 2006, measured as the number of deaths directly resulting (Clauset et al., 2007). h) The number of bytes of data received a... |

164 |
Heavy-Tail Phenomena: Probabilistic and Statistical Modeling
- Resnick
- 2007
(Show Context)
Citation Context ... perhaps, the longest history. We give only a brief summary of this material here; readers interested in pursuing the topic further are encouraged to consult the books by Adler et al. [4] and Resnick =-=[49]-=- for a more thorough explanation. 6 In the statistical literature, researchers often consider a family of distributions of the form p(x) ∝ L(x)x −α , (3.12) where L(x) is some slowly varying function,... |

142 | Collective entity resolution in relational data - Bhattacharya, Getoor - 2007 |

125 | On some simple estimates of an exponent of regular variation - Hall - 1982 |

116 | Where mathematics meets the internet - Willinger, Paxson - 1998 |

110 | How reliable are experimental protein-protein interaction data - Sprinzak, Sattath - 2003 |

108 | Problems with Fitting to the Power-law Distribution - Goldstein, Morris, et al. |

103 | A functional approach to external graph algorithms. Algorithmica - Abello, Buchsbaum, et al. - 2002 |

87 | Functional and topological characterization of protein interaction networks - Yook, Oltvai, et al. - 2004 |

79 | On the bias of traceroute sampling: Or, power-law degree distributions in regular graphs - Achlioptas, Clauset, et al. - 2009 |

66 | Applied Linear Regression, 2nd ed - Weisberg - 1985 |

64 |
A Practical Guide to Heavy Tails
- Adler, Feldman, et al.
- 1998
(Show Context)
Citation Context ...stributions has, perhaps, the longest history. We give only a brief summary of this material here; readers interested in pursuing the topic further are encouraged to consult the books by Adler et al. =-=[4]-=- and Resnick [49] for a more thorough explanation. 6 In the statistical literature, researchers often consider a family of distributions of the form p(x) ∝ L(x)x −α , (3.12) where L(x) is some slowly ... |

63 | Some tests of significance treated by the theory of probability - Jeffreys - 1935 |

61 | Fully exponential Laplace approximations to expectations and variances of nonpositive functions - TIERNEY, KASS, et al. - 1989 |

45 | Laws of large numbers for sums of extreme values - Mason - 1982 |

43 |
All of Statistics - A Concise Course
- Wasserman
- 2004
(Show Context)
Citation Context ...ons to observed data is the method of maximum likelihood, which provably gives accurate (asymptotically normal) parameter estimates in the limit of large sample size (Barndorff-Nielsen and Cox, 1995; =-=Wasserman, 2003-=-). Assuming that our data are drawn from a distribution that follows a power law exactly for x ≥ xmin, we can derive maximum likelihood estimators (MLEs) of the scaling parameter for both the discrete... |

38 | Estimating dimension of a model Ann - Schwarz - 1978 |

36 | P: Currency and commodity metabolites: their identification and relation to the modularity of metabolic networks - Huss, Holme |

35 | 2003. Likelihood-based inference for stochastic models of sexual network formation, working Paper 29 - Handcock, Jones |

34 | Organization of growing random networks, Phys - Krapivsky, Redner |

34 | Some Basic Theory for Statistical Inference - Pitman - 1979 |

33 | The large sample distribution of the likelihood ratio for testing composite hypotheses - Wilks - 1938 |

31 | Power laws, pareto distributions and zipfs - Newman - 2005 |

28 | The nature of markets in - Adamic, Huberman |

25 | Editorial: The Future of Power Law Research - Mitzenmacher - 2006 |

21 | Dynamics of Bayesian updating with dependent data and misspecified models,” Electron
- Shalizi
- 2009
(Show Context)
Citation Context ...ithout the p-value to tell us when the results are significant. The Bayesian estimation used is equivalent to a smoothing, which to some extent buffers the results against the effects of fluctuations =-=[53]-=-, but the method is not capable, itself, of saying whether the results could be due to chance [39, 65].Power-law distributions in empirical data 21 1 0.9 0.8 (a) 0.012 0.010 (b) error rate 0.7 0.6 0.... |

21 | The QQ Estimator and Heavy Tails - Kratz, Resnick - 1996 |

21 | Frequentist statistics as a theory of inductive inference - Mayo, Cox - 2006 |

19 | What really causes large price changes - Farmer, Gillemot, et al. - 2004 |

19 | Body mass of late Quaternary mammals - Smith, Lyons, et al. |

13 | On the frequency of severe terrorist events - Clauset, Young, et al. |

12 | Email networks and the spread of computer viruses, Phys - Newman, Forrest, et al. |

11 |
Skew distribution and the sizes of business firms. North
- Ijiri, Simon
- 1977
(Show Context)
Citation Context ...lution of Internet structure or traffic patterns, then it may matter greatly whether the observed quantity follows a power law or some other form. In closing, we echo comments made by Ijiri and Simon =-=[28]-=- more than thirty years ago and similar thoughts expressed more recently by Mitzenmacher [42]. They argue that the characterization of empirical distributions is only a part of the challenge that face... |

9 | Fractality and self-organized criticality of wars - Roberts, Turcotte - 1998 |

8 |
Quantitative Finance 4
- Farmer, Gillemot, et al.
- 2004
(Show Context)
Citation Context ... et al. (2006). An alternative approach, quite common in the economics literature, is simply to limit the analysis to the largest observed samples only, such as the largest √ n or 1 10n observations (=-=Farmer et al., 2004-=-). The methods we describe in Section III offer several advantages over these visual or heuristic techniques. In particular, the goodness-of-fit-based approach gives accurate estimates of xmin with sy... |

8 | de Albuquerque, Are citations of scientific papers a case of nonextensivity?, Eur - Tsallis, P - 1999 |

7 | Estimating Heavy-Tail Exponents through Max Self-Similarity, Preprint, 2006; available at http://arxiv.org/abs/math.ST/0609163 - Stoev, Michailidis, et al. |

7 | Radial structure of the Internet - Holme, Karlin, et al. |

6 |
Error and the Growth of Experimental Knowledge (University of Chicago
- Mayo
- 1996
(Show Context)
Citation Context ...nt to a smoothing of the MLE, which buffers the results against fluctuations to some extent (Shalizi, 2007), but the method is incapable, itself, of saying whether the results could be due to chance (=-=Mayo, 1996-=-; Wasserman, 2006).14 P(x) P(x) P(x) P(x) 10 0 10 −5 10 −4 10 −3 10 −2 10 −1 10 0 10 0 10 −5 10 −4 10 −3 10 −2 10 −1 10 0 10 0 10 −5 10 −4 10 −3 10 −2 10 −1 10 0 10 0 10 −3 10 −2 10 −1 10 0 (a) words... |

6 | Nonparametric estimation of long-tailed density functions and its application to the analysis of World Wide Web traffic, Performance Eval - Markovitch, Krieger |

6 | On measures of location and dispersion and tests of hypotheses in a Pareto population - Muniruzzaman - 1957 |

6 | A Bayesian approach to solar flare prediction, Astrophys - Wheatland |

5 | Parameter estimation for power-law tail distributions by maximum likelihood methods - Bauke |

5 | The maximum likelihood fitting of the discrete Pareto law - Seal - 1952 |

4 |
Extreme Value Theory: An Introduction (Springer
- Haan, Ferreira
(Show Context)
Citation Context ...the largest or smallest values generated by probability distributions, values that assume some importance in studies of, for instance, earthquakes, other natural disasters, and the risks thereof— see =-=[24]-=-.Power-law distributions in empirical data 15 hold up under closer scrutiny. Consider Fig. 4.1a, which shows the CDFs of three small data sets (n = 100) drawn from a power-law distribution with α = 2... |

4 |
Some Basic Theory for Statistical Inference Chapman and
- Pitman
- 1979
(Show Context)
Citation Context ... regularity conditions, if the data are independent, identically-distributed draws from a distribution with parameter α, then as the sample size n → ∞, ˆα → α almost surely. Proof. See, for instance, =-=[46]-=-. Proposition B.2 ([43]). The maximum likelihood estimator ˆα of the continuous power law converges almost surely on the true α. Proof. It is easily verified that ln(x/xmin) has an exponential distrib... |

4 | Empirical distributions of log-returns: Between the stretched exponential and the power law - Malevergne, Pisarenko, et al. |

4 | Frequentist Bayes is objective - Wasserman - 2006 |

2 |
Finance 5
- Malevergne, Pisarenko, et al.
- 2005
(Show Context)
Citation Context ...er nonstatistical argument favoring one distribution or another. The specific problem of the indistinguishability of power laws and stretched exponentials has also been discussed by Malevergne et al. =-=[36]-=-. In some other cases the likelihood ratio tests do give conclusive answers. For instance, the stretched exponential is ruled out for the book sales, telephone calls, and citation counts, but is stron... |

1 |
Performance Evaluation 42
- Markovitch, Krieger
- 2000
(Show Context)
Citation Context ...than any other heavy-tailed distribution. (In such cases, non-parametric estimates of the distribution may be useful, though making such estimates for heavy-tailed data presents special difficulties (=-=Markovitch and Krieger, 2000-=-).) If, on the other hand, our goal is to infer plausible mechanisms that might underlie the formation and evolution of Internet structure or traffic patterns, then it may matter greatly whether the o... |

1 |
2006, in Optimality: The Second Erich L. Lehmann Symposium, edited by
- Mayo, Cox
(Show Context)
Citation Context ...re, by contrast, we use the p-value as a measure of the hypothesis we are trying to verify, and hence high values, not low, are “good.” For a general discussion of the interpretation of p-values, see =-=[40]-=-.18 A. Clauset, C. R. Shalizi and M. E. J. Newman which we discuss in Section 5. Second, as mentioned above, it is possible for small values of n that the empirical distribution will follow a power l... |

1 |
Bayesian learning, evolutionary dynamics, and information theory
- Shalizi
- 2007
(Show Context)
Citation Context ...t to the likelihood ratio test under reasonable conditions. Bayesian estimation in this context is equivalent to a smoothing of the MLE, which buffers the results against fluctuations to some extent (=-=Shalizi, 2007-=-), but the method is incapable, itself, of saying whether the results could be due to chance (Mayo, 1996; Wasserman, 2006).14 P(x) P(x) P(x) P(x) 10 0 10 −5 10 −4 10 −3 10 −2 10 −1 10 0 10 0 10 −5 10... |

1 |
Estimating heavy-tail exponents through distributions in empirical data 43 max self-similarity, Preprint math/0609163
- Stoev, Michailidis, et al.
- 2006
(Show Context)
Citation Context ...n and identify a point beyond which the value appears relatively stable. But these approaches are clearly subjective and can be sensitive to noise or fluctuations in the tail of the distribution— see =-=[58]-=- and references therein. A more objective and principled approach is desirable. Here we review two such methods, one that is specific to discrete data and is based on a so-called marginal likelihood, ... |

1 | 70 Years of Best - Hackett - 1967 |

1 | Preprint, 2005; available at http://arxiv.org/abs/physics/0510216 - Stouffer, Malmgren, et al. - 2005 |