## On the bias of traceroute sampling: or, power-law degree distributions in regular graphs (2005)

### Cached

### Download Links

- [www.santafe.edu]
- [tuvalu.santafe.edu]
- [www.cs.ucsc.edu]
- [users.soe.ucsc.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | In ACM STOC |

Citations: | 60 - 1 self |

### BibTeX

@INPROCEEDINGS{Achlioptas05onthe,

author = {Dimitris Achlioptas and David Kempe and Aaron Clauset and Cristopher Moore},

title = {On the bias of traceroute sampling: or, power-law degree distributions in regular graphs},

booktitle = {In ACM STOC},

year = {2005},

pages = {694--703}

}

### Years of Citing Articles

### OpenURL

### Abstract

Understanding the graph structure of the Internet is a crucial step for building accurate network models and designing efficient algorithms for Internet applications. Yet, obtaining this graph structure can be a surprisingly difficult task, as edges cannot be explicitly queried. For instance, empirical studies of the network of Internet Protocol (IP) addresses typically rely on indirect methods like traceroute to build what are approximately single-source, all-destinations, shortest-path trees. These trees only sample a fraction of the network’s edges, and a recent paper by Lakhina et al. found empirically that the resulting sample is intrinsically biased. Further, in simulations, they observed that the degree distribution under traceroute sampling exhibits a power law even when the underlying degree distribution is Poisson. In this paper, we study the bias of traceroute sampling mathematically and, for a very general class of underlying degree distributions, explicitly calculate the distribution that will be observed. As example applications of our machinery, we prove that traceroute sampling finds power-law degree distributions in both δ-regular and Poisson-distributed random graphs. Thus, our work puts the observations of Lakhina et al. on a rigorous footing, and extends them to nearly arbitrary degree distributions.

### Citations

2073 | On the evolution of random graphs - Erdős, Rényi - 1960 |

1957 | Random Graphs
- Bollobás
- 2001
(Show Context)
Citation Context ...s consistent with the conjectured range α ∈ (2, 3) for the Internet [15, 17]. In order to speak precisely about a random (multi)graph with a given degree sequence, we will use the configuration model =-=[5]-=-: for each vertex of degree k, we create k copies, and then define the edges of the graph according to a uniformly random matching on these copies. Standard estimates imply that if a degree sequence i... |

1942 | Randomized Algorithms
- Motwani, Raghavan
- 1995
(Show Context)
Citation Context ...ntouched. 2.2 Exposure on the fly Because G is a uniformly random multigraph conditioned on its degree sequence, the matching on the copies is uniformly random. By the principle of deferred decisions =-=[28]-=-, we can define this matching “on the fly,” choosing u’s partner v uniformly at random from among all the unexposed copies at the time. One way to make this random choice is as follows. At the outset,... |

1584 | Probability inequalities for sums of bounded random variables
- Hoeffding
- 1963
(Show Context)
Citation Context ...ex(t) · n| < n 1/2+ǫ |Cunto(t) − cunto(t) · n| < n 1−β , Note that this concentration becomes weaker as α → 2, since then β → 0. Proof. Our proof is based on the following form of the Hoeffding Bound =-=[19, 24]-=-: Theorem 4 (Theorem 3 from [24]). If X1, . . .,Xk are independent, non-negative random variables with Xi ≤ bi for all i, and X = ∑ i Xi, then for any ∆ ≥ 0: Prob[|X − E[X]| ≥ ∆] ≤ 2e −2∆2 / P i b2 i ... |

1356 | On Power-Law Relationships of the Internet Topology
- Faloutsos, Faloutsos, et al.
- 1999
(Show Context)
Citation Context ...CCR-0220070, EIA-0218563, ITR-0324845, and PHY-0200909. 11 Introduction A large body of recent work has focused on the topological properties of the Internet. Perhaps most famously, Faloutsos et al. =-=[15]-=- claimed a power-law degree distribution both in the graph of routers in the Internet, i.e., the level at which the Internet Protocol (IP) operates, and the connections between autonomous systems, the... |

670 | Measuring ISP Topologies with Rocketfuel
- Spring, Mahajan, et al.
- 2004
(Show Context)
Citation Context ...First, as we discussed above, the mapping between routers and IP addresses (which are what is observed in reality) is neither one-to-one nor well-defined. This creates aliasing problems, among others =-=[34]-=-. Traceroute itself can also introduce spurious edges: traceroute works by sending a series of packets toward the destination, with increasing bounds on the number of hops, and seeing where each of th... |

352 | Heuristics for internet map discovery, in
- Govindan, Tangmunarunkit
- 2000
(Show Context)
Citation Context ...the level at which the Internet Protocol (IP) operates, and the connections between autonomous systems, the level at which the Border Gateway Protocol (BGP) operates. Similar results were obtained in =-=[17, 3]-=-, among others. Based on these and other topological studies, it is widely believed that the degree distribution of Internet routers has a power-law form with exponent 2 < α < 3, i.e., the fraction ak... |

329 | A critical point for random graphs with a given degree sequence, Random Structures and Algorithms
- Molloy, Reed
- 1995
(Show Context)
Citation Context ...is(t)) i−1−m dt Our definition of “sober” degree sequences implies that the graph is w.h.p. connected, so that every copy is eventually added to the queue. For other degree sequences, Molloy and Reed =-=[26, 27]-=- established that ] ] 17w.h.p. there is a unique giant component if ∑ j aj(j2 −2j) > 0, and calculated its size within o(n). We omit the details, but gobs (z) is then given by an integral from t0 to ... |

314 | The Web as a graph: Measurements, models, and methods
- Kleinberg, Kumar, et al.
- 1999
(Show Context)
Citation Context ... research into the question of how the topology might affect the real-world performance of Internet algorithms and mechanisms (for instance [25, 1]). However, unlike graphs such as the World Wide Web =-=[21]-=-, in which links from each site can be readily observed, the connections between IP-level routers on the Internet cannot be queried directly. Without explicitly knowing which routers are connected, ho... |

166 | Concentration. In Probabilistic methods for algorithmic discrete mathematics - McDiarmid - 1998 |

164 | Dimes: Let the internet measure itself
- Shavitt, Shir
- 2005
(Show Context)
Citation Context ...? 1 Several studies, including [3, 29], have combined traceroutes from multiple sources. However, the number of sources used is typically quite small (about 12). The recently started NetDimes project =-=[32]-=- uses many more sources, but samples many fewer destinations. 2Our answers to these questions demonstrate that formal mathematical analysis can highlight the inherent problems with the current method... |

162 | Heuristically optimized trade-offs: A new paradigm for power laws in the internet
- Fabrikant, Koutsoupias, et al.
- 2002
(Show Context)
Citation Context ...e fraction ak of vertices with degree k is proportional to k −α . These results have motivated both the search for natural graph growth models that give similar degree distributions (see for instance =-=[14]-=-) and research into the question of how the topology might affect the real-world performance of Internet algorithms and mechanisms (for instance [25, 1]). However, unlike graphs such as the World Wide... |

157 | Models of random regular graphs
- Wormald
- 1999
(Show Context)
Citation Context ...llowing concentration inequality for random variables on matchings due to Wormald [38, Theorem 2.19]. A switching consists of replacing two edges {p1, p2}, {p3, p4} by {p1, p3}, {p2, p4}. Theorem 14. =-=[38]-=- Let Xk be a random variable defined on uniformly random configurations M, M ′ of k copies, such that, whenever M and M ′ differ by only one switching, for some constant c. Then, for any r > 0, |Xk(M)... |

125 |
On routers and multicast trees in the Internet
- Pansiot, Grad
- 1998
(Show Context)
Citation Context ...eroute sampling? Or, in a purely graph-theoretic framework: can we characterize the degree distribution of a BFS tree for a random graph with a given degree distribution? 1 Several studies, including =-=[3, 29]-=-, have combined traceroutes from multiple sources. However, the number of sources used is typically quite small (about 12). The recently started NetDimes project [32] uses many more sources, but sampl... |

114 | Sampling biases in IP topology measurements
- Lakhina, Byers, et al.
- 2003
(Show Context)
Citation Context ... by communicating with each other using a data link layer protocol, as in a token ring network. These issues are known to introduce noise into the measured topology [2, 8]. However, as Lakhina et al. =-=[22]-=- recently pointed out, traceroute sampling has another fundamental bias, one that is well-captured by the BFS idealization. Specifically, in using such a sample to represent the network, one tacitly a... |

108 | On the marginal utility of network topology measurements
- Barford, Bestavros, et al.
- 2001
(Show Context)
Citation Context ...the level at which the Internet Protocol (IP) operates, and the connections between autonomous systems, the level at which the Border Gateway Protocol (BGP) operates. Similar results were obtained in =-=[17, 3]-=-, among others. Based on these and other topological studies, it is widely believed that the degree distribution of Internet routers has a power-law form with exponent 2 < α < 3, i.e., the fraction ak... |

98 | The origin of power laws in internet topologies revisited
- Chen, Chang, et al.
- 2002
(Show Context)
Citation Context ...uters may share a single IP address by communicating with each other using a data link layer protocol, as in a token ring network. These issues are known to introduce noise into the measured topology =-=[2, 8]-=-. However, as Lakhina et al. [22] recently pointed out, traceroute sampling has another fundamental bias, one that is well-captured by the BFS idealization. Specifically, in using such a sample to rep... |

71 | The diameter of a cycle plus a random matching
- Bollobás, Chung
- 1988
(Show Context)
Citation Context ... − E[Bj(i)]| = O(n −c ) , 4.1 A high probability bound for the diameter Here we bound the diameter of a random (multi)graph with a given degree sequence. Our result is less precise than, say, that of =-=[6]-=- for random 3-regular multigraphs, but holds with higher probability, a necessity for our application. Theorem 16. Let {ai}i be a degree sequence with ai = 0 for i < 3. Let G be a random multi-graph w... |

68 | On Certain Connectivity Properties of the Internet Topology
- Mihail, Papadimitriou, et al.
- 2004
(Show Context)
Citation Context ...ive similar degree distributions (see for instance [14]) and research into the question of how the topology might affect the real-world performance of Internet algorithms and mechanisms (for instance =-=[25, 1]-=-). However, unlike graphs such as the World Wide Web [21], in which links from each site can be readily observed, the connections between IP-level routers on the Internet cannot be queried directly. W... |

46 | Generatingfunctionology - Wilf - 1994 |

38 | Relevance of massively distributed explorations of the internet topology: Simulation results
- Guillaume, Latapy
- 2005
(Show Context)
Citation Context ..., since in a random graph, highdegree vertices are more likely to be encountered early on in the BFS tree, they are sampled more accurately than low-degree vertices. Indeed, [22] (and, more recently, =-=[18]-=-) showed empirically that for Erdős-Rényi random graphs G(n, p) [13], which have a Poisson degree distribution, the observed degree distribution under traceroute sampling follows a power law, and this... |

36 | Exploring networks with traceroute-like probes: theory and simulations, TCS 355
- Dall’Asta, Hamelin, et al.
- 2006
(Show Context)
Citation Context ...networks, as well as the fraction of edges lying on at least one shortest path from s to some node. They observe and analyze in depth an oscillating behavior for the latter quantity. Dall’Asta et al. =-=[12]-=- use mean-field approximations and simulations to explore the probabilities of vertex and edge detection, and relate these probabilities to the betweenness of vertices and edges. Despite several simul... |

35 |
The size of the largest component of a random graph on a fixed degree sequence
- Molloy, Reed
- 1998
(Show Context)
Citation Context ...is(t)) i−1−m dt Our definition of “sober” degree sequences implies that the graph is w.h.p. connected, so that every copy is eventually added to the queue. For other degree sequences, Molloy and Reed =-=[26, 27]-=- established that ] ] 17w.h.p. there is a unique giant component if ∑ j aj(j2 −2j) > 0, and calculated its size within o(n). We omit the details, but gobs (z) is then given by an integral from t0 to ... |

34 | Understanding internet topology: Principles, models, and validation
- Alderson, Li, et al.
- 2005
(Show Context)
Citation Context ...ive similar degree distributions (see for instance [14]) and research into the question of how the topology might affect the real-world performance of Internet algorithms and mechanisms (for instance =-=[25, 1]-=-). However, unlike graphs such as the World Wide Web [21], in which links from each site can be readily observed, the connections between IP-level routers on the Internet cannot be queried directly. W... |

29 |
Hypergeometric Functions and Their Applications
- Seaborn
- 1991
(Show Context)
Citation Context ... generating function for the observed degree sequence: 0 g obs (z) = zδ · ∫ 1 0 j t δ−1 (1 − (1 − z)t δ(δ−2) ) δ−1 dt . (14) This integral can be expressed in terms of the hypergeometric function 2F1 =-=[31]-=-. In general, for all a > −1 and b > 0, we have ∫ 1 t a (1 − xt b ) −c dt = 1 a + 1 2F1 ( ) a + 1 a + b + 1 , c; ; x . b b where 2F1(s, t; u; z) = ∞∑ i=0 Γ(s + i) Γ(s) Γ(t + i) Γ(t) Γ(u) z Γ(u + i) i ... |

25 |
Accuracy and scaling phenomena in internet mapping
- Clauset, Moore
- 2005
(Show Context)
Citation Context ... graphs G(n, p) [13], which have a Poisson degree distribution, the observed degree distribution under traceroute sampling follows a power law, and this was verified analytically by Clauset and Moore =-=[9]-=-. In other words, the bias introduced by traceroute sampling alone can make power laws appear where none exist in the underlying graph. Even when the underlying graph actually does have a power-law de... |

20 | Issues with inferring Internet topological attributes - Amini, Shaikh, et al. - 2004 |

15 | Poisson cloning model for random graphs
- Kim
- 2006
(Show Context)
Citation Context ...9]. The proof of this result is based on a process which gradually discovers the BFS tree (see Section 2). By mapping it to a continuous-time process somewhat analogous to Kim’s Poisson cloning model =-=[20]-=-, we can avoid explicitly tracking the (rather complicated) state of the FIFO queue that arises in the process, and in particular the complex relationship between the degree of a vertex and its positi... |

14 |
Estin D. The impact of policy on Internet paths
- Tangmunarunkit, Govindan, et al.
(Show Context)
Citation Context ...contractual obligations or business concerns, in addition to latency (path length) concerns. Also, a shortest path at the AS level is not necessarily equivalent to a shortest path at the router level =-=[35]-=-, or vice versa. Thus, some routes can be up to several hops longer than the actual shortest paths [35, 23]. On the other hand, such issues do not fundamentally change the manner in which traceroute r... |

11 |
Los Rios, Exploration of Scale-Free Networks - Do we measure the real exponents?, Eur
- Petermann, De
- 2004
(Show Context)
Citation Context ... sampling alone can make power laws appear where none exist in the underlying graph. Even when the underlying graph actually does have a power-law degree distribution k −α , Petermann and De Los Rios =-=[30]-=- and Clauset and Moore [9] showed numerically that traceroute sampling can significantly underestimate its exponent α, especially when the average degree of the underlying graph is large. This inheren... |

11 | Detection, understanding, and prevention of traceroute measurement artifacts
- Viger, Augustin, et al.
(Show Context)
Citation Context ...and k + 1 hops can have non-adjacent endpoints, in which case traceroute will incorrectly infer an edge between them. These and other traceroute artifacts are discussed in more detail by Viger et al. =-=[36]-=- who propose an improved version of traceroute, called “Paris traceroute,” which significantly reduces some, but not all, of these issues. While the above are measuring artifacts, a more fundamental p... |

9 | Describing and simulating internet routes
- Leguay, Latapy, et al.
- 2005
(Show Context)
Citation Context ...ortest path at the AS level is not necessarily equivalent to a shortest path at the router level [35], or vice versa. Thus, some routes can be up to several hops longer than the actual shortest paths =-=[35, 23]-=-. On the other hand, such issues do not fundamentally change the manner in which traceroute reveals edges in the IP-level Internet, suggesting that the idealizations made by our model may be reasonabl... |

4 |
Distance distribution in random graphs and application to network exploration , Phys
- Blondel, Guillaume, et al.
(Show Context)
Citation Context ...al recent papers, published since the appearance of the conference version of this paper, continue an analysis of the impact of traceroute sampling on the exploration of random graphs. Blondel et al. =-=[4]-=- use mean-field approximations to heuristically calculate the distance distribution of nodes in random networks, as well as the fraction of edges lying on at least one shortest path from s to some nod... |

4 | Bias reduction in traceroute sampling - towards a more accurate map of the Internet
- Flaxman, Vera
(Show Context)
Citation Context ...aceroute sampling, another possibility is to employ sophisticated inference or machine learning techniques that rely mainly on data currently accessible to researchers. For instance, Flaxman and Vera =-=[16]-=- recently proposed a new estimator of node degrees using insights from multiple-capture census techniques in biology. Under certain strong assumptions, this technique provably reduces the sampling bia... |

3 | 2004) “Exploration of Scale-Free networks, Do we measure real exponents?” arXiv:cond-mat/0401065 - Peterman, Rios |

3 | The exponential integral ei(x) and related functions - Spanier, Oldham - 1987 |

1 | Bounding the bias of tree-like sampling in ip topologies - COHEN, GONEN, et al. - 2007 |

1 |
Dynamic exploration of networks: From general principles to the traceroute process
- DALL’ASTA
(Show Context)
Citation Context ...small marginal utility [3, 29]. However, numerical studies of power-law random graphs by Clauset and Moore [9] and Guillaume et al. [18], and an analytical study of Poisson random graphs by Dall’Asta =-=[11]-=-, show that additional sources can have significant utility in sufficient numbers. For power-law random graphs, the number of sources required to compensate for the bias in traceroute sampling grows l... |