Results 1  10
of
69
Statistical properties of community structure in large social and information networks
"... A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structur ..."
Abstract

Cited by 246 (14 self)
 Add to MetaCart
(Show Context)
A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structural properties of such sets of nodes. We define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales, and we study over 70 large sparse realworld networks taken from a wide range of application domains. Our results suggest a significantly more refined picture of community structure in large realworld networks than has been appreciated previously. Our most striking finding is that in nearly every network dataset we examined, we observe tight but almost trivial communities at very small scales, and at larger size scales, the best possible communities gradually “blend in ” with the rest of the network and thus become less “communitylike.” This behavior is not explained, even at a qualitative level, by any of the commonlyused network generation models. Moreover, this behavior is exactly the opposite of what one would expect based on experience with and intuition from expander graphs, from graphs that are wellembeddable in a lowdimensional structure, and from small social networks that have served as testbeds of community detection algorithms. We have found, however, that a generative model, in which new edges are added via an iterative “forest fire” burning process, is able to produce graphs exhibiting a network community structure similar to our observations.
Random Walks in PeertoPeer Networks
, 2004
"... We quantify the effectiveness of random walks for searching and construction of unstructured peertopeer (P2P) networks. For searching, we argue that random walks achieve improvement over flooding in the case of clustered overlay topologies and in the case of reissuing the same request several tim ..."
Abstract

Cited by 226 (3 self)
 Add to MetaCart
We quantify the effectiveness of random walks for searching and construction of unstructured peertopeer (P2P) networks. For searching, we argue that random walks achieve improvement over flooding in the case of clustered overlay topologies and in the case of reissuing the same request several times. For construction, we argue that an expander can be maintained dynamically with constant operations per addition. The key technical ingredient of our approach is a deep result of stochastic processes indicating that samples taken from consecutive steps of a random walk can achieve statistical properties similar to independent sampling (if the second eigenvalue of the transition matrix is bounded away from 1, which translates to good expansion of the network; such connectivity is desired, and believed to hold, in every reasonable network and network model). This property has been previously used in complexity theory for construction of pseudorandom number generators. We reveal another facet of this theory and translate savings in random bits to savings in processing overhead.
A FirstPrinciples Approach to Understanding the Internet's Routerlevel Topology
, 2004
"... A detailed understanding of the many facets of the Internet's topological structure is critical for evaluating the performance of networking protocols, for assessing the effectiveness of proposed techniques to protect the network from nefarious intrusions and attacks, or for developing improved ..."
Abstract

Cited by 213 (19 self)
 Add to MetaCart
(Show Context)
A detailed understanding of the many facets of the Internet's topological structure is critical for evaluating the performance of networking protocols, for assessing the effectiveness of proposed techniques to protect the network from nefarious intrusions and attacks, or for developing improved designs for resource provisioning. Previous studies of topology have focused on interpreting measurements or on phenomenological descriptions and evaluation of graphtheoretic properties of topology generators. We propose a complementary approach of combining a more subtle use of statistics and graph theory with a firstprinciples theory of routerlevel topology that reflects practical constraints and tradeoffs. While there is an inevitable tradeoff between model complexity and fidelity, a challenge is to distill from the seemingly endless list of potentially relevant technological and economic issues the features that are most essential to a solid understanding of the intrinsic fundamentals of network topology. We claim that very simple models that incorporate hard technological constraints on router and link bandwidth and connectivity, together with abstract models of user demand and network performance, can successfully address this challenge and further resolve much of the confusion and controversy that has surrounded topology generation and evaluation.
Community structure in large networks: Natural cluster sizes and the absence of large welldefined clusters
, 2008
"... A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins wit ..."
Abstract

Cited by 208 (17 self)
 Add to MetaCart
(Show Context)
A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins with the premise that a community or a cluster should be thought of as a set of nodes that has more and/or better connections between its members than to the remainder of the network. In this paper, we explore from a novel perspective several questions related to identifying meaningful communities in large social and information networks, and we come to several striking conclusions. Rather than defining a procedure to extract sets of nodes from a graph and then attempt to interpret these sets as a “real ” communities, we employ approximation algorithms for the graph partitioning problem to characterize as a function of size the statistical and structural properties of partitions of graphs that could plausibly be interpreted as communities. In particular, we define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales. We study over 100 large realworld networks, ranging from traditional and online social networks, to technological and information networks and
Cooperative security for network coding file distribution," in
 Proc. of IEEE INFOCOM'06,
, 2006
"... AbstractPeertopeer content distribution networks can suffer from malicious participants that intentionally corrupt content. Traditional systems verify blocks with traditional cryptographic signatures and hashes. However, these techniques do not apply well to more elegant schemes that use network ..."
Abstract

Cited by 109 (2 self)
 Add to MetaCart
AbstractPeertopeer content distribution networks can suffer from malicious participants that intentionally corrupt content. Traditional systems verify blocks with traditional cryptographic signatures and hashes. However, these techniques do not apply well to more elegant schemes that use network coding techniques for efficient content distribution. Architectures that use network coding are prone to jamming attacks where the introduction of a few corrupted blocks can quickly result in a large number of bad blocks propagating through the system. Identifying such bogus blocks is difficult and requires the use of homomorphic hashing functions, which are computationally expensive. This paper presents a practical security scheme for network coding that reduces the cost of verifying blocks onthefly while efficiently preventing the propagation of malicious blocks. In our scheme, users not only cooperate to distribute the content, but (wellbehaved) users also cooperate to protect themselves against malicious users by informing affected nodes when a malicious block is found. We analyze and study such cooperative security scheme and introduce elegant techniques to prevent DoS attacks. We show that the loss in the efficiency caused by the attackers is limited to the effort the attackers put to corrupt the communication, which is a natural lower bound in the damage of the system. We also show experimentally that checking as low as 15% of the received blocks is enough to guarantee low corruption rates.
The Internet ASLevel Topology: Three Data Sources and One Definitive Metric
"... We calculate an extensive set of characteristics for Internet AS topologies extracted from the three data sources most frequently used by the research community: traceroutes, BGP, and WHOIS. We discover that traceroute and BGP topologies are similar to one another but differ substantially from the W ..."
Abstract

Cited by 108 (15 self)
 Add to MetaCart
We calculate an extensive set of characteristics for Internet AS topologies extracted from the three data sources most frequently used by the research community: traceroutes, BGP, and WHOIS. We discover that traceroute and BGP topologies are similar to one another but differ substantially from the WHOIS topology. Among the widely considered metrics, we find that the joint degree distribution appears to fundamentally characterize Internet AS topologies as well as narrowly define values for other important metrics. We discuss the interplay between the specifics of the three data collection mechanisms and the resulting topology views. In particular, we show how the data collection peculiarities explain differences in the resulting joint degree distributions of the respective topologies. Finally, we release to the community the input topology datasets, along with the scripts and output of our calculations. This supplement should enable researchers to validate their models against real data and to make more informed selection of topology data sources for their specific needs.
Understanding internet topology: principles, models, and validation
 IEEE/ACM TRANSACTIONS ON NETWORKING
, 2005
"... Building on a recent effort that combines a firstprinciples approach to modeling routerlevel connectivity with a more pragmatic use of statistics and graph theory, we show in this paper that for the Internet, an improved understanding of its physical infrastructure is possible by viewing the phys ..."
Abstract

Cited by 51 (8 self)
 Add to MetaCart
(Show Context)
Building on a recent effort that combines a firstprinciples approach to modeling routerlevel connectivity with a more pragmatic use of statistics and graph theory, we show in this paper that for the Internet, an improved understanding of its physical infrastructure is possible by viewing the physical connectivity as an annotated graph that delivers raw connectivity and bandwidth to the upper layers in the TCP/IP protocol stack, subject to practical constraints (e.g., router technology) and economic considerations (e.g., link costs). More importantly, by relying on data from Abilene, a Tier1 ISP, and the Rocketfuel project, we provide empirical evidence in support of the proposed approach and its consistency with networking reality. To illustrate its utility, we: 1) show that our approach provides insight into the origin of high variability in measured or inferred routerlevel maps; 2) demonstrate that it easily accommodates the incorporation of additional objectives of network design (e.g., robustness to router failure); and 3) discuss how it complements ongoing community efforts to reverseengineer the Internet.
Dynamics of Large Networks
, 2008
"... A basic premise behind the study of large networks is that interaction leads to complex collective behavior. In our work we found very interesting and counterintuitive patterns for time evolving networks, which change some of the basic assumptions that were made in the past. We then develop models ..."
Abstract

Cited by 33 (0 self)
 Add to MetaCart
A basic premise behind the study of large networks is that interaction leads to complex collective behavior. In our work we found very interesting and counterintuitive patterns for time evolving networks, which change some of the basic assumptions that were made in the past. We then develop models that explain processes which govern the network evolution, fit such models to real networks, and use them to generate realistic graphs or give formal explanations about their properties. In addition, our work has a wide range of applications: it can help us spot anomalous graphs and outliers, forecast future graph structure and run simulations of network evolution. Another important aspect of our research is the study of “local ” patterns and structures of propagation in networks. We aim to identify building blocks of the networks and find the patterns of influence that these blocks have on information or virus propagation over the network. Our recent work included the study of the spread of influence in a large persontoperson
Social networks spread rumors in sublogarithmic time
 IN STOC
, 2011
"... With the prevalence of social networks, it has become increasingly important to understand their features and limitations. It has been observed that information spreads extremely fast in social networks. We study the performance of randomized rumor spreading protocols on graphs in the preferential a ..."
Abstract

Cited by 29 (6 self)
 Add to MetaCart
(Show Context)
With the prevalence of social networks, it has become increasingly important to understand their features and limitations. It has been observed that information spreads extremely fast in social networks. We study the performance of randomized rumor spreading protocols on graphs in the preferential attachment model. The wellknown random phone call model of Karp et al. (FOCS 2000) is a pushpull strategy where in each round, each vertex chooses a random neighbor and exchanges information with it. We prove the following. • The pushpull strategy delivers a message to all nodes within Θ(log n) rounds with high probability. The best known bound so far was O(log 2 n). • If we slightly modify the protocol so that contacts are chosen uniformly from all neighbors but the one contacted in the previous round, then this time reduces to Θ(log n / log log n), which is the diameter of the graph. This is the first time that a sublogarithmic broadcast time is proven for a natural setting. Also, this is the first time that avoiding doublecontacts reduces the runtime to a smaller order of magnitude.
DETECTIVES: DETEcting Coalition hiT Inflation attacks in adVertising nEtworks Streams
 IN PROCEEDINGS OF THE 16TH WWW INTERNATIONAL WORLD WIDE WEB CONFERENCE
, 2007
"... Click fraud is jeopardizing the industry of Internet advertising. Internet advertising is crucial for the thriving of the entire Internet, since it allows producers to advertise their products, and hence contributes to the well being of ecommerce. Moreover, advertising supports the intellectual va ..."
Abstract

Cited by 28 (6 self)
 Add to MetaCart
Click fraud is jeopardizing the industry of Internet advertising. Internet advertising is crucial for the thriving of the entire Internet, since it allows producers to advertise their products, and hence contributes to the well being of ecommerce. Moreover, advertising supports the intellectual value of the Internet by covering the running expenses of publishing content. Some content publishers are dishonest, and use automation to generate traffic to defraud the advertisers. Similarly, some advertisers automate clicks on the advertisements of their competitors to deplete their competitors’ advertising budgets. This paper describes the advertising network model, and focuses on the most sophisticated type of fraud, which involves coalitions among fraudsters. We build on several published theoretical results to devise the SimilaritySeeker algorithm that discovers coalitions made by pairs of fraudsters. We then generalize the solution to coalitions of arbitrary sizes. Before deploying our system on a real network, we conducted comprehensive experiments on data samples for proof of concept. The results were very accurate. We detected several coalitions, formed using various techniques, and spanning numerous sites. This reveals the generality of our model and approach.