Results 11  20
of
246
Scalable graph clustering using stochastic flows: applications to community discovery.
 In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining,
, 2009
"... ABSTRACT Algorithms based on simulating stochastic flows are a simple and natural solution for the problem of clustering graphs, but their widespread use has been hampered by their lack of scalability and fragmentation of output. In this article we present a multilevel algorithm for graph clusteri ..."
Abstract

Cited by 53 (4 self)
 Add to MetaCart
(Show Context)
ABSTRACT Algorithms based on simulating stochastic flows are a simple and natural solution for the problem of clustering graphs, but their widespread use has been hampered by their lack of scalability and fragmentation of output. In this article we present a multilevel algorithm for graph clustering using flows that delivers significant improvements in both quality and speed. The graph is first successively coarsened to a manageable size, and a small number of iterations of flow simulation is performed on the coarse graph. The graph is then successively refined, with flows from the previous graph used as initializations for brief flow simulations on each of the intermediate graphs. When we reach the final refined graph, the algorithm is run to convergence and the highflow regions are clustered together, with regions without any flow forming the natural boundaries of the clusters. Extensive experimental results on several real and synthetic datasets demonstrate the effectiveness of our approach when compared to stateoftheart algorithms.
Suggesting Friends Using the Implicit Social Graph
"... Although users of online communication tools rarely categorize their contacts into groups such as ”family”, ”coworkers”, or ”jogging buddies”, they nonetheless implicitly cluster contacts, by virtue of their interactions with them, forming implicit groups. In this paper, we describe the implicit so ..."
Abstract

Cited by 51 (0 self)
 Add to MetaCart
(Show Context)
Although users of online communication tools rarely categorize their contacts into groups such as ”family”, ”coworkers”, or ”jogging buddies”, they nonetheless implicitly cluster contacts, by virtue of their interactions with them, forming implicit groups. In this paper, we describe the implicit social graph which is formed by users ’ interactions with contacts and groups of contacts, and which is distinct from explicit social graphs in which users explicitly add other individuals as their ”friends”. We introduce an interactionbased metric for estimating a user’s affinity to his contacts and groups. We then describe a novel friend suggestion algorithm that uses a user’s implicit social graph to generate a friend group, given a small seed set of contacts which the user has already labeled as friends. We show experimental results that demonstrate the importance of both implicit group relationships and interactionbased affinity ranking in suggesting friends. Finally, we discuss two applications of the Friend Suggest algorithm that have been released as Gmail Labs features.
Multiplicative Attribute Graph Model of RealWorld Networks
, 1009
"... Large scale realworld network data, such as social networks, Internet and Web graphs, are ubiquitous. The study of such social and information networks seeks to find patterns and explain their emergence through tractable models. In most networks, especially in social networks, nodes have a rich set ..."
Abstract

Cited by 46 (4 self)
 Add to MetaCart
Large scale realworld network data, such as social networks, Internet and Web graphs, are ubiquitous. The study of such social and information networks seeks to find patterns and explain their emergence through tractable models. In most networks, especially in social networks, nodes have a rich set of attributes (e.g., age, gender) associated with them. However, many existing network models focus on modeling the network structure while ignoring the features of the nodes. Here we present a model that we refer to as the Multiplicative Attribute Graphs (MAG), which naturally captures the interactions between the network structure and node attributes. We consider a model where each node has a vector of categorical latent attributes associated with it. The probability of an edge between a pair of nodes then depends on the product of individual attributeattribute similarities. This model yields itself to mathematical analysis and we derive thresholds for the connectivity and the emergence of the giant connected component, and show that the model gives rise to graphs with a constant diameter. We analyze the degree distribution to show that the model can produce networks with either lognormal or powerlaw degree distribution depending on certain conditions. 1
Weighted Graphs and Disconnected Components Patterns and a Generator
"... The vast majority of earlier work has focused on graphs which are both connected (typically by ignoring all but the giant connected component), and unweighted. Here we study numerous, real, weighted graphs, and report surprising discoveries on the way in which new nodes join and form links in a soci ..."
Abstract

Cited by 45 (20 self)
 Add to MetaCart
(Show Context)
The vast majority of earlier work has focused on graphs which are both connected (typically by ignoring all but the giant connected component), and unweighted. Here we study numerous, real, weighted graphs, and report surprising discoveries on the way in which new nodes join and form links in a social network. The motivating questions were the following: How do connected components in a graph form and change over time? What happens after new nodes join a network – how common are repeated edges? We study numerous diverse, real graphs (citation networks, networks in social media, internet traffic, and others); and make the following contributions: (a) we observe that the nongiant connected components seem to stabilize in size, (b) we observe the weights on the edges follow several power laws with surprising exponents, and (c) we propose an intuitive, generative model for graph growth that obeys observed patterns.
Finding sparse cuts locally using evolving sets
 In STOC'09: Proceedings of the 41st Annual ACM symposium on Theory of Computing
, 2009
"... A local graph partitioning algorithm finds a set of vertices with small conductance (i.e. a sparse cut) by adaptively exploring part of a large graph G, starting from a specified vertex. For the algorithm to be local, its complexity must be bounded in terms of the size of the set that it outputs, wi ..."
Abstract

Cited by 42 (0 self)
 Add to MetaCart
(Show Context)
A local graph partitioning algorithm finds a set of vertices with small conductance (i.e. a sparse cut) by adaptively exploring part of a large graph G, starting from a specified vertex. For the algorithm to be local, its complexity must be bounded in terms of the size of the set that it outputs, with at most a weak dependence on the number n of vertices in G. Previous local partitioning algorithms find sparse cuts using random walks and personalized PageRank. In this paper, we introduce a randomized local partitioning algorithm that finds a sparse cut by simulating the volumebiased evolving set process, which is a Markov chain on sets of vertices. We prove that for any set of vertices A that has conductance at most φ, for at least half of the starting vertices in A our algorithm will output (with probability at least half), a set of conductance O(φ 1/2 log 1/2 n). We prove that for a given run of the algorithm, the expected ratio between its computational complexity and the volume of the set that it outputs is O(φ −1/2 polylog(n)). In comparison, the best previous local partitioning algorithm, due to Andersen, Chung, and Lang, has the same approximation guarantee, but a larger ratio of O(φ −1 polylog(n)) between the complexity and output volume. Using our local partitioning algorithm as a subroutine, we construct a fast algorithm for finding balanced cuts. Given a fixed value of φ, the resulting algorithm has complexity (m + nφ −1/2)) · O(polylog(n)) and returns a cut with conductance O(φ 1/2 log 1/2 n) and volume at least vφ/2, where vφ is the largest volume of any set with conductance at most φ. 1 1
Analyzing Patterns of User Content Generation in Online Social Networks
"... Various online social networks (OSNs) have been developed rapidly on the Internet. Researchers have analyzed different properties of such OSNs, mainly focusing on the formation and evolution of the networks as well as the information propagation over the networks. In knowledgesharing OSNs, such as ..."
Abstract

Cited by 38 (1 self)
 Add to MetaCart
(Show Context)
Various online social networks (OSNs) have been developed rapidly on the Internet. Researchers have analyzed different properties of such OSNs, mainly focusing on the formation and evolution of the networks as well as the information propagation over the networks. In knowledgesharing OSNs, such as blogs and question answering systems, issues on how users participate in the network and how users “generate/contribute” knowledge are vital to the sustained and healthy growth of the networks. However, related discussions have not been reported in the research literature. In this work, we empirically study workloads from three popular knowledgesharing OSNs, including a blog system, a social bookmark sharing network, and a question answering social network to examine these properties. Our analysis consistently shows that (1) users ’ posting behavior in these networks exhibits strong daily and weekly patterns, but the user active time in these OSNs does not follow exponential distributions; (2) the user posting behavior in these OSNs follows stretched exponential distributions instead of powerlaw distributions, indicating the influence of a small number of core users cannot dominate the network; (3) the distributions of user contributions on highquality and effortconsuming contents in these OSNs have smaller stretch factors for the stretched exponential distribution. Our study provides insights into user activity patterns and lays out an analytical foundation for further understanding various properties of these OSNs.
Optimal Sybilresilient node admission control
, 2010
"... Abstract—Most existing largescale networked systems on the Internet such as peertopeer systems are vulnerable to Sybil attacks where a single adversary can introduce many bogus identities. One promising defense of Sybil attacks is to perform socialnetwork based admission control to bound the num ..."
Abstract

Cited by 38 (3 self)
 Add to MetaCart
Abstract—Most existing largescale networked systems on the Internet such as peertopeer systems are vulnerable to Sybil attacks where a single adversary can introduce many bogus identities. One promising defense of Sybil attacks is to perform socialnetwork based admission control to bound the number of Sybil identities admitted. SybilLimit [22], the best known Sybil admission control mechanism, can restrict the number of Sybil identities admitted per attack edge to O(logn) with high probability assuming O(n/logn) attack edges. In this paper, we propose Gatekeeper, a decentralized Sybilresilient admission control protocol that significantly improves over SybilLimit. Gatekeeper is optimal for the case of O(1) attack edges and admits only O(1) Sybil identities (with high probability) in a random expander social networks (realworld social networks exhibit expander properties). In the face of O(k) attack edges (for any k ∈ O(n/logn)), Gatekeeper admits O(logk) Sybils per attack edge. This result provides a graceful continuum across the spectrum of attack edges. We demonstrate the effectiveness of Gatekeeper experimentally on realworld social networks and synthetic topologies. I.
A STATESPACE MIXED MEMBERSHIP BLOCKMODEL FOR DYNAMIC NETWORK TOMOGRAPHY
 SUBMITTED TO THE ANNALS OF APPLIED STATISTICS
"... In a dynamic social or biological environment, the interactions between the actors can undergo large and systematic changes. In this paper, we propose a modelbased approach to analyze what we will refer to as the dynamic tomography of such timeevolving networks. Our approach offers an intuitive bu ..."
Abstract

Cited by 37 (1 self)
 Add to MetaCart
In a dynamic social or biological environment, the interactions between the actors can undergo large and systematic changes. In this paper, we propose a modelbased approach to analyze what we will refer to as the dynamic tomography of such timeevolving networks. Our approach offers an intuitive but powerful tool to infer the semantic underpinnings of each actor, such as its social roles or biological functions, underlying the observed network topologies. Our model builds on earlier work on a mixed membership stochastic blockmodel for static networks, and the statespace model for tracking object trajectory. It overcomes a major limitation of many current network inference techniques, which assume that each actor plays a unique and invariant role that accounts for all its interactions with other actors; instead, our method models the role of each actor as a timeevolving mixed membership vector that allows actors to behave differently over time and carry out different roles/functions when interacting with different peers, which is closer to reality. We present an efficient algorithm for approximate inference and learning using our model; and we applied our model to analyze a social network between monks (i.e., the Sampson’s network), a dynamic email communication network between the Enron employees, and a rewiring gene interaction network of fruit fly collected during its full life cycle. In all cases, our model reveals interesting patterns of the dynamic roles of the actors.
Dynamics of Large Networks
, 2008
"... A basic premise behind the study of large networks is that interaction leads to complex collective behavior. In our work we found very interesting and counterintuitive patterns for time evolving networks, which change some of the basic assumptions that were made in the past. We then develop models ..."
Abstract

Cited by 33 (0 self)
 Add to MetaCart
A basic premise behind the study of large networks is that interaction leads to complex collective behavior. In our work we found very interesting and counterintuitive patterns for time evolving networks, which change some of the basic assumptions that were made in the past. We then develop models that explain processes which govern the network evolution, fit such models to real networks, and use them to generate realistic graphs or give formal explanations about their properties. In addition, our work has a wide range of applications: it can help us spot anomalous graphs and outliers, forecast future graph structure and run simulations of network evolution. Another important aspect of our research is the study of “local ” patterns and structures of propagation in networks. We aim to identify building blocks of the networks and find the patterns of influence that these blocks have on information or virus propagation over the network. Our recent work included the study of the spread of influence in a large persontoperson
HADI: Mining radii of large graphs
 ACM Transactions on Knowledge Discovery from Data
, 2010
"... Given large, multimillion node graphs (e.g., Facebook, webcrawls, etc.), how do they evolve over time? How are they connected? What are the central nodes and the outliers? In this paper we define the Radius plot of a graph and show how it can answer these questions. However, computing the Radius p ..."
Abstract

Cited by 33 (10 self)
 Add to MetaCart
(Show Context)
Given large, multimillion node graphs (e.g., Facebook, webcrawls, etc.), how do they evolve over time? How are they connected? What are the central nodes and the outliers? In this paper we define the Radius plot of a graph and show how it can answer these questions. However, computing the Radius plot is prohibitively expensive for graphs reaching the planetary scale. There are two major contributions in this paper: (a) We propose HADI (HAdoop DIameter and radii estimator), a carefully designed and finetuned algorithm to compute the radii and the diameter of massive graphs, that runs on the top of the Hadoop/MapReduce system, with excellent scaleup on the number of available machines (b) We run HADI on several real world datasets including YahooWeb (6B edges, 1/8 of a Terabyte), one of the largest public graphs ever analyzed. Thanks to HADI, we report fascinating patterns on large networks, like the surprisingly small effective diameter, the multimodal/bimodal shape of the Radius plot, and its palindrome motion over time.