Results 1  10
of
130
Statistical properties of community structure in large social and information networks
"... A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structur ..."
Abstract

Cited by 246 (14 self)
 Add to MetaCart
A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structural properties of such sets of nodes. We define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales, and we study over 70 large sparse realworld networks taken from a wide range of application domains. Our results suggest a significantly more refined picture of community structure in large realworld networks than has been appreciated previously. Our most striking finding is that in nearly every network dataset we examined, we observe tight but almost trivial communities at very small scales, and at larger size scales, the best possible communities gradually “blend in ” with the rest of the network and thus become less “communitylike.” This behavior is not explained, even at a qualitative level, by any of the commonlyused network generation models. Moreover, this behavior is exactly the opposite of what one would expect based on experience with and intuition from expander graphs, from graphs that are wellembeddable in a lowdimensional structure, and from small social networks that have served as testbeds of community detection algorithms. We have found, however, that a generative model, in which new edges are added via an iterative “forest fire” burning process, is able to produce graphs exhibiting a network community structure similar to our observations.
Community structure in large networks: Natural cluster sizes and the absence of large welldefined clusters
, 2008
"... A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins wit ..."
Abstract

Cited by 208 (17 self)
 Add to MetaCart
(Show Context)
A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins with the premise that a community or a cluster should be thought of as a set of nodes that has more and/or better connections between its members than to the remainder of the network. In this paper, we explore from a novel perspective several questions related to identifying meaningful communities in large social and information networks, and we come to several striking conclusions. Rather than defining a procedure to extract sets of nodes from a graph and then attempt to interpret these sets as a “real ” communities, we employ approximation algorithms for the graph partitioning problem to characterize as a function of size the statistical and structural properties of partitions of graphs that could plausibly be interpreted as communities. In particular, we define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales. We study over 100 large realworld networks, ranging from traditional and online social networks, to technological and information networks and
Differences in the Mechanics of Information Diffusion Across Topics: Idioms, Political Hashtags, and Complex Contagion on Twitter. WWW
, 2011
"... ABSTRACT There is a widespread intuitive sense that different kinds of information spread differently online, but it has been difficult to evaluate this question quantitatively since it requires a setting where many different kinds of information spread in a shared environment. Here we study this ..."
Abstract

Cited by 171 (4 self)
 Add to MetaCart
(Show Context)
ABSTRACT There is a widespread intuitive sense that different kinds of information spread differently online, but it has been difficult to evaluate this question quantitatively since it requires a setting where many different kinds of information spread in a shared environment. Here we study this issue on Twitter, analyzing the ways in which tokens known as hashtags spread on a network defined by the interactions among Twitter users. We find significant variation in the ways that widelyused hashtags on different topics spread. Our results show that this variation is not attributable simply to differences in "stickiness," the probability of adoption based on one or more exposures, but also to a quantity that could be viewed as a kind of "persistence" the relative extent to which repeated exposures to a hashtag continue to have significant marginal effects. We find that hashtags on politically controversial topics are particularly persistent, with repeated exposures continuing to have unusually large marginal effects on adoption; this provides, to our knowledge, the first largescale validation of the "complex contagion" principle from sociology, which posits that repeated exposures to an idea are particularly crucial when the idea is in some way controversial or contentious. Among other findings, we discover that hashtags representing the natural analogues of Twitter idioms and neologisms are particularly nonpersistent, with the effect of multiple exposures decaying rapidly relative to the first exposure. We also study the subgraph structure of the initial adopters for different widelyadopted hashtags, again finding structural differences across topics. We develop simulationbased and generative models to analyze how the adoption dynamics interact with the network structure of the early adopters on which a hashtag spreads.
Patterns of temporal variation in online media
, 2010
"... Online content exhibits rich temporal dynamics, and diverse realtime user generated content further intensifies this process. However, temporal patterns by which online content grows and fades over time, and by which different pieces of content compete for attention remain largely unexplored. We stu ..."
Abstract

Cited by 144 (5 self)
 Add to MetaCart
(Show Context)
Online content exhibits rich temporal dynamics, and diverse realtime user generated content further intensifies this process. However, temporal patterns by which online content grows and fades over time, and by which different pieces of content compete for attention remain largely unexplored. We study temporal patterns associated with online content and how the content’s popularity grows and fades over time. The attention that content receives on the Web varies depending on many factors and occurs on very different time scales and at different resolutions. In order to uncover the temporal dynamics of online content we formulate a time series clustering problem using a similarity metric that is invariant to scaling and shifting. We develop the KSpectral Centroid (KSC) clustering algorithm that effectively finds cluster centroids with our similarity measure. By applying an adaptive waveletbased incremental approach to clustering, we scale KSC to large data sets. We demonstrate our approach on two massive datasets: a set of 580 million Tweets, and a set of 170 million blog posts and news media articles. We find that KSC outperforms the Kmeans clustering algorithm in finding distinct shapes of time series. Our analysis shows that there are six main temporal shapes of attention of online content. We also present a simple model that reliably predicts the shape of attention by using information about only a small number of participants. Our analyses offer insight into common temporal patterns of the content on the Web and broaden the understanding of the dynamics of human attention.
Kronecker Graphs: An Approach to Modeling Networks
 JOURNAL OF MACHINE LEARNING RESEARCH 11 (2010) 9851042
, 2010
"... How can we generate realistic networks? In addition, how can we do so with a mathematically tractable model that allows for rigorous analysis of network properties? Real networks exhibit a long list of surprising properties: Heavy tails for the in and outdegree distribution, heavy tails for the ei ..."
Abstract

Cited by 123 (3 self)
 Add to MetaCart
How can we generate realistic networks? In addition, how can we do so with a mathematically tractable model that allows for rigorous analysis of network properties? Real networks exhibit a long list of surprising properties: Heavy tails for the in and outdegree distribution, heavy tails for the eigenvalues and eigenvectors, small diameters, and densification and shrinking diameters over time. Current network models and generators either fail to match several of the above properties, are complicated to analyze mathematically, or both. Here we propose a generative model for networks that is both mathematically tractable and can generate networks that have all the above mentioned structural properties. Our main idea here is to use a nonstandard matrix operation, the Kronecker product, to generate graphs which we refer to as “Kronecker graphs”. First, we show that Kronecker graphs naturally obey common network properties. In fact, we rigorously prove that they do so. We also provide empirical evidence showing that Kronecker graphs can effectively model the structure of real networks. We then present KRONFIT, a fast and scalable algorithm for fitting the Kronecker graph generation model to large real networks. A naive approach to fitting would take superexponential
Inferring Networks of Diffusion and Influence
, 2010
"... Information diffusion and virus propagation are fundamental processes talking place in networks. While it is often possible to directly observe when nodes become infected, observing individual transmissions (i.e., who infects whom or who influences whom) is typically very difficult. Furthermore, in ..."
Abstract

Cited by 116 (13 self)
 Add to MetaCart
Information diffusion and virus propagation are fundamental processes talking place in networks. While it is often possible to directly observe when nodes become infected, observing individual transmissions (i.e., who infects whom or who influences whom) is typically very difficult. Furthermore, in many applications, the underlying network over which the diffusions and propagations spread is actually unobserved. We tackle these challenges by developing a method for tracing paths of diffusion and influence through networks and inferring the networks over which contagions propagate. Given the times when nodes adopt pieces of information or become infected, we identify the optimal network that best explains the observed infection times. Since the optimization problem is NPhard to solve exactly, we develop an efficient approximation algorithm that scales to large datasets and in practice gives provably nearoptimal performance. We demonstrate the effectiveness of our approach by tracing information cascades in a set of 170 million blogs and news articles over a one year period to infer how information flows through the online media space. We find that the diffusion network of news tends to have a coreperiphery structure with a small set of core media sites that diffuse information to the rest of the Web. These sites tend to have stable circles of influence with more general news media sites acting as connectors between them.
The Structure of Information Pathways in a Social Communication Network
"... Social networks are of interest to researchers in part because they are thought to mediate the flow of information in communities and organizations. Here we study the temporal dynamics of communication using online data, including email communication among the faculty and staff of a large universi ..."
Abstract

Cited by 105 (2 self)
 Add to MetaCart
(Show Context)
Social networks are of interest to researchers in part because they are thought to mediate the flow of information in communities and organizations. Here we study the temporal dynamics of communication using online data, including email communication among the faculty and staff of a large university over a twoyear period. We formulate a temporal notion of “distance ” in the underlying social network by measuring the minimum time required for information to spread from one node to another — a concept that draws on the notion of vectorclocks from the study of distributed computing systems. We find that such temporal measures provide structural insights that are not apparent from analyses of the pure social network topology. In particular, we define the network backbone to be the subgraph consisting of edges on which information has the potential to flow the quickest. We find that the backbone is a sparse graph with a concentration of both highly embedded edges and longrange bridges — a finding that sheds new light on the relationship between tie strength and connectivity in social networks.
Topic modeling with network regularization
 In Proc. of the 17th WWW Conference
, 2008
"... In this paper, we formally define the problem of topic modeling with network structure (TMN). We propose a novel solution to this problem, which regularizes a statistical topic model with a harmonic regularizer based on a graph structure in the data. The proposed method combines topic modeling and s ..."
Abstract

Cited by 102 (9 self)
 Add to MetaCart
(Show Context)
In this paper, we formally define the problem of topic modeling with network structure (TMN). We propose a novel solution to this problem, which regularizes a statistical topic model with a harmonic regularizer based on a graph structure in the data. The proposed method combines topic modeling and social network analysis, and leverages the power of both statistical topic models and discrete regularization. The output of this model can summarize well topics in text, map a topic onto the network, and discover topical communities. With appropriate instantiations of the topic model and the graphbased regularizer, our model can be applied to a wide range of text mining problems such as authortopic analysis, community discovery, and spatial text mining. Empirical experiments on two data sets with different genres show that our approach is effective and outperforms both textoriented methods and networkoriented methods alone. The proposed model is general; it can be applied to any text collections with a mixture of topics and an associated network structure.
Modeling Information Diffusion in Implicit Networks
"... Abstract—Social media forms a central domain for the production and dissemination of realtime information. Even though such flows of information have traditionally been thought of as diffusion processes over social networks, the underlying phenomena are the result of a complex web of interactions a ..."
Abstract

Cited by 83 (2 self)
 Add to MetaCart
(Show Context)
Abstract—Social media forms a central domain for the production and dissemination of realtime information. Even though such flows of information have traditionally been thought of as diffusion processes over social networks, the underlying phenomena are the result of a complex web of interactions among numerous participants. Here we develop a Linear Influence Model where rather than requiring the knowledge of the social network and then modeling the diffusion by predicting which node will influence which other nodes in the network, we focus on modeling the global influence of a node on the rate of diffusion through the (implicit) network. We model the number of newly infected nodes as a function of which other nodes got infected in the past. For each node we estimate an influence function that quantifies how many subsequent infections can be attributed to the influence of that node over time. A nonparametric formulation of the model leads to a simple least squares problem that can be solved on large datasets. We validate our model on a set of 500 million tweets and a set of 170 million news articles and blog posts. We show that the Linear Influence Model accurately models influences of nodes and reliably predicts the temporal dynamics of information diffusion. We find that patterns of influence of individual participants differ significantly depending on the type of the node and the topic of the information. I.