Results 11 - 20
of
123
Modeling Information Propagation with Survival Theory
, 2013
"... Networks provide a ‘skeleton’ for the spread of contagions, like, information, ideas, behaviors and diseases. Many times networks over which contagions diffuse are unobserved and need to be inferred. Here we apply survival theory to develop general additive and multiplicative risk models under which ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
Networks provide a ‘skeleton’ for the spread of contagions, like, information, ideas, behaviors and diseases. Many times networks over which contagions diffuse are unobserved and need to be inferred. Here we apply survival theory to develop general additive and multiplicative risk models under which the network inference problems can be solved efficiently by exploiting their convexity. Our additive risk model generalizes several existing network inference models. We show all these models are particular cases of our more general model. Our multiplicative model allows for modeling scenarios in which a node can either increase or decrease the risk of activation of another node, in contrast with previous approaches, which consider only positive risk increments. We evaluate the performance of our network inference algorithms on large synthetic and real cascade datasets, and show that our models are able to predict the length and duration of cascades in real data.
Time varying graphs and social network analysis: Temporal indicators and metrics
- Artificial Intelligence and Simulation of Behaviour (AISB
, 2011
"... Abstract. Most instruments- formalisms, concepts, and metrics-for social networks analysis fail to capture their dynamics. Typical systems exhibit different scales of dynamics, ranging from the fine-grain dynamics of interactions (which recently led researchers to consider temporal versions of dista ..."
Abstract
-
Cited by 14 (6 self)
- Add to MetaCart
(Show Context)
Abstract. Most instruments- formalisms, concepts, and metrics-for social networks analysis fail to capture their dynamics. Typical systems exhibit different scales of dynamics, ranging from the fine-grain dynamics of interactions (which recently led researchers to consider temporal versions of distance, connectivity, and related indicators), to the evolution of network properties over longer periods of time. This paper proposes a general formal approach to study networks’ structural evolution for both atemporal and temporal indicators, based respectively on sequences of static graphs and sequences of time-varying graphs that cover successive time-windows. All the concepts and indicators, some of which are new, are expressed using a time-varying graph formalism recently proposed in [10]. Experimental results of the application of atemporal metrics applied to a portion of the scientific community of arXiv are provided. 1
A Constructing and Sampling Graphs with a Prescribed Joint Degree Distribution
"... One of the most influential recent results in network analysis is that many natural networks exhibit a power-law or log-normal degree distribution. This has inspired numerous generative models that match this property. However, more recent work has shown that while these generative models do have th ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
(Show Context)
One of the most influential recent results in network analysis is that many natural networks exhibit a power-law or log-normal degree distribution. This has inspired numerous generative models that match this property. However, more recent work has shown that while these generative models do have the right degree distribution, they are not good models for real life networks due to their differences on other important metrics like conductance. We believe this is, in part, because many of these real-world networks have very different joint degree distributions, i.e. the probability that a randomly selected edge will be between nodes of degree k and l. Assortativity is a sufficient statistic of the joint degree distribution, and it has been previously noted that social networks tend to be assortative, while biological and technological networks tend to be disassortative. We suggest understanding the relationship between network structure and the joint degree distribution of graphs is an interesting avenue of further research. An important tool for such studies are algorithms that can generate random instances of graphs with the same joint degree distribution. This is the main topic of this paper and we study the problem from both a theoretical and practical perspective. We provide an algorithm for constructing simple graphs from a given joint degree distribution, and a Monte Carlo Markov Chain method for sampling them. We also show that the state space of simple graphs with a fixed degree distribution is connected via end point switches. We empirically evaluate the mixing time of this Markov Chain by using experiments based on the autocorrelation of each edge. These experiments show that our Markov Chain mixes quickly on these real graphs, allowing for utilization of our techniques in practice.
C: Network archaeology: uncovering ancient networks from present-day interactions
- PLoS Comput Biol
"... What proteins interacted in a long-extinct ancestor of yeast? How have different members of a protein complex assembled together over time? Our ability to answer such questions has been limited by the unavailability of ancestral protein-protein interaction (PPI) networks. To overcome this limitation ..."
Abstract
-
Cited by 13 (5 self)
- Add to MetaCart
(Show Context)
What proteins interacted in a long-extinct ancestor of yeast? How have different members of a protein complex assembled together over time? Our ability to answer such questions has been limited by the unavailability of ancestral protein-protein interaction (PPI) networks. To overcome this limitation, we propose several novel algorithms to reconstruct the growth history of a present-day network. Our likelihood-based method finds a probable previous state of the graph by applying an assumed growth model backwards in time. This approach retains node identities so that the history of individual nodes can be tracked. Using this methodology, we estimate protein ages in the yeast PPI network that are in good agreement with sequence-based estimates of age and with structural features of protein complexes. Further, by comparing the quality of the inferred histories for several different growth models (duplication-mutation with complementarity, forest fire, and preferential attachment), we provide additional evidence that a duplication-based model captures many features of PPI network growth better than models designed to mimic social network growth. From the reconstructed history, we model the arrival time of extant and ancestral interactions and predict that complexes have significantly re-wired over time and that new edges tend to form within existing complexes. We also hypothesize a distribution of per-protein duplication rates, track the change of the network’s clustering coefficient, and predict paralogous relationships between extant proteins that are likely to be complementary to the relationships inferred using sequence alone. Finally, we infer plausible parameters for
COUNTING TRIANGLES IN MASSIVE GRAPHS WITH MAPREDUCE
, 2013
"... Graphs and networks are used to model interactions in a variety of contexts. There is a growing need to quickly assess the characteristics of a graph in order to understand its underlying structure. Some of the most useful metrics are triangle-based and give a measure of the connectedness of mutual ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
(Show Context)
Graphs and networks are used to model interactions in a variety of contexts. There is a growing need to quickly assess the characteristics of a graph in order to understand its underlying structure. Some of the most useful metrics are triangle-based and give a measure of the connectedness of mutual friends. This is often summarized in terms of clustering coefficients, which measure the likelihood that two neighbors of a node are themselves connected. Computing these measures exactly for large-scale networks is prohibitively expensive in both memory and time. However, a recent wedge sampling algorithm has proved successful in efficiently and accurately estimating clustering coefficients. In this paper, we describe how to implement this approach in MapReduce to deal with extremely massive graphs. We show results on publicly-available networks, the largest of which is 132M nodes and 4.7B edges, as well as artificially generated networks (using the Graph500 benchmark), the largest of which has 240M nodes and 8.5B edges. We can estimate the clustering coefficient by degree bin (e.g., we use exponential binning) and the number of triangles per bin, as well as the global clustering coefficient and total number of triangles, in an average of 0.33 sec. per million edges plus overhead (approximately 225 sec. total for our configuration). The technique can also be used to study triangle statistics such as the ratio of the highest and lowest degree, and we highlight differences between social and non-social networks. To the best of our knowledge, these are the largest triangle-based graph computations published to date.
Changepoint detection over graphs with the spectral scan statistic. arXiv preprint arXiv:1206.0773,
, 2012
"... Abstract We consider the change-point detection problem of deciding, based on noisy measurements, whether an unknown signal over a given graph is constant or is instead piecewise constant over two induced subgraphs of relatively low cut size. We analyze the corresponding generalized likelihood rati ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
(Show Context)
Abstract We consider the change-point detection problem of deciding, based on noisy measurements, whether an unknown signal over a given graph is constant or is instead piecewise constant over two induced subgraphs of relatively low cut size. We analyze the corresponding generalized likelihood ratio (GLR) statistic and relate it to the problem of finding a sparsest cut in a graph. We develop a tractable relaxation of the GLR statistic based on the combinatorial Laplacian of the graph, which we call the spectral scan statistic, and analyze its properties. We show how its performance as a testing procedure depends directly on the spectrum of the graph, and use this result to explicitly derive its asymptotic properties on few graph topologies. Finally, we demonstrate both theoretically and by simulations that the spectral scan statistic can outperform naive testing procedures based on edge thresholding and χ 2 testing.
Tied Kronecker Product Graph Models to Capture Variance in Network Populations
"... Abstract—Much of the past work on mining and modeling networks has focused on understanding the observed properties of single example graphs. However, in many real-life applications it is important to characterize the structure of populations of graphs. In this work, we investigate the distributiona ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
Abstract—Much of the past work on mining and modeling networks has focused on understanding the observed properties of single example graphs. However, in many real-life applications it is important to characterize the structure of populations of graphs. In this work, we investigate the distributional properties of Kronecker product graph models (KPGMs) [1]. Specifically, we examine whether these models can represent the natural variability in graph properties observed across multiple networks and find surprisingly that they cannot. By considering KPGMs from a new viewpoint, we can show the reason for this lack of variance theoretically—which is primarily due to the generation of each edge independently from the others. Based on this understanding we propose a generalization of KPGMs that uses tied parameters to increase the variance of the model, while preserving the expectation. We then show experimentally, that our mixed-KPGM can adequately capture the natural variability across a population of networks. I.
Uncover TopicSensitive Information Diffusion Networks
- In AISTATS, 2012b
"... Analyzing the spreading patterns of memes with respect to their topic distributions and the under-lying diffusion network structures is an impor-tant task in social network analysis. This task in many cases becomes very challenging since the underlying diffusion networks are often hidden, and the to ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
(Show Context)
Analyzing the spreading patterns of memes with respect to their topic distributions and the under-lying diffusion network structures is an impor-tant task in social network analysis. This task in many cases becomes very challenging since the underlying diffusion networks are often hidden, and the topic specific transmission rates are un-known either. In this paper, we propose a con-tinuous time model, TOPICCASCADE, for topic-sensitive information diffusion networks, and in-fer the hidden diffusion networks and the topic dependent transmission rates from the observed time stamps and contents of cascades. One at-tractive property of the model is that its param-eters can be estimated via a convex optimization which we solve with an efficient proximal gra-dient based block coordinate descent (BCD) al-gorithm. In both synthetic and real-world data, we show that our method significantly improves over the previous state-of-the-art models in terms of both recovering the hidden diffusion networks and predicting the transmission times of memes. 1
Graphlet decomposition of a weighted network
, 2012
"... We introduce the graphlet decomposition of a weighted network, which encodes a notion of social information based on social structure. We develop a scalable algorithm, which combines EM with Bron-Kerbosch in a novel fashion, for estimating the parameters of the model underlying graphlets using one n ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
We introduce the graphlet decomposition of a weighted network, which encodes a notion of social information based on social structure. We develop a scalable algorithm, which combines EM with Bron-Kerbosch in a novel fashion, for estimating the parameters of the model underlying graphlets using one network sample. We explore theoretical properties of graphlets, including computational complexity, redundancy and expected accuracy. We test graphlets on synthetic data, and we analyze messaging on Facebook and crime associations in the 19th century.
Transforming Graph Data for Statistical Relational Learning
, 2012
"... Relational data representations have become an increasingly important topic due to the recent proliferation of network datasets (e.g., social, biological, information networks) and a corresponding increase in the application of Statistical Relational Learning (SRL) algorithms to these domains. In th ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
(Show Context)
Relational data representations have become an increasingly important topic due to the recent proliferation of network datasets (e.g., social, biological, information networks) and a corresponding increase in the application of Statistical Relational Learning (SRL) algorithms to these domains. In this article, we examine and categorize techniques for transforming graph-based relational data to improve SRL algorithms. In particular, appropriate transformations of the nodes, links, and/or features of the data can dramatically affect the capabilities and results of SRL algorithms. We introduce an intuitive taxonomy for data representation transformations in relational domains that incorporates link transformation and node transformation as symmetric representation tasks. More specifically, the transformation tasks for both nodes and links include (i) predicting their existence, (ii) predicting their label or type, (iii) estimating their weight or importance, and (iv) systematically constructing their relevant features. We motivate our taxonomy through detailed examples and use it to survey competing approaches for each of these tasks. We also discuss general conditions for transforming links, nodes, and features. Finally, we highlight challenges that remain to be addressed.