Results 1 
9 of
9
Cascading outbreak prediction in networks: a datadriven approach
 In KDD’13
"... Cascades are ubiquitous in various network environments such as epidemic networks, traffic networks, water distribution networks and social networks. The outbreaks of cascades will often bring bad or even devastating effects. How to accurately predict the cascading outbreaks in early stage is of par ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
(Show Context)
Cascades are ubiquitous in various network environments such as epidemic networks, traffic networks, water distribution networks and social networks. The outbreaks of cascades will often bring bad or even devastating effects. How to accurately predict the cascading outbreaks in early stage is of paramount importance for people to avoid these bad effects. Although there have been some pioneering works on cascading outbreaks detection, how to predict, rather than detect, the cascading outbreaks is still an open problem. In this paper, we attempt harnessing historical cascade data, propose a novel data driven approach to select important nodes as sensors, and predict the outbreaks based on the cascading behaviors of these sensors. In particular, we propose Orthogonal Sparse LOgistic Regression (OSLOR) method to jointly optimize node selection and outbreak prediction, where the prediction loss are combined with an orthogonal regularizer and L1 regularizer to guarantee good prediction accuracy, as well as the sparsity and lowredundancy of selected sensors. We evaluate the proposed method on a real online social network dataset including 182.7 million information cascades. The experimental results show that the proposed OSLOR significantly and consistently outperform topological measure based method and other data driven methods in prediction performances.
Incremental Algorithms for Network Management and Analysis based on Closeness Centrality
, 1303
"... Analyzing networks requires complex algorithms to extract meaningful information. Centrality metrics have shown to be correlated with the importance and loads of the nodes in network traffic. Here, we are interested in the problem of centralitybased network management. The problem has many applicat ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Analyzing networks requires complex algorithms to extract meaningful information. Centrality metrics have shown to be correlated with the importance and loads of the nodes in network traffic. Here, we are interested in the problem of centralitybased network management. The problem has many applications such as verifying the robustness of the networks and controlling or improving the entity dissemination. It can be defined as finding a small set of topological network modifications which yield a desired closeness centrality configuration. As a fundamental building block to tackle that problem, we propose incremental algorithms which efficiently update the closeness centrality values upon changes in network topology, i.e., edge insertions and deletions. Our algorithms are proven to be efficient on many reallife networks, especially on smallworld networks, which have a small diameter and a spikeshaped shortest distance distribution. In addition to closeness centrality, they can also be a great arsenal for the shortestpathbased management and analysis of the networks. We experimentally validate the efficiency of our algorithms on large networks and show that they update the closeness centrality values of the temporal DBLPcoauthorship network of 1.2 million users 460 times faster than it would take to compute them from scratch. To the best of our knowledge, this is the first work which can yield practical largescale network management based on closeness centrality values.
Fast Influencebased Coarsening for Large Networks
"... Given a social network, can we quickly ‘zoomout ’ of the graph? Is there a smaller equivalent representation of the graph that preserves its propagation characteristics? Can we group nodes together based on their influence properties? These are important problems with applications to influence anal ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Given a social network, can we quickly ‘zoomout ’ of the graph? Is there a smaller equivalent representation of the graph that preserves its propagation characteristics? Can we group nodes together based on their influence properties? These are important problems with applications to influence analysis, epidemiology and viral marketing applications. In this paper, we first formulate a novel Graph Coarsening Problem to find a succinct representation of any graph while preserving key characteristics for diffusion processes on that graph. We then provide a fast and effective nearlineartime (in nodes and edges) algorithm coarseNet for the same. Using extensive experiments on multiple real datasets, we demonstrate the quality and scalability of coarseNet, enabling us to reduce the graph by 90 % in some cases without much loss of information. Finally we also show how our method can help in diverse applications like influence maximization and detecting patterns of propagation at the level of automatically created groups on real cascade data.
Approximation algorithms for reducing the spectral radius to control epidemic spread.” SDM
, 2015
"... The largest eigenvalue of the adjacency matrix of a network (referred to as the spectral radius) is an important metric in its own right. Further, for several models of epidemic spread on networks (e.g., the ‘flulike ’ SIS model), it has been shown that an epidemic dies out quickly if the spectral ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
The largest eigenvalue of the adjacency matrix of a network (referred to as the spectral radius) is an important metric in its own right. Further, for several models of epidemic spread on networks (e.g., the ‘flulike ’ SIS model), it has been shown that an epidemic dies out quickly if the spectral radius of the graph is below a certain threshold that depends on the model parameters. This motivates a strategy to control epidemic spread by reducing the spectral radius of the underlying network. In this paper, we develop a suite of provable approximation algorithms for reducing the spectral radius by removing the minimum cost set of edges (modeling quarantining) or nodes (modeling vaccinations), with different time and quality tradeoffs. Our main algorithm, GreedyWalk, is based on the idea of hitting closed walks of a given length, and gives an O(log2 n)approximation, where n denotes the number of nodes; it also performs much better in practice compared to all prior heuristics proposed for this problem. We further present a novel sparsification method to improve its running time. In addition, we give a new primaldual based algorithm with an even better approximation guarantee (O(log n)), albeit with slower running time. We also give lower bounds on the worstcase performance of some of the popular heuristics. Finally we demonstrate the applicability of our algorithms and the properties of our solutions via extensive experiments on multiple synthetic and real networks. 1
Scalable Vaccine Distribution in Large Graphs given Uncertain Data
"... Given an noisy or sampled snapshot of a network, like a contactnetwork or the blogosphere, in which an infection (or meme/virus) has been spreading for some time, what are the best nodes to immunize (vaccinate)? Manipulating graphs via node removal by itself is an important problem in multiple diff ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Given an noisy or sampled snapshot of a network, like a contactnetwork or the blogosphere, in which an infection (or meme/virus) has been spreading for some time, what are the best nodes to immunize (vaccinate)? Manipulating graphs via node removal by itself is an important problem in multiple different domains like epidemiology, public health and social media. Moreover, it is important to account for uncertainty as typically surveillance data on who is infected is limited or the data is sampled. Efficient algorithms for such a problem can help publichealth experts take more informed decisions. In this paper, we study the problem of designing vaccinedistribution algorithms under an uncertain environment, with known information consisting of confirmed cases as well as a probability distribution of unknown cases. We formulate the NPHard Uncertain DataAware Vaccination problem, and design multiple efficient algorithms for factorizable distributions (including a novel subquadratic algorithm) which naturally take into account the uncertainty, while providing robust solutions. Finally, we show the effectiveness and scalability of our methods via extensive experiments on real datasets, including large epidemiological and social networks.
Pattern Discovery
"... Despite the wealth of research on frequent graph pattern mining, how to efficiently mine the complete set of those with constraints still poses a huge challenge to the existing algorithms mainly due to the inherent bottleneck in the mining paradigm. In essence, mining requests with explicitlyspecif ..."
Abstract
 Add to MetaCart
(Show Context)
Despite the wealth of research on frequent graph pattern mining, how to efficiently mine the complete set of those with constraints still poses a huge challenge to the existing algorithms mainly due to the inherent bottleneck in the mining paradigm. In essence, mining requests with explicitlyspecified constraints cannot be handled in a way that is direct and precise. In this paper, we propose a direct mining framework to solve the problem and illustrate our ideas in the context of a particular type of constrained frequent patterns — the “skinny ” patterns, which are graph patterns with a long backbone from which short twigs branch out. These patterns, which we formally define as llong δskinny patterns, are able to reveal insightful spatial and temporal trajectory patterns in mobile data mining, information diffusion, adoption propagation, and many others.
A Direct Mining Approach To Efficient Constrained Graph Pattern Discovery
"... Despite the wealth of research on frequent graph pattern mining, how to efficiently mine the complete set of those with constraints still poses a huge challenge to the existing algorithms mainly due to the inherent bottleneck in the mining paradigm. In essence, mining requests with explicitlyspecif ..."
Abstract
 Add to MetaCart
(Show Context)
Despite the wealth of research on frequent graph pattern mining, how to efficiently mine the complete set of those with constraints still poses a huge challenge to the existing algorithms mainly due to the inherent bottleneck in the mining paradigm. In essence, mining requests with explicitlyspecified constraints cannot be handled in a way that is direct and precise. In this paper, we propose a direct mining framework to solve the problem and illustrate our ideas in the context of a particular type of constrained frequent patterns — the “skinny ” patterns, which are graph patterns with a long backbone from which short twigs branch out. These patterns, which we formally define as llong δskinny patterns, are able to reveal insightful spatial and temporal trajectory patterns in mobile data mining, information diffusion, adoption propagation, and many others. Based on the key concept of a canonical diameter, we develop SkinnyMine, an efficient algorithm to mine all the llong δskinny patterns guaranteeing both the completeness of our mining result as well as the unique generation of each target pattern. We also present a general direct mining framework together with two properties of reducibility and continuity for qualified constraints. Our experiments on both synthetic and real data demonstrate the effectiveness and scalability of our approach.
Where Graph Topology Matters: The Robust Subgraph Problem
"... Robustness is a critical measure of the resilience of large networked systems, such as transportation and communication networks. Most prior works focus on the global robustness of a given graph at large, e.g., by measuring its overall vulnerability to external attacks or random failures. In this ..."
Abstract
 Add to MetaCart
(Show Context)
Robustness is a critical measure of the resilience of large networked systems, such as transportation and communication networks. Most prior works focus on the global robustness of a given graph at large, e.g., by measuring its overall vulnerability to external attacks or random failures. In this paper, we turn attention to local robustness and pose a novel problem in the lines of subgraph mining: given a large graph, how can we find its most robust local subgraph (RLS)? We define a robust subgraph as a subset of nodes with high communicability [15] among them, and formulate the RLSPROBLEM of finding a subgraph of given size with maximum robustness in the host graph. Our formulation is related to the recently proposed general framework [39] for the densest subgraph problem, however differs from it substantially in that besides the number of edges in the subgraph, robustness also concerns with the placement of edges, i.e., the subgraph topology. We show that the RLSPROBLEM is NPhard and propose two heuristic algorithms based on topdown and bottomup search strategies. Further, we present modifications of our algorithms to handle three practical variants of the RLSPROBLEM. Experiments on synthetic and realworld graphs demonstrate that we find subgraphs with larger robustness than the densest subgraphs [9, 39] even at lower densities, suggesting that the existing approaches are not suitable for the new problem setting. 1
DAVA: Distributing Vaccines over Networks under Prior Information
"... Given a graph, like a social/computer network or the blogosphere, in which an infection (or meme or virus) has been spreading for some time, how to select the k best nodes for immunization/quarantining immediately? Most previous works for controlling propagation (say via immunization) have concentra ..."
Abstract
 Add to MetaCart
(Show Context)
Given a graph, like a social/computer network or the blogosphere, in which an infection (or meme or virus) has been spreading for some time, how to select the k best nodes for immunization/quarantining immediately? Most previous works for controlling propagation (say via immunization) have concentrated on developing strategies for vaccination preemptively before the start of the epidemic. While very useful to provide insights in to which baseline policies can best control an infection, they may not be ideal to make realtime decisions as the infection is progressing. In this paper, we study how to immunize healthy nodes, in presence of already infected nodes. Efficient algorithms for such a problem can help publichealth experts make more informed choices. First we formulate the DataAware Vaccination problem, and prove it is NPhard and also that it is hard to approximate. Secondly, we propose two effective polynomialtime heuristics DAVA and DAVAfast. Finally, we also demonstrate the scalability and effectiveness of our algorithms through extensive experiments on multiple real networks including epidemiology datasets, which show substantial gains of up to 10 times more healthy nodes at the end. 1