Results 1 
9 of
9
Using Friends as Sensors to Detect GlobalScale Contagious Outbreaks
"... Recent research has focused on the monitoring of globalscale online data for improved detection of epidemics 1,2,3,4, mood patterns 5,6, movements in the stock market 7 political revolutions 8, boxoffice revenues 9, consumer behaviour 3,10 and many other important phenomena. However, privacy consi ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
Recent research has focused on the monitoring of globalscale online data for improved detection of epidemics 1,2,3,4, mood patterns 5,6, movements in the stock market 7 political revolutions 8, boxoffice revenues 9, consumer behaviour 3,10 and many other important phenomena. However, privacy considerations and the sheer scale of data available online are quickly making global monitoring infeasible, and existing methods do not take full advantage of local network structure to identify key nodes for monitoring. Here, we develop a model of the contagious spread of information in a globalscale, publiclyarticulated social network and show that a simple method can yield not just early detection, but advance warning of contagious outbreaks. In this method, we randomly choose a small fraction of nodes in the network and then we randomly choose a “friend ” of each node to include in a group for local monitoring. Using six months of data from most of the full Twittersphere, we show that this friend group is more central in the network and it helps us to detect viral outbreaks of the use of novel hashtags about 7 days earlier than we could with an equalsized randomly chosen group. Moreover, the method actually works
Analyzing the Optimal Neighborhood: Algorithms for Budgeted and Partial Connected Dominating Set Problems
"... We study partial and budgeted versions of the well studied connected dominating set problem. In the partial connected dominating set problem (Pcds), we are given an undirected graph G = (V,E) and an integer n′, and the goal is to find a minimum subset of vertices that induces a connected subgraph o ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
We study partial and budgeted versions of the well studied connected dominating set problem. In the partial connected dominating set problem (Pcds), we are given an undirected graph G = (V,E) and an integer n′, and the goal is to find a minimum subset of vertices that induces a connected subgraph of G and dominates at least n ′ vertices. We obtain the first polynomial time algorithm with an O(ln ∆) approximation factor for this problem, thereby significantly extending the results of Guha and Khuller (Algorithmica, Vol. 20(4), Pages 374387, 1998) for the connected dominating set problem. We note that none of the methods developed earlier can be applied directly to solve this problem. In the budgeted connected dominating set problem (Bcds), there is a budget on the number of vertices we can select, and the goal is to dominate as many vertices as possible. We obtain a
Local Computation Algorithms for Graphs of NonConstant Degrees
, 2014
"... In the model of local computation algorithms (LCAs), we aim to compute the queried part of the output by examining only a small (sublinear) portion of the input. Many recently developed LCAs on graph problems achieve time and space complexities with very low dependence on n, the number of vertices. ..."
Abstract
 Add to MetaCart
In the model of local computation algorithms (LCAs), we aim to compute the queried part of the output by examining only a small (sublinear) portion of the input. Many recently developed LCAs on graph problems achieve time and space complexities with very low dependence on n, the number of vertices. Nonetheless, these complexities are generally at least exponential in d, the upper bound on the degree of the input graph. Instead, we consider the case where parameter d can be moderately dependent on n, and aim for complexities with quasipolynomial dependence on d, while maintaining polylogarithmic dependence on n. In this thesis, we give randomized LCAs for computing maximal independent sets, maximal matchings, and approximate maximum matchings. Both time and space complexities of our LCAs on these problems are
Finding Adam in random growing trees
, 2015
"... We investigate algorithms to find the first vertex in large trees generated by either the uniform attachment or preferential attachment model. We require the algorithm to output a set of K vertices, such that, with probability at least 1 − ε, the first vertex is in this set. We show that for any ε, ..."
Abstract
 Add to MetaCart
We investigate algorithms to find the first vertex in large trees generated by either the uniform attachment or preferential attachment model. We require the algorithm to output a set of K vertices, such that, with probability at least 1 − ε, the first vertex is in this set. We show that for any ε, there exist such algorithms with K independent of the size of the input tree. Moreover, we provide almost tight bounds for the best value of K as a function of ε. In the uniform attachment case we show that the optimal K is subpolynomial in 1/ε, and that it has to be at least superpolylogarithmic. On the other hand, the preferential attachment case is exponentially harder, as we prove that the best K is polynomial in 1/ε. We conclude the paper with several open problems.
UltraFast Load Balancing on ScaleFree Networks
"... Abstract. The performance of large distributed systems crucially depends on efficiently balancing their load. This has motivated a large amount of theoretical research how an imbalanced load vector can be smoothed with local algorithms. For technical reasons, the vast majority of previous work focu ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. The performance of large distributed systems crucially depends on efficiently balancing their load. This has motivated a large amount of theoretical research how an imbalanced load vector can be smoothed with local algorithms. For technical reasons, the vast majority of previous work focuses on regular (or almost regular) graphs including symmetric topologies such as grids and hypercubes, and ignores the fact that large networks are often highly heterogenous. We model large scalefree networks by ChungLu random graphs and analyze a simple local algorithm for iterative load balancing. On nnode graphs our distributed algorithm balances the load within O((log logn)2) steps. It does not need to know the exponent β ∈ (2, 3) of the powerlaw degree distribution or the weights wi of the graph model. To the best of our knowledge, this is the first result which shows that loadbalancing can be done in doublelogarithmic time on realistic graph classes. 1
Quick Detection of Highdegree Entities in Large Directed Networks
"... Abstract—In this paper we address the problem of quick detection of highdegree entities in large online social networks. Practical importance of this problem is attested by a large number of companies that continuously collect and update statistics about popular entities, usually using the degree o ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—In this paper we address the problem of quick detection of highdegree entities in large online social networks. Practical importance of this problem is attested by a large number of companies that continuously collect and update statistics about popular entities, usually using the degree of an entity as an approximation of its popularity. We suggest a simple, efficient, and easy to implement twostage randomized algorithm that provides highly accurate solutions to this problem. For instance, our algorithm needs only one thousand API requests in order to find the top100 most followed users, with more than 90% precision, in the online social network Twitter with approximately a billion of registered users. Our algorithm significantly outperforms existing methods and serves many different purposes such as finding the most popular users or the most popular interest groups in social networks. An important contribution of this work is the analysis of the proposed algorithm using Extreme Value Theory — a branch of probability that studies extreme events and properties of largest order statistics in random samples. Using this theory we derive an accurate prediction for the algorithm’s performance and show that the number of API requests for finding the topk most popular entities is sublinear in the number of entities. Moreover, we formally show that the high variability of the entities, expressed through heavytailed distributions, is the reason for the algorithm’s efficiency. We quantify this phenomenon in a rigorous mathematical way. I.
Information Gathering in Networks via Active Exploration
"... How should we gather information in a network, where each node’s visibility is limited to its local neighborhood? This problem arises in numerous realworld applications, such as surveying and task routing in social networks, team formation in collaborative networks and experimental design with depe ..."
Abstract
 Add to MetaCart
How should we gather information in a network, where each node’s visibility is limited to its local neighborhood? This problem arises in numerous realworld applications, such as surveying and task routing in social networks, team formation in collaborative networks and experimental design with dependency constraints. Often the informativeness of a set of nodes can be quantified via a submodular utility function. Existing approaches for submodular optimization, however, require that the set of all nodes that can be selected is known ahead of time, which is often unrealistic. In contrast, we propose a novel model where we start our exploration from an initial node, and new nodes become visible and available for selection only once one of their neighbors has been chosen. We then present a general algorithm NETEXP for this problem, and provide theoretical bounds on its performance dependent on structural properties of the underlying network. We evaluate our methodology on various simulated problem instances as well as on data collected from social question answering system deployed within a large enterprise.