Results 11  20
of
267
Timevarying graphs and dynamic networks
 International Journal of Parallel, Emergent and Distributed Systems
"... The past few years have seen intensive research efforts carried out in some apparently unrelated areas of dynamic systems – delaytolerant networks, opportunisticmobility networks, social networks – obtaining closely related insights. Indeed, the concepts discovered in these investigations can be v ..."
Abstract

Cited by 61 (21 self)
 Add to MetaCart
(Show Context)
The past few years have seen intensive research efforts carried out in some apparently unrelated areas of dynamic systems – delaytolerant networks, opportunisticmobility networks, social networks – obtaining closely related insights. Indeed, the concepts discovered in these investigations can be viewed as parts of the same conceptual universe; and the formal models proposed so far to express some specific concepts are components of a larger formal description of this universe. The main contribution of this paper is to integrate the vast collection of concepts, formalisms, and results found in the literature into a unified framework, which we call TVG (for timevarying graphs). Using this framework, it is possible to express directly in the same formalism not only the concepts common to all those different areas, but also those specific to each. Based on this definitional work, employing both existing results and original observations, we present a hierarchical classification of TVGs; each class corresponds to a significant property examined in the distributed computing literature. We then examine how TVGs can be used to study the evolution of network properties, and propose different techniques, depending on whether the indicators for these properties are atemporal (as in the majority of existing studies) or temporal. Finally, we briefly discuss the introduction of randomness in TVGs.
The Internet is Flat: Modeling the Transition from a Transit Hierarchy to a Peering Mesh
 in Proceedings of ACM CoNEXT
, 2010
"... Recent measurements and anecdotal evidence indicate that the Internet ecosystem is rapidly evolving from a multitier hierarchybuiltmostlywithtransit(customerprovider)links toadensemeshformedwithmostlypeeringlinks. ThistransitioncanhavemajorimpactontheglobalInterneteconomy aswell asonthe trafficflo ..."
Abstract

Cited by 51 (6 self)
 Add to MetaCart
(Show Context)
Recent measurements and anecdotal evidence indicate that the Internet ecosystem is rapidly evolving from a multitier hierarchybuiltmostlywithtransit(customerprovider)links toadensemeshformedwithmostlypeeringlinks. ThistransitioncanhavemajorimpactontheglobalInterneteconomy aswell asonthe trafficflowandtopologicalstructureofthe Internet. In this paper, we study this evolutionarytransition with an agentbased network formation model that captures key aspectsof the interdomainecosystem, viz., interdomain traffic flow and routing, provider and peer selection strategies, geographical constraints, and the economics of transit and peering interconnections. The model predicts several substantial differences between the Hierarchical Internet and the Flat Internet in terms of topological structure, pathlengths,interdomaintrafficflow,andtheprofitabilityof transitproviders. Wealsoquantifytheeffectofthethreefactors driving this evolutionary transition. Finally, we examineahypotheticalscenarioinwhichalargecontentprovider producesmorethanhalfofthetotalInternettraffic. 1.
Effective label acquisition for collective classification
 in International Conference on Knowledge Discovery and Data mining
, 2008
"... Information diffusion, viral marketing, and collective classification all attempt to model and exploit the relationships in a network to make inferences about the labels of nodes. A variety of techniques have been introduced and methods that combine attribute information and neighboring label inform ..."
Abstract

Cited by 38 (6 self)
 Add to MetaCart
(Show Context)
Information diffusion, viral marketing, and collective classification all attempt to model and exploit the relationships in a network to make inferences about the labels of nodes. A variety of techniques have been introduced and methods that combine attribute information and neighboring label information have been shown to be effective for collective labeling of the nodes in a network. However, in part because of the correlation between node labels that the techniques exploit, it is easy to find cases in which, once a misclassification is made, incorrect information propagates throughout the network. This problem can be mitigated if the system is allowed to judiciously acquire the labels for a small number of nodes. Unfortunately, under relatively general assumptions, determining the optimal set of labels to acquire is intractable. Here we propose an acquisition method that learns the cases when a given collective classification algorithm makes mistakes, and suggests acquisitions to correct those mistakes. We empirically show on both real and synthetic datasets that this method significantly outperforms a greedy approximate inference approach, a viral marketing approach, and approaches based on network structural measures such as node degree and network clustering. In addition to significantly improving accuracy with just a small amount of labeled data, our method is tractable on large networks.
Practical recommendations on crawling online social networks
 SELECTED AREAS IN COMMUNICATIONS, IEEE JOURNAL ON
, 2011
"... Our goal in this paper is to develop a practical framework for obtaining a uniform sample of users in an online social network (OSN) by crawling its social graph. Such a sample allows to estimate any user property and some topological properties as well. To this end, first, we consider and compare ..."
Abstract

Cited by 37 (1 self)
 Add to MetaCart
(Show Context)
Our goal in this paper is to develop a practical framework for obtaining a uniform sample of users in an online social network (OSN) by crawling its social graph. Such a sample allows to estimate any user property and some topological properties as well. To this end, first, we consider and compare several candidate crawling techniques. Two approaches that can produce approximately uniform samples are the MetropolisHasting random walk (MHRW) and a reweighted random walk (RWRW). Both have pros and cons, which we demonstrate through a comparison to each other as well as to the “ground truth. ” In contrast, using BreadthFirstSearch (BFS) or an unadjusted Random Walk (RW) leads to substantially biased results. Second, and in addition to offline performance assessment, we introduce online formal convergence diagnostics to assess sample quality during the data collection process. We show how these diagnostics can be used to effectively determine when a random walk sample is of adequate size and quality. Third, as a case study, we apply the above methods to Facebook and we collect the first, to the best of our knowledge, representative sample of Facebook users. We make it publicly available and employ it to characterize several key properties of Facebook.
XStream: Edgecentric Graph Processing using Streaming Partitions
"... XStream is a system for processing both inmemory and outofcore graphs on a single sharedmemory machine. While retaining the scattergather programming model with state stored in the vertices, XStream is novel in (i) using an edgecentric rather than a vertexcentric implementation of this mod ..."
Abstract

Cited by 35 (2 self)
 Add to MetaCart
XStream is a system for processing both inmemory and outofcore graphs on a single sharedmemory machine. While retaining the scattergather programming model with state stored in the vertices, XStream is novel in (i) using an edgecentric rather than a vertexcentric implementation of this model, and (ii) streaming completely unordered edge lists rather than performing random access. This design is motivated by the fact that sequential bandwidth for all storage media (main memory, SSD, and magnetic disk) is substantially larger than random access bandwidth. We demonstrate that a large number of graph algorithms can be expressed using the edgecentric scattergather model. The resulting implementations scale well in terms of number of cores, in terms of number of I/O devices, and across different storage media. XStream competes favorably with existing systems for graph processing. Besides sequential access, we identify as one of the main contributors to better performance the fact that XStream does not need to sort edge lists during preprocessing. 1
Revisiting the Nyström method for improved largescale machine learning
"... We reconsider randomized algorithms for the lowrank approximation of SPSD matrices such as Laplacian and kernel matrices that arise in data analysis and machine learning applications. Our main results consist of an empirical evaluation of the performance quality and running time of sampling and pro ..."
Abstract

Cited by 34 (5 self)
 Add to MetaCart
(Show Context)
We reconsider randomized algorithms for the lowrank approximation of SPSD matrices such as Laplacian and kernel matrices that arise in data analysis and machine learning applications. Our main results consist of an empirical evaluation of the performance quality and running time of sampling and projection methods on a diverse suite of SPSD matrices. Our results highlight complementary aspects of sampling versus projection methods, and they point to differences between uniform and nonuniform sampling methods based on leverage scores. We complement our empirical results with a suite of worstcase theoretical bounds for both random sampling and random projection methods. These bounds are qualitatively superior to existing bounds—e.g., improved additiveerror bounds for spectral and Frobenius norm error and relativeerror bounds for trace norm error. 1.
HADI: Mining radii of large graphs
 ACM Transactions on Knowledge Discovery from Data
, 2010
"... Given large, multimillion node graphs (e.g., Facebook, webcrawls, etc.), how do they evolve over time? How are they connected? What are the central nodes and the outliers? In this paper we define the Radius plot of a graph and show how it can answer these questions. However, computing the Radius p ..."
Abstract

Cited by 33 (10 self)
 Add to MetaCart
(Show Context)
Given large, multimillion node graphs (e.g., Facebook, webcrawls, etc.), how do they evolve over time? How are they connected? What are the central nodes and the outliers? In this paper we define the Radius plot of a graph and show how it can answer these questions. However, computing the Radius plot is prohibitively expensive for graphs reaching the planetary scale. There are two major contributions in this paper: (a) We propose HADI (HAdoop DIameter and radii estimator), a carefully designed and finetuned algorithm to compute the radii and the diameter of massive graphs, that runs on the top of the Hadoop/MapReduce system, with excellent scaleup on the number of available machines (b) We run HADI on several real world datasets including YahooWeb (6B edges, 1/8 of a Terabyte), one of the largest public graphs ever analyzed. Thanks to HADI, we report fascinating patterns on large networks, like the surprisingly small effective diameter, the multimodal/bimodal shape of the Radius plot, and its palindrome motion over time.
Correcting for Missing Data in Information Cascades
, 2010
"... Transmission of infectious diseases, propagation of information, and spread of ideas and influence through social networks are all examples of diffusion. In such cases we say that a contagion spreads through the network, a process that can be modeled by a cascade graph. Studying cascades and network ..."
Abstract

Cited by 33 (2 self)
 Add to MetaCart
(Show Context)
Transmission of infectious diseases, propagation of information, and spread of ideas and influence through social networks are all examples of diffusion. In such cases we say that a contagion spreads through the network, a process that can be modeled by a cascade graph. Studying cascades and network diffusion is challenging due to missing data. Even a single missing observation in a sequence of propagation events can significantly alter our inferences about the diffusion process. We address the problem of missing data in information cascades. Specifically, given only a fraction C ′ of the complete cascade C, our goal is to estimate the properties of the complete cascade C, such as its size or depth. To estimate the properties of C, we first formulate ktree model of cascades and analytically study its properties in the face of missing data. We then propose a numerical method that given a cascade model and observed cascade C ′ can estimate properties ofthecomplete cascade C. Weevaluate our methodology usinginformation propagation cascades in the Twitter network (70 million nodes and 2 billion edges), as well as information cascades arising in the blogosphere. Our experiments show that the ktree model is an effective tool to study the effects of missing data in cascades. Most importantly, we show that our method (and the ktree model) can accurately estimate properties of the complete cascade C even when 90 % of the data is missing. 1
Dynamics of conversations
 In Proc. KDD
, 2010
"... How do online conversations build? Is there a common model that is followed in human communication? In this work we explore these questions in detail. By considering three different social datasets, namely, Usenet groups, Yahoo! Groups, and Twitter, we analyze the structure of conversations in each ..."
Abstract

Cited by 30 (1 self)
 Add to MetaCart
(Show Context)
How do online conversations build? Is there a common model that is followed in human communication? In this work we explore these questions in detail. By considering three different social datasets, namely, Usenet groups, Yahoo! Groups, and Twitter, we analyze the structure of conversations in each of these datasets. We propose simple mathematical models for the generation of basic conversation structures and then refine this model to take into account the identities of each member of the conversation. 1.
Stochastic kronecker graphs
 Proceedings of the 5th Workshop on Algorithms and Models for the WebGraph
, 2007
"... A random graph model based on Kronecker products of probability matrices has been recently proposed as a generative model for largescale realworld networks such as the web. This model simultaneously captures several wellknown properties of realworld networks; in particular, it gives rise to a he ..."
Abstract

Cited by 27 (2 self)
 Add to MetaCart
(Show Context)
A random graph model based on Kronecker products of probability matrices has been recently proposed as a generative model for largescale realworld networks such as the web. This model simultaneously captures several wellknown properties of realworld networks; in particular, it gives rise to a heavytailed degree distribution, has a low diameter, and obeys the densification power law. Most properties of Kronecker products of graphs (such as connectivity and diameter) are only rigorously analyzed in the deterministic case. In this paper, we study the basic properties of stochastic Kronecker products based on an initiator matrix of size two (which is the case that is shown to provide the best fit to many realworld networks). We will show a phase transition for the emergence of the giant component and another phase transition for connectivity, and prove that such graphs have constant diameters beyond the connectivity threshold, but are not searchable using a decentralized algorithm. 1