Results 1  10
of
26
Multigraph Sampling of Online Social Networks
 IEEE J. SEL. AREAS COMMUN. ON MEASUREMENT OF INTERNET TOPOLOGIES
, 2011
"... Stateoftheart techniques for probability sampling of users of online social networks (OSNs) are based on random walks on a single social relation (typically friendship). While powerful, these methods rely on the social graph being fully connected. Furthermore, the mixing time of the sampling pro ..."
Abstract

Cited by 26 (8 self)
 Add to MetaCart
(Show Context)
Stateoftheart techniques for probability sampling of users of online social networks (OSNs) are based on random walks on a single social relation (typically friendship). While powerful, these methods rely on the social graph being fully connected. Furthermore, the mixing time of the sampling process strongly depends on the characteristics of this graph. In this paper, we observe that there often exist other relations between OSN users, such as membership in the same group or participation in the same event. We propose to exploit the graphs these relations induce, by performing a random walk on their union multigraph. We design a computationally efficient way to perform multigraph sampling by randomly selecting the graph on which to walk at each iteration. We demonstrate the benefits of our approach through (i) simulation in synthetic graphs, and (ii) measurements of Last.fm an Internet website for music with social networking features. More specifically, we show that multigraph sampling can obtain a representative sample and faster convergence, even when the individual graphs fail, i.e., are disconnected or highly clustered.
Network Sampling: From Static to Streaming Graphs
, 2013
"... Network sampling is integral to the analysis of social, information, and biological networks. Since many realworld networks are massive in size, continuously evolving, and/or distributed in nature, the network structure is often sampled in order to facilitate study. For these reasons, a more thorou ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
Network sampling is integral to the analysis of social, information, and biological networks. Since many realworld networks are massive in size, continuously evolving, and/or distributed in nature, the network structure is often sampled in order to facilitate study. For these reasons, a more thorough and complete understanding of network sampling is critical to support the field of network science. In this paper, we outline a framework for the general problem of network sampling, by highlighting the different objectives, population and units of interest, and classes of network sampling methods. In addition, we propose a spectrum of computational models for network sampling methods, ranging from the traditionally studied model based on the assumption of a static domain to a more challenging model that is appropriate for streaming domains. We design a family of sampling methods based on the concept of graph induction that generalize across the full spectrum of computational models (from static to streaming) while efficiently preserving many of the topological properties of the input graphs. Furthermore, we demonstrate how traditional static sampling algorithms can be modified for graph streams for each of the three main classes of sampling methods: node, edge, and topologybased sampling. Experimental results indicate that our proposed family of sampling methods more accurately preserve the underlying properties of the graph in both static and streaming domains. Finally, we study the impact of network sampling algorithms on the parameter estimation and performance evaluation of relational classification algorithms.
CoarseGrained Topology Estimation via Graph Sampling
, 2012
"... In many online networks, nodes are partitioned into categories (e.g., countries or universities in OSNs), which naturally defines a weighted category graph i.e., a coarsegrained version of the underlying network. In this paper, we show how to efficiently estimate the category graph from a probabili ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
(Show Context)
In many online networks, nodes are partitioned into categories (e.g., countries or universities in OSNs), which naturally defines a weighted category graph i.e., a coarsegrained version of the underlying network. In this paper, we show how to efficiently estimate the category graph from a probability sample of nodes. We prove consistency of our estimators and evaluate their efficiency via simulation. We also apply our methodology to a sample of Facebook users to obtain a number of category graphs, such as the college friendship graph and the country friendship graph. We share and visualize the resulting data at www.geosocialmap.com.
Using locationbased social networks to validate human mobility and relationships models
 In Proceedings of 2012 IEEE/ACM International Conference on Advances in Social Network Analysis and Mining
"... Abstract—We propose to use social networking data to validate mobility models for pervasive mobile adhoc networks (MANETs) and delay tolerant networks (DTNs). The Random Waypoint (RWP) [19] and ErdosRenyi (ER) models have been a popular choice among researchers for generating mobility traces of no ..."
Abstract

Cited by 9 (6 self)
 Add to MetaCart
(Show Context)
Abstract—We propose to use social networking data to validate mobility models for pervasive mobile adhoc networks (MANETs) and delay tolerant networks (DTNs). The Random Waypoint (RWP) [19] and ErdosRenyi (ER) models have been a popular choice among researchers for generating mobility traces of nodes and relationships between them. Not only RWP and ER are useful in evaluating networking protocols in a simulation environment, but they are also used for theoretical analysis of such dynamic networks. However, it has been observed that neither relationships among people nor their movements are random. Instead, human movements frequently contain repeated patterns and friendship is bounded by distance. We used social networking site Gowalla to collect, create and validate models of human mobility and relationships for analysis and evaluations of applications in opportunistic networks such as sensor networks and transportation models in civil engineering. In doing so, we hope to provide more humanlike movements and social relationship models to researchers to study problems in complex and mobile networks. I.
Sampling online social networks by random walk
 In ACM SIGKDD Workshop on Hot Topics in Online Social Networks
, 2012
"... This paper proposes to use simple random walk, a sampling method supported by most online social networks (OSN), to estimate a variety of properties of large OSNs. We show that due to the scalefree nature of OSNs the estimators derived from random walk sampling scheme are much better than uniform r ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
(Show Context)
This paper proposes to use simple random walk, a sampling method supported by most online social networks (OSN), to estimate a variety of properties of large OSNs. We show that due to the scalefree nature of OSNs the estimators derived from random walk sampling scheme are much better than uniform random sampling, even when uniform random samples are available disregarding the notorious high cost of obtaining the random samples. The paper first proposes to use harmonic mean to estimate the average degree of OSNs. The accurate estimation of the average degree leads to the discovery of other properties, such as the population size, the heterogeneity of the degrees, the number of friends of friends, the threshold value for messages to reach a large component, and Gini coefficient of the population. The method is validated in complete Twitter data dated in 2009 that contains 42 million nodes and 1.5 billion edges.
Spaceefficient sampling from social activity streams
 In BigMine
, 2012
"... In order to efficiently study the characteristics of network domains and support development of network systems (e.g. algorithms, protocols that operate on networks), it is often necessary to sample a representative subgraph from a large complex network. Although recent subgraph sampling methods hav ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
In order to efficiently study the characteristics of network domains and support development of network systems (e.g. algorithms, protocols that operate on networks), it is often necessary to sample a representative subgraph from a large complex network. Although recent subgraph sampling methods have been shown to work well, they focus on sampling from memoryresident graphs and assume that the sampling algorithm can access the entire graph in order to decide which nodes/edges to select. Many largescale network datasets, however, are too large and/or dynamic to be processed using main memory (e.g., email, tweets, wall posts). In this work, we formulate the problem of sampling from large graph streams. We propose a streaming graph sampling algorithm that dynamically maintains a representative sample in a reservoir based setting. We evaluate the efficacy of our proposed methods empirically using several realworld data sets. Across all datasets, we found that our method produce samples that preserve better the original graph distributions. 1.
Sampling social networks using shortest paths
"... In recent years, online social networks (OSN) have emerged as a platform of sharing variety of information about people, and their interests, activities, events and news from real worlds. Due to the large scale and access limitations (e.g., privacy policies) of online social network services such as ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
In recent years, online social networks (OSN) have emerged as a platform of sharing variety of information about people, and their interests, activities, events and news from real worlds. Due to the large scale and access limitations (e.g., privacy policies) of online social network services such as Facebook and Twitter, it is difficult to access the whole public network in a limited amount of time. For this reason researchers try to study and characterize OSN by taking appropriate and reliable samples from the network. In this paper, we propose to use the concept of shortest path for sampling social networks. The proposed sampling method first finds the shortest paths between several pairs of nodes selected according to some criteria. Then the edges in these shortest paths are ranked according to the number of times that each edge has appeared in the set of found shortest paths. The sampled network is then computed as a subgraph of the social network which contains a percentage of highly ranked edges. In order to investigate the performance of the proposed sampling method, we provide a number of experiments on synthetic and real networks. Experimental results show that the proposed sampling method outperforms the existing method such as random edge sampling, random node sampling, random walk sampling and MetropolisHastings random walk sampling in terms of relative error (RE), normalized root mean square error (NMSE), and KolmogorovSmirnov (KS) test.
Online myopic network covering
 CoRR
"... Efficient marketing or awarenessraising campaigns seek to recruit n influential individuals – where n is the campaign budget – that are able to cover a large target audience through their social connections. So far most of the related literature on maximizing this network cover assumes that the so ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Efficient marketing or awarenessraising campaigns seek to recruit n influential individuals – where n is the campaign budget – that are able to cover a large target audience through their social connections. So far most of the related literature on maximizing this network cover assumes that the social network topology is known. Even in such a case the optimal solution is NPhard. In practice, however, the network topology is generally unknown and needs to be discovered onthefly. In this work we consider an unknown topology where recruited individuals disclose their social connections (a feature known as onehop lookahead). The goal of this work is to provide an efficient greedy online algorithm that recruits individuals as to maximize the size of target audience covered by the campaign. We propose a new greedy online algorithm, Maximum Expected dExcess Degree (MEED), and provide, to the best of our knowledge, the first detailed theoretical analysis of the cover size of a variety of well known network sampling algorithms on finite networks. Our proposed algorithm greedily maximizes the expected size of the cover. For a class of random power law networks we show that MEED simplifies into a straightforward procedure, which we denote MOD (Maximum Observed Degree). We substantiate our analytical results with extensive simulations and show that MOD significantly outperforms all analyzed myopic algorithms. We note that performance may be further improved if the node degree distribution is known or can be estimated online during the campaign. 1.
Discover Hidden Web Properties by Random Walk on Bipartite Graph
"... This paper proposes to use random walk to discover the properties of the deep web data sources that are hidden behind searchable interfaces. The properties, such as the average degree and population size of both documents and terms, are of interests to general public, and find their applications i ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
This paper proposes to use random walk to discover the properties of the deep web data sources that are hidden behind searchable interfaces. The properties, such as the average degree and population size of both documents and terms, are of interests to general public, and find their applications in business intelligence, data integration and deep web crawling. We show that simple random walk (RW) can outperform the uniform random (UR) samples disregarding the high cost of uniform random sampling. We prove that in the idealized case when the degrees follow Zipf’s law, the sample size of UR sampling needs to grow in the order of O(N/ln 2 N) with the corpus size N, while the sample size of RW sampling grows logarithmically. Reuters corpus is used to demonstrate that the term degrees resemble power law distribution, thus RW is better than UR sampling. On the other hand, document degrees have lognormal distribution and exhibit a smaller variance, therefore UR sampling is slightly better.
On Random Walk Based Graph Sampling
, 2015
"... Random walk based graph sampling has been recognized as a fundamental technique to collect uniform node samples from a large graph. In this paper, we first present a comprehensive analysis of the drawbacks of three widelyused random walk based graph sampling algorithms, called reweighted random w ..."
Abstract
 Add to MetaCart
Random walk based graph sampling has been recognized as a fundamental technique to collect uniform node samples from a large graph. In this paper, we first present a comprehensive analysis of the drawbacks of three widelyused random walk based graph sampling algorithms, called reweighted random walk (RW) algorithm, MetropolisHastings random walk (MH) algorithm and maximumdegree random walk (MD) algorithm. Then, to address the limitations of these algorithms, we propose two general random walk based algorithms, named rejectioncontrolled MetropolisHastings (RCMH) algorithm and generalized maximumdegree random walk (GMD) algorithm. We show that RCMH balances the tradeoff between the limitations of RW and MH, and GMD balances the tradeoff between the drawbacks of RW and MD. To further improve the performance of our algorithms, we integrate the socalled delayed acceptance technique and the nonbacktracking random walk technique into RCMH and GMD respectively. We conduct extensive experiments over four realworld datasets, and the results demonstrate the effectiveness of the proposed algorithms.