Results

**1 - 5**of**5**### On Random Walk Based Graph Sampling

, 2015

"... Random walk based graph sampling has been recognized as a fundamental technique to collect uniform node samples from a large graph. In this paper, we first present a comprehensive analysis of the drawbacks of three widely-used random walk based graph sampling algorithms, called re-weighted random w ..."

Abstract
- Add to MetaCart

Random walk based graph sampling has been recognized as a fundamental technique to collect uniform node samples from a large graph. In this paper, we first present a comprehensive analysis of the drawbacks of three widely-used random walk based graph sampling algorithms, called re-weighted random walk (RW) algorithm, Metropolis-Hastings random walk (MH) algorithm and maximum-degree random walk (MD) algorithm. Then, to address the limitations of these algorithms, we propose two general random walk based algorithms, named rejection-controlled Metropolis-Hastings (RCMH) algorithm and generalized maximum-degree random walk (GMD) algorithm. We show that RCMH balances the tradeoff between the limitations of RW and MH, and GMD balances the tradeoff between the drawbacks of RW and MD. To further improve the performance of our algorithms, we integrate the so-called delayed acceptance technique and the non-backtracking random walk technique into RCMH and GMD respectively. We conduct extensive experiments over four real-world datasets, and the results demonstrate the effectiveness of the proposed algorithms.

### Leveraging History for Faster Sampling of Online Social Networks

"... ABSTRACT With a vast amount of data available on online social networks, how to enable efficient analytics over such data has been an increasingly important research problem. Given the sheer size of such social networks, many existing studies resort to sampling techniques that draw random nodes fro ..."

Abstract
- Add to MetaCart

(Show Context)
ABSTRACT With a vast amount of data available on online social networks, how to enable efficient analytics over such data has been an increasingly important research problem. Given the sheer size of such social networks, many existing studies resort to sampling techniques that draw random nodes from an online social network through its restrictive web/API interface. While these studies differ widely in analytics tasks supported and algorithmic design, almost all of them use the exact same underlying technique of random walk -a Markov Chain Monte Carlo based method which iteratively transits from one node to its random neighbor. Random walk fits naturally with this problem because, for most online social networks, the only query we can issue through the interface is to retrieve the neighbors of a given node (i.e., no access to the full graph topology). A problem with random walks, however, is the "burn-in" period which requires a large number of transitions/queries before the sampling distribution converges to a stationary value that enables the drawing of samples in a statistically valid manner. In this paper, we consider a novel problem of speeding up the fundamental design of random walks (i.e., reducing the number of queries it requires) without changing the stationary distribution it achieves -thereby enabling a more efficient "drop-in" replacement for existing sampling-based analytics techniques over online social networks. Technically, our main idea is to leverage the history of random walks to construct a higher-ordered Markov chain. We develop two algorithms, Circulated Neighbors and Groupby Neighbors Random Walk (CNRW and GNRW) and rigidly prove that, no matter what the social network topology is, CNRW and GNRW offer better efficiency than baseline random walks while achieving the same stationary distribution. We demonstrate through extensive experiments on real-world social networks and synthetic graphs the superiority of our techniques over the existing ones.

### Walk, Not Wait: Faster Sampling Over Online Social Networks

"... In this paper, we introduce a novel, general purpose, technique for faster sampling of nodes over an online social network. Specif-ically, unlike traditional random walk which wait for the conver-gence of sampling distribution to a predetermined target distribu-tion- a waiting process that incurs a ..."

Abstract
- Add to MetaCart

(Show Context)
In this paper, we introduce a novel, general purpose, technique for faster sampling of nodes over an online social network. Specif-ically, unlike traditional random walk which wait for the conver-gence of sampling distribution to a predetermined target distribu-tion- a waiting process that incurs a high query cost- we de-velop WALK-ESTIMATE, which starts with a much shorter ran-dom walk, and then proactively estimate the sampling probability for the node taken before using acceptance-rejection sampling to adjust the sampling probability to the predetermined target distri-bution. We present a novel backward random walk technique which provides provably unbiased estimations for the sampling probabil-ity, and demonstrate the superiority of WALK-ESTIMATE over traditional random walks through theoretical analysis and extensive experiments over real world online social networks. 1.

### Design of Efficient Sampling Methods on Hybrid Social-Affiliation Networks Technique Report

"... Abstract—Graph sampling via crawling has become increas-ingly popular and important in the study of measuring various characteristics of large scale complex networks. While powerful, it is known to be challenging when the graph is loosely connected or disconnected which slows down the convergence of ..."

Abstract
- Add to MetaCart

(Show Context)
Abstract—Graph sampling via crawling has become increas-ingly popular and important in the study of measuring various characteristics of large scale complex networks. While powerful, it is known to be challenging when the graph is loosely connected or disconnected which slows down the convergence of random walks and can cause poor estimation accuracy. In this work, we observe that the graph under study, or called target graph, usually does not exist in isolation. In many situations, the target graph is related to an auxiliary graph and an affiliation graph, and the target graph becomes well connected when we view it from the perspective of these three graphs together, or called a hybrid social-affiliation graph in this paper. When directly sampling the target graph is difficult or inefficient, we can indirectly sample it efficiently with the assistances of the other two graphs. We design three sampling methods on such a hybrid social-affiliation network. Experiments conducted on both synthetic and real datasets demonstrate the effectiveness of our proposed methods. I.

### Walk, Not Wait: Faster Sampling Over Online Social Networks

"... In this paper, we introduce a novel, general purpose, technique for faster sampling of nodes over an online social network. Specifi-cally, unlike traditional random walks which wait for the conver-gence of sampling distribution to a predetermined target distribu-tion- a waiting process that incurs a ..."

Abstract
- Add to MetaCart

(Show Context)
In this paper, we introduce a novel, general purpose, technique for faster sampling of nodes over an online social network. Specifi-cally, unlike traditional random walks which wait for the conver-gence of sampling distribution to a predetermined target distribu-tion- a waiting process that incurs a high query cost- we de-velop WALK-ESTIMATE, which starts with a much shorter ran-dom walk, and then proactively estimate the sampling probability for the node taken before using acceptance-rejection sampling to adjust the sampling probability to the predetermined target distri-bution. We present a novel backward random walk technique which provides provably unbiased estimations for the sampling probabil-ity, and demonstrate the superiority of WALK-ESTIMATE over traditional random walks through theoretical analysis and extensive experiments over real world online social networks. 1.