Results 1  10
of
16
Directed Graph Exploration
"... Abstract. We study the problem of exploring all nodes of an unknown directed graph. A searcher has to construct a tour that visits all nodes, but only has information about the parts of the graph it already visited. The goal is to minimize the cost of such a tour. In this paper, we present upper and ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We study the problem of exploring all nodes of an unknown directed graph. A searcher has to construct a tour that visits all nodes, but only has information about the parts of the graph it already visited. The goal is to minimize the cost of such a tour. In this paper, we present upper and lower bounds for both the deterministic and the randomized online version of exploring all nodes of directed graphs. Our bounds are sharp or sharp up to a small constant, depending on the specific model. Essentially, exploring a directed graph has a multiplicative overhead linear in the number of nodes. If one wants to search for just a node in unweighted directed graphs, a greedy algorithm with quadratic multiplicative overhead can only be improved by a factor of at most two. We were also able to show that randomly choosing a starting point does not improve lower bounds beyond a small constant factor.
Sampling social networks using shortest paths
"... In recent years, online social networks (OSN) have emerged as a platform of sharing variety of information about people, and their interests, activities, events and news from real worlds. Due to the large scale and access limitations (e.g., privacy policies) of online social network services such as ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
In recent years, online social networks (OSN) have emerged as a platform of sharing variety of information about people, and their interests, activities, events and news from real worlds. Due to the large scale and access limitations (e.g., privacy policies) of online social network services such as Facebook and Twitter, it is difficult to access the whole public network in a limited amount of time. For this reason researchers try to study and characterize OSN by taking appropriate and reliable samples from the network. In this paper, we propose to use the concept of shortest path for sampling social networks. The proposed sampling method first finds the shortest paths between several pairs of nodes selected according to some criteria. Then the edges in these shortest paths are ranked according to the number of times that each edge has appeared in the set of found shortest paths. The sampled network is then computed as a subgraph of the social network which contains a percentage of highly ranked edges. In order to investigate the performance of the proposed sampling method, we provide a number of experiments on synthetic and real networks. Experimental results show that the proposed sampling method outperforms the existing method such as random edge sampling, random node sampling, random walk sampling and MetropolisHastings random walk sampling in terms of relative error (RE), normalized root mean square error (NMSE), and KolmogorovSmirnov (KS) test.
OF
, 2015
"... In recent years, online social networks have become a very popular and effective forum for information exchange. These large, highly interconnected networks span the globe and have the ability to disseminate information in a fraction of the time it would take other communication networks. Given the ..."
Abstract
 Add to MetaCart
In recent years, online social networks have become a very popular and effective forum for information exchange. These large, highly interconnected networks span the globe and have the ability to disseminate information in a fraction of the time it would take other communication networks. Given the myriad ways in which online social networks can be used, creating accurate, predictive models for the spread of information across them is very valuable. With that, modeling processes on large networks is a difficult task. It is computationally expensive, and usually prohibitive, to model a process on the entirety of a very large network. Given these complexities, creating smaller network graphs that are characteristically similar to the original networks graphs enable researchers to run models that are otherwise not feasible. This project aims to create prototypic networks and model the spread of information across them using networkbased epidemiological models to better understand how information spreads across an online social network. More specifically,
Challenges of Forecasting and Measuring a Complex Networked World
"... ABSTRACT A new era of data analytics of online social networks promises tremendous highimpact societal, business, and healthcare applications. As more users join online social networks, the data available for analysis and forecast of human social and collective behavior grows at an incredible pace ..."
Abstract
 Add to MetaCart
ABSTRACT A new era of data analytics of online social networks promises tremendous highimpact societal, business, and healthcare applications. As more users join online social networks, the data available for analysis and forecast of human social and collective behavior grows at an incredible pace. The first part of this talk introduces an apparent paradox, where larger online social networks entail more user data but also less analytic and forecasting capabilities
CHARACTERIZING ACCURACY AND PERFORMANCE TRADEOFFS IN GRAPH SAMPLING FOR GRAPH PROPERTY COMPUTATIONS
, 2014
"... In this thesis, we present a systematic way to characterize the tradeoffs between accuracy and cost in graph sampling. This characterization is heavily dependent on graph structure. Here we focus on vector graph properties, which consist of a value per node in the graph (e.g., PageRank, degree). We ..."
Abstract
 Add to MetaCart
In this thesis, we present a systematic way to characterize the tradeoffs between accuracy and cost in graph sampling. This characterization is heavily dependent on graph structure. Here we focus on vector graph properties, which consist of a value per node in the graph (e.g., PageRank, degree). We present a new technique for assessing the accuracy of a property based on the algorithm used to compute it. Next, we describe how to interpret several features of accuracyperformance tradeoff curves. Finally, we present our analysis of actual accuracycost curves for both realworld and synthetic graphs. Conclusions from the analysis include that the structure of a graph is more important than its scale for the purposes of sampling, and that different structures require different sampling approaches.
Sampling Node Pairs Over Large Graphs
"... Abstract — Characterizing user pair relationships is important for applications such as friend recommendation and interest targeting in online social networks (OSNs). Due to the large scale nature of such networks, it is infeasible to enumerate all user pairs and so sampling is used. In this paper, ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract — Characterizing user pair relationships is important for applications such as friend recommendation and interest targeting in online social networks (OSNs). Due to the large scale nature of such networks, it is infeasible to enumerate all user pairs and so sampling is used. In this paper, we show that it is a great challenge even for OSN service providers to characterize user pair relationships even when they posses the complete graph topology. The reason is that when sampling techniques (i.e., uniform vertex sampling (UVS) and random walk (RW)) are naively applied, they can introduce large biases, in particular, for estimating similarity distribution of user pairs with constraints such as existence of mutual neighbors, which is important for applications such as identifying network homophily. Estimating statistics of user pairs is more challenging in the absence of the complete topology information, since an unbiased sampling technique such as UVS is usually not allowed, and exploring the OSN graph topology is expensive. To address these challenges, we present asymptotically unbiased sampling methods to characterize user pair properties based on UVS and RW techniques respectively. We carry out an evaluation of our methods to show their accuracy and efficiency. Finally, we apply our methods to two Chinese OSNs, Doudan and Xiami, and discover significant homophily is present in these two networks. I.
Preprint submitted to International Journal of Communication Systems A new learning automata based sampling algorithm for social networks
"... Recently, studying social networks plays a significant role in many applications of social network analysis, from the studying the characterization of network to that of financial applications. Due to the large data and privacy issues of social network services, there is only a limited local access ..."
Abstract
 Add to MetaCart
(Show Context)
Recently, studying social networks plays a significant role in many applications of social network analysis, from the studying the characterization of network to that of financial applications. Due to the large data and privacy issues of social network services, there is only a limited local access to whole network data in a reasonable amount of time. Therefore, network sampling arises to studying the characterization of real networks such as communication, technological, information and social networks. In this paper, a sampling algorithm for complex social networks which is based on a new version of distributed learning automata (DLA) reported recently called extended distributed learning automata (eDLA) is proposed. For evaluation purpose, the eDLA based sampling algorithm has been tested on several test networks and the obtained experimental results are compared with the results obtained for number of wellknown sampling algorithms in terms of relative error (RE) and KolmogorovSmirnov (KS) test. It is shown that eDLA based sampling algorithm outperforms the existing sampling algorithms. Experimental results further show that the eDLA based sampling algorithm in comparison with the DLA based sampling algorithm has a 26.93 % improvement for the average of KS value for degree distribution taken over all test networks.
AEfficiently Estimating Motif Statistics of Large Networks
"... Exploring statistics of locally connected subgraph patterns (also known as network motifs) has helped researchers better understand the structure and function of biological and online social networks (OSNs). Nowadays the massive size of some critical networks – often stored in already overloaded rel ..."
Abstract
 Add to MetaCart
(Show Context)
Exploring statistics of locally connected subgraph patterns (also known as network motifs) has helped researchers better understand the structure and function of biological and online social networks (OSNs). Nowadays the massive size of some critical networks – often stored in already overloaded relational databases – effectively limits the rate at which nodes and edges can be explored, making it a challenge to accurately discover subgraph statistics. In this work, we propose sampling methods to accurately estimate subgraph statistics from as few queried nodes as possible. We present sampling algorithms that efficiently and accurately estimate subgraph properties of massive networks. Our algorithms require no precomputation or complete network topology information. At the same time, we provide theoretical guarantees of convergence. We perform experiments using widely known data sets, and show that for the same accuracy, our algorithms require an order of magnitude less queries (samples) than the current stateoftheart algorithms.
International Journal of Modern Physics C © World Scientific Publishing Company Social network sampling using spanning trees
, 2015
"... Due to the large scales and limitations in accessing most online social networks, it is hard or infeasible to directly access them in a reasonable amount of time for studying and analysis. Hence, network sampling has emerged as a suitable technique to study and analyze real networks. The main goal o ..."
Abstract
 Add to MetaCart
Due to the large scales and limitations in accessing most online social networks, it is hard or infeasible to directly access them in a reasonable amount of time for studying and analysis. Hence, network sampling has emerged as a suitable technique to study and analyze real networks. The main goal of sampling online social networks is constructing a small scale sampled network which preserves the most important properties of the original network. In this paper, we propose two sampling algorithms for sampling online social networks using spanning trees. The first proposed sampling algorithm finds several spanning trees from randomly chosen starting nodes; then the edges in these spanning trees are ranked according to the number of times that each edge has appeared in the set of found spanning trees in the given network. The sampled network is then constructed as a subgraph of the original network which contains a fraction of nodes that are incident on highly ranked edges. In order to avoid traversing the entire network, the second sampling algorithm is proposed using partial spanning trees. The second sampling algorithm is similar to the first algorithm except that it uses partial spanning trees. Several experiments are conducted to examine the performance of the proposed sampling algorithms on wellknown real networks. The obtained results in comparison with other popular sampling methods demonstrate the efficiency of
Physica A 396 (2014) 224–234 Contents lists available at ScienceDirect
"... journal homepage: www.elsevier.com/locate/physa Sampling from complex networks using distributed ..."
Abstract
 Add to MetaCart
(Show Context)
journal homepage: www.elsevier.com/locate/physa Sampling from complex networks using distributed