Results 1  10
of
116
The Socialbot Network: When Bots Socialize for Fame and Money
 Proc. 27th Ann. Computer Security Applications Conf. (ACSAC
, 2011
"... Online Social Networks (OSNs) have become an integral part of today’s Web. Politicians, celebrities, revolutionists, and others use OSNs as a podium to deliver their message to millions of active web users. Unfortunately, in the wrong hands, OSNs can be used to run astroturf campaigns to spread misi ..."
Abstract

Cited by 63 (7 self)
 Add to MetaCart
(Show Context)
Online Social Networks (OSNs) have become an integral part of today’s Web. Politicians, celebrities, revolutionists, and others use OSNs as a podium to deliver their message to millions of active web users. Unfortunately, in the wrong hands, OSNs can be used to run astroturf campaigns to spread misinformation and propaganda. Such campaigns usually start off by infiltrating a targeted OSN on a large scale. In this paper, we evaluate how vulnerable OSNs are to a largescale infiltration by socialbots: computer programs that control OSN accounts and mimic real users. We adopt a traditional webbased botnet design and built a Socialbot Network (SbN): a group of adaptive socialbots that are orchestrated in a commandandcontrol fashion. We operated such an SbN on Facebook—a 750 million user OSN—for about 8 weeks. We collected data related to users ’ behavior in response to a largescale infiltration where socialbots were used to connect to a large number of Facebook users. Our results show that (1) OSNs, such as Facebook, can be infiltrated with a success rate of up to 80%, (2) depending on users ’ privacy settings, a successful infiltration can result in privacy breaches where even more users ’ data are exposed when compared to a purely public access, and (3) in practice, OSN security defenses, such as the Facebook Immune System, are not effective enough in detecting or stopping a largescale infiltration as it occurs. 1.
Social Structure of Facebook Networks
, 2011
"... We study the social structure of Facebook “friendship ” networks at one hundred American colleges and universities at a single point in time, and we examine the roles of user attributes—gender, class year, major, high school, and residence—at these institutions. We investigate the influence of commo ..."
Abstract

Cited by 43 (2 self)
 Add to MetaCart
We study the social structure of Facebook “friendship ” networks at one hundred American colleges and universities at a single point in time, and we examine the roles of user attributes—gender, class year, major, high school, and residence—at these institutions. We investigate the influence of common attributes at the dyad level in terms of assortativity coefficients and regression models. We then examine largerscale groupings by detecting communities algorithmically and comparing them to network partitions based on the user characteristics. We thereby compare the relative importances of different characteristics at different institutions, finding for example that common high school is more important to the social organization of large institutions and that the importance of common major varies significantly between institutions. Our calculations illustrate how microscopic and macroscopic perspectives give complementary insights on the social organization at universities and suggest future studies to investigate such phenomena further.
Comparing community structure to characteristics in online collegiate social networks
 SIAM Review
, 2011
"... Abstract. We study the structure of social networks of students by examining the graphs of Facebook “friendships ” at five U.S. universities at a single point in time. We investigate the community structure of each singleinstitution network and employ visual and quantitative tools, including standa ..."
Abstract

Cited by 42 (5 self)
 Add to MetaCart
(Show Context)
Abstract. We study the structure of social networks of students by examining the graphs of Facebook “friendships ” at five U.S. universities at a single point in time. We investigate the community structure of each singleinstitution network and employ visual and quantitative tools, including standardized paircounting methods, to measure the correlations between the network communities and a set of selfidentified user characteristics (residence, class year, major, and high school). We review the basic properties and statistics of the employed paircounting indices and recall, in simplified notation, a useful formula for the zscore of the Rand coefficient. Our study illustrates how to examine different instances of social networks constructed in similar environments, emphasizes the array of social forces that combine to form “communities, ” and leads to comparative observations about online social structures, which reflect offline social structures. We calculate the relative contributions of different characteristics to the community structure of individual universities and compare these relative contributions at different universities. For example, we examine the importance of common high school affiliation at large state universities and the varying degrees of influence that common major can have on the social structure at different universities.
Practical recommendations on crawling online social networks
 SELECTED AREAS IN COMMUNICATIONS, IEEE JOURNAL ON
, 2011
"... Our goal in this paper is to develop a practical framework for obtaining a uniform sample of users in an online social network (OSN) by crawling its social graph. Such a sample allows to estimate any user property and some topological properties as well. To this end, first, we consider and compare ..."
Abstract

Cited by 37 (1 self)
 Add to MetaCart
(Show Context)
Our goal in this paper is to develop a practical framework for obtaining a uniform sample of users in an online social network (OSN) by crawling its social graph. Such a sample allows to estimate any user property and some topological properties as well. To this end, first, we consider and compare several candidate crawling techniques. Two approaches that can produce approximately uniform samples are the MetropolisHasting random walk (MHRW) and a reweighted random walk (RWRW). Both have pros and cons, which we demonstrate through a comparison to each other as well as to the “ground truth. ” In contrast, using BreadthFirstSearch (BFS) or an unadjusted Random Walk (RW) leads to substantially biased results. Second, and in addition to offline performance assessment, we introduce online formal convergence diagnostics to assess sample quality during the data collection process. We show how these diagnostics can be used to effectively determine when a random walk sample is of adequate size and quality. Third, as a case study, we apply the above methods to Facebook and we collect the first, to the best of our knowledge, representative sample of Facebook users. We make it publicly available and employ it to characterize several key properties of Facebook.
Estimating Sizes of Social Networks via Biased Sampling
"... Online social networks have become very popular in recent years and their number of users is already measured in many hundreds of millions. For various commercial and sociological purposes, an independent estimate of their sizes is important. In this work, algorithms for estimating the number of use ..."
Abstract

Cited by 30 (0 self)
 Add to MetaCart
Online social networks have become very popular in recent years and their number of users is already measured in many hundreds of millions. For various commercial and sociological purposes, an independent estimate of their sizes is important. In this work, algorithms for estimating the number of users in such networks are considered. The proposed schemes are also applicable for estimating the sizes of networks’ subpopulations. The suggested algorithms interact with the social networks via their public APIs only, and rely on no other external information. Due to obvious traffic and privacy concerns, the number of such interactions is severely limited. We therefore focus on minimizing the number of API interactions needed for producing good size estimates. We adopt the abstraction of social networks as undirected graphs and use random node sampling. By counting the number of collisions or nonunique nodes in the sample, we produce a size estimate. Then, we show analytically that the estimate error vanishes with high probability for smaller number of samples than those required by priorart algorithms. Moreover, although our algorithms are provably correct for any graph, they excel when applied to social networklike graphs. The proposed algorithms were evaluated on synthetic as well real social networks such as Facebook, IMDB, and DBLP. Our experiments corroborated the theoretical results, and demonstrated the effectiveness of the algorithms.
Multigraph Sampling of Online Social Networks
 IEEE J. SEL. AREAS COMMUN. ON MEASUREMENT OF INTERNET TOPOLOGIES
, 2011
"... Stateoftheart techniques for probability sampling of users of online social networks (OSNs) are based on random walks on a single social relation (typically friendship). While powerful, these methods rely on the social graph being fully connected. Furthermore, the mixing time of the sampling pro ..."
Abstract

Cited by 26 (8 self)
 Add to MetaCart
(Show Context)
Stateoftheart techniques for probability sampling of users of online social networks (OSNs) are based on random walks on a single social relation (typically friendship). While powerful, these methods rely on the social graph being fully connected. Furthermore, the mixing time of the sampling process strongly depends on the characteristics of this graph. In this paper, we observe that there often exist other relations between OSN users, such as membership in the same group or participation in the same event. We propose to exploit the graphs these relations induce, by performing a random walk on their union multigraph. We design a computationally efficient way to perform multigraph sampling by randomly selecting the graph on which to walk at each iteration. We demonstrate the benefits of our approach through (i) simulation in synthetic graphs, and (ii) measurements of Last.fm an Internet website for music with social networking features. More specifically, we show that multigraph sampling can obtain a representative sample and faster convergence, even when the individual graphs fail, i.e., are disconnected or highly clustered.
Towards unbiased BFS sampling
 SELECTED AREAS IN COMMUNICATIONS, IEEE JOURNAL ON
, 2011
"... Breadth First Search (BFS) is a widely used approach for sampling large graphs. However, it has been empirically observed that BFS sampling is biased toward highdegree nodes, which may strongly affect the measurement results. In this paper, we quantify and correct the degree bias of BFS. First, we ..."
Abstract

Cited by 26 (4 self)
 Add to MetaCart
(Show Context)
Breadth First Search (BFS) is a widely used approach for sampling large graphs. However, it has been empirically observed that BFS sampling is biased toward highdegree nodes, which may strongly affect the measurement results. In this paper, we quantify and correct the degree bias of BFS. First, we consider a random graph RG(pk) with an arbitrary degree distribution pk. For this model, we calculate the node degree distribution expected to be observed by BFS as a function of the fraction f of covered nodes. We also show that, for RG(pk), all commonly used graph traversal techniques (BFS, DFS, Forest Fire, Snowball Sampling, RDS) have exactly the same bias. Next, we propose a practical BFSbias correction procedure that takes as input a collected BFS sample together with the fraction f. Our correction technique is exact (i.e., leads to unbiased estimation) for RG(pk). Furthermore, it performs well when applied to a broad range of Internet topologies and to two large BFS samples of Facebook and Orkut networks.
Crawling Facebook for Social Network Analysis Purposes
"... We describe our work in the collection and analysis of massive data describing the connections between participants to online social networks. Alternative approaches to social network data collection are defined and evaluated in practice, against the popular Facebook Web site. Thanks to our adhoc, ..."
Abstract

Cited by 24 (6 self)
 Add to MetaCart
(Show Context)
We describe our work in the collection and analysis of massive data describing the connections between participants to online social networks. Alternative approaches to social network data collection are defined and evaluated in practice, against the popular Facebook Web site. Thanks to our adhoc, privacycompliant crawlers, two large samples, comprising millions of connections, have been collected; the data is anonymous and organized as an undirected graph. We describe a set of tools that we developed to analyze specific properties of such socialnetwork graphs, i.e., among others, degree distribution, centrality measures, scaling laws and distribution of friendship.
How to win friends and influence people, truthfully: Influence maximization mechanisms for social networks
 In WSDM
, 2012
"... Throughout the past decade there has been extensive research on algorithmic and data mining techniques for solving the problem of influence maximization in social networks: if one can incentivize a subset of individuals to become early adopters of a new technology, which subset should be selected so ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
(Show Context)
Throughout the past decade there has been extensive research on algorithmic and data mining techniques for solving the problem of influence maximization in social networks: if one can incentivize a subset of individuals to become early adopters of a new technology, which subset should be selected so that the wordofmouth effect in the social network is maximized? Despite the progress in modeling and techniques, the incomplete information aspect of the problem has been largely overlooked. While data can often provide the network structure and influence patterns may be observable, the inherent cost individuals have to become early adopters is difficult to extract. In this paper we introduce mechanisms that elicit individuals’ costs while providing desirable approximation guarantees in some of the most wellstudied models of social network influence. We follow the mechanism design framework which advocates for allocation and payment schemes that incentivize individuals to report their true information. We also performed experiments using the Mechanical Turk platform and social network data to provide evidence of the framework’s effectiveness in practice.
Walking on a Graph with a Magnifying Glass: Stratified Sampling via Weighted Random Walks
 in Proc. ACM SIGMETRICS
, 2011
"... Our objective is to sample the node set of a large unknown graph via crawling, to accurately estimate a given metric of interest. We design a random walk on an appropriately defined weighted graph that achieves high efficiency by preferentially crawling those nodes and edges that convey greater info ..."
Abstract

Cited by 23 (7 self)
 Add to MetaCart
(Show Context)
Our objective is to sample the node set of a large unknown graph via crawling, to accurately estimate a given metric of interest. We design a random walk on an appropriately defined weighted graph that achieves high efficiency by preferentially crawling those nodes and edges that convey greater information regarding the target metric. Our approach begins by employing the theory of stratification to find optimal node weights, for a given estimation problem, under an independence sampler. While optimal under independence sampling, these weights may be impractical under graph crawling due to constraints arising from the structure of the graph. Therefore, the edge weights for our random walk should be chosen so as to lead to an equilibrium distribution that strikes a balance between approximating the optimal weights under an independence sampler and achieving fast convergence. We propose a heuristic approach (stratified weighted random walk, or SWRW) that achieves this goal, while using only limited information about the graph structure and the node properties. We evaluate our technique in simulation, and experimentally, by collecting a sample of Facebook college users. We show that SWRW requires 1315 times fewer samples than the simple reweighted random walk (RW) to achieve the same estimation accuracy for a range of metrics.