Results 1 - 10
of
825
Defining and Evaluating Network Communities based on Ground-truth. Extended version
, 2012
"... Abstract—Nodes in real-world networks organize into densely linked communities where edges appear with high concentration among the members of the community. Identifying such communities of nodes has proven to be a challenging task mainly due to a plethora of definitions of a community, intractabili ..."
Abstract
-
Cited by 112 (4 self)
- Add to MetaCart
(Show Context)
Abstract—Nodes in real-world networks organize into densely linked communities where edges appear with high concentration among the members of the community. Identifying such communities of nodes has proven to be a challenging task mainly due to a plethora of definitions of a community, intractability of algorithms, issues with evaluation and the lack of a reliable gold-standard ground-truth. In this paper we study a set of 230 large real-world social, collaboration and information networks where nodes explicitly state their group memberships. For example, in social networks nodes explicitly join various interest based social groups. We use such groups to define a reliable and robust notion of ground-truth communities. We then propose a methodology which allows us to compare and quantitatively evaluate how different structural definitions of network communities correspond to ground-truth communities. We choose 13 commonly used structural definitions of network communities and examine their sensitivity, robustness and performance in identifying the ground-truth. We show that the 13 structural definitions are heavily correlated and naturally group into four classes. We find that two of these definitions, Conductance and Triad-participation-ratio, consistently give the best performance in identifying ground-truth communities. We also investigate a task of detecting communities given a single seed node. We extend the local spectral clustering algorithm into a heuristic parameter-free community detection method that easily scales to networks with more than hundred million nodes. The proposed method achieves 30 % relative improvement over current local clustering methods. I.
Privacy wizards for social networking sites
- in WWW ’10: Proceedings of the 19th International World Wide Web Conference
, 2010
"... Privacy is an enormous problem in online social networking sites. While sites such as Facebook allow users fine-grained control over who can see their profiles, it is difficult for average users to specify this kind of detailed policy. In this paper, we propose a template for the design of a social ..."
Abstract
-
Cited by 95 (2 self)
- Add to MetaCart
(Show Context)
Privacy is an enormous problem in online social networking sites. While sites such as Facebook allow users fine-grained control over who can see their profiles, it is difficult for average users to specify this kind of detailed policy. In this paper, we propose a template for the design of a social networking privacy wizard. The intuition for the design comes from the observation that real users conceive their privacy preferences (which friends should be able to see which information) based on an implicit set of rules. Thus, with a limited amount of user input, it is usually possible to build a machine learning model that concisely describes a particular user’s preferences, and then use this model to configure the user’s privacy settings automatically. As an instance of this general framework, we have built a wizard based on an active learning paradigm called uncertainty sampling. The wizard iteratively asks the user to assign privacy “labels ” to selected (“informative”) friends, and it uses this input to construct a classifier, which can in turn be used to automatically assign privileges to the rest of the user’s (unlabeled) friends. To evaluate our approach, we collected detailed privacy preference data from 45 real Facebook users. Our study revealed two important things. First, real users tend to conceive their privacy preferences in terms of communities, which can easily be extracted from a social network graph using existing techniques. Second, our active learning wizard, using communities as features, is able to recommend high-accuracy privacy settings using less user input than existing policy-specification tools.
Spatial networks
- PHYSICS REPORTS
, 2010
"... Complex systems are very often organized under the form of networks where nodes and edges are embedded in space. Transportation and mobility networks, Internet, mobile phone networks, power grids, social and contact networks, neural networks, are all examples where space is relevant and where topolo ..."
Abstract
-
Cited by 93 (5 self)
- Add to MetaCart
(Show Context)
Complex systems are very often organized under the form of networks where nodes and edges are embedded in space. Transportation and mobility networks, Internet, mobile phone networks, power grids, social and contact networks, neural networks, are all examples where space is relevant and where topology alone does not contain all the information. Characterizing and understanding
An Analysis of Social Network-Based Sybil Defenses ABSTRACT
"... Recently, there has been much excitement in the research community over using social networks to mitigate multiple identity, or Sybil, attacks. A number of schemes have been proposed, but they differ greatly in the algorithms they use and in the networks upon which they are evaluated. As a result, t ..."
Abstract
-
Cited by 91 (8 self)
- Add to MetaCart
Recently, there has been much excitement in the research community over using social networks to mitigate multiple identity, or Sybil, attacks. A number of schemes have been proposed, but they differ greatly in the algorithms they use and in the networks upon which they are evaluated. As a result, the research community lacks a clear understanding of how these schemes compare against each other, how well they would work on real-world social networks with different structural properties, or whether there exist other (potentially better) ways of Sybil defense. In this paper, we show that, despite their considerable differences, existing Sybil defense schemes work by detecting local communities (i.e., clusters of nodes more tightly knit than the rest of the graph) around a trusted node. Our finding has important implications for both existing and future designs of Sybil defense schemes. First, we show that there is an opportunity to leverage the substantial amount of prior work on general community detection algorithms in order to defend against Sybils. Second, our analysis reveals the fundamental limits of current social network-based Sybil defenses: We demonstrate that networks with well-defined community structure are inherently more vulnerable to Sybil attacks, and that, in such networks, Sybils can carefully target their links in order make their attacks more effective.
Analyzing Facebook privacy settings: User expectations vs. reality
- In Proc. ACM/USENIX Internet Measurement Conference (IMC
, 2011
"... The sharing of personal data has emerged as a popular activity over online social networking sites like Facebook. As a result, the issue of online social network privacy has received significant attention in both the research literature and the mainstream media. Our overarching goal is to improve de ..."
Abstract
-
Cited by 77 (7 self)
- Add to MetaCart
(Show Context)
The sharing of personal data has emerged as a popular activity over online social networking sites like Facebook. As a result, the issue of online social network privacy has received significant attention in both the research literature and the mainstream media. Our overarching goal is to improve defaults and provide better tools for managing privacy, but we are limited by the fact that the full extent of the privacy problem remains unknown; there is little quantification of the incidence of incorrect privacy settings or the difficulty users face when managing their privacy. In this paper, we focus on measuring the disparity between the desired and actual privacy settings, quantifying the magnitude of the problem of managing privacy. We deploy a survey, implemented as a Facebook application, to 200 Facebook users recruited via Amazon Mechanical Turk. We find that 36 % of content remains shared with the default privacy settings. We also find that, overall, privacy settings match users ’ expectations only 37 % of the time, and when incorrect, almost always expose content to more users than expected. Finally, we explore how our results have potential to assist users in selecting appropriate privacy settings by examining the user-created friend lists. We find that these have significant correlation with the social network, suggesting that information from the social network may be helpful in implementing new tools for managing privacy.
Overlapping community detection in networks: the state of the art and comparative study
- ACM Comput. Surv
, 2012
"... This paper reviews the state of the art in overlapping community detection algorithms, quality measures, and benchmarks. A thorough comparison of different algorithms (a total of fourteen) is provided. In addition to community level evaluation, we propose a framework for evaluating algorithms ’ abil ..."
Abstract
-
Cited by 74 (6 self)
- Add to MetaCart
(Show Context)
This paper reviews the state of the art in overlapping community detection algorithms, quality measures, and benchmarks. A thorough comparison of different algorithms (a total of fourteen) is provided. In addition to community level evaluation, we propose a framework for evaluating algorithms ’ ability to detect overlapping nodes, which helps to assess over-detection and underdetection. After considering community level detection performance measured by Normalized Mutual Information, the Omega index, and node level detection performance measured by F-score, we reached the following conclusions. For low overlapping density networks, SLPA, OSLOM, Game and COPRA offer better performance than the other tested algorithms. For networks with high overlapping density and high overlapping diversity, both SLPA and Game provide relatively stable performance. However, test results also suggest that the detection in such networks is still not yet fully resolved. A common feature observed by various algorithms in real-world networks is the relatively small fraction of overlapping nodes (typically less than 30%), each of which belongs to only 2 or 3 communities.
Statistical significance of communities in networks
- Physical Review E
"... Community structure is one of the main structural features of networks, revealing both their internal organization and the similarity of their elementary units. Despite the large variety of methods proposed to detect communities in graphs, there is a big need for multi-purpose techniques, able to ha ..."
Abstract
-
Cited by 58 (2 self)
- Add to MetaCart
(Show Context)
Community structure is one of the main structural features of networks, revealing both their internal organization and the similarity of their elementary units. Despite the large variety of methods proposed to detect communities in graphs, there is a big need for multi-purpose techniques, able to handle different types of datasets and the subtleties of community structure. In this paper we present OSLOM (Order Statistics Local Optimization Method), the first method capable to detect clusters in networks accounting for edge directions, edge weights, overlapping communities, hierarchies and community dynamics. It is based on the local optimization of a fitness function expressing the statistical significance of clusters with respect to random fluctuations, which is estimated with tools of Extreme and Order Statistics. OSLOM can be used alone or as a refinement procedure of partitions/covers delivered by other techniques. We have also implemented sequential algorithms combining OSLOM with other fast techniques, so that the community structure of very large networks can be uncovered. Our method has a comparable performance as the best existing algorithms on artificial benchmark graphs. Several applications on real networks are shown as well. OSLOM is implemented in a freely available software
Social Structure of Facebook Networks
, 2011
"... We study the social structure of Facebook “friendship ” networks at one hundred American colleges and universities at a single point in time, and we examine the roles of user attributes—gender, class year, major, high school, and residence—at these institutions. We investigate the influence of commo ..."
Abstract
-
Cited by 43 (2 self)
- Add to MetaCart
(Show Context)
We study the social structure of Facebook “friendship ” networks at one hundred American colleges and universities at a single point in time, and we examine the roles of user attributes—gender, class year, major, high school, and residence—at these institutions. We investigate the influence of common attributes at the dyad level in terms of assortativity coefficients and regression models. We then examine larger-scale groupings by detecting communities algorithmically and comparing them to network partitions based on the user characteristics. We thereby compare the relative importances of different characteristics at different institutions, finding for example that common high school is more important to the social organization of large institutions and that the importance of common major varies significantly between institutions. Our calculations illustrate how microscopic and macroscopic perspectives give complementary insights on the social organization at universities and suggest future studies to investigate such phenomena further.
Comparing community structure to characteristics in online collegiate social networks
- SIAM Review
, 2011
"... Abstract. We study the structure of social networks of students by examining the graphs of Facebook “friendships ” at five U.S. universities at a single point in time. We investigate the community structure of each single-institution network and employ visual and quantitative tools, including standa ..."
Abstract
-
Cited by 42 (5 self)
- Add to MetaCart
(Show Context)
Abstract. We study the structure of social networks of students by examining the graphs of Facebook “friendships ” at five U.S. universities at a single point in time. We investigate the community structure of each single-institution network and employ visual and quantitative tools, including standardized pair-counting methods, to measure the correlations between the network communities and a set of self-identified user characteristics (residence, class year, major, and high school). We review the basic properties and statistics of the employed pair-counting indices and recall, in simplified notation, a useful formula for the z-score of the Rand coefficient. Our study illustrates how to examine different instances of social networks constructed in similar environments, emphasizes the array of social forces that combine to form “communities, ” and leads to comparative observations about online social structures, which reflect offline social structures. We calculate the relative contributions of different characteristics to the community structure of individual universities and compare these relative contributions at different universities. For example, we examine the importance of common high school affiliation at large state universities and the varying degrees of influence that common major can have on the social structure at different universities.
Overlapping community detection at scale: a nonnegative matrix factorization approach
- In WSDM
, 2013
"... Network communities represent basic structures for understanding the organization of real-world networks. A community (also re-ferred to as a module or a cluster) is typically thought of as a group of nodes with more connections amongst its members than between its members and the remainder of the n ..."
Abstract
-
Cited by 41 (5 self)
- Add to MetaCart
(Show Context)
Network communities represent basic structures for understanding the organization of real-world networks. A community (also re-ferred to as a module or a cluster) is typically thought of as a group of nodes with more connections amongst its members than between its members and the remainder of the network. Communities in networks also overlap as nodes belong to multiple clusters at once. Due to the difficulties in evaluating the detected communities and the lack of scalable algorithms, the task of overlapping community detection in large networks largely remains an open problem. In this paper we present BIGCLAM (Cluster Affiliation Model for Big Networks), an overlapping community detection method that scales to large networks of millions of nodes and edges. We build on a novel observation that overlaps between communities are densely connected. This is in sharp contrast with present com-munity detection methods which implicitly assume that overlaps between communities are sparsely connected and thus cannot prop-erly extract overlapping communities in networks. In this paper, we develop a model-based community detection algorithm that can detect densely overlapping, hierarchically nested as well as non-overlapping communities in massive networks. We evaluate our al-gorithm on 6 large social, collaboration and information networks with ground-truth community information. Experiments show state of the art performance both in terms of the quality of detected com-munities as well as in speed and scalability of our algorithm.