Results 1 - 10
of
501
Statistical properties of community structure in large social and information networks
"... A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structur ..."
Abstract
-
Cited by 246 (14 self)
- Add to MetaCart
(Show Context)
A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structural properties of such sets of nodes. We define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales, and we study over 70 large sparse real-world networks taken from a wide range of application domains. Our results suggest a significantly more refined picture of community structure in large real-world networks than has been appreciated previously. Our most striking finding is that in nearly every network dataset we examined, we observe tight but almost trivial communities at very small scales, and at larger size scales, the best possible communities gradually “blend in ” with the rest of the network and thus become less “community-like.” This behavior is not explained, even at a qualitative level, by any of the commonly-used network generation models. Moreover, this behavior is exactly the opposite of what one would expect based on experience with and intuition from expander graphs, from graphs that are well-embeddable in a low-dimensional structure, and from small social networks that have served as testbeds of community detection algorithms. We have found, however, that a generative model, in which new edges are added via an iterative “forest fire” burning process, is able to produce graphs exhibiting a network community structure similar to our observations.
Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters
, 2008
"... A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins wit ..."
Abstract
-
Cited by 208 (17 self)
- Add to MetaCart
(Show Context)
A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins with the premise that a community or a cluster should be thought of as a set of nodes that has more and/or better connections between its members than to the remainder of the network. In this paper, we explore from a novel perspective several questions related to identifying meaningful communities in large social and information networks, and we come to several striking conclusions. Rather than defining a procedure to extract sets of nodes from a graph and then attempt to interpret these sets as a “real ” communities, we employ approximation algorithms for the graph partitioning problem to characterize as a function of size the statistical and structural properties of partitions of graphs that could plausibly be interpreted as communities. In particular, we define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales. We study over 100 large real-world networks, ranging from traditional and on-line social networks, to technological and information networks and
Relational learning via latent social dimensions, in 'KDD '09
- Proceedings di of the 15th ACM SIGKDD international ti conference on Knowledge
, 2009
"... Social media such as blogs, Facebook, Flickr, etc., presents data in a network format rather than classical IID distribution. To address the interdependency among data instances, relational learning has been proposed, and collective inference based on network connectivity is adopted for prediction. ..."
Abstract
-
Cited by 86 (28 self)
- Add to MetaCart
(Show Context)
Social media such as blogs, Facebook, Flickr, etc., presents data in a network format rather than classical IID distribution. To address the interdependency among data instances, relational learning has been proposed, and collective inference based on network connectivity is adopted for prediction. However, the connections in social media are often multi-dimensional. An actor can connect to another actor due to different factors, e.g., alumni, colleagues, living in the same city or sharing similar interest, etc. Collective inference normally does not differentiate these connections. In this work, we propose to extract latent social dimensions based on network information first, and then utilize them as features for discriminative learning. These social dimensions describe different affiliations of social actors hidden in the network, and the subsequent discriminative learning can automatically determine which affiliations are better aligned with the class labels. Such a scheme is preferred when multiple diverse relations are associated with the same network. We conduct extensive experiments on social media data (one from a real-world blog site and the other from a popular content sharing site). Our model outperforms representative relational learning methods based on collective inference, especially when few labeled data are available. The sensitivity of this model and its connection to existing methods are also carefully examined.
Unveiling Facebook: A Measurement Study of Social Network Based Applications
"... Online social networking sites such as Facebook and MySpace have become increasingly popular, with close to 500 million users as of August 2008. The introduction of the Facebook Developer Platform and OpenSocial allows thirdparty developers to launch their own applications for the existing massive u ..."
Abstract
-
Cited by 75 (3 self)
- Add to MetaCart
(Show Context)
Online social networking sites such as Facebook and MySpace have become increasingly popular, with close to 500 million users as of August 2008. The introduction of the Facebook Developer Platform and OpenSocial allows thirdparty developers to launch their own applications for the existing massive user base. The viral growth of these social applications can potentially influence how content is produced and consumed in the future Internet. To gain a better understanding, we conducted a largescale measurement study of the usage characteristics of online social network based applications. In particular, we developed and launched three Facebook applications, which have achieved a combined subscription base of over 8 million users. Using the rich dataset gathered through these applications, we analyze the aggregate workload characteristics (including temporal and geographical distributions) as well as the structure of user interactions. We explore the existence of ‘communities’, with high degree of interaction within a community and limited interaction outside the community. We find that a small fraction of users account for the majority of activity within the context of our Facebook applications and a small number of applications account for the majority of users on Facebook. Furthermore, user response times for Facebook applications are independent of source/destination user locality. We also investigate distinguishing characteristics of social gaming applications. To the best of our knowledge, this is the first study analyzing user activities on online social applications.
Overlapping community detection in networks: the state of the art and comparative study
- ACM Comput. Surv
, 2012
"... This paper reviews the state of the art in overlapping community detection algorithms, quality measures, and benchmarks. A thorough comparison of different algorithms (a total of fourteen) is provided. In addition to community level evaluation, we propose a framework for evaluating algorithms ’ abil ..."
Abstract
-
Cited by 74 (6 self)
- Add to MetaCart
This paper reviews the state of the art in overlapping community detection algorithms, quality measures, and benchmarks. A thorough comparison of different algorithms (a total of fourteen) is provided. In addition to community level evaluation, we propose a framework for evaluating algorithms ’ ability to detect overlapping nodes, which helps to assess over-detection and underdetection. After considering community level detection performance measured by Normalized Mutual Information, the Omega index, and node level detection performance measured by F-score, we reached the following conclusions. For low overlapping density networks, SLPA, OSLOM, Game and COPRA offer better performance than the other tested algorithms. For networks with high overlapping density and high overlapping diversity, both SLPA and Game provide relatively stable performance. However, test results also suggest that the detection in such networks is still not yet fully resolved. A common feature observed by various algorithms in real-world networks is the relatively small fraction of overlapping nodes (typically less than 30%), each of which belongs to only 2 or 3 communities.
Political polarization on Twitter
- In ICWSM
, 2011
"... In this study we investigate how social media shape the networked public sphere and facilitate communication be-tween communities with different political orientations. We examine two networks of political communication on Twit-ter, comprised of more than 250,000 tweets from the six weeks leading up ..."
Abstract
-
Cited by 73 (7 self)
- Add to MetaCart
In this study we investigate how social media shape the networked public sphere and facilitate communication be-tween communities with different political orientations. We examine two networks of political communication on Twit-ter, comprised of more than 250,000 tweets from the six weeks leading up to the 2010 U.S. congressional midterm elections. Using a combination of network clustering algo-rithms and manually-annotated data we demonstrate that the network of political retweets exhibits a highly segregated par-tisan structure, with extremely limited connectivity between left- and right-leaning users. Surprisingly this is not the case for the user-to-user mention network, which is dominated by a single politically heterogeneous cluster of users in which ideologically-opposed individuals interact at a much higher rate compared to the network of retweets. To explain the dis-tinct topologies of the retweet and mention networks we con-jecture that politically motivated individuals provoke inter-action by injecting partisan content into information streams whose primary audience consists of ideologically-opposed users. We conclude with statistical evidence in support of this hypothesis. 1
An Algorithm to Find Overlapping Community Structure in Networks
"... Abstract. Recent years have seen the development of many graph clustering algorithms, which can identify community structure in networks. The vast majority of these only find disjoint communities, but in many real-world networks communities overlap to some extent. We present a new algorithm for disc ..."
Abstract
-
Cited by 71 (4 self)
- Add to MetaCart
(Show Context)
Abstract. Recent years have seen the development of many graph clustering algorithms, which can identify community structure in networks. The vast majority of these only find disjoint communities, but in many real-world networks communities overlap to some extent. We present a new algorithm for discovering overlapping communities in networks, by extending Girvan and Newman’s well-known algorithm based on the betweenness centrality measure. Like the original algorithm, ours performs hierarchical clustering — partitioning a network into any desired number of clusters — but allows them to overlap. Experiments confirm good performance on randomly generated networks based on a known overlapping community structure, and interesting results have also been obtained on a range of real-world networks. 1
Analysis of the structure of complex networks at different resolution levels.
- New J. of Phys.
, 2008
"... ..."
(Show Context)
Latent social structure in open source projects
- PROCEEDINGS OF THE 16TH ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON FOUNDATIONS OF SOFTWARE ENGINEERING
, 2008
"... Commercial software project managers design project organizational structure carefully, mindful of available skills, division of labour, geographical boundaries, etc. These organizational “cathedrals ” are to be contrasted with the “bazaarlike” nature of Open Source Software (OSS) Projects, which ha ..."
Abstract
-
Cited by 51 (8 self)
- Add to MetaCart
(Show Context)
Commercial software project managers design project organizational structure carefully, mindful of available skills, division of labour, geographical boundaries, etc. These organizational “cathedrals ” are to be contrasted with the “bazaarlike” nature of Open Source Software (OSS) Projects, which have no pre-designed organizational structure. Any structure that exists is dynamic, self-organizing, latent, and usually not explicitly stated. However, in large, complex, successful, OSS projects, we expect that sub-communities will form organically within the “bazaar ” of developer teams. Studying these sub-communities, and their behavior can shed light on how successful OSS projects self-organize. This phenomenon could even hold important lessons for how commercial software teams might be organized. Building on wellestablished techniques for detecting community structure in complex networks, we extract and evaluate latent subcommunities from the email social network of several projects: Apache HTTPD, Python, PostgresSQL, Perl, and Apache ANT. We then validate them with software development activity history. Our results show that subcommunities do indeed form within these projects. We find, in other words, that “chapels ” (if not cathedrals) spontaneously arise within the bazaar as OSS systems and the teams evolve. We also find that these subgroups manifest most strongly in technical discussions, and are significantly connected with collaboration behaviour. 1.