• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Defining and Evaluating Network Communities based on Ground-truth. Extended version (2012)

by Jaewon Yang, Jure Leskovec
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 112
Next 10 →

Learning to Discover Social Circles in Ego Networks

by Julian Mcauley
"... Our personal social networks are big and cluttered, and currently there is no good way to organize them. Social networking sites allow users to manually categorize their friends into social circles (e.g. ‘circles ’ on Google+, and ‘lists ’ on Facebook and Twitter), however they are laborious to cons ..."
Abstract - Cited by 95 (5 self) - Add to MetaCart
Our personal social networks are big and cluttered, and currently there is no good way to organize them. Social networking sites allow users to manually categorize their friends into social circles (e.g. ‘circles ’ on Google+, and ‘lists ’ on Facebook and Twitter), however they are laborious to construct and must be updated whenever a user’s network grows. We define a novel machine learning task of identifying users ’ social circles. We pose the problem as a node clustering problem on a user’s ego-network, a network of connections between her friends. We develop a model for detecting circles that combines network structure as well as user profile information. For each circle we learn its members and the circle-specific user profile similarity metric. Modeling node membership to multiple circles allows us to detect overlapping as well as hierarchically nested circles. Experiments show that our model accurately identifies circles on a diverse set of data from Facebook, Google+, and Twitter for all of which we obtain hand-labeled ground-truth. 1
(Show Context)

Citation Context

...ends. We aim to discover circle memberships and to find common properties around which circles form. However, different circles overlap heavily, i.e., alters belong to multiple circles simultaneously =-=[1, 21, 28, 29]-=-, and many circles are hierarchically nested in larger ones (Figure 1). Thus it is important to model an alter’s memberships to multiple circles. Secondly, we expect that each circle is not only dense...

Overlapping community detection at scale: a nonnegative matrix factorization approach

by Jaewon Yang, Jure Leskovec - In WSDM , 2013
"... Network communities represent basic structures for understanding the organization of real-world networks. A community (also re-ferred to as a module or a cluster) is typically thought of as a group of nodes with more connections amongst its members than between its members and the remainder of the n ..."
Abstract - Cited by 41 (5 self) - Add to MetaCart
Network communities represent basic structures for understanding the organization of real-world networks. A community (also re-ferred to as a module or a cluster) is typically thought of as a group of nodes with more connections amongst its members than between its members and the remainder of the network. Communities in networks also overlap as nodes belong to multiple clusters at once. Due to the difficulties in evaluating the detected communities and the lack of scalable algorithms, the task of overlapping community detection in large networks largely remains an open problem. In this paper we present BIGCLAM (Cluster Affiliation Model for Big Networks), an overlapping community detection method that scales to large networks of millions of nodes and edges. We build on a novel observation that overlaps between communities are densely connected. This is in sharp contrast with present com-munity detection methods which implicitly assume that overlaps between communities are sparsely connected and thus cannot prop-erly extract overlapping communities in networks. In this paper, we develop a model-based community detection algorithm that can detect densely overlapping, hierarchically nested as well as non-overlapping communities in massive networks. We evaluate our al-gorithm on 6 large social, collaboration and information networks with ground-truth community information. Experiments show state of the art performance both in terms of the quality of detected com-munities as well as in speed and scalability of our algorithm.

Community-Affiliation Graph Model for Overlapping Network Community Detection. Extended version

by Jaewon Yang, Jure Leskovec
"... Abstract—One of the main organizing principles in real-world networks is that of network communities, where sets of nodes organize into densely linked clusters. Communities in networks often overlap as nodes can belong to multiple communities at once. Identifying such overlapping communities is cruc ..."
Abstract - Cited by 15 (3 self) - Add to MetaCart
Abstract—One of the main organizing principles in real-world networks is that of network communities, where sets of nodes organize into densely linked clusters. Communities in networks often overlap as nodes can belong to multiple communities at once. Identifying such overlapping communities is crucial for the understanding the structure as well as the function of real-world networks. Even though community structure in networks has been widely studied in the past, practically all research makes an implicit assumption that overlaps between communities are less densely connected than the non-overlapping parts themselves. Here we validate this assumption on 6 large scale social, collaboration and information networks where nodes explicitly state their community memberships. By examining such ground-truth communities we find that the community overlaps are more densely connected than the non-overlapping parts, which is in sharp contrast to the conventional wisdom that community overlaps are more sparsely connected than the communities themselves. Practially all existing community detection methods fail to detect communities with dense overlaps. We propose Community-Affiliation Graph Model, a model-based commu-nity detection method that builds on bipartite node-community affiliation networks. Our method successfully captures over-lapping, non-overlapping as well as hierarchically nested com-munities, and identifies relevant communities more accurately than the state-of-the-art methods in networks ranging from biological to social and information networks. I.
(Show Context)

Citation Context

...ngs suggest densely connected community overlaps. Top: network; Bottom: corresponding adjacency matrix. alone [8]. The understanding and models of network communities has evolved over time [8], [17], =-=[30]-=-. Early works on network community detection were heavily influenced by the research on the strength of weak ties [11]. This lead researchers to think of networks as consisting of dense clusters that ...

Social resilience in online communities: The autopsy of friendster

by David Garcia, Pavlin Mavrodiev, Frank Schweitzer, David Garcia, Pavlin Mavrodiev, Frank Schweitzer - In Proceedings of the First ACM Conference on Online Social Networks, COSN ’13 , 2013
"... We empirically analyze five online communities: Friendster, Livejournal, Facebook, Orkut, Myspace, to identify causes for the decline of social networks. We define social resilience as the ability of a community to withstand changes. We do not argue about the cause of such changes, but concentrate o ..."
Abstract - Cited by 13 (5 self) - Add to MetaCart
We empirically analyze five online communities: Friendster, Livejournal, Facebook, Orkut, Myspace, to identify causes for the decline of social networks. We define social resilience as the ability of a community to withstand changes. We do not argue about the cause of such changes, but concentrate on their impact. Changes may cause users to leave, which may trigger further leaves of others who lost connection to their friends. This may lead to cascades of users leaving. A social network is said to be resilient if the size of such cascades can be limited. To quantify resilience, we use the k-core analysis, to identify subsets of the network in which all users have at least k friends. These connections generate benefits (b) for each user, which have to outweigh the costs (c) of being a member of the network. If this difference is not positive, users leave. After all cascades, the remaining network is the k-core of the original network determined by the cost-to-benefit (c/b) ratio. By analysing the cumulative distribution of k-cores we are able to calculate the number of users remaining in each community. This allows us to infer the impact of the c/b ratio on the resilience of these online communities. We find that the different online communities have different k-core distributions. Consequently, similar changes in the c/b ratio have a different impact on the amount of active users. As a case study, we focus on the evolution of Friendster. We identify time periods when new users entering the network observed an insufficient c/b ratio. This measure can be seen as a precursor of the later collapse of the community. Our analysis can be applied to estimate the impact of changes in the user interface, which may temporarily increase the c/b ratio, thus posing a threat for the community to shrink, or even to collapse. 1
(Show Context)

Citation Context

...re its discontinuation. This dataset provides a high-quality snapshot of the large amount of user information that was publicly available on the site, including friend lists and interest-based groups =-=[31]-=-. In this article, we provide the first analysis of the social network topology of Friendster as a whole. Since some user profiles in Friendster were private, this dataset does not include their conne...

Fast Detection of Overlapping Communities via Online Tensor Methods on GPUs

by Furong Huang, Niranjan U N, Mohammad Umar Hakeem, Prateek Verma, Animashree An
"... We present a scalable tensor-based approach for detecting hidden overlapping communities under the mixed membership stochastic block model. We employ stochastic gradient descent for performing tensor decompositions, which provides flexibility to tradeoff node sub-sampling with accuracy. Our GPU impl ..."
Abstract - Cited by 10 (3 self) - Add to MetaCart
We present a scalable tensor-based approach for detecting hidden overlapping communities under the mixed membership stochastic block model. We employ stochastic gradient descent for performing tensor decompositions, which provides flexibility to tradeoff node sub-sampling with accuracy. Our GPU implementation of the tensor-based approach is extremely fast and scalable, and involves a careful optimization of GPU-CPU storage and communication. We validate our results on datasets from popular social networks (Facebook, Yelp and DBLP), where ground truth is available, using notions of p-values and false discovery rates, and obtain high accuracy for membership recovery. We compare our results, both in terms of execution time and accuracy, to the state-of-the-art algorithms such as the variational method, and report better performance. For instance, on the Yelp network consisting of about 40,000 nodes and 500 communities, we recover the latent communities in under 30 minutes, and on the DBLP network consisting of about 120,000 nodes and 500 communities, we recover the latent communities in about 2.8 hours. In comparison, the variational method takes more than an order of magnitude higher execution time on the same datasets.
(Show Context)

Citation Context

...ods based on random walk, which do not necessarily read the entire edge information in the beginning of. These methods are local in that they start with a single seed node and then traverse the graph =-=[YL12]-=-. We note that alternative vertex-based implementations of our tensor-based method are possible on Graphlab [LGK+10] or Pregel architectures [MAB+10], and we defer it to future work. Compared to the s...

New metrics of quality of network community structure

by Mingming Chen, Tommy Nguyen, Boleslaw K. Szymanski - ASE Human Journal , 2013
"... Modularity is widely used to effectively measure the strength of the community structure found by community detection algorithms. However, modularity maximization suffers from two opposite yet coexisting problems: in some cases, it tends to favor small communities over large ones while in others, la ..."
Abstract - Cited by 8 (8 self) - Add to MetaCart
Modularity is widely used to effectively measure the strength of the community structure found by community detection algorithms. However, modularity maximization suffers from two opposite yet coexisting problems: in some cases, it tends to favor small communities over large ones while in others, large communities over small ones. The latter tendency is known in the literature as the resolution limit problem. To address them, we propose to modify modularity by subtracting from it the fraction of edges connecting nodes of different communities and by including community density into modularity. We refer to the modified metric as Modularity Density and we demonstrate that it indeed resolves both problems mentioned above. We describe the motivation for introducing this metric by using intuitively clear and simple examples. We also prove that this new metric solves the resolution limit problem. Finally, we discuss the results of applying this metric, modularity, and several other popular community quality metrics to two real dynamic networks. The results imply that Modularity Density is consistent with all the community quality measurements but not modularity, which suggests that Modularity Density is an improved measurement of the community quality compared to modularity. I
(Show Context)

Citation Context

... 4.881 6.799 6.916 6.117 5.669 Conductance 0.23 0.176 0.114 0.1 0.0934 0.0933 0.0843 0.0955 0.102 0.107 0.104 number of Intra-edges, Contraction, the number of Inter-edges, Expansion, and Conductance =-=[10]-=-, which characterize how community-like is the connectivity structure of a given set of nodes. All of them rely on the intuition that communities are sets of nodes with many edges inside them and few ...

Scalable and High Performance Betweenness Centrality on the GPU

by Adam Mclaughlin, David A. Bader - in Proceedings of the 26th ACM/IEEE International Conference on High Performance Computing, Networking, Storage, and Analysis (SC , 2014
"... Abstract—Graphs that model social networks, numerical sim-ulations, and the structure of the Internet are enormous and cannot be manually inspected. A popular metric used to analyze these networks is betweenness centrality, which has applications in community detection, power grid contingency analys ..."
Abstract - Cited by 7 (4 self) - Add to MetaCart
Abstract—Graphs that model social networks, numerical sim-ulations, and the structure of the Internet are enormous and cannot be manually inspected. A popular metric used to analyze these networks is betweenness centrality, which has applications in community detection, power grid contingency analysis, and the study of the human brain. However, these analyses come with a high computational cost that prevents the examination of large graphs of interest. Prior GPU implementations suffer from large local data struc-tures and inefficient graph traversals that limit scalability and per-formance. Here we present several hybrid GPU implementations, providing good performance on graphs of arbitrary structure rather than just scale-free graphs as was done previously. We achieve up to 13x speedup on high-diameter graphs and an average of 2.71x speedup overall over the best existing GPU algorithm. We observe near linear speedup and performance exceeding tens of GTEPS when running betweenness centrality on 192 GPUs. Keywords—GPUs, Graph Algorithms, Parallel Algorithms I.
(Show Context)

Citation Context

...ons are shown in Table II. These graphs were taken from the 10th DIMACS Challenge [4], the University of Florida Sparse Matrix Collection [14], and the Stanford Network Analysis Platform (SNAP) [12], =-=[38]-=-. These benchmarks contain both real-world and randomly generated instances of graphs that correspond to a wide variety of practical applications and network structures. Although numerous approaches f...

Distinguishing topical and social groups based on common identity and bond theory

by Przemyslaw A. Grabowicz, Luca Maria, Aiello Víctor, M. Eguíluz Alej, Ro Jaimes - In WSDM , 2013
"... Social groups play a crucial role in social media platforms because they form the basis for user participation and en-gagement. Groups are created explicitly by members of the community, but also form organically as members in-teract. Due to their importance, they have been studied widely (e.g., com ..."
Abstract - Cited by 7 (0 self) - Add to MetaCart
Social groups play a crucial role in social media platforms because they form the basis for user participation and en-gagement. Groups are created explicitly by members of the community, but also form organically as members in-teract. Due to their importance, they have been studied widely (e.g., community detection, evolution, activity, etc.). One of the key questions for understanding how such groups evolve is whether there are different types of groups and how they differ. In Sociology, theories have been proposed to help explain how such groups form. In particular, the com-mon identity and common bond theory states that people join groups based on identity (i.e., interest in the topics dis-cussed) or bond attachment (i.e., social relationships). The theory has been applied qualitatively to small groups to clas-sify them as either topical or social. We use the identity and bond theory to define a set of features to classify groups into those two categories. Using a dataset from Flickr, we extract user-defined groups and automatically-detected groups, ob-tained from a community detection algorithm. We discuss the process of manual labeling of groups into social or top-ical and present results of predicting the group label based on the defined features. We directly validate the predictions of the theory showing that the metrics are able to forecast the group type with high accuracy. In addition, we present a comparison between declared and detected groups along topicality and sociality dimensions.
(Show Context)

Citation Context

...loyed in recent years to describe the structure of complex social systems [8], the need for a clearer assessment of the meaning of the detected clusters has been often expressed from different angles =-=[15, 31]-=-, but never completely satisfied. With our study we also contribute to shed light on this matter. To the best of our knowledge, this is the first attempt of formalization of the common identity and co...

On Measuring the Quality of a Network Community Structure

by Mingming Chen, Tommy Nguyen, Boleslaw K. Szymanski
"... Abstract—Modularity is widely used to effectively measure the strength of the community structure found by community detection algorithms. However, modularity maximization suffers from two opposite yet coexisting problems: in some cases, it tends to favor small communities over large ones while in o ..."
Abstract - Cited by 4 (4 self) - Add to MetaCart
Abstract—Modularity is widely used to effectively measure the strength of the community structure found by community detection algorithms. However, modularity maximization suffers from two opposite yet coexisting problems: in some cases, it tends to favor small communities over large ones while in others, large communities over small ones. The latter tendency is known in the literature as the resolution limit problem. To address them, we propose to modify modularity by subtracting from it the fraction of edges connecting nodes of different communities and by including community density into modularity. We refer to the modified metric as Modularity Density and we demonstrate that it indeed resolves both problems mentioned above. We describe the motivation for introducing this metric by using intuitively clear and simple examples. We also discuss the results of applying this metric, modularity, and several other popular community quality metrics to two real dynamic networks. The results imply that Modularity Density is consistent with all the community quality measurements but not modularity, which suggests that Modularity Density is an improved measurement of the community quality compared to modularity. I.
(Show Context)

Citation Context

...r experiments with Modularity Density, modularity, and other popular community quality metrics, including the number of Intra-edges, Contraction, the number of Inter-edges, Expansion, and Conductance =-=[8]-=-, on two real dynamic networks. The results show that Modularity Density is different from original modularity, but consistent with all those community quality measurements, which implies that Modular...

A scalable approach to probabilistic latent space inference of large-scale networks

by Junming Yin , Qirong Ho , Eric P Xing - in NIPS, 2013
"... Abstract We propose a scalable approach for making inference about latent spaces of large networks. With a succinct representation of networks as a bag of triangular motifs, a parsimonious statistical model, and an efficient stochastic variational inference algorithm, we are able to analyze real ne ..."
Abstract - Cited by 4 (2 self) - Add to MetaCart
Abstract We propose a scalable approach for making inference about latent spaces of large networks. With a succinct representation of networks as a bag of triangular motifs, a parsimonious statistical model, and an efficient stochastic variational inference algorithm, we are able to analyze real networks with over a million vertices and hundreds of latent roles on a single machine in a matter of hours, a setting that is out of reach for many existing methods. When compared to the state-of-the-art probabilistic approaches, our method is several orders of magnitude faster, with competitive or improved accuracy for latent space recovery and link prediction.
(Show Context)

Citation Context

... still require days to process modest networks of around 100, 000 nodes. To perform latent space analysis on at least million-node (if not larger) real social networks with many distinct latent roles =-=[24]-=-, one must design inferential mechanisms that scale in both the number of vertices N and the number of latent roles K. In this paper, we argue that the following three principles are crucial for succe...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University