Results 1 - 10
of
112
Learning to Discover Social Circles in Ego Networks
"... Our personal social networks are big and cluttered, and currently there is no good way to organize them. Social networking sites allow users to manually categorize their friends into social circles (e.g. ‘circles ’ on Google+, and ‘lists ’ on Facebook and Twitter), however they are laborious to cons ..."
Abstract
-
Cited by 95 (5 self)
- Add to MetaCart
(Show Context)
Our personal social networks are big and cluttered, and currently there is no good way to organize them. Social networking sites allow users to manually categorize their friends into social circles (e.g. ‘circles ’ on Google+, and ‘lists ’ on Facebook and Twitter), however they are laborious to construct and must be updated whenever a user’s network grows. We define a novel machine learning task of identifying users ’ social circles. We pose the problem as a node clustering problem on a user’s ego-network, a network of connections between her friends. We develop a model for detecting circles that combines network structure as well as user profile information. For each circle we learn its members and the circle-specific user profile similarity metric. Modeling node membership to multiple circles allows us to detect overlapping as well as hierarchically nested circles. Experiments show that our model accurately identifies circles on a diverse set of data from Facebook, Google+, and Twitter for all of which we obtain hand-labeled ground-truth. 1
Overlapping community detection at scale: a nonnegative matrix factorization approach
- In WSDM
, 2013
"... Network communities represent basic structures for understanding the organization of real-world networks. A community (also re-ferred to as a module or a cluster) is typically thought of as a group of nodes with more connections amongst its members than between its members and the remainder of the n ..."
Abstract
-
Cited by 41 (5 self)
- Add to MetaCart
Network communities represent basic structures for understanding the organization of real-world networks. A community (also re-ferred to as a module or a cluster) is typically thought of as a group of nodes with more connections amongst its members than between its members and the remainder of the network. Communities in networks also overlap as nodes belong to multiple clusters at once. Due to the difficulties in evaluating the detected communities and the lack of scalable algorithms, the task of overlapping community detection in large networks largely remains an open problem. In this paper we present BIGCLAM (Cluster Affiliation Model for Big Networks), an overlapping community detection method that scales to large networks of millions of nodes and edges. We build on a novel observation that overlaps between communities are densely connected. This is in sharp contrast with present com-munity detection methods which implicitly assume that overlaps between communities are sparsely connected and thus cannot prop-erly extract overlapping communities in networks. In this paper, we develop a model-based community detection algorithm that can detect densely overlapping, hierarchically nested as well as non-overlapping communities in massive networks. We evaluate our al-gorithm on 6 large social, collaboration and information networks with ground-truth community information. Experiments show state of the art performance both in terms of the quality of detected com-munities as well as in speed and scalability of our algorithm.
Community-Affiliation Graph Model for Overlapping Network Community Detection. Extended version
"... Abstract—One of the main organizing principles in real-world networks is that of network communities, where sets of nodes organize into densely linked clusters. Communities in networks often overlap as nodes can belong to multiple communities at once. Identifying such overlapping communities is cruc ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
(Show Context)
Abstract—One of the main organizing principles in real-world networks is that of network communities, where sets of nodes organize into densely linked clusters. Communities in networks often overlap as nodes can belong to multiple communities at once. Identifying such overlapping communities is crucial for the understanding the structure as well as the function of real-world networks. Even though community structure in networks has been widely studied in the past, practically all research makes an implicit assumption that overlaps between communities are less densely connected than the non-overlapping parts themselves. Here we validate this assumption on 6 large scale social, collaboration and information networks where nodes explicitly state their community memberships. By examining such ground-truth communities we find that the community overlaps are more densely connected than the non-overlapping parts, which is in sharp contrast to the conventional wisdom that community overlaps are more sparsely connected than the communities themselves. Practially all existing community detection methods fail to detect communities with dense overlaps. We propose Community-Affiliation Graph Model, a model-based commu-nity detection method that builds on bipartite node-community affiliation networks. Our method successfully captures over-lapping, non-overlapping as well as hierarchically nested com-munities, and identifies relevant communities more accurately than the state-of-the-art methods in networks ranging from biological to social and information networks. I.
Social resilience in online communities: The autopsy of friendster
- In Proceedings of the First ACM Conference on Online Social Networks, COSN ’13
, 2013
"... We empirically analyze five online communities: Friendster, Livejournal, Facebook, Orkut, Myspace, to identify causes for the decline of social networks. We define social resilience as the ability of a community to withstand changes. We do not argue about the cause of such changes, but concentrate o ..."
Abstract
-
Cited by 13 (5 self)
- Add to MetaCart
(Show Context)
We empirically analyze five online communities: Friendster, Livejournal, Facebook, Orkut, Myspace, to identify causes for the decline of social networks. We define social resilience as the ability of a community to withstand changes. We do not argue about the cause of such changes, but concentrate on their impact. Changes may cause users to leave, which may trigger further leaves of others who lost connection to their friends. This may lead to cascades of users leaving. A social network is said to be resilient if the size of such cascades can be limited. To quantify resilience, we use the k-core analysis, to identify subsets of the network in which all users have at least k friends. These connections generate benefits (b) for each user, which have to outweigh the costs (c) of being a member of the network. If this difference is not positive, users leave. After all cascades, the remaining network is the k-core of the original network determined by the cost-to-benefit (c/b) ratio. By analysing the cumulative distribution of k-cores we are able to calculate the number of users remaining in each community. This allows us to infer the impact of the c/b ratio on the resilience of these online communities. We find that the different online communities have different k-core distributions. Consequently, similar changes in the c/b ratio have a different impact on the amount of active users. As a case study, we focus on the evolution of Friendster. We identify time periods when new users entering the network observed an insufficient c/b ratio. This measure can be seen as a precursor of the later collapse of the community. Our analysis can be applied to estimate the impact of changes in the user interface, which may temporarily increase the c/b ratio, thus posing a threat for the community to shrink, or even to collapse. 1
Fast Detection of Overlapping Communities via Online Tensor Methods on GPUs
"... We present a scalable tensor-based approach for detecting hidden overlapping communities under the mixed membership stochastic block model. We employ stochastic gradient descent for performing tensor decompositions, which provides flexibility to tradeoff node sub-sampling with accuracy. Our GPU impl ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
(Show Context)
We present a scalable tensor-based approach for detecting hidden overlapping communities under the mixed membership stochastic block model. We employ stochastic gradient descent for performing tensor decompositions, which provides flexibility to tradeoff node sub-sampling with accuracy. Our GPU implementation of the tensor-based approach is extremely fast and scalable, and involves a careful optimization of GPU-CPU storage and communication. We validate our results on datasets from popular social networks (Facebook, Yelp and DBLP), where ground truth is available, using notions of p-values and false discovery rates, and obtain high accuracy for membership recovery. We compare our results, both in terms of execution time and accuracy, to the state-of-the-art algorithms such as the variational method, and report better performance. For instance, on the Yelp network consisting of about 40,000 nodes and 500 communities, we recover the latent communities in under 30 minutes, and on the DBLP network consisting of about 120,000 nodes and 500 communities, we recover the latent communities in about 2.8 hours. In comparison, the variational method takes more than an order of magnitude higher execution time on the same datasets.
New metrics of quality of network community structure
- ASE Human Journal
, 2013
"... Modularity is widely used to effectively measure the strength of the community structure found by community detection algorithms. However, modularity maximization suffers from two opposite yet coexisting problems: in some cases, it tends to favor small communities over large ones while in others, la ..."
Abstract
-
Cited by 8 (8 self)
- Add to MetaCart
(Show Context)
Modularity is widely used to effectively measure the strength of the community structure found by community detection algorithms. However, modularity maximization suffers from two opposite yet coexisting problems: in some cases, it tends to favor small communities over large ones while in others, large communities over small ones. The latter tendency is known in the literature as the resolution limit problem. To address them, we propose to modify modularity by subtracting from it the fraction of edges connecting nodes of different communities and by including community density into modularity. We refer to the modified metric as Modularity Density and we demonstrate that it indeed resolves both problems mentioned above. We describe the motivation for introducing this metric by using intuitively clear and simple examples. We also prove that this new metric solves the resolution limit problem. Finally, we discuss the results of applying this metric, modularity, and several other popular community quality metrics to two real dynamic networks. The results imply that Modularity Density is consistent with all the community quality measurements but not modularity, which suggests that Modularity Density is an improved measurement of the community quality compared to modularity. I
Scalable and High Performance Betweenness Centrality on the GPU
- in Proceedings of the 26th ACM/IEEE International Conference on High Performance Computing, Networking, Storage, and Analysis (SC
, 2014
"... Abstract—Graphs that model social networks, numerical sim-ulations, and the structure of the Internet are enormous and cannot be manually inspected. A popular metric used to analyze these networks is betweenness centrality, which has applications in community detection, power grid contingency analys ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
(Show Context)
Abstract—Graphs that model social networks, numerical sim-ulations, and the structure of the Internet are enormous and cannot be manually inspected. A popular metric used to analyze these networks is betweenness centrality, which has applications in community detection, power grid contingency analysis, and the study of the human brain. However, these analyses come with a high computational cost that prevents the examination of large graphs of interest. Prior GPU implementations suffer from large local data struc-tures and inefficient graph traversals that limit scalability and per-formance. Here we present several hybrid GPU implementations, providing good performance on graphs of arbitrary structure rather than just scale-free graphs as was done previously. We achieve up to 13x speedup on high-diameter graphs and an average of 2.71x speedup overall over the best existing GPU algorithm. We observe near linear speedup and performance exceeding tens of GTEPS when running betweenness centrality on 192 GPUs. Keywords—GPUs, Graph Algorithms, Parallel Algorithms I.
Distinguishing topical and social groups based on common identity and bond theory
- In WSDM
, 2013
"... Social groups play a crucial role in social media platforms because they form the basis for user participation and en-gagement. Groups are created explicitly by members of the community, but also form organically as members in-teract. Due to their importance, they have been studied widely (e.g., com ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
Social groups play a crucial role in social media platforms because they form the basis for user participation and en-gagement. Groups are created explicitly by members of the community, but also form organically as members in-teract. Due to their importance, they have been studied widely (e.g., community detection, evolution, activity, etc.). One of the key questions for understanding how such groups evolve is whether there are different types of groups and how they differ. In Sociology, theories have been proposed to help explain how such groups form. In particular, the com-mon identity and common bond theory states that people join groups based on identity (i.e., interest in the topics dis-cussed) or bond attachment (i.e., social relationships). The theory has been applied qualitatively to small groups to clas-sify them as either topical or social. We use the identity and bond theory to define a set of features to classify groups into those two categories. Using a dataset from Flickr, we extract user-defined groups and automatically-detected groups, ob-tained from a community detection algorithm. We discuss the process of manual labeling of groups into social or top-ical and present results of predicting the group label based on the defined features. We directly validate the predictions of the theory showing that the metrics are able to forecast the group type with high accuracy. In addition, we present a comparison between declared and detected groups along topicality and sociality dimensions.
On Measuring the Quality of a Network Community Structure
"... Abstract—Modularity is widely used to effectively measure the strength of the community structure found by community detection algorithms. However, modularity maximization suffers from two opposite yet coexisting problems: in some cases, it tends to favor small communities over large ones while in o ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
(Show Context)
Abstract—Modularity is widely used to effectively measure the strength of the community structure found by community detection algorithms. However, modularity maximization suffers from two opposite yet coexisting problems: in some cases, it tends to favor small communities over large ones while in others, large communities over small ones. The latter tendency is known in the literature as the resolution limit problem. To address them, we propose to modify modularity by subtracting from it the fraction of edges connecting nodes of different communities and by including community density into modularity. We refer to the modified metric as Modularity Density and we demonstrate that it indeed resolves both problems mentioned above. We describe the motivation for introducing this metric by using intuitively clear and simple examples. We also discuss the results of applying this metric, modularity, and several other popular community quality metrics to two real dynamic networks. The results imply that Modularity Density is consistent with all the community quality measurements but not modularity, which suggests that Modularity Density is an improved measurement of the community quality compared to modularity. I.
A scalable approach to probabilistic latent space inference of large-scale networks
- in NIPS, 2013
"... Abstract We propose a scalable approach for making inference about latent spaces of large networks. With a succinct representation of networks as a bag of triangular motifs, a parsimonious statistical model, and an efficient stochastic variational inference algorithm, we are able to analyze real ne ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
(Show Context)
Abstract We propose a scalable approach for making inference about latent spaces of large networks. With a succinct representation of networks as a bag of triangular motifs, a parsimonious statistical model, and an efficient stochastic variational inference algorithm, we are able to analyze real networks with over a million vertices and hundreds of latent roles on a single machine in a matter of hours, a setting that is out of reach for many existing methods. When compared to the state-of-the-art probabilistic approaches, our method is several orders of magnitude faster, with competitive or improved accuracy for latent space recovery and link prediction.