Results 1  10
of
23
Detecting Communities in Social Networks using MaxMin Modularity
"... Many datasets can be described in the form of graphs or networks where nodes in the graph represent entities and edges represent relationships between pairs of entities. A common property of these networks is their community structure, considered as clusters of densely connected groups of vertices, ..."
Abstract

Cited by 33 (4 self)
 Add to MetaCart
Many datasets can be described in the form of graphs or networks where nodes in the graph represent entities and edges represent relationships between pairs of entities. A common property of these networks is their community structure, considered as clusters of densely connected groups of vertices, with only sparser connections between groups. The identification of such communities relies on some notion of clustering or density measure, which defines the communities that can be found. However, previous community detection methods usually apply the same structural measure on all kinds of networks, despite their distinct dissimilar features. In this paper, we present a new community mining measure, MaxMin Modularity, which considers both connected pairs and criteria defined by domain experts in finding communities, and then specify a hierarchical clustering algorithm to detect communities in networks. When applied to real world networks for which the community structures are already known, our method shows improvement over previous algorithms. In addition, when applied to randomly generated networks for which we only have approximate information about communities, it gives promising results which shows the algorithm’s robustness against noise.
VSMARTJoin: A Scalable MapReduce Framework for AllPair Similarity Joins of Multisets and Vectors
"... This work proposes VSMARTJoin, a scalable MapReducebased framework for discovering all pairs of similar entities. The VSMARTJoin framework is applicable to sets, multisets, and vectors. VSMARTJoin is motivated by the observed skew in the underlying distributions of Internet traffic, and is a f ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
(Show Context)
This work proposes VSMARTJoin, a scalable MapReducebased framework for discovering all pairs of similar entities. The VSMARTJoin framework is applicable to sets, multisets, and vectors. VSMARTJoin is motivated by the observed skew in the underlying distributions of Internet traffic, and is a family of 2stage algorithms, where the first stage computes and joins the partial results, and the second stage computes the similarity exactly for all candidate pairs. The VSMARTJoin algorithms are very efficient and scalable in the number of entities, as well as their cardinalities. They were up to 30 times faster than the state of the art algorithm, VCL, when compared on a real dataset of a small size. We also established the scalability of the proposed algorithms by running them on a dataset of a realistic size, on which VCL never succeeded to finish. Experiments were run using real datasets of IPs and cookies, where each IP is represented as a multiset of cookies, and the goal is to discover similar IPs to identify Internet proxies. 1.
Demon: a localfirst discovery method for overlapping communities
 In KDD
, 2012
"... Community discovery in complex networks is an interesting problem with a number of applications, especially in the knowledge extraction task in social and information networks. However, many large networks often lack a particular community organization at a global level. In these cases, tradition ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
(Show Context)
Community discovery in complex networks is an interesting problem with a number of applications, especially in the knowledge extraction task in social and information networks. However, many large networks often lack a particular community organization at a global level. In these cases, traditional graph partitioning algorithms fail to let the latent knowledge embedded in modular structure emerge, because they impose a topdown global view of a network. We propose here a simple localfirst approach to community discovery, able to unveil the modular organization of real complex networks. This is achieved by democratically letting each node vote for the communities it sees surrounding it in its limited view of the global system, i.e. its ego neighborhood, using a label propagation algorithm; finally, the local communities are merged into a global collection. We tested this intuition against the stateoftheart overlapping and nonoverlapping community discovery methods, and found that our new method clearly outperforms the others in the quality of the obtained communities, evaluated by using the extracted communities to predict the metadata about the nodes of several real world networks. We also show how our method is deterministic, fully incremental, and has a limited time complexity, so that it can be used on webscale real networks.
Subgraph detection using eigenvector L1 norms
 In NIPS 2010
, 2010
"... When working with network datasets, the theoretical framework of detection theory for Euclidean vector spaces no longer applies. Nevertheless, it is desirable to determine the detectability of small, anomalous graphs embedded into background networks with known statistical properties. Casting the pr ..."
Abstract

Cited by 16 (7 self)
 Add to MetaCart
(Show Context)
When working with network datasets, the theoretical framework of detection theory for Euclidean vector spaces no longer applies. Nevertheless, it is desirable to determine the detectability of small, anomalous graphs embedded into background networks with known statistical properties. Casting the problem of subgraph detection in a signal processing context, this article provides a framework and empirical results that elucidate a “detection theory ” for graphvalued data. Its focus is the detection of anomalies in unweighted, undirected graphs through L1 properties of the eigenvectors of the graph’s socalled modularity matrix. This metric is observed to have relatively low variance for certain categories of randomlygenerated graphs, and to reveal the presence of an anomalous subgraph with reasonable reliability when the anomaly is not wellcorrelated with stronger portions of the background graph. An analysis of subgraphs in real network datasets confirms the efficacy of this approach. 1
An ensemble learning strategy for graph clustering
 CONTEMPORARY MATHEMATICS
, 2013
"... ..."
(Show Context)
Community Detection via Maximization of Modularity and Its Variants
, 2014
"... In this paper, we first discuss the definition of modularity (Q) used as a metric for community quality and then we review the modularity maximization approaches which were used for community detection in the last decade. Then, we discuss two opposite yet coexisting problems of modularity optimizat ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
In this paper, we first discuss the definition of modularity (Q) used as a metric for community quality and then we review the modularity maximization approaches which were used for community detection in the last decade. Then, we discuss two opposite yet coexisting problems of modularity optimization: in some cases, it tends to favor small communities over large ones while in others, large communities over small ones (so called the resolution limit problem). Next, we overview several community quality metrics proposed to solve the resolution limit problem and discuss Modularity Density (Qds) which simultaneously avoids the two problems of modularity. Finally, we introduce two novel finetuned community detection algorithms that iteratively attempt to improve the community quality measurements by splitting and merging the given network community structure. The first of them, referred to as Finetuned Q, is based on modularity (Q) while the second one is based on Modularity Density (Qds) and denoted as Finetuned Qds. Then, we compare the greedy algorithm of modularity maximization (denoted as Greedy Q), Finetuned Q, and Finetuned Qds on four real networks, and also on the classical clique network and the LFR benchmark networks, each of which is instantiated by a wide range of parameters. The results indicate that Finetuned Qds is the most effective among the three algorithms discussed. Moreover, we show that Finetuned Qds can be applied to the communities detected by other algorithms to significantly improve their results.
A Latent Community Topic Analysis: Integration of Community Discovery with Topic Modeling
"... This paper studies the problem of latent community topic analysis in textassociated graphs. With the development of social media, a lot of usergenerated content is available with user networks. Along with rich information in networks, user graphs can be extended with text information associated wi ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
This paper studies the problem of latent community topic analysis in textassociated graphs. With the development of social media, a lot of usergenerated content is available with user networks. Along with rich information in networks, user graphs can be extended with text information associated with nodes. Topic modeling is a classic problem in text mining and it is interesting to discover the latent topics in textassociated graphs. Different from traditional topic modeling methods considering links, we incorporate community discovery into topic analysis in textassociated graphs to guarantee the topical coherence in the communities so that users in the same community are closely linked to each other and share common latent topics. We handle topic modeling and community discovery in the same framework. In our model we separate the concepts of community and topic, so one community can correspond to multiple topics and multiple communities can share the same topic. We compare different methods and perform extensive experiments on two real datasets. The results confirm our hypothesis that topics could help understand community structure, while community structure could help model topics.
A topical link model for community discovery in textual interaction graph
 In Proc. of 2010 ACM Conf. on Information and Knowledge Management
, 2010
"... This paper is concerned with community discovery in textual interaction graph, where the links between entities are indicated by textual documents. Specifically, we propose a Topical Link Model(TLM), which leverages Hierarchical Dirichlet Process(HDP) to introduce hidden topical variable of the link ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
This paper is concerned with community discovery in textual interaction graph, where the links between entities are indicated by textual documents. Specifically, we propose a Topical Link Model(TLM), which leverages Hierarchical Dirichlet Process(HDP) to introduce hidden topical variable of the links. Other than the use of links, TLM can look into the documents on the links in detail to recover sound communities. Moreover, TLM is a nonparametric model, which is able to learn the number of communities from the data. Extensive experiments on two real world corpora show TLM outperforms two stateoftheart baseline models, which verify the effectiveness of TLM in determining the proper number of communities and generating sound communities.
Features
"... Many methods have been proposed for community detection in networks, but most of them do not take into account additional information on the nodes that is often available in practice. In this paper, we propose a new joint community detection criterion that use both the network and the features to de ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Many methods have been proposed for community detection in networks, but most of them do not take into account additional information on the nodes that is often available in practice. In this paper, we propose a new joint community detection criterion that use both the network and the features to detect community structure. One advantage our method has over existing joint detection approaches is the flexibility of learning the impact of different features which may differ across communities. Another advantage is the flexibility of choosing the amount of influence the feature information has on communities. The method is asymptotically consistent under the block model with additional assumptions on the feature distributions, and performs well on simulated and real networks. 1
Multiagent random walks for local clustering on graphs
"... Abstract—We consider the problem of local graph clustering where the aim is to discover the local cluster corresponding to a point of interest. The most popular algorithms to solve this problem start a random walk at the point of interest and let it run until some stopping criterion is met. The vert ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract—We consider the problem of local graph clustering where the aim is to discover the local cluster corresponding to a point of interest. The most popular algorithms to solve this problem start a random walk at the point of interest and let it run until some stopping criterion is met. The vertices visited are then considered the local cluster. We suggest a more powerful alternative, the multiagent random walk. It consists of several “agents ” connected by a fixed rope of length l. All agents move independently like a standard random walk on the graph, but they are constrained to have distance at most l from each other. The main insight is that for several agents it is harder to simultaneously travel over the bottleneck of a graph than for just one agent. Hence, the multiagent random walk has less tendency to mistakenly merge two different clusters than the original random walk. In our paper we analyze the multiagent random walk theoretically and compare it experimentally to the major local graph clustering algorithms from the literature. We find that our multiagent random walk consistently outperforms these algorithms.