#### DMCA

## Local Network Community Detection with Continuous Optimization of Conductance and Weighted Kernel K-Means Twan van Laarhoven (2016)

### Citations

1504 |
Community structure in social and biological networks
- Girvan, Newman
- 2002
(Show Context)
Citation Context ...rates networks from several different studies. We have constructed networks for Gavin et al. (2006), Krogan et al. (2006), Collins et al. (2007), Costanzo et al. (2010), Hoppins et al. (2011), as well as a network that is the union of all interaction networks confirmed by physical experiments. As ground truth communities we take the CYC2008 catalog of protein complexes for each of the networks (Pu et al., 2009). 4.2.4 Other Datasets Additionally we used some classical datasets with known communities: Zachary’s karate club Zachary (1977); Football: A network of American college football games (Girvan and Newman, 2002); Political books: A network of books about US politics (Krebs, 2004); and Political blogs: Hyperlinks between weblogs on US politics (Adamic and Glance, 2005). These datasets might not be very well suited for this problem, since they have very few communities. 4.3 Results In all our experiments we use a single seed node, drawn uniformly at random from the community. We have also performed experiments with multiple seeds; the results of those experiments can be found in the supplementary material. To keep the computation time manageable we have performed all experiments on a random sample of 1... |

811 | Community detection in graphs
- Fortunato
(Show Context)
Citation Context ...14). On large networks PGDc and EMc stay localized and produce communities which are more faithful to the ground truth than those generated by the considered graph diffusion algorithms. PPR and HK produce much larger communities with a low conductance, while the YL strategy outputs very small communities with a higher conductance. 3 van Laarhoven and Marchiori 1.1 Related Work The enormous growth of network data from diverse disciplines such as social and information science and biology has boosted research on network community detection (see for instance the overviews by Schaeffer (2007) and Fortunato (2010)). Here we confine ourself to literature we consider to be relevant to the present work, namely local community detection by seed expansion, and review related work on conductance as objective function and its local optimization. We also briefly review research on other objectives functions, and on properties of communities and of seeds. 1.1.1 Conductance and Its Local Optimization Conductance has been largely used for network community detection. For instance Leskovec et al. (2008) introduced the notion of network community profile plot to measure the quality of a ‘best’ community as a functi... |

423 | Biogrid: a general repository for interaction datasets
- Stark, Breitkreutz, et al.
(Show Context)
Citation Context ...ommunity goodness metrics, 17 van Laarhoven and Marchiori among which is conductance. We therefore believe that communities in this set are biased to be more easy to recover by optimizing conductance, and therefore do not consider them here. Results with these top 5000 ground truth communities are available in tables 1–3 in the supplementary material 2. In addition to the SNAP datasets we also include the Flickr social network dataset (Wang et al., 2012). 4.2.3 Protein Interaction Network Datasets We have also run experiments on protein interaction networks of yeast from the BioGRID database (Stark et al., 2006). This database curates networks from several different studies. We have constructed networks for Gavin et al. (2006), Krogan et al. (2006), Collins et al. (2007), Costanzo et al. (2010), Hoppins et al. (2011), as well as a network that is the union of all interaction networks confirmed by physical experiments. As ground truth communities we take the CYC2008 catalog of protein complexes for each of the networks (Pu et al., 2009). 4.2.4 Other Datasets Additionally we used some classical datasets with known communities: Zachary’s karate club Zachary (1977); Football: A network of American colleg... |

294 | Global landscape of protein complexes in the yeast saccharomyces cerevisiae.
- Krogan, Cagney
- 2006
(Show Context)
Citation Context ...(om=2) 5000 25123 0.021 146 51.4 0.534 LFR (om=3) 5000 25126 0.016 191 52.4 0.647 LFR (om=4) 5000 25117 0.015 234 53.4 0.717 Karate 34 78 0.103 2 17.0 0.141 Football 115 613 0.186 12 9.6 0.402 Pol.Blogs 1490 16715 0.089 2 745.0 0.094 Pol.Books 105 441 0.151 3 35.0 0.322 Flickr 35313 3017530 0.030 171 4336.1 0.682 Amazon 334863 925872 0.079 151037 19.4 0.554 DBLP 317080 1049866 0.128 13477 53.4 0.622 Youtube 1134890 2987624 0.002 8385 13.5 0.916 LiveJournal 3997962 34681189 0.045 287512 22.3 0.937 Orkut 3072441 117185083 0.014 6288363 14.2 0.977 CYC/Gavin 2006 6230 6531 0.121 408 4.7 0.793 CYC/Krogan 2006 6230 7075 0.075 408 4.7 0.733 CYC/Collins 2007 6230 14401 0.083 408 4.7 0.997 CYC/Costanzo 2010 6230 57772 0.022 408 4.7 0.996 CYC/Hoppins 2011 6230 10093 0.030 408 4.7 0.999 CYC/all 6230 80506 0.017 408 4.7 0.905 Table 1: Overview of the datasets used in the experiments. For each dataset we consider three different sets of communities. 4. Experiments To test the proposed algorithms, we assess their performance on various networks. We also perform experiments on recent state-of-the-art algorithms based on the diffusion method which also optimize conductance. 4.1 Algorithms Specifically, we pe... |

243 | Statistical properties of community structure in large social and information networks
- Leskovec, Dasgupta
- 2008
(Show Context)
Citation Context ...nd biology has boosted research on network community detection (see for instance the overviews by Schaeffer (2007) and Fortunato (2010)). Here we confine ourself to literature we consider to be relevant to the present work, namely local community detection by seed expansion, and review related work on conductance as objective function and its local optimization. We also briefly review research on other objectives functions, and on properties of communities and of seeds. 1.1.1 Conductance and Its Local Optimization Conductance has been largely used for network community detection. For instance Leskovec et al. (2008) introduced the notion of network community profile plot to measure the quality of a ‘best’ community as a function of community size in a network. They used conductance to measure the quality of a community and analyze a large number of communities of different size scales in real-world social and information networks. Direct conductance optimization was shown to favor communities which are quasi-cliques (Kang and Faloutsos, 2011) or communities of large size which include irrelevant subgraphs (Andersen and Lang, 2006; Whang et al., 2013). Popular algorithms for local community detection empl... |

223 | Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems
- Spielman, Teng
- 2004
(Show Context)
Citation Context ...n of community size in a network. They used conductance to measure the quality of a community and analyze a large number of communities of different size scales in real-world social and information networks. Direct conductance optimization was shown to favor communities which are quasi-cliques (Kang and Faloutsos, 2011) or communities of large size which include irrelevant subgraphs (Andersen and Lang, 2006; Whang et al., 2013). Popular algorithms for local community detection employ the local graph diffusion method to find a community with small conductance. Starting from the seminal work by Spielman and Teng (2004) various algorithms for local community detection by seed expansion based on this approach have been proposed (Andersen et al., 2006; Avron and Horesh, 2015; Chung, 2007; Kloster and Gleich, 2014; Zhu et al., 2013a). The theoretical analysis in these works is largely based on a mixing result which shows that a cut with small conductance can be found by simulating a random walk starting from a single node for sufficiently many steps (Lovasz and Simonovits, 1990). This result is used to prove that if the seed is near to a set with small conductance then the result of the procedure is a communit... |

213 |
Graph clustering
- Schaeffer
- 2007
(Show Context)
Citation Context ...loster and Gleich (2014). On large networks PGDc and EMc stay localized and produce communities which are more faithful to the ground truth than those generated by the considered graph diffusion algorithms. PPR and HK produce much larger communities with a low conductance, while the YL strategy outputs very small communities with a higher conductance. 3 van Laarhoven and Marchiori 1.1 Related Work The enormous growth of network data from diverse disciplines such as social and information science and biology has boosted research on network community detection (see for instance the overviews by Schaeffer (2007) and Fortunato (2010)). Here we confine ourself to literature we consider to be relevant to the present work, namely local community detection by seed expansion, and review related work on conductance as objective function and its local optimization. We also briefly review research on other objectives functions, and on properties of communities and of seeds. 1.1.1 Conductance and Its Local Optimization Conductance has been largely used for network community detection. For instance Leskovec et al. (2008) introduced the notion of network community profile plot to measure the quality of a ‘best’ ... |

200 | Local graph partitioning using PageRank vectors
- Andersen, Chung, et al.
- 2006
(Show Context)
Citation Context ... of different size scales in real-world social and information networks. Direct conductance optimization was shown to favor communities which are quasi-cliques (Kang and Faloutsos, 2011) or communities of large size which include irrelevant subgraphs (Andersen and Lang, 2006; Whang et al., 2013). Popular algorithms for local community detection employ the local graph diffusion method to find a community with small conductance. Starting from the seminal work by Spielman and Teng (2004) various algorithms for local community detection by seed expansion based on this approach have been proposed (Andersen et al., 2006; Avron and Horesh, 2015; Chung, 2007; Kloster and Gleich, 2014; Zhu et al., 2013a). The theoretical analysis in these works is largely based on a mixing result which shows that a cut with small conductance can be found by simulating a random walk starting from a single node for sufficiently many steps (Lovasz and Simonovits, 1990). This result is used to prove that if the seed is near to a set with small conductance then the result of the procedure is a community with a related conductance, which is returned in time proportional to the volume of the community (up to a logarithmic factor). Ma... |

174 |
The genetic landscape of a cell
- Costanzo
- 2010
(Show Context)
Citation Context ...000 25117 0.015 234 53.4 0.717 Karate 34 78 0.103 2 17.0 0.141 Football 115 613 0.186 12 9.6 0.402 Pol.Blogs 1490 16715 0.089 2 745.0 0.094 Pol.Books 105 441 0.151 3 35.0 0.322 Flickr 35313 3017530 0.030 171 4336.1 0.682 Amazon 334863 925872 0.079 151037 19.4 0.554 DBLP 317080 1049866 0.128 13477 53.4 0.622 Youtube 1134890 2987624 0.002 8385 13.5 0.916 LiveJournal 3997962 34681189 0.045 287512 22.3 0.937 Orkut 3072441 117185083 0.014 6288363 14.2 0.977 CYC/Gavin 2006 6230 6531 0.121 408 4.7 0.793 CYC/Krogan 2006 6230 7075 0.075 408 4.7 0.733 CYC/Collins 2007 6230 14401 0.083 408 4.7 0.997 CYC/Costanzo 2010 6230 57772 0.022 408 4.7 0.996 CYC/Hoppins 2011 6230 10093 0.030 408 4.7 0.999 CYC/all 6230 80506 0.017 408 4.7 0.905 Table 1: Overview of the datasets used in the experiments. For each dataset we consider three different sets of communities. 4. Experiments To test the proposed algorithms, we assess their performance on various networks. We also perform experiments on recent state-of-the-art algorithms based on the diffusion method which also optimize conductance. 4.1 Algorithms Specifically, we perform a comparative empirical analysis of the following algorithms. 1. PGDc. The projected gradi... |

173 | Weighted graph cuts without eigenvectors: A multilevel approach
- Dhillon, Guan, et al.
- 2007
(Show Context)
Citation Context ...to allow for fractional membership. This paper investigates such a continuous relaxation, which leads to the following findings. 1.0.1 On Local Optima Although local optima of a continuous relaxation of conductance might at first glance have nodes with fractional memberships, somewhat surprisingly all strict local optima are discrete. This means that continuous optimization can directly be used to find communities without fractional memberships. 1.0.2 Relation with Weighted Kernel K-Means We unravel the relation between conductance and weighted kernel k-means objectives using the framework by Dhillon et al. (2007). Since the aim is to find only one community, we consider a slight variation with one mean, that is, with k = 1. This relation leads 2 Local Network Community Detection with Continuous Optimization to the introduction of a new objective function for local community detection, called σ- conductance, which is the sum of conductance and a regularization term whose influence is controlled by a parameter σ. Interestingly, the choice of σ has a direct effect on the number of local optima of the function, where larger values of σ lead to more local optima. In particular, we prove that for σ > 2 all ... |

168 | Empirical comparison of algorithms for network community detection
- Leskovec, Lang, et al.
- 2010
(Show Context)
Citation Context ...ng, where the overall community structure of a network has to be found, local community detection aims to find only one community around the given seeds by relying on local computations involving only nodes relatively close to the seed. Local community detection by seed expansion is 1. Source code of the algorithms used in the paper is available at http://cs.ru.nl/~tvanlaarhoven/ conductance2016. c©2016 Twan van Laarhoven and Elena Marchiori. van Laarhoven and Marchiori especially beneficial in large networks, and is commonly used in real-life large scale network analysis (Gargi et al., 2011; Leskovec et al., 2010; Wu et al., 2012). Several algorithms for local community detection operate by seed expansion. These methods have different expansion strategies, but what they have in common is their use of conductance as the objective to be optimized. Intuitively, conductance measures how strongly a set of nodes is connected to the rest of the graph; sets of nodes that are isolated from the graph have low conductance and make good communities. The problem of finding a set of minimum conductance in a graph is computationally intractable (Chawla et al., 2005; Sıma and Schaeffer, 2006). As a consequence, man... |

145 | Benchmark graphs for testing community detection algorithms - Lancichinetti, Fortunato, et al. - 2008 |

137 | Toward a comprehensive atlas of the physical interactome of Saccharomyces cerevisiae
- Collins, Kemmeren, et al.
(Show Context)
Citation Context ...=3) 5000 25126 0.016 191 52.4 0.647 LFR (om=4) 5000 25117 0.015 234 53.4 0.717 Karate 34 78 0.103 2 17.0 0.141 Football 115 613 0.186 12 9.6 0.402 Pol.Blogs 1490 16715 0.089 2 745.0 0.094 Pol.Books 105 441 0.151 3 35.0 0.322 Flickr 35313 3017530 0.030 171 4336.1 0.682 Amazon 334863 925872 0.079 151037 19.4 0.554 DBLP 317080 1049866 0.128 13477 53.4 0.622 Youtube 1134890 2987624 0.002 8385 13.5 0.916 LiveJournal 3997962 34681189 0.045 287512 22.3 0.937 Orkut 3072441 117185083 0.014 6288363 14.2 0.977 CYC/Gavin 2006 6230 6531 0.121 408 4.7 0.793 CYC/Krogan 2006 6230 7075 0.075 408 4.7 0.733 CYC/Collins 2007 6230 14401 0.083 408 4.7 0.997 CYC/Costanzo 2010 6230 57772 0.022 408 4.7 0.996 CYC/Hoppins 2011 6230 10093 0.030 408 4.7 0.999 CYC/all 6230 80506 0.017 408 4.7 0.905 Table 1: Overview of the datasets used in the experiments. For each dataset we consider three different sets of communities. 4. Experiments To test the proposed algorithms, we assess their performance on various networks. We also perform experiments on recent state-of-the-art algorithms based on the diffusion method which also optimize conductance. 4.1 Algorithms Specifically, we perform a comparative empirical analysis of the f... |

118 |
Finding local community structure in networks
- Clauset
- 2005
(Show Context)
Citation Context ...eds. The degree of connectedness was specified by setting a so-called correlation parameter. The authors showed that the optimal solution of the resulting constrained optimization problem is a generalization of Personalized PageRank (Andersen and Lang, 2006). 1.1.2 Other Objectives Conductance is not the only objective function used in local community detection algorithms. Various other objective functions have been considered in the literature. For instance, Chen et al. (2009) proposed to use the ratio of the average internal and external degree of nodes in a community as objective function. Clauset (2005) proposed a local variant of modularity. Wu et al. (2015) modified the classical density objective, equal to the sum of edges in the community divided by its size, by replacing the denominator with the 4 Local Network Community Detection with Continuous Optimization sum of weights of the community nodes, where the weight of a node quantifies its proximity to the seeds and is computed using a graph diffusion method. A comparative experimental analysis of objective functions with respect to their experimental and theoretical properties was performed e.g. in (Yang and Leskovec, 2012) and (Wu et a... |

112 | You are who you know: inferring user profiles in online social networks
- Mislove, Viswanath, et al.
(Show Context)
Citation Context ...icance of a community, defined with respect to a global null model, by iteratively adding external significant nodes and removing internal nodes that are not statistically relevant. The resulting community is not guaranteed to contain the nodes of the initial community. 1.1.4 Properties of Seeds Properties of seeds in relation to the performance of algorithms were investigated by e.g. Kloumann and Kleinberg (2014). They considered different types of algorithms, in particular a greedy seed expansion algorithm which at each step adds the node that yields the most negative change in conductance (Mislove et al., 2010). Whang et al. (2013) investigated various methods for choosing the seeds for a PageRank based algorithm for community detection. Chen et al. (2013) introduced the notion of local degree central node, whose degree is greater than or equal to the degree of its neighbor nodes. A new local community detection method is introduced based on the local degree central node. In this method, the local community is not discovered from the given starting node, but from the local degree central node that is associated with the given starting node. 1.2 Notation We start by introducing the notation used in t... |

110 | Defining and evaluating network communities based on ground-truth
- Yang, Leskovec
- 2012
(Show Context)
Citation Context ...raph diffusion, such as personalized Page Rank (Andersen and Lang, 2006) and Heat Kernel (Chung, 2007), are determined by the choice of the diffusion coefficients. In the diffusion method an approximation of f is computed. After dividing each vector component by the degree of the corresponding node, the nodes are sorted in descending order by their values in this vector. Next, the conductance of each prefix of the sorted list is computed and either the set of smallest conductance is selected, e.g. in (Andersen and Lang, 2006) or a local optima of conductance along the prefix length dimension (Yang and Leskovec, 2012) is considered. These algorithms optimize conductance along a single dimension, representing the order in which nodes are added by the algorithm. However this ordering is mainly related to the seed, and not directly to the objective that is being optimized. Algorithms for the direct optimization of conductance mainly operate in the discrete search space of communities, and locally optimize conductance by adding and/or removing one node. This amounts to fixing a specific neighborhood structure over communities where the neighbors of a community are only those communities which differ by the mem... |

101 | On the hardness of approximating multicut and sparsest-cut
- Chawla, Krauthgamer, et al.
- 2005
(Show Context)
Citation Context ...large scale network analysis (Gargi et al., 2011; Leskovec et al., 2010; Wu et al., 2012). Several algorithms for local community detection operate by seed expansion. These methods have different expansion strategies, but what they have in common is their use of conductance as the objective to be optimized. Intuitively, conductance measures how strongly a set of nodes is connected to the rest of the graph; sets of nodes that are isolated from the graph have low conductance and make good communities. The problem of finding a set of minimum conductance in a graph is computationally intractable (Chawla et al., 2005; Sıma and Schaeffer, 2006). As a consequence, many heuristic and approximation algorithms for local community detection have been introduced (see references in the related work section). In particular, effective algorithms for this task are based on the local graph diffusion method. A graph diffusion vector f is an infinite series f = ∑∞ i=0 αiP is, with diffusion coefficients ∑∞ i=0 αi = 1, seed nodes s, and random walk transition matrix P. Types of graph diffusion, such as personalized Page Rank (Andersen and Lang, 2006) and Heat Kernel (Chung, 2007), are determined by the choice of the d... |

79 | The mixing rate of Markov chains, an isoperimetric inequality, and computing the volume. - Lovász, Simonovits - 1990 |

67 |
Communities from seed sets
- Andersen, Lang
- 2006
(Show Context)
Citation Context ...g a set of minimum conductance in a graph is computationally intractable (Chawla et al., 2005; Sıma and Schaeffer, 2006). As a consequence, many heuristic and approximation algorithms for local community detection have been introduced (see references in the related work section). In particular, effective algorithms for this task are based on the local graph diffusion method. A graph diffusion vector f is an infinite series f = ∑∞ i=0 αiP is, with diffusion coefficients ∑∞ i=0 αi = 1, seed nodes s, and random walk transition matrix P. Types of graph diffusion, such as personalized Page Rank (Andersen and Lang, 2006) and Heat Kernel (Chung, 2007), are determined by the choice of the diffusion coefficients. In the diffusion method an approximation of f is computed. After dividing each vector component by the degree of the corresponding node, the nodes are sorted in descending order by their values in this vector. Next, the conductance of each prefix of the sorted list is computed and either the set of smallest conductance is selected, e.g. in (Andersen and Lang, 2006) or a local optima of conductance along the prefix length dimension (Yang and Leskovec, 2012) is considered. These algorithms optimize conduc... |

58 | Finding statistically significant communities in networks
- Lancichinetti, Radicchi, et al.
- 2011
(Show Context)
Citation Context ...g time for a random walk on the subgraph induced by the community. They showed that for well-connected communities, it is possible to provide an improved performance guarantee, in terms of conductance of the output, for local community detection algorithms based on the diffusion method. Gleich and Seshadhri (2012) investigated the utility of neighbors of the seed; in particular they showed empirically that such neighbors form a ‘good’ local community around the seed. Yang and Leskovec (2012) investigated properties of ground truth communities in social, information and technological networks. Lancichinetti et al. (2011) addressed the problem of finding a significant local community from an initial group of nodes. They proposed a method which locally optimizes the statistical significance of a community, defined with respect to a global null model, by iteratively adding external significant nodes and removing internal nodes that are not statistically relevant. The resulting community is not guaranteed to contain the nodes of the initial community. 1.1.4 Properties of Seeds Properties of seeds in relation to the performance of algorithms were investigated by e.g. Kloumann and Kleinberg (2014). They considered ... |

44 |
Up-to-date catalogues of yeast protein complexes
- Pu, Wong, et al.
- 2009
(Show Context)
Citation Context ...set (Wang et al., 2012). 4.2.3 Protein Interaction Network Datasets We have also run experiments on protein interaction networks of yeast from the BioGRID database (Stark et al., 2006). This database curates networks from several different studies. We have constructed networks for Gavin et al. (2006), Krogan et al. (2006), Collins et al. (2007), Costanzo et al. (2010), Hoppins et al. (2011), as well as a network that is the union of all interaction networks confirmed by physical experiments. As ground truth communities we take the CYC2008 catalog of protein complexes for each of the networks (Pu et al., 2009). 4.2.4 Other Datasets Additionally we used some classical datasets with known communities: Zachary’s karate club Zachary (1977); Football: A network of American college football games (Girvan and Newman, 2002); Political books: A network of books about US politics (Krebs, 2004); and Political blogs: Hyperlinks between weblogs on US politics (Adamic and Glance, 2005). These datasets might not be very well suited for this problem, since they have very few communities. 4.3 Results In all our experiments we use a single seed node, drawn uniformly at random from the community. We have also perform... |

34 | The heat kernel as the pagerank of a graph.
- Chung
- 2007
(Show Context)
Citation Context ...is computationally intractable (Chawla et al., 2005; Sıma and Schaeffer, 2006). As a consequence, many heuristic and approximation algorithms for local community detection have been introduced (see references in the related work section). In particular, effective algorithms for this task are based on the local graph diffusion method. A graph diffusion vector f is an infinite series f = ∑∞ i=0 αiP is, with diffusion coefficients ∑∞ i=0 αi = 1, seed nodes s, and random walk transition matrix P. Types of graph diffusion, such as personalized Page Rank (Andersen and Lang, 2006) and Heat Kernel (Chung, 2007), are determined by the choice of the diffusion coefficients. In the diffusion method an approximation of f is computed. After dividing each vector component by the degree of the corresponding node, the nodes are sorted in descending order by their values in this vector. Next, the conductance of each prefix of the sorted list is computed and either the set of smallest conductance is selected, e.g. in (Andersen and Lang, 2006) or a local optima of conductance along the prefix length dimension (Yang and Leskovec, 2012) is considered. These algorithms optimize conductance along a single dimension... |

29 | The political blogosphere and the - Adamic, Glance - 2004 |

24 | Beyond ‘caveman communities’: Hubs and spokes for graph compression and mining
- Kang, Faloutsos
- 2011
(Show Context)
Citation Context ...on properties of communities and of seeds. 1.1.1 Conductance and Its Local Optimization Conductance has been largely used for network community detection. For instance Leskovec et al. (2008) introduced the notion of network community profile plot to measure the quality of a ‘best’ community as a function of community size in a network. They used conductance to measure the quality of a community and analyze a large number of communities of different size scales in real-world social and information networks. Direct conductance optimization was shown to favor communities which are quasi-cliques (Kang and Faloutsos, 2011) or communities of large size which include irrelevant subgraphs (Andersen and Lang, 2006; Whang et al., 2013). Popular algorithms for local community detection employ the local graph diffusion method to find a community with small conductance. Starting from the seminal work by Spielman and Teng (2004) various algorithms for local community detection by seed expansion based on this approach have been proposed (Andersen et al., 2006; Avron and Horesh, 2015; Chung, 2007; Kloster and Gleich, 2014; Zhu et al., 2013a). The theoretical analysis in these works is largely based on a mixing result whic... |

22 | Vertex neighborhoods, low conductance cuts, and good seeds for local community methods
- Gleich, Seshadhri
(Show Context)
Citation Context ...s type of communities starting from a seed connected to a large fraction of the members of the community. Zhu et al. (2013b) considered the class of well-connected communities, which have a better internal connectivity than conductance. Internal connectivity of a community is defined as the inverse of the mixing time for a random walk on the subgraph induced by the community. They showed that for well-connected communities, it is possible to provide an improved performance guarantee, in terms of conductance of the output, for local community detection algorithms based on the diffusion method. Gleich and Seshadhri (2012) investigated the utility of neighbors of the seed; in particular they showed empirically that such neighbors form a ‘good’ local community around the seed. Yang and Leskovec (2012) investigated properties of ground truth communities in social, information and technological networks. Lancichinetti et al. (2011) addressed the problem of finding a significant local community from an initial group of nodes. They proposed a method which locally optimizes the statistical significance of a community, defined with respect to a global null model, by iteratively adding external significant nodes and re... |

18 | Large-scale community detection on youtube for topic discovery and exploration.
- Gargi, Lu, et al.
- 2011
(Show Context)
Citation Context ...t to global clustering, where the overall community structure of a network has to be found, local community detection aims to find only one community around the given seeds by relying on local computations involving only nodes relatively close to the seed. Local community detection by seed expansion is 1. Source code of the algorithms used in the paper is available at http://cs.ru.nl/~tvanlaarhoven/ conductance2016. c©2016 Twan van Laarhoven and Elena Marchiori. van Laarhoven and Marchiori especially beneficial in large networks, and is commonly used in real-life large scale network analysis (Gargi et al., 2011; Leskovec et al., 2010; Wu et al., 2012). Several algorithms for local community detection operate by seed expansion. These methods have different expansion strategies, but what they have in common is their use of conductance as the objective to be optimized. Intuitively, conductance measures how strongly a set of nodes is connected to the rest of the graph; sets of nodes that are isolated from the graph have low conductance and make good communities. The problem of finding a set of minimum conductance in a graph is computationally intractable (Chawla et al., 2005; Sıma and Schaeffer, 2006)... |

14 | Learning with partially absorbing random walks.
- Wu, Li, et al.
- 2012
(Show Context)
Citation Context ...ommunity structure of a network has to be found, local community detection aims to find only one community around the given seeds by relying on local computations involving only nodes relatively close to the seed. Local community detection by seed expansion is 1. Source code of the algorithms used in the paper is available at http://cs.ru.nl/~tvanlaarhoven/ conductance2016. c©2016 Twan van Laarhoven and Elena Marchiori. van Laarhoven and Marchiori especially beneficial in large networks, and is commonly used in real-life large scale network analysis (Gargi et al., 2011; Leskovec et al., 2010; Wu et al., 2012). Several algorithms for local community detection operate by seed expansion. These methods have different expansion strategies, but what they have in common is their use of conductance as the objective to be optimized. Intuitively, conductance measures how strongly a set of nodes is connected to the rest of the graph; sets of nodes that are isolated from the graph have low conductance and make good communities. The problem of finding a set of minimum conductance in a graph is computationally intractable (Chawla et al., 2005; Sıma and Schaeffer, 2006). As a consequence, many heuristic and ap... |

13 | A local spectral method for graphs: With applications to improving graph partitions and exploring data graphs locally.
- Mahoney, Orecchia, et al.
- 2012
(Show Context)
Citation Context ...06; Avron and Horesh, 2015; Chung, 2007; Kloster and Gleich, 2014; Zhu et al., 2013a). The theoretical analysis in these works is largely based on a mixing result which shows that a cut with small conductance can be found by simulating a random walk starting from a single node for sufficiently many steps (Lovasz and Simonovits, 1990). This result is used to prove that if the seed is near to a set with small conductance then the result of the procedure is a community with a related conductance, which is returned in time proportional to the volume of the community (up to a logarithmic factor). Mahoney et al. (2012) performed local community detection by modifying the spectral program used in standard global spectral clustering. Specifically the authors incorporated a bias towards a target region of seed nodes in the form of a constraint to force the solution to be well connected with or to lie near the seeds. The degree of connectedness was specified by setting a so-called correlation parameter. The authors showed that the optimal solution of the resulting constrained optimization problem is a generalization of Personalized PageRank (Andersen and Lang, 2006). 1.1.2 Other Objectives Conductance is not th... |

11 | Local community identification in social networks. - Chen, Zaıane, et al. - 2009 |

11 |
A mitochondrial-focused genetic interaction map reveals a scaffoldlike complex required for inner membrane organization in mitochondria.
- Hoppins
- 2011
(Show Context)
Citation Context ... 2 17.0 0.141 Football 115 613 0.186 12 9.6 0.402 Pol.Blogs 1490 16715 0.089 2 745.0 0.094 Pol.Books 105 441 0.151 3 35.0 0.322 Flickr 35313 3017530 0.030 171 4336.1 0.682 Amazon 334863 925872 0.079 151037 19.4 0.554 DBLP 317080 1049866 0.128 13477 53.4 0.622 Youtube 1134890 2987624 0.002 8385 13.5 0.916 LiveJournal 3997962 34681189 0.045 287512 22.3 0.937 Orkut 3072441 117185083 0.014 6288363 14.2 0.977 CYC/Gavin 2006 6230 6531 0.121 408 4.7 0.793 CYC/Krogan 2006 6230 7075 0.075 408 4.7 0.733 CYC/Collins 2007 6230 14401 0.083 408 4.7 0.997 CYC/Costanzo 2010 6230 57772 0.022 408 4.7 0.996 CYC/Hoppins 2011 6230 10093 0.030 408 4.7 0.999 CYC/all 6230 80506 0.017 408 4.7 0.905 Table 1: Overview of the datasets used in the experiments. For each dataset we consider three different sets of communities. 4. Experiments To test the proposed algorithms, we assess their performance on various networks. We also perform experiments on recent state-of-the-art algorithms based on the diffusion method which also optimize conductance. 4.1 Algorithms Specifically, we perform a comparative empirical analysis of the following algorithms. 1. PGDc. The projected gradient descent algorithm for optimizing σ-conductan... |

9 | Overlapping community detection using seed set expansion.
- Whang, Gleich, et al.
- 2013
(Show Context)
Citation Context ... used for network community detection. For instance Leskovec et al. (2008) introduced the notion of network community profile plot to measure the quality of a ‘best’ community as a function of community size in a network. They used conductance to measure the quality of a community and analyze a large number of communities of different size scales in real-world social and information networks. Direct conductance optimization was shown to favor communities which are quasi-cliques (Kang and Faloutsos, 2011) or communities of large size which include irrelevant subgraphs (Andersen and Lang, 2006; Whang et al., 2013). Popular algorithms for local community detection employ the local graph diffusion method to find a community with small conductance. Starting from the seminal work by Spielman and Teng (2004) various algorithms for local community detection by seed expansion based on this approach have been proposed (Andersen et al., 2006; Avron and Horesh, 2015; Chung, 2007; Kloster and Gleich, 2014; Zhu et al., 2013a). The theoretical analysis in these works is largely based on a mixing result which shows that a cut with small conductance can be found by simulating a random walk starting from a single node... |

7 |
Think locally, act locally: Detection of small, medium-sized, and large communities in large networks.
- Laarhoven, Jeub, et al.
- 2015
(Show Context)
Citation Context ....1 564.2 66.5 5.9 1058.8 942.9 CYC/Hoppins 2011 229.9 110.1 235.5 110.4 4.3 295.2 295.2 CYC/all 657.5 16.0 841.9 17.0 9.6 2795.2 5786.0 Table 3: Average size of the recovered communities. as well. On the other hand, on networks with large communities our methods, PPR and HK work best. On the artificial LFR data continuous relaxation of conductance seems to work best. This result indicates that the LFR model of ‘what is a community’ is somehow in agreement with the notion of local community as local optimum of the continuous relaxation of conductance. However, as observed in recent works like (Jeub et al., 2015), the LFR model does not seem to represent the diverse characteristics of real-life communities. We have included tables of the standard deviation in the supplementary material. Overall, the standard deviation in cluster size is of the same order of magnitude as the mean. The standard deviation of the conductance is around 0.1 for LFR datasets, 0.2 for the SNAP datasets and 0.3 for the CYC datasets. It is not surprising that the variance is this high, because the communities vary a lot in size and density. Results on these datasets can be summarized as follows. 4.3.1 Artificial LFR Datasets On... |

5 | Heat kernel based community detection.
- Kloster, Gleich
- 2014
(Show Context)
Citation Context ...ly available artificial and real-life network data with labeled ground-truth communities to assess the performance of PGDc and EMc. Results of the two methods are very similar, with PGDc performing slightly better, while EMc is slightly faster. These results are compared with those obtained by three state-of-the-art algorithms for conductance optimization based on the local graph diffusion: the popular Personalized Page Rank (PPR) diffusion algorithm by Andersen and Lang (2006), a more recent variant by Yang and Leskovec (2012) (here called YL), and the Heat Kernel (HK) diffusion algorithm by Kloster and Gleich (2014). On large networks PGDc and EMc stay localized and produce communities which are more faithful to the ground truth than those generated by the considered graph diffusion algorithms. PPR and HK produce much larger communities with a low conductance, while the YL strategy outputs very small communities with a higher conductance. 3 van Laarhoven and Marchiori 1.1 Related Work The enormous growth of network data from diverse disciplines such as social and information science and biology has boosted research on network community detection (see for instance the overviews by Schaeffer (2007) and For... |

3 |
Detecting local community structures in complex networks based on local degree central nodes. Physica A: Statistical Mechanics and its Applications,
- Chen, Wu, et al.
- 2013
(Show Context)
Citation Context ...re not statistically relevant. The resulting community is not guaranteed to contain the nodes of the initial community. 1.1.4 Properties of Seeds Properties of seeds in relation to the performance of algorithms were investigated by e.g. Kloumann and Kleinberg (2014). They considered different types of algorithms, in particular a greedy seed expansion algorithm which at each step adds the node that yields the most negative change in conductance (Mislove et al., 2010). Whang et al. (2013) investigated various methods for choosing the seeds for a PageRank based algorithm for community detection. Chen et al. (2013) introduced the notion of local degree central node, whose degree is greater than or equal to the degree of its neighbor nodes. A new local community detection method is introduced based on the local degree central node. In this method, the local community is not discovered from the given starting node, but from the local degree central node that is associated with the given starting node. 1.2 Notation We start by introducing the notation used in the rest of this paper. We denote by V the set of nodes in a network or graph G. A community, also called a cluster, C ⊆ V will be a 5 van Laarhoven ... |

3 |
Robust local community detection: On free rider effect and its elimination.
- Wu, Jin, et al.
- 2015
(Show Context)
Citation Context ...ng a so-called correlation parameter. The authors showed that the optimal solution of the resulting constrained optimization problem is a generalization of Personalized PageRank (Andersen and Lang, 2006). 1.1.2 Other Objectives Conductance is not the only objective function used in local community detection algorithms. Various other objective functions have been considered in the literature. For instance, Chen et al. (2009) proposed to use the ratio of the average internal and external degree of nodes in a community as objective function. Clauset (2005) proposed a local variant of modularity. Wu et al. (2015) modified the classical density objective, equal to the sum of edges in the community divided by its size, by replacing the denominator with the 4 Local Network Community Detection with Continuous Optimization sum of weights of the community nodes, where the weight of a node quantifies its proximity to the seeds and is computed using a graph diffusion method. A comparative experimental analysis of objective functions with respect to their experimental and theoretical properties was performed e.g. in (Yang and Leskovec, 2012) and (Wu et al., 2015), respectively. 1.1.3 Properties of Communities ... |

2 | Community membership identification from small seed sets.
- Kloumann, Kleinberg
- 2014
(Show Context)
Citation Context ...nological networks. Lancichinetti et al. (2011) addressed the problem of finding a significant local community from an initial group of nodes. They proposed a method which locally optimizes the statistical significance of a community, defined with respect to a global null model, by iteratively adding external significant nodes and removing internal nodes that are not statistically relevant. The resulting community is not guaranteed to contain the nodes of the initial community. 1.1.4 Properties of Seeds Properties of seeds in relation to the performance of algorithms were investigated by e.g. Kloumann and Kleinberg (2014). They considered different types of algorithms, in particular a greedy seed expansion algorithm which at each step adds the node that yields the most negative change in conductance (Mislove et al., 2010). Whang et al. (2013) investigated various methods for choosing the seeds for a PageRank based algorithm for community detection. Chen et al. (2013) introduced the notion of local degree central node, whose degree is greater than or equal to the degree of its neighbor nodes. A new local community detection method is introduced based on the local degree central node. In this method, the local c... |

2 |
Learning with multi-resolution overlapping communities. Knowledge and Information Systems
- Wang, Tang, et al.
- 2012
(Show Context)
Citation Context ...st 3 nodes. Yang and Leskovec (2012) also defined a set of top 5000 communities for each dataset. These are communities with a high combined score for several community goodness metrics, 17 van Laarhoven and Marchiori among which is conductance. We therefore believe that communities in this set are biased to be more easy to recover by optimizing conductance, and therefore do not consider them here. Results with these top 5000 ground truth communities are available in tables 1–3 in the supplementary material 2. In addition to the SNAP datasets we also include the Flickr social network dataset (Wang et al., 2012). 4.2.3 Protein Interaction Network Datasets We have also run experiments on protein interaction networks of yeast from the BioGRID database (Stark et al., 2006). This database curates networks from several different studies. We have constructed networks for Gavin et al. (2006), Krogan et al. (2006), Collins et al. (2007), Costanzo et al. (2010), Hoppins et al. (2011), as well as a network that is the union of all interaction networks confirmed by physical experiments. As ground truth communities we take the CYC2008 catalog of protein complexes for each of the networks (Pu et al., 2009). 4.2.4... |

1 | Community detection using time-dependent personalized pagerank.
- Avron, Horesh
- 2015
(Show Context)
Citation Context ...es in real-world social and information networks. Direct conductance optimization was shown to favor communities which are quasi-cliques (Kang and Faloutsos, 2011) or communities of large size which include irrelevant subgraphs (Andersen and Lang, 2006; Whang et al., 2013). Popular algorithms for local community detection employ the local graph diffusion method to find a community with small conductance. Starting from the seminal work by Spielman and Teng (2004) various algorithms for local community detection by seed expansion based on this approach have been proposed (Andersen et al., 2006; Avron and Horesh, 2015; Chung, 2007; Kloster and Gleich, 2014; Zhu et al., 2013a). The theoretical analysis in these works is largely based on a mixing result which shows that a cut with small conductance can be found by simulating a random walk starting from a single node for sufficiently many steps (Lovasz and Simonovits, 1990). This result is used to prove that if the seed is near to a set with small conductance then the result of the procedure is a community with a related conductance, which is returned in time proportional to the volume of the community (up to a logarithmic factor). Mahoney et al. (2012) perf... |

1 |
New political patterns.
- Krebs
- 2004
(Show Context)
Citation Context ...avin et al. (2006), Krogan et al. (2006), Collins et al. (2007), Costanzo et al. (2010), Hoppins et al. (2011), as well as a network that is the union of all interaction networks confirmed by physical experiments. As ground truth communities we take the CYC2008 catalog of protein complexes for each of the networks (Pu et al., 2009). 4.2.4 Other Datasets Additionally we used some classical datasets with known communities: Zachary’s karate club Zachary (1977); Football: A network of American college football games (Girvan and Newman, 2002); Political books: A network of books about US politics (Krebs, 2004); and Political blogs: Hyperlinks between weblogs on US politics (Adamic and Glance, 2005). These datasets might not be very well suited for this problem, since they have very few communities. 4.3 Results In all our experiments we use a single seed node, drawn uniformly at random from the community. We have also performed experiments with multiple seeds; the results of those experiments can be found in the supplementary material. To keep the computation time manageable we have performed all experiments on a random sample of 1000 ground-truth communities. For datasets with fewer than 1000 commu... |

1 |
Local Network Community Detection with Continuous Optimization Michael
- Mahoney, Orecchia, et al.
- 2012
(Show Context)
Citation Context ...06; Avron and Horesh, 2015; Chung, 2007; Kloster and Gleich, 2014; Zhu et al., 2013a). The theoretical analysis in these works is largely based on a mixing result which shows that a cut with small conductance can be found by simulating a random walk starting from a single node for sufficiently many steps (Lovasz and Simonovits, 1990). This result is used to prove that if the seed is near to a set with small conductance then the result of the procedure is a community with a related conductance, which is returned in time proportional to the volume of the community (up to a logarithmic factor). Mahoney et al. (2012) performed local community detection by modifying the spectral program used in standard global spectral clustering. Specifically the authors incorporated a bias towards a target region of seed nodes in the form of a constraint to force the solution to be well connected with or to lie near the seeds. The degree of connectedness was specified by setting a so-called correlation parameter. The authors showed that the optimal solution of the resulting constrained optimization problem is a generalization of Personalized PageRank (Andersen and Lang, 2006). 1.1.2 Other Objectives Conductance is not th... |

1 | Isabelle Stanton, and Robert Endre Tarjan. Finding strongly knit clusters in social networks. - Mishra, Schreiber - 2008 |

1 |
Local Network Community Detection with Continuous Optimization
- Chawla, Krauthgamer, et al.
- 2005
(Show Context)
Citation Context ...large scale network analysis (Gargi et al., 2011; Leskovec et al., 2010; Wu et al., 2012). Several algorithms for local community detection operate by seed expansion. These methods have different expansion strategies, but what they have in common is their use of conductance as the objective to be optimized. Intuitively, conductance measures how strongly a set of nodes is connected to the rest of the graph; sets of nodes that are isolated from the graph have low conductance and make good communities. The problem of finding a set of minimum conductance in a graph is computationally intractable (Chawla et al., 2005; Sıma and Schaeffer, 2006). As a consequence, many heuristic and approximation algorithms for local community detection have been introduced (see references in the related work section). In particular, effective algorithms for this task are based on the local graph diffusion method. A graph diffusion vector f is an infinite series f = ∑∞ i=0 αiP is, with diffusion coefficients ∑∞ i=0 αi = 1, seed nodes s, and random walk transition matrix P. Types of graph diffusion, such as personalized Page Rank (Andersen and Lang, 2006) and Heat Kernel (Chung, 2007), are determined by the choice of the d... |

1 | Local Network Community Detection with Continuous Optimization Alan Mislove, Bimal - Viswanath, Druschel - 2010 |