#### DMCA

## Statistical properties of community structure in large social and information networks

### Cached

### Download Links

- [www.cs.cmu.edu]
- [cs.stanford.edu]
- [cs-www.cs.yale.edu]
- [www-2.cs.cmu.edu]
- [www2008.org]
- [wwwconference.org]
- [cs-www.cs.yale.edu]
- DBLP

### Other Repositories/Bibliography

Citations: | 244 - 14 self |

### Citations

3932 | Emergence of scaling in random networks
- Barabási, Albert
- 1999
(Show Context)
Citation Context ...ions, at even a qualitative level. In Figure 7, we summarize these results. Figure 7(a) shows the NCP plot for a 10, 000 node network generated according to the original preferential attachment model =-=[1]-=-, where at each time step a node joins the graph and connects to m = 2 existing nodes. Note that the NCP plot is very shallow and flat (even more than the corresponding rewired graph), and thus the ne... |

3787 | Normalized cuts and image segmentation.
- Shi, Malik
- 2000
(Show Context)
Citation Context ...s (1)–(4), we will follows the usual path in this paper. For point (3), we choose a natural and widely-adopted notion of community goodness called conductance, also known as the normalized cut metric =-=[6, 31, 16]-=-. Since there exist a rich suite of both theoretical and practical algorithms to optimize this quantity [32, 20, 4, 17, 37, 10], we can for point (4) compare and contrast several methods to approximat... |

3323 |
Collective dynamics of ’small-world’ networks
- Watts, Strogatz
- 1998
(Show Context)
Citation Context ...networks that “live” in a low-dimensional structure, e.g., on a manifold or the surface of the earth. For example, Figure 2(b) shows the NCP plot for a power grid network of Western States Power Grid =-=[34]-=-, and Figure 2(c) shows the NCP plot for a road network of California. Finally, in contrast, Figures 2(d) shows NCP plots for a Gnm graph with 100, 000 nodes and average degrees of 4, 6, and 8, i.e., ... |

2386 | Random graphs
- Bollobás
- 1985
(Show Context)
Citation Context ...graphs 41 here. (The other interesting special case, in which all the expected degrees wi are equal to np, for some p ∈ [0, 1], corresponds to the classical Gilbert-Erdös-Renyi Gnp random graph model =-=[24]-=-.) Given the number of nodes n, the power-law exponent β, and the parameters w and wmax, Chung and Lu [41] give the degree sequence for a power-law graph: wi = ci −1/(β−1) for i s.t. i0 ≤ i < n + i0, ... |

2143 | Statistical mechanics of complex networks
- Albert, Barabási
- 2002
(Show Context)
Citation Context ...munity identification [125, 52], data clustering [90], graph and spectral clustering [75, 151, 143], graph and heavy-tailed data analysis [126, 29, 49], surveys on various aspects of complex networks =-=[10, 55, 124, 25, 51, 114, 23]-=-, the monographs on spectral graph theory and complex networks [33, 41], and the book on social network analysis [152]. See Section 7 for a more detailed discussion of the relationship of our work wit... |

1666 | On power-law relationships of the Internet topology
- Faloutsos, Faloutsos, et al.
- 1999
(Show Context)
Citation Context ... edges are added via a preferential-attachment or rich-gets-richer mechanism [124, 25]. Much of this work aims at reproducing properties of real-world graphs such as heavy-tailed degree distributions =-=[11, 27, 61]-=-. In these preferential attachment models, one typically connects each new node to the existing network by adding exactly m edges to existing nodes with a nonuniform probability that depends on the cu... |

1537 |
Spectral Graph Theory
- Chung
- 1997
(Show Context)
Citation Context ...s (1)–(4), we will follows the usual path in this paper. For point (3), we choose a natural and widely-adopted notion of community goodness called conductance, also known as the normalized cut metric =-=[6, 31, 16]-=-. Since there exist a rich suite of both theoretical and practical algorithms to optimize this quantity [32, 20, 4, 17, 37, 10], we can for point (4) compare and contrast several methods to approximat... |

1511 |
Community structure in social and biological networks
- Girvan, ME
- 2002
(Show Context)
Citation Context ...works. 2.2 Clusters and communities in networks Hierarchical clustering is a common approach to community identification in the social sciences [152], but it has also found application more generally =-=[79, 89]-=-. In this procedure, one first defines a distance metric between pairs of nodes and then produces a tree (in either a bottom-up or a top-down manner) describing how nodes group into communities and ho... |

1189 | A fast and high quality multilevel scheme for partitioning irregular graphs
- Karypis, Kumar
(Show Context)
Citation Context ...on of community goodness called conductance, also known as the normalized cut metric [6, 31, 16]. Since there exist a rich suite of both theoretical and practical algorithms to optimize this quantity =-=[32, 20, 4, 17, 37, 10]-=-, we can for point (4) compare and contrast several methods to approximately optimize it. However, it is in point (5) that we deviate from previous work. Instead of focusing on individual groups of no... |

985 |
Modularity and Community Structure in Networks
- Newman
- 2006
(Show Context)
Citation Context ... Empirically we observe that local minima in the NCP plot correspond to sets of nodes that are plausible communities. Consider, e.g., Zachary’s karate club [35], an extensivelyanalyzed social network =-=[24, 26]-=-. Figure 3(a) depicts the karate club network, and Figure 3(b) shows its NCP plot. Note that Cut B, which separates the graph roughly in half, has better conductance value than Cut A (note also commun... |

691 |
Finding community structure in very large networks
- Clauset, Newman, et al.
- 2004
(Show Context)
Citation Context ... in which nodes are authors and edges connect authors co-authoring at least one paper. Here, publication venues (e.g., journals, conferences) play the role of “ground truth” communities. • AmazonProd =-=[8]-=- is a network linking products often purchased together at amazon.com. Each item belongs to one or more hierarchically organized categories, and products from the same category define a group which is... |

672 | A new approach to the maximum-flow problem,”
- Goldberg, Tarjan
- 1988
(Show Context)
Citation Context ...blically available code that scales to the sizes we need. Ordinary max flow is a very thoroughly studied problem. Currently, the best theoretical time bounds are [81], the most practical algorithm is =-=[82]-=-, while the best implementation is hi pr by [32]. Since Metis+MQI using the hi pr code is very fast and scalable, while the method empirically seems to usually find the lowest or nearly lowest conduct... |

665 |
Algebraic connectivity of graphs
- Fiedler
- 1973
(Show Context)
Citation Context ...ere is the spectral method, which uses an eigenvector of the graph’s Laplacian matrix to find a cut whose conductance is no bigger than φ if the graph actually contains a cut with conductance O(φ 2 ) =-=[31, 54, 65, 120, 33]-=-. The spectral method also produces lower bounds which can show that the solution for a given graph is closer to optimal than promised by the worst-case guarantee. Second, there is an algorithm that u... |

537 | Graphs over time: densification laws, shrinking diameters and possible explanations
- Leskovec, Kleinberg, et al.
- 2005
(Show Context)
Citation Context ... best cuts at large size scales are very shallow, and there is a relatively abrupt transition in between. This is a consequence of the extreme sparsity of the data. • A “forest fire” generative model =-=[21]-=-, in which edges are added in a manner that imitates a fire-spreading process, reproduces not only the deep cuts at small size scales and the absence of deep cuts at large size scales but other proper... |

524 | A linear-time heuristic for improving network partitions.
- Fiduccia, Mattheyses
- 1982
(Show Context)
Citation Context ...further subdivide the new groups until the desired number of clusters groups is achieved. This may be combined with local improvement methods like the Kernighan-Lin and Fiduccia-Mattheyses procedures =-=[96, 64]-=-, which are fast and can climb out of some local minima. The latter was combined with a multi-resolution framework to create Metis [94, 95], a very fast program intended to split mesh-like graphs into... |

498 | Finding community structure in networks using the eigenvectors of matrices
- Newman
- 2006
(Show Context)
Citation Context ...ly linked among themselves and there are few edges between nodes of different communities. In a similar manner, Figure 3(c) depicts Newman’s network of 379 scientists who conduct research on networks =-=[25]-=-. In this latter case, we see a hierarchical structure, in which the community defined by Cut C is included in a larger community that has better conductance value. 3.3 Community profile plots of larg... |

496 | Group formation in large social networks: Membership, growth, and evolution.
- Backstrom, Huttenlocher, et al.
- 2006
(Show Context)
Citation Context ...30,507,070 0.47 0.88 8.78 351.66 0.23 23 5.43 Social network of professional contacts LiveJournal01 3,766,521 30,629,297 0.78 0.97 16.26 111.24 0.36 23 5.55 Friendship network of a blogging community =-=[20]-=- LiveJournal11 4,145,160 34,469,135 0.77 0.97 16.63 122.44 0.36 23 5.61 Friendship network of a blogging community [20] LiveJournal12 4,843,953 42,845,684 0.76 0.97 17.69 170.66 0.35 20 5.53 Friendshi... |

433 | Complex networks: structure and dynamics
- Boccaletti, Latora, et al.
- 2006
(Show Context)
Citation Context ...munity identification [125, 52], data clustering [90], graph and spectral clustering [75, 151, 143], graph and heavy-tailed data analysis [126, 29, 49], surveys on various aspects of complex networks =-=[10, 55, 124, 25, 51, 114, 23]-=-, the monographs on spectral graph theory and complex networks [33, 41], and the book on social network analysis [152]. See Section 7 for a more detailed discussion of the relationship of our work wit... |

417 | Evolution of networks
- Dorogovtsev, Mendes
- 2002
(Show Context)
Citation Context ...munity identification [125, 52], data clustering [90], graph and spectral clustering [75, 151, 143], graph and heavy-tailed data analysis [126, 29, 49], surveys on various aspects of complex networks =-=[10, 55, 124, 25, 51, 114, 23]-=-, the monographs on spectral graph theory and complex networks [33, 41], and the book on social network analysis [152]. See Section 7 for a more detailed discussion of the relationship of our work wit... |

415 | Inferring Web Communities from Link Topology
- Gibson, Kleinberg, et al.
- 1998
(Show Context)
Citation Context ...there exists work which views communities from a very different perspective. For example, Kumar et al. [102] view communities as a dense bipartite subgraph of the Web; Gibson, Kleinberg, and Raghavan =-=[78]-=- view communities as consisting of a core of central authoritative pages linked together by hub pages; Hopcroft et al. [88, 89] are interested in the temporal evolution of communities that are robust ... |

406 | A random graph model for massive graphs
- Aiello, Chung, et al.
- 2000
(Show Context)
Citation Context ...is model is different than the so-called “configuration model” in which the degree distribution is exactly specified and which was studied by Molloy and Reed [121, 122] and also Aiello, Chung, and Lu =-=[7, 8]-=-. This model is also different than generative models such as preferential attachment models [9, 124, 25] or models based on optimization [56, 57, 60], although common to all of these generative model... |

377 | An information flow model for conflict and fission in small groups
- Zachary
- 1977
(Show Context)
Citation Context ...are sparsely embedded in larger communities. Empirically we observe that local minima in the NCP plot correspond to sets of nodes that are plausible communities. Consider, e.g., Zachary’s karate club =-=[35]-=-, an extensivelyanalyzed social network [24, 26]. Figure 3(a) depicts the karate club network, and Figure 3(b) shows its NCP plot. Note that Cut B, which separates the graph roughly in half, has bette... |

360 | Mapping the gnutella network: Properties of large-scale peer-to-peer systems and implications for system design
- Ripeanu, Foster, et al.
(Show Context)
Citation Context ...-papers) networks Atp-DBLP 615,678 944,456 DBLP [21] AtM-Imdb 2,076,978 5,847,693 Actors-to-movies • Internet networks AsSkitter 1,719,037 12,814,089 Autonom. sys. Gnutella 62,561 147,878 P2P network =-=[29]-=- Table 1: Some of the network datasets we studied. 2. BACKGROUND AND OVERVIEW In this section, we will provide background on our data and methods. There exist a large number of reviews on topics relat... |

359 | Expander graphs and their applications.
- Hoory, Linial, et al.
- 2006
(Show Context)
Citation Context ...The NCP plot is roughly flat, which we also observed in Figure 2(a) for a clique, which is to be expected since the minimum conductance cut in the entire graph cannot be too small for a good expander =-=[15]-=-. Interestingly, a steadily decreasing downward NCP plot is also seen for small social networks that have been extensively studied for validating community detection algorithms. Two examples are shown... |

356 | Multicommodity max-flow min-cut theorems and their use in designing approximation algorithms
- Leighton, Rao
- 1999
(Show Context)
Citation Context ...on of community goodness called conductance, also known as the normalized cut metric [6, 31, 16]. Since there exist a rich suite of both theoretical and practical algorithms to optimize this quantity =-=[32, 20, 4, 17, 37, 10]-=-, we can for point (4) compare and contrast several methods to approximately optimize it. However, it is in point (5) that we deviate from previous work. Instead of focusing on individual groups of no... |

350 | Graph theoretical methods for detecting and describing gestalt clusters.
- Zahn
- 1971
(Show Context)
Citation Context ...to the outside. Although numerous measures have been proposed for how communitylikeisasetofnodes,itiscommonlynoted—e.g., see [31] and [16]—that conductance captures the “gestalt” notion of clustering =-=[36]-=-, and so it has been widely-used for graph clustering and community detection [13, 30]. 3. NETWORK COMMUNITY PROFILE PLOT In this section, we discuss the network community profile plot (NCP plot), whi... |

335 |
A lower bound for the smallesteigenvalue of the Laplacian
- Cheeger
- 1970
(Show Context)
Citation Context ...ere is the spectral method, which uses an eigenvector of the graph’s Laplacian matrix to find a cut whose conductance is no bigger than φ if the graph actually contains a cut with conductance O(φ 2 ) =-=[31, 54, 65, 120, 33]-=-. The spectral method also produces lower bounds which can show that the solution for a given graph is closer to optimal than promised by the worst-case guarantee. Second, there is an algorithm that u... |

332 | On Clusterings: Good, Bad and Spectral.
- Kannan, Vempala, et al.
- 2004
(Show Context)
Citation Context ...s (1)–(4), we will follows the usual path in this paper. For point (3), we choose a natural and widely-adopted notion of community goodness called conductance, also known as the normalized cut metric =-=[6, 31, 16]-=-. Since there exist a rich suite of both theoretical and practical algorithms to optimize this quantity [32, 20, 4, 17, 37, 10], we can for point (4) compare and contrast several methods to approximat... |

312 | Expander flows, geometric embeddings and graph partitioning.
- Arora, Rao, et al.
- 2009
(Show Context)
Citation Context ...inatorial quantity; and it has a very natural interpretation in terms of random walkers on the interaction graph. Moreover, since there exist a rich suite of both theoretical and practical algorithms =-=[86, 146, 106, 107, 17, 94, 95, 159, 53]-=-, we can for point (4) compare and contrast several methods to approximately optimize it. However, it is in point (5) that we deviate from previous work. Instead of focusing on individual groups of no... |

309 |
Resolution limit in community detection
- Fortunato, Barthélemy
- 2007
(Show Context)
Citation Context ...ecent community detection literature [125, 52], and one can use spectral techniques to approximate it [154, 128]. On the other hand, Guimerà, Sales-Pardo, and Amaral [84] and Fortunato and Barthélemy =-=[72]-=- showed that random graphs have high-modularity subsets and that there exists a size scale below which communities cannot be identified. In part as a response to this, some recent work has had a more ... |

304 | Graph structure in the web
- Broder, Kumar, et al.
- 2000
(Show Context)
Citation Context ... edges are added via a preferential-attachment or rich-gets-richer mechanism [124, 25]. Much of this work aims at reproducing properties of real-world graphs such as heavy-tailed degree distributions =-=[11, 27, 61]-=-. In these preferential attachment models, one typically connects each new node to the existing network by adding exactly m edges to existing nodes with a nonuniform probability that depends on the cu... |

293 | Efficient identification of Web communities. In:
- Flake, Giles
(Show Context)
Citation Context ...nce of the communities and the relative weight of inter-community edges. Flake, Tarjan, and Tsioutsiouliklis [68] introduce a similar bicriterion that is based on network flow ideas, and Flake et al. =-=[66, 67]-=- defined a community as a set of nodes that has more intra-edges than inter-edges. Similar edge-counting ideas were used by Radicchi et al. [133] to define and apply the notions of a strong community ... |

291 | Comparing community structure identification
- Danon, Diaz-Guilera, et al.
(Show Context)
Citation Context ... we will provide background on our data and methods. There exist a large number of reviews on topics related to those discussed in this paper. For example, see the reviews on community identification =-=[24, 9]-=-, graph and spectral clustering [13, 30], and the monographs on spectral graph theory and complex networks [6, 7]. 2.1 Network datasets We have examined a large number of real-world complex networks. ... |

289 | Stochastic models for the web graph.
- Kumar, Raghavan, et al.
- 2000
(Show Context)
Citation Context ...than the corresponding rewired graph), and thus the network that is generated is very expander-like at all size scales. In a different type of generative model edges are added via a copying mechanism =-=[18]-=-. Figure 7(b) shows the results for a network with 50, 000 nodes, generated with m =2andβ=0.05. Although the copying model aims to produce communities by linking a new node to neighbors of a existing ... |

286 | The average distance in random graphs with given expected degrees
- Chung, Lu
(Show Context)
Citation Context ...s a baseline for understanding the community properties we have observed in our real-world networks. We will work with the random graph model with given expected degrees, as described by Chung and Lu =-=[41, 39, 43, 38, 40, 44, 45, 42]-=-. Let n, the number of nodes in the graph, and a vector w = (w1, . . . , wn), which will be the expected degree sequence vector (where we will assume that maxi w 2 i < ∑ k wk), be given. Then, in this... |

271 | Trust management for the semantic web
- Richardson, Agrawal, et al.
- 2003
(Show Context)
Citation Context ...k: Social Networks & Web 2.0 - Discovery and Evolution of Communities • Social nets Nodes Edges Description LiveJournal 4,843,953 42,845,684 Blog friendships [5] Epinions 75,877 405,739 Trust network =-=[28]-=- CA-DBLP 317,080 1,049,866 Co-authorship [5] • Information (citation) networks Cit-hep-th 27,400 352,021 Arxiv hep-th [14] AmazonProd 524,371 1,491,793 Amazon products [8] • Web graphs Web-google 855,... |

266 | Graph evolution: Densification and shrinking diameters
- LESKOVEC, KLEINBERG, et al.
- 2007
(Show Context)
Citation Context ...groups increases exponentially with the distance in the community hierarchy. Graphs generated by this principle have both power-law degree distributions and they also obey the Densification Power Law =-=[111, 112]-=-. As Figure 16(d) shows, though, the NCP plot is sloping downward. Qualitatively this plot from CGA is very similar to the plot of the recursive hierarchical construction in Figure 16(c), which is not... |

245 | R-MAT: A recursive model for graph mining.
- Chakrabarti, Zhan, et al.
- 2004
(Show Context)
Citation Context ...veloping hiererchical graph generation models, i.e., models in which a hierarchy is given and the linkage probability between pairs of nodes decreases as a function of their distance in the hierarchy =-=[136, 135, 30, 6, 109, 47, 156, 110]-=-. The motivation for this comes largely from the intuition that nodes in social networks and are joined in to small relatively tight groups that are then further join into larger groups, and so on. As... |

222 | Diameter of the world wide web
- ALBERT, JEONG, et al.
- 1999
(Show Context)
Citation Context ...le 855,802 4,291,352 0.75 0.92 10.03 170.35 0.62 24 6.27 Web graph Google released in 2002 [3] Web-notredame 325,729 1,090,108 0.41 0.76 6.69 280.68 0.47 46 7.22 Web graph of University of Notre Dame =-=[11]-=- Web-wt10g 1,458,316 6,225,033 0.59 0.78 8.54 682.89 0.68 112 8.58 Web graph of TREC WT10G web corpus [2] Internet networks As-RouteViews 6,474 12,572 0.62 0.80 3.88 164.81 0.40 9 3.72 AS from Oregon ... |

215 |
Graph Clustering
- Schaeffer
- 2007
(Show Context)
Citation Context ... and methods. There exist a large number of reviews on topics related to those discussed in this paper. For example, see the reviews on community identification [24, 9], graph and spectral clustering =-=[13, 30]-=-, and the monographs on spectral graph theory and complex networks [6, 7]. 2.1 Network datasets We have examined a large number of real-world complex networks. Table 1 gives a subset of the networks t... |

213 |
Hierarchical organization in complex networks,
- Ravasz, Barabasi
- 2003
(Show Context)
Citation Context ...es are all treated equally and since new nodes always create same number of edges. Next, in Figure 7(c), we consider a network that was designed to have a recursively hierarchical community structure =-=[27]-=-. In this case, however, the NCP plot is sloping downwards, and the local dips in the plot correspond to multiples of the size of the basic module of the graph. Finally, Figure 7(d) shows the NCP plot... |

211 | Self-organization and identification of web communities.
- Flake, Lawrence, et al.
- 2002
(Show Context)
Citation Context ...nce of the communities and the relative weight of inter-community edges. Flake, Tarjan, and Tsioutsiouliklis [68] introduce a similar bicriterion that is based on network flow ideas, and Flake et al. =-=[66, 67]-=- defined a community as a set of nodes that has more intra-edges than inter-edges. Similar edge-counting ideas were used by Radicchi et al. [133] to define and apply the notions of a strong community ... |

209 | On implementing push-relabel method for the maximum flow problem
- Cherkassky, Goldberg
- 1994
(Show Context)
Citation Context ... we need. Ordinary max flow is a very thoroughly studied problem. Currently, the best theoretical time bounds are [81], the most practical algorithm is [82], while the best implementation is hi pr by =-=[32]-=-. Since Metis+MQI using the hi pr code is very fast and scalable, while the method empirically seems to usually find the lowest or nearly lowest conductance cuts in a wide variety of graphs, we have u... |

206 | P.R.V.: Characterization of complex networks: A survey of measurements
- Costa, Rodrigues, et al.
- 2007
(Show Context)
Citation Context |

205 | Connected components in a random graph with given degree sequences
- Chung, Lu
(Show Context)
Citation Context ...s a baseline for understanding the community properties we have observed in our real-world networks. We will work with the random graph model with given expected degrees, as described by Chung and Lu =-=[41, 39, 43, 38, 40, 44, 45, 42]-=-. Let n, the number of nodes in the graph, and a vector w = (w1, . . . , wn), which will be the expected degree sequence vector (where we will assume that maxi w 2 i < ∑ k wk), be given. Then, in this... |

201 | Local Graph Partitioning using Pagerank Vectors.
- Andersen, Chung, et al.
- 2006
(Show Context)
Citation Context ...ackage Metis [17] followed by the flow-based MQI post-processing procedure MQI [19], which taken together returns sets that have very good conductance values; and second, the Local Spectral Algorithm =-=[3]-=-, which returns sets that are somewhat “regularized” (more internally “coherent”) but that often have worse conductance values. Just as the conductance of a set of nodes provides a quality measure of ... |

201 | Spectral partitioning works: planar graphs and finite element meshes.
- Spielman, S
- 1996
(Show Context)
Citation Context ...on of community goodness called conductance, also known as the normalized cut metric [6, 31, 16]. Since there exist a rich suite of both theoretical and practical algorithms to optimize this quantity =-=[32, 20, 4, 17, 37, 10]-=-, we can for point (4) compare and contrast several methods to approximately optimize it. However, it is in point (5) that we deviate from previous work. Instead of focusing on individual groups of no... |

190 |
Spectral graph theory, volume 92
- Chung
- 1997
(Show Context)
Citation Context ...king such an hypothesis and modeling assumption. For point (3), we choose a natural and widelyadopted notion of community goodness called conductance, which is also known as the normalized cut metric =-=[33, 144, 92]-=-. Informally, the conductance of a set of nodes (defined and discussed in more detail in Section 2.3) is the ratio of the number of “cut” edges between that set and its complement divided by4 Leskove... |

180 | The spectra of random graphs with given expected degrees.
- Chung, Lu, et al.
- 2003
(Show Context)
Citation Context ...s a baseline for understanding the community properties we have observed in our real-world networks. We will work with the random graph model with given expected degrees, as described by Chung and Lu =-=[41, 39, 43, 38, 40, 44, 45, 42]-=-. Let n, the number of nodes in the graph, and a vector w = (w1, . . . , wn), which will be the expected degree sequence vector (where we will assume that maxi w 2 i < ∑ k wk), be given. Then, in this... |

178 | Heuristically Optimized Tradeoffs: A New Paradigm for Power Laws in the Internet
- Fabrikant, Koutsoupias, et al.
- 2002
(Show Context)
Citation Context ...lloy and Reed [121, 122] and also Aiello, Chung, and Lu [7, 8]. This model is also different than generative models such as preferential attachment models [9, 124, 25] or models based on optimization =-=[56, 57, 60]-=-, although common to all of these generative models is that they attempt to reproduce empirically-observed power-law behavior [11, 61, 27, 126, 49].) In this random graph model, the expected average d... |

174 | Weighted graph cuts without eigenvectors a multilevel approach
- DHILLON, GUAN, et al.
- 2007
(Show Context)
Citation Context |

163 |
A fast parametric maximum flow algorithm and applications.
- Gallo, Grigoriadis, et al.
- 1989
(Show Context)
Citation Context ...el k-means to optimize a metric that is closely related to conductance. While the preceding were all approximate algorithms for finding the lowest conductance cut in a whole graph, we now mention MQI =-=[76, 105]-=-, an exact algorithm for the slightly different problem of finding the lowest conductance cut in half of a graph. This algorithm can be combined with a good method for initially splitting the graph in... |

159 | A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization
- Burer, Monteiro
(Show Context)
Citation Context ...are, it is a good idea to compute some lower bounds. Here we will discuss the well-known spectral lower bound [33] on the conductance of cuts of arbitrary balance, and a related SDP-based lower bound =-=[28]-=- on the conductance of any cut that divides the graph into two pieces of equal volume. First, we need the following notation. ⃗d is a column vector of the graph’s node degrees. D is a square matrix wh... |

156 | Complexity and robustness
- Carlson, Doyle
- 2002
(Show Context)
Citation Context ...lloy and Reed [121, 122] and also Aiello, Chung, and Lu [7, 8]. This model is also different than generative models such as preferential attachment models [9, 124, 25] or models based on optimization =-=[56, 57, 60]-=-, although common to all of these generative models is that they attempt to reproduce empirically-observed power-law behavior [11, 61, 27, 126, 49].) In this random graph model, the expected average d... |

150 |
Beyond the flow decomposition barrier.
- Goldberg, Rao
- 1998
(Show Context)
Citation Context ... in [18], but currently there is no publically available code that scales to the sizes we need. Ordinary max flow is a very thoroughly studied problem. Currently, the best theoretical time bounds are =-=[81]-=-, the most practical algorithm is [82], while the best implementation is hi pr by [32]. Since Metis+MQI using the hi pr code is very fast and scalable, while the method empirically seems to usually fi... |

145 | Mathematical results on scale-free random graphs.
- Bollobas, Riordan
- 2003
(Show Context)
Citation Context ...real-world social and information networks. Finally, we also compare results with analytical and/or simulational results on a wide range of commonly and not-so-commonly used network generation models =-=[124, 25, 9, 101, 135, 111, 70, 71]-=-. 1.2 Summary of our results Main Empirical Findings: Taken as a whole, the results we will present in this paper suggest a rather detailed and somewhat counterintuitive picture of the community struc... |

132 | Graph mining: Laws, generators, and algorithms."
- Chakrabarti, Faloutsos
- 2006
(Show Context)
Citation Context ...e discussed in this paper. For example, see the reviews on community identification [125, 52], data clustering [90], graph and spectral clustering [75, 151, 143], graph and heavy-tailed data analysis =-=[126, 29, 49]-=-, surveys on various aspects of complex networks [10, 55, 124, 25, 51, 114, 23], the monographs on spectral graph theory and complex networks [33, 41], and the book on social network analysis [152]. S... |

122 |
Finding local community structure in networks,” Physical Review E,
- Clauset
- 2005
(Show Context)
Citation Context ...unities. In light of our results, such methods seem promising more generally. Other recent work that has focused on developing local and/or near-linear time heuristics for community detection include =-=[48, 155, 46, 21, 134]-=-. In addition to this work we have cited, there exists work which views communities from a very different perspective. For example, Kumar et al. [102] view communities as a dense bipartite subgraph of... |

116 | Empirical and theoretical comparisons of selected criterion functions for document clustering
- Zhao, Karypis
(Show Context)
Citation Context |

106 | random graph model for power law graphs
- Aiello, Chung, et al.
- 2001
(Show Context)
Citation Context ...is model is different than the so-called “configuration model” in which the degree distribution is exactly specified and which was studied by Molloy and Reed [121, 122] and also Aiello, Chung, and Lu =-=[7, 8]-=-. This model is also different than generative models such as preferential attachment models [9, 124, 25] or models based on optimization [56, 57, 60], although common to all of these generative model... |

99 |
Modularity from fluctuations in random graphs and complex networks, Phys
- Guimerà, Sales-Pardo, et al.
- 2004
(Show Context)
Citation Context ...ave been several works hinting that the network communities subject is more complex than it seems at the first sight. For example, it has been found that random graphs can have good modularity scores =-=[84]-=-. Intuitively, random graphs have no community structure but there can still exist sets of nodes with good community scores, at least as measured by modularity. Moreover, very recently a study of robu... |

94 | A combinatorial, primal-dual approach to semidefinite programs.
- Arora, Kale
- 2007
(Show Context)
Citation Context ...at uses semidefinite programming to find a solution that is within O( √ log n) of optimal [17]. This paper sparked a flurry of theoretical research on a family of closely related algorithms including =-=[15, 98, 16]-=-, all of which can be informally described as combinations of spectral and flow-based techniques which exploit their complementary strengths. However, none of those algorithms are currently practical ... |

91 | A simple conceptual model for the internet topology
- Tauro, Palmer, et al.
- 2001
(Show Context)
Citation Context ...t. (b) Network structure as suggested by our experiments. We have also examined in detail the structure of our social and information networks. We have observed that an “jellyfish” or “octopus” model =-=[33, 7]-=- provides a rough first approximation to structure of many of the networks we have examined. That is, most networks may be viewed as having a “core,” with no obvious underlying geometry and which cont... |

73 | Communities from seed sets. - Andersen, Lang - 2006 |

71 | Graph clustering and minimum cut trees.
- Flake, Tarjan, et al.
- 2004
(Show Context)
Citation Context ...rithms and describe a community concept in terms of a bicriterion depending on the conductance of the communities and the relative weight of inter-community edges. Flake, Tarjan, and Tsioutsiouliklis =-=[68]-=- introduce a similar bicriterion that is based on network flow ideas, and Flake et al. [66, 67] defined a community as a set of nodes that has more intra-edges than inter-edges. Similar edge-counting ... |

69 | Conductance and congestion in power law graphs.
- Gkantsidis, Mihail, et al.
- 2003
(Show Context)
Citation Context ...hat has focused on the expansion properties of power law graphs and the real-world networks they model. For example, Mihail, Papadimitriou, and Saberi [118], as well as Gkantsidis, Mihail, and Saberi =-=[80]-=-, studied Internet routing at the level of Autonomous Systems (AS), and showed that the preferential attachment model and a random graph model with power law degree distributions each have good expans... |

64 | Eigenvalues of random power law graphs
- Chung, Lu, et al.
- 2003
(Show Context)
Citation Context |

62 | Spectral techniques applied to sparse random graphs
- Feige, Ofek
- 2005
(Show Context)
Citation Context ...ed [24]. (If p < 1/n, the a typical graph is disconnected and there does not exist a giant component, while if p > log n/n, then a typical graph is fully connected.) As noted, e.g., by Feige and Ofek =-=[62]-=-, this latter regime is particularly difficult to analyze since with fairly high probability there exist vertices with degrees that are much larger than their expected degree. As reviewed in Section 6... |

60 | A Geometric Preferential Attachment Model of Networks
- Flaxman, Frieze, et al.
(Show Context)
Citation Context ...g downwards, and the local dips in the plot correspond to multiples of the size of the basic module of the graph. Finally, Figure 7(d) shows the NCP plot for a geometric preferential attachment model =-=[12]-=-. This model aims to achieve a heavytailed degree distribution as well as deep cuts, and it does so by making the connection probabilities depend both on the two-dimensional geometry and on the prefer... |

58 | On the quality of spectral separators
- Guattery, Miller
- 1998
(Show Context)
Citation Context ...s are known in practice to yield cuts with extremely good conductance values [103, 105]. On the other hand, spectral methods are known to have difficulties when they confuse long paths with deep cuts =-=[146, 83]-=-, a consequence of which is that they may be viewed as computing a “regularized” approximation to the network community profile plot. (See Section 5 for a more detailed discussion of these and related... |

57 |
and the Evolution of Language
- Dunbar, Grooming
- 1996
(Show Context)
Citation Context ...” the the graph. Eventually, even the existence of communities (at least when viewed as sets with stronger internal than external connectivity) is rather questionable. This seems to agree with Dunbar =-=[11]-=- who predicted that 150 is the upper limit on the size of a human community. Moreover, Allen [2] gives evidence that on-line communities have around 60 members, and on-line discussion forums start to ... |

52 | The diameter of sparse random graphs
- Chung, Lu
- 2001
(Show Context)
Citation Context ...ith degrees that are much larger than their expected degree. As reviewed in Section 6.2, however, this regime is not unlike that in a power law random graph in which the power law exponent β ∈ (2, 3) =-=[37, 115, 41]-=-. Of particular interest to us are recent results on the mixing time of random walks in this p ∈ (1/n, logn/n) regime of the Gnp (and the related Gnm) random graph model. Benjamini, Kozma, and Wormald... |

46 |
Algorithms for partitioning of graphs an,d computer logic based on eigenvectors of con,nection matrices,
- DONATH, HOFFMAN
- 1972
(Show Context)
Citation Context ...ere is the spectral method, which uses an eigenvector of the graph’s Laplacian matrix to find a cut whose conductance is no bigger than φ if the graph actually contains a cut with conductance O(φ 2 ) =-=[31, 54, 65, 120, 33]-=-. The spectral method also produces lower bounds which can show that the solution for a given graph is closer to optimal than promised by the worst-case guarantee. Second, there is an algorithm that u... |

45 | Power-law distributions in empirical data. arXiv:0706.1062v1 [physics.data-an] - Clauset, Shalizi, et al. - 2007 |

39 | A local method for detecting communities
- Bagrow, Bollt, et al.
- 2005
(Show Context)
Citation Context ...unities. In light of our results, such methods seem promising more generally. Other recent work that has focused on developing local and/or near-linear time heuristics for community detection include =-=[48, 155, 46, 21, 134]-=-. In addition to this work we have cited, there exists work which views communities from a very different perspective. For example, Kumar et al. [102] view communities as a dense bipartite subgraph of... |

36 |
Spectral scaling and good expansion properties in complex networks
- Estrada
- 2006
(Show Context)
Citation Context ...k we have studied. On the other hand, Estrada has made the observation that although certain communication, information, and biological networks have good expansion properties, social networks do not =-=[59]-=-. This is interpreted as evidence that such social networks have good small highly-cohesive groups, a property which is not attributed to the biological networks that were considered. From the perspec... |

35 |
Influence through social communication.
- Back
- 1951
(Show Context)
Citation Context ...re based on common identity of its members, e.g., liking to play a particular online game, contributing to Wikipedia, etc. It has been noted that bond communities tend to be smaller and more cohesive =-=[19]-=-, as they are based on interpersonal ties, while identity communities are focused around common theme or interest. See [138] for a very good review of the topic. Translating this to our context, the b... |

34 | The heat kernel as the pagerank of a graph.
- Chung
- 2007
(Show Context)
Citation Context ... [147, 13], and they have roughly the same kind of quadratic approximation guarantees as the global spectral method, but they have computational cost is proportional to the size of the obtained piece =-=[34, 36, 35]-=-. 3 The Network Community Profile Plot In this section, we discuss the network community profile plot (NCP plot), which measures the quality of network communities at different size scales. We start i... |

31 | The volume of the giant component of a random graph with given expected degrees. - Chung, Lu - 2006 |

30 |
Complex graphs and networks, volume 107
- Chung, Lu
- 2006
(Show Context)
Citation Context ...t. (b) Network structure as suggested by our experiments. We have also examined in detail the structure of our social and information networks. We have observed that an “jellyfish” or “octopus” model =-=[33, 7]-=- provides a rough first approximation to structure of many of the networks we have examined. That is, most networks may be viewed as having a “core,” with no obvious underlying geometry and which cont... |

27 |
Power Laws, Highly Optimized Tolerance and Generalized Source Coding,” Phys.
- Carlson, Doyle
- 2000
(Show Context)
Citation Context ...lloy and Reed [121, 122] and also Aiello, Chung, and Lu [7, 8]. This model is also different than generative models such as preferential attachment models [9, 124, 25] or models based on optimization =-=[56, 57, 60]-=-, although common to all of these generative models is that they attempt to reproduce empirically-observed power-law behavior [11, 61, 27, 126, 49].) In this random graph model, the expected average d... |

26 |
S.: A flow-based method for improving the expansion or conductance of graph cuts
- Lang, Rao
- 2004
(Show Context)
Citation Context ...pute different approximations to the NCP plot. We employ two procedures: first, Metis+MQI, i.e., the graph partitioning package Metis [17] followed by the flow-based MQI post-processing procedure MQI =-=[19]-=-, which taken together returns sets that have very good conductance values; and second, the Local Spectral Algorithm [3], which returns sets that are somewhat “regularized” (more internally “coherent”... |

25 | Engineering graph clustering: Models and experimental evaluation,”
- Brandes, Gaertler, et al.
- 2007
(Show Context)
Citation Context ...these group further into super-communities. A quite different approach that has received a great deal of attention (and that will be central to our analysis) is based on ideas from graph partitioning =-=[143, 26]-=-. In this case, the network is a modeled as simple undirected graph, where nodes and edges have no attributes, and a partition of the graph is determined by optimizing a merit function. The graph part... |

24 |
Community detection as an inference problem
- Hastings
- 2006
(Show Context)
Citation Context ... graphs have high-modularity subsets and that there exists a size scale below which communities cannot be identified. In part as a response to this, some recent work has had a more statistical flavor =-=[85, 137, 141, 93, 130]-=-. In light of our results, this work seems promising, both due to potential “overfitting” issues arising from the extreme sparsity of the networks, and also due to the empirically-promising regulariza... |

20 | High degree vertices and eigenvalues in the preferential attachment graph
- Flaxman, Frieze, et al.
(Show Context)
Citation Context ...xpanders [87], which we also empirically observe. Eigenvalues of power law graphs have also been studied by Mihail and Papadimitriou [117], Chung, Lu, Vu [43, 44, 45], and Flaxman, Frieze, and Fenner =-=[69]-=-.56 Leskovec et al. 8 Conclusion We investigated statistical properties of sets of nodes in large real-world social and information networks that could plausibly be interpreted as good communities, a... |

19 | Characterization and modeling of proteinprotein interaction networks
- Colizza, Flammini, et al.
(Show Context)
Citation Context ...am 199,308 951,649 0.39 0.87 9.55 430.74 0.00 7 3.83 Users-to-URLs they visited [123] Biological networks Bio-Proteins 4,626 14,801 0.72 0.91 6.40 24.25 0.12 12 4.24 Yeast protein interaction network =-=[50]-=- Bio-Yeast 1,458 1,948 0.37 0.51 2.67 7.13 0.14 19 6.89 Yeast protein interaction network data [91] Bio-YeastP0.001 353 1,517 0.73 0.93 8.59 20.18 0.57 11 4.33 Yeast protein-protein interaction map [1... |

18 |
O( √ log n) approximation to sparsest cut
- Arora, Hazan, et al.
- 2004
(Show Context)
Citation Context ...at uses semidefinite programming to find a solution that is within O( √ log n) of optimal [17]. This paper sparked a flurry of theoretical research on a family of closely related algorithms including =-=[15, 98, 16]-=-, all of which can be informally described as combinations of spectral and flow-based techniques which exploit their complementary strengths. However, none of those algorithms are currently practical ... |

15 |
Four proofs for the cheeger inequality and graph partition algorithms
- Chung
- 2010
(Show Context)
Citation Context ... [147, 13], and they have roughly the same kind of quadratic approximation guarantees as the global spectral method, but they have computational cost is proportional to the size of the obtained piece =-=[34, 36, 35]-=-. 3 The Network Community Profile Plot In this section, we discuss the network community profile plot (NCP plot), which measures the quality of network communities at different size scales. We start i... |

13 |
The diameter of sparse random graphs, Random Structures Algorithms
- Fernholz, Ramachandran
(Show Context)
Citation Context ...onstructions in their proofs is complicated, but they have a similar flavor to the core-and-whiskers structure we have empirically observed. Similar results were observed by Fernholz and Ramachandran =-=[63]-=-, whose analysis separately considered the 2-core of these graphs and then the residual pieces. They show that a typical longest shortest path between two vertices u and v consists of a path of length... |

11 |
Random walks and local cuts in graphs. Linear Algebra and its applications
- Chung
(Show Context)
Citation Context ... [147, 13], and they have roughly the same kind of quadratic approximation guarantees as the global spectral method, but they have computational cost is proportional to the size of the obtained piece =-=[34, 36, 35]-=-. 3 The Network Community Profile Plot In this section, we discuss the network community profile plot (NCP plot), which measures the quality of network communities at different size scales. We start i... |

10 |
Hierarchical Graph Maps
- Abello
(Show Context)
Citation Context ...veloping hiererchical graph generation models, i.e., models in which a hierarchy is given and the linkage probability between pairs of nodes decreases as a function of their distance in the hierarchy =-=[136, 135, 30, 6, 109, 47, 156, 110]-=-. The motivation for this comes largely from the intuition that nodes in social networks and are joined in to small relatively tight groups that are then further join into larger groups, and so on. As... |

9 |
The mixing time of the giant component of a random graph. Available at http://www.arxiv.org/abs/math.PR/0610459
- Benjamini, Kozma, et al.
- 2006
(Show Context)
Citation Context ... Of particular interest to us are recent results on the mixing time of random walks in this p ∈ (1/n, logn/n) regime of the Gnp (and the related Gnm) random graph model. Benjamini, Kozma, and Wormald =-=[22]-=- and Fountoulakis and Reed [74, 73] have established rapid mixing results by proving structural results about these very sparse graphs. In particular, they proved that these graphs may be viewed as a ... |

7 |
Life with alacrity: The Dunbar number as a limit to group sizes. Retrieved April 19, 2010 from http://www.lifewithalacrity.com/2004/03/the_dunbar_numb.html Åström
- Allen
- 2004
(Show Context)
Citation Context ... stronger internal than external connectivity) is rather questionable. This seems to agree with Dunbar [11] who predicted that 150 is the upper limit on the size of a human community. Moreover, Allen =-=[2]-=- gives evidence that on-line communities have around 60 members, and on-line discussion forums start to break down at about 80 active contributors. Church congregations, military companies, divisions ... |

7 | Experimental evaluation of parametric max-flow algorithms
- Babenko, D, et al.
(Show Context)
Citation Context ...ic max flow problems, or sequences of ordinary max flow problems. Parametric max flow (with MQI described as one of the applications) was introduced by [76], and recent empirical work is described in =-=[18]-=-, but currently there is no publically available code that scales to the sizes we need. Ordinary max flow is a very thoroughly studied problem. Currently, the best theoretical time bounds are [81], th... |

5 |
Faster mixing and small bottlenecks
- Fountoulakis, Reed
(Show Context)
Citation Context ...re recent results on the mixing time of random walks in this p ∈ (1/n, logn/n) regime of the Gnp (and the related Gnm) random graph model. Benjamini, Kozma, and Wormald [22] and Fountoulakis and Reed =-=[74, 73]-=- have established rapid mixing results by proving structural results about these very sparse graphs. In particular, they proved that these graphs may be viewed as a “core” expander subgraph, whose del... |

4 |
Expander geometric embeddings, and graph partitionings
- Arora, Rao, et al.
- 2004
(Show Context)
Citation Context |

4 |
Structural Inference of Hierarchies in Networks. arXiv:physics/0610051
- Clauset, Moore, et al.
- 2006
(Show Context)
Citation Context ...veloping hiererchical graph generation models, i.e., models in which a hierarchy is given and the linkage probability between pairs of nodes decreases as a function of their distance in the hierarchy =-=[136, 135, 30, 6, 109, 47, 156, 110]-=-. The motivation for this comes largely from the intuition that nodes in social networks and are joined in to small relatively tight groups that are then further join into larger groups, and so on. As... |

3 |
formation in large social networks: membership, growth, and evolution
- Group
(Show Context)
Citation Context ...profile plot. 696WWW 2008 / Refereed Track: Social Networks & Web 2.0 - Discovery and Evolution of Communities • Social nets Nodes Edges Description LiveJournal 4,843,953 42,845,684 Blog friendships =-=[5]-=- Epinions 75,877 405,739 Trust network [28] CA-DBLP 317,080 1,049,866 Co-authorship [5] • Information (citation) networks Cit-hep-th 27,400 352,021 Arxiv hep-th [14] AmazonProd 524,371 1,491,793 Amazo... |

3 |
http://www-personal.umich.edu/ ∼ mejn/netdata
- data
- 2007
(Show Context)
Citation Context ...nal networks; IMDB networks; and Amazon networks. We have also examined numerous small social networks that have been used as a testbed for community detection algorithms (e.g., Zachary’s karate club =-=[157, 5]-=-, interactions between dolphins [116, 5], interactions between monks [142, 5], Newman’s network science network [127, 5], etc.), numerous simple network models in which by design there is an underlyin... |

2 |
The evolution of the mixing rate. arXiv:math/0701474
- Fountoulakis, Reed
- 2007
(Show Context)
Citation Context ...re recent results on the mixing time of random walks in this p ∈ (1/n, logn/n) regime of the Gnp (and the related Gnm) random graph model. Benjamini, Kozma, and Wormald [22] and Fountoulakis and Reed =-=[74, 73]-=- have established rapid mixing results by proving structural results about these very sparse graphs. In particular, they proved that these graphs may be viewed as a “core” expander subgraph, whose del... |