#### DMCA

## Wedge sampling for computing clustering coefficients and triangle counts on large graphs (2014)

Venue: | Statistical Analysis and Data Mining |

Citations: | 3 - 0 self |

### Citations

3701 | Social capital in the creation of human capital
- Coleman
- 1988
(Show Context)
Citation Context ...th those similar to themselves) and transitivity (friends of friends become friends). The triangle structure of a graph is commonly used in the social sciences for positing various theses on behavior =-=[11, 23, 7, 14]-=-. Triangles have also been used in graph mining applications such as spam detection and finding common topics on the WWW [13, 4]. A new generative model, Blocked Two-Level Erdös-Rényi (BTER) [26], c... |

3606 |
Social Network Analysis: Methods and Applications
- Wasserman, Faust
- 1994
(Show Context)
Citation Context ... with graphs as well. The following are classic quantities of interest, and are defined on undirected graphs. (Formal definitions are given in Tab. 1.) • Transitivity or global clustering coefficient =-=[34]-=-: This is κ = 3T/W , and is the fraction of wedges that are closed (i.e., participate in a triangle). Intuitively, it measures how often friends of friends are also friends, and it is the most common ... |

3313 |
Collective dynamics of ‘small-world’ networks
- Watts, Strogatz
- 1998
(Show Context)
Citation Context ...es that are closed (i.e., participate in a triangle). Intuitively, it measures how often friends of friends are also friends, and it is the most common triadic measure. • Local clustering coefficient =-=[35]-=-: The clustering coefficient of vertex v (denoted by Cv) is the fraction of wedges centered at v that are closed. The average of Cv over all vertices v is the local clustering coefficient, denoted by ... |

2207 | Probability inequalities for sums of bounded random variables
- Hoeffding
- 1963
(Show Context)
Citation Context ... w2, . . . , wk, with associated random variables X1, X2, . . . , Xk. Define X̄ = 1 k ∑ i≤kXi as our estimate. The Chernoff-Hoeffding bounds give guarantees on X̄, as follows. Theorem 2.1. (Hoeffding =-=[15]-=-) Let X1, X2, . . . , Xk be independent random variables with 0 ≤ Xi ≤ 1 for all i = 1, . . . , k. Define X̄ = 1k ∑k i=1Xi. Let µ = E[X̄]. Then for ε ∈ (0, 1), we have Pr {|X̄ − µ| ≥ ε} ≤ 2 exp(−2kε2)... |

1058 |
Social capital: Its origins and applications in modern sociology
- Portes
- 1998
(Show Context)
Citation Context ...th those similar to themselves) and transitivity (friends of friends become friends). The triangle structure of a graph is commonly used in the social sciences for positing various theses on behavior =-=[11, 23, 7, 14]-=-. Triangles have also been used in graph mining applications such as spam detection and finding common topics on the WWW [13, 4]. A new generative model, Blocked Two-Level Erdös-Rényi (BTER) [26], c... |

694 | B.: Measurement and analysis of online social networks
- Mislove, Marcon, et al.
- 2007
(Show Context)
Citation Context ...scale) for transitivity computation with increasing numbers of wedge samples. an 8GB memory. We performed our experiments on 13 graphs from SNAP [37] and per private communication with the authors of =-=[21]-=-. In all cases, directionality is ignored, and repeated and self-edges are omitted. The properties of these matrices are presented in Tab. 2. The last column reports the times for the enumeration algo... |

395 |
Structural holes and good ideas
- Burt
- 2004
(Show Context)
Citation Context ...th those similar to themselves) and transitivity (friends of friends become friends). The triangle structure of a graph is commonly used in the social sciences for positing various theses on behavior =-=[11, 23, 7, 14]-=-. Triangles have also been used in graph mining applications such as spam detection and finding common topics on the WWW [13, 4]. A new generative model, Blocked Two-Level Erdös-Rényi (BTER) [26], c... |

148 | Reductions in streaming algorithms, with an application to counting triangles in graphs.
- Bar-Yossef, Kumar, et al.
- 2002
(Show Context)
Citation Context ...r ones that can be loaded into memory. However, the Doulion estimate can suffer from high variance [36]. Alternative sampling mechanisms have been proposed for streaming and semi-streaming algorithms =-=[3, 17, 4, 6]-=-. Yet, all these fast sampling methods only estimate the number of triangles and give no information about other triadic measures. In subsequent work by the authors of this paper, a Hadoop implementat... |

113 |
Arboricity and subgraph listing algorithms
- Chiba, Nishizeki
- 1985
(Show Context)
Citation Context ...lustering coefficients, {Cd} for d = 1, . . . , dmax. The only other competing method that can compute {Cd} is an exhaustive enumeration. We compare with the basic fast enumeration algorithm given by =-=[8, 25]-=- (which has been studied and reinvented by [10, 29]). • Computing triangles per degree: Wedge sampling can also be employed to sample random triangles, including the application of estimating the numb... |

76 |
Curvature of co-links uncovers hidden thematic layers in the World Wide Web
- Eckmann, Moses
- 2002
(Show Context)
Citation Context ... in the social sciences for positing various theses on behavior [11, 23, 7, 14]. Triangles have also been used in graph mining applications such as spam detection and finding common topics on the WWW =-=[13, 4]-=-. A new generative model, Blocked Two-Level Erdös-Rényi (BTER) [26], can capture triangle behavior in real-world graphs, but necessarily requires the degree-wise clustering coefficients as input. Re... |

72 | Counting triangles and the curse of the last reducer.
- Suri, Vassilvitskii
- 2011
(Show Context)
Citation Context ...ax. The only other competing method that can compute {Cd} is an exhaustive enumeration. We compare with the basic fast enumeration algorithm given by [8, 25] (which has been studied and reinvented by =-=[10, 29]-=-). • Computing triangles per degree: Wedge sampling can also be employed to sample random triangles, including the application of estimating the number of triangles containing one (or more) vertices o... |

69 | Efficient semi-streaming algorithms for local triangle counting in massive graphs
- BECCHETTI, BOLDI, et al.
(Show Context)
Citation Context ... in the social sciences for positing various theses on behavior [11, 23, 7, 14]. Triangles have also been used in graph mining applications such as spam detection and finding common topics on the WWW =-=[13, 4]-=-. A new generative model, Blocked Two-Level Erdös-Rényi (BTER) [26], can capture triangle behavior in real-world graphs, but necessarily requires the degree-wise clustering coefficients as input. Re... |

67 | Counting triangles in data streams.
- Buriol, Frahling, et al.
- 2006
(Show Context)
Citation Context ...r ones that can be loaded into memory. However, the Doulion estimate can suffer from high variance [36]. Alternative sampling mechanisms have been proposed for streaming and semi-streaming algorithms =-=[3, 17, 4, 6]-=-. Yet, all these fast sampling methods only estimate the number of triangles and give no information about other triadic measures. In subsequent work by the authors of this paper, a Hadoop implementat... |

53 | New streaming algorithms for counting triangles in graphs.
- Jowhari, Ghodsi
- 2005
(Show Context)
Citation Context ...r ones that can be loaded into memory. However, the Doulion estimate can suffer from high variance [36]. Alternative sampling mechanisms have been proposed for streaming and semi-streaming algorithms =-=[3, 17, 4, 6]-=-. Yet, all these fast sampling methods only estimate the number of triangles and give no information about other triadic measures. In subsequent work by the authors of this paper, a Hadoop implementat... |

52 | Doulion: Counting triangles in massive graphs with a coin
- TSOURAKAKIS, KANG, et al.
(Show Context)
Citation Context ...of magnitude slower than full enumeration. Most relevant to our work are sampling mechanisms. Tsourakakis et al. [30] started the use of sparsification methods, the most important of which is Doulion =-=[33]-=-. This method sparsifies the graph by keeping each edge with probability p; counts the triangles in the sparsified graph; and multiplies this count by p−3 to predict the number of triangles in the ori... |

50 | Approximating clustering coefficient and transitivity
- Schank, Wagner
(Show Context)
Citation Context ...s (to obtain the degree distribution). We discovered older independent work by Schank and Wagner that proposes the same wedge sampling idea for estimating the global and local clustering coefficients =-=[24]-=-. Our work extends that in several directions, including several other uses for wedge sampling (such as directed triangle 2 counting, random triangle sampling, degree-wise clustering coefficients) and... |

44 | Main-memory triangle computations for very large (sparse (power-law)) graphs.
- Latapy
- 2008
(Show Context)
Citation Context ...nce again, the versatility of wedge sampling is used to develop a method for counting all types of directed triangles. 1.3 Related Work There has been significant work on enumeration of all triangles =-=[8, 25, 20, 5, 9]-=-. Recent work by Cohen [10] and Suri and Vassilvitskii [29] give MapReduce implementations of these algorithms. Arifuzzaman et al. [1] give a massively parallel algorithm for computing clustering coef... |

29 |
Community structure and scale-free collections of Erdős-Rényi graphs
- Seshadhri, Kolda, et al.
- 2012
(Show Context)
Citation Context ... 7, 14]. Triangles have also been used in graph mining applications such as spam detection and finding common topics on the WWW [13, 4]. A new generative model, Blocked Two-Level Erdös-Rényi (BTER) =-=[26]-=-, can capture triangle behavior in real-world graphs, but necessarily requires the degree-wise clustering coefficients as input. Relationships among degrees of triangle vertices can also be used as a ... |

28 |
Triangle listing in massive networks and its applications.
- Chu
- 2011
(Show Context)
Citation Context ...nce again, the versatility of wedge sampling is used to develop a method for counting all types of directed triangles. 1.3 Related Work There has been significant work on enumeration of all triangles =-=[8, 25, 20, 5, 9]-=-. Recent work by Cohen [10] and Suri and Vassilvitskii [29] give MapReduce implementations of these algorithms. Arifuzzaman et al. [1] give a massively parallel algorithm for computing clustering coef... |

28 | Colorful triangle counting and a MapReduce implementation
- Pagh, Tsourakakis
(Show Context)
Citation Context ...n the sparsified graph; and multiplies this count by p−3 to predict the number of triangles in the original graph. Various theoretical analyses of this algorithm (and its variants) have been proposed =-=[19, 31, 22]-=-. One of the main benefits of Doulion is that it reduces large graphs to smaller ones that can be loaded into memory. However, the Doulion estimate can suffer from high variance [36]. Alternative samp... |

21 | C.E.: Efficient Triangle Counting in Large Graphs via Degree-based Vertex Partitioning
- Kolountzakis, Miller, et al.
(Show Context)
Citation Context ...n the sparsified graph; and multiplies this count by p−3 to predict the number of triangles in the original graph. Various theoretical analyses of this algorithm (and its variants) have been proposed =-=[19, 31, 22]-=-. One of the main benefits of Doulion is that it reduces large graphs to smaller ones that can be loaded into memory. However, the Doulion estimate can suffer from high variance [36]. Alternative samp... |

20 |
Counting triangles in large graphs using randomized matrix trace estimation.
- Avron
- 2010
(Show Context)
Citation Context ...ce graphs even of moderate size (millions of vertices) can have an extremely large number of triangles (see, e.g., Tab. 2). Eigenvalue/trace based methods have been used by Tsourakakis [32] and Avron =-=[2]-=- to compute estimates of the total and per-degree number of triangles. However, computing eigenvalues (even just a few of them) is a compute-intensive task and quickly becomes intractable on large gra... |

18 |
Tsourakakis. Fast Counting of Triangles in Large Real Networks without Counting: Algorithms and Laws
- E
- 2008
(Show Context)
Citation Context ... expensive, since graphs even of moderate size (millions of vertices) can have an extremely large number of triangles (see, e.g., Tab. 2). Eigenvalue/trace based methods have been used by Tsourakakis =-=[32]-=- and Avron [2] to compute estimates of the total and per-degree number of triangles. However, computing eigenvalues (even just a few of them) is a compute-intensive task and quickly becomes intractabl... |

17 | A space efficient streaming algorithm for triangle counting using the birthday paradox
- Jha, Seshadhri, et al.
- 2013
(Show Context)
Citation Context ...r triadic measures. In subsequent work by the authors of this paper, a Hadoop implementation of these techniques is given in [18], and a streaming version of the wedge sampling method is presented in =-=[16]-=-. 3 Table 1: Graph notation and clustering coefficients for undirected graphs n number of vertices nd number of vertices of degree d m number of edges dv degree of vertex v Vd set of degree-d vertices... |

17 | Triangle sparsifiers
- Tsourakakis, Kolountzakis, et al.
(Show Context)
Citation Context ...n the sparsified graph; and multiplies this count by p−3 to predict the number of triangles in the original graph. Various theoretical analyses of this algorithm (and its variants) have been proposed =-=[19, 31, 22]-=-. One of the main benefits of Doulion is that it reduces large graphs to smaller ones that can be loaded into memory. However, the Doulion estimate can suffer from high variance [36]. Alternative samp... |

16 | PATRIC: A parallel algorithm for counting triangles in massive networks
- Arifuzzaman, Khan, et al.
- 2013
(Show Context)
Citation Context ...en significant work on enumeration of all triangles [8, 25, 20, 5, 9]. Recent work by Cohen [10] and Suri and Vassilvitskii [29] give MapReduce implementations of these algorithms. Arifuzzaman et al. =-=[1]-=- give a massively parallel algorithm for computing clustering coefficients. Enumeration algorithms however, can be very expensive, since graphs even of moderate size (millions of vertices) can have an... |

15 | Triadic measures on graphs: The power of wedge sampling
- Seshadhri, Pinar, et al.
- 2013
(Show Context)
Citation Context ...3; in Fig. 1, 3 – 4 – 5 is a triangle. We say a wedge is closed if its vertices form a triangle. Observe that each triangle consists of three closed wedges. ∗This manuscript is an extended version of =-=[28]-=-. This work was funded by the DARPA GRAPHS program and by the DOE ASCR Complex Distributed Interconnected Systems (CDIS) program, and Sandia’s Laboratory Directed Research & Development (LDRD) program... |

12 | Counting triangles in massive graphs with MapReduce
- Kolda, Pinar, et al.
(Show Context)
Citation Context ...ds only estimate the number of triangles and give no information about other triadic measures. In subsequent work by the authors of this paper, a Hadoop implementation of these techniques is given in =-=[18]-=-, and a streaming version of the wedge sampling method is presented in [16]. 3 Table 1: Graph notation and clustering coefficients for undirected graphs n number of vertices nd number of vertices of d... |

8 |
Graph twiddling in a MapReduce World. Comput Sci Eng. 2009;11(4):29–41. DOI:10.1109/MCSE.2009.120 Submitted for publication April 8, 2009. Accepted in revised form January 6
- Cohen
- 2010
(Show Context)
Citation Context ...ax. The only other competing method that can compute {Cd} is an exhaustive enumeration. We compare with the basic fast enumeration algorithm given by [8, 25] (which has been studied and reinvented by =-=[10, 29]-=-). • Computing triangles per degree: Wedge sampling can also be employed to sample random triangles, including the application of estimating the number of triangles containing one (or more) vertices o... |

8 |
Is a friend a friend?: Investigating the structure of friendship networks in virtual worlds
- Welles, Devender, et al.
(Show Context)
Citation Context |

8 | Spectral counting of triangles in power-law networks via element-wise sparsification
- Tsourakakis, Drineas, et al.
- 2009
(Show Context)
Citation Context ...e graphs. In our experiment, even computing the largest eigenvalue was multiple orders of magnitude slower than full enumeration. Most relevant to our work are sampling mechanisms. Tsourakakis et al. =-=[30]-=- started the use of sparsification methods, the most important of which is Doulion [33]. This method sparsifies the graph by keeping each edge with probability p; counts the triangles in the sparsifie... |

6 | Degree relations of triangles in real-world networks and graph models
- Durak, Pinar, et al.
- 2012
(Show Context)
Citation Context ...real-world graphs, but necessarily requires the degree-wise clustering coefficients as input. Relationships among degrees of triangle vertices can also be used as a descriptor of the underlying graph =-=[12]-=-. In this paper, we study the idea of wedge sampling, i.e., choosing random wedges (from a uniform distribution over all wedges) to compute various triadic measures on large-scale graphs. We provide p... |

5 |
Listing triangles in expected linear time on power law graphs with exponent at least 7/3
- Berry, Fosvedt, et al.
- 2011
(Show Context)
Citation Context ...nce again, the versatility of wedge sampling is used to develop a method for counting all types of directed triangles. 1.3 Related Work There has been significant work on enumeration of all triangles =-=[8, 25, 20, 5, 9]-=-. Recent work by Cohen [10] and Suri and Vassilvitskii [29] give MapReduce implementations of these algorithms. Arifuzzaman et al. [1] give a massively parallel algorithm for computing clustering coef... |

5 |
Network Analysis Project (SNAP),” available at http://snap.stanford.edu
- “Stanford
(Show Context)
Citation Context ... 2K 8K 32K Figure 3: Speed-up over enumeration (in log-scale) for transitivity computation with increasing numbers of wedge samples. an 8GB memory. We performed our experiments on 13 graphs from SNAP =-=[37]-=- and per private communication with the authors of [21]. In all cases, directionality is ignored, and repeated and self-edges are omitted. The properties of these matrices are presented in Tab. 2. The... |

4 |
counting and listing all triangles in large graphs, an experimental study
- Finding
- 2005
(Show Context)
Citation Context ...lustering coefficients, {Cd} for d = 1, . . . , dmax. The only other competing method that can compute {Cd} is an exhaustive enumeration. We compare with the basic fast enumeration algorithm given by =-=[8, 25]-=- (which has been studied and reinvented by [10, 29]). • Computing triangles per degree: Wedge sampling can also be employed to sample random triangles, including the application of estimating the numb... |

3 |
The importance of directed triangles with reciprocity: patterns and algorithms
- Seshadhri, Pinar, et al.
(Show Context)
Citation Context ...t finds all triangles. For degree-wise clustering coefficients, no other method was previously known. Triangles and transitivity in directed graphs have also been the subject of recent work (see e.g. =-=[27]-=- and references therein). In a directed graph, edges are ordered pairs of vertices of the form (i, j), indicating a link from node i to node j. When edges (i, j) and (j, i) exist, we say there is a re... |

3 |
Improved sampling for triangle counting with mapreduce
- Yoon, Kim
- 2011
(Show Context)
Citation Context ... proposed [19, 31, 22]. One of the main benefits of Doulion is that it reduces large graphs to smaller ones that can be loaded into memory. However, the Doulion estimate can suffer from high variance =-=[36]-=-. Alternative sampling mechanisms have been proposed for streaming and semi-streaming algorithms [3, 17, 4, 6]. Yet, all these fast sampling methods only estimate the number of triangles and give no i... |