Results 1  10
of
10
The input/output complexity of triangle enumeration
 In PODS'14
, 2014
"... ar ..."
(Show Context)
Listing triangles
 In Automata, Languages, and Programming  41st International Colloquium, ICALP 2014
"... Abstract. We present new algorithms for listing triangles in dense and sparse graphs. The running time of our algorithm for dense graphs is Õ(nω + n3(ω−1)/(5−ω)t2(3−ω)/(5−ω)), and the running time of the algorithm for sparse graphs is Õ(m2ω/(ω+1) + m3(ω−1)/(ω+1)t(3−ω)/(ω+1)), where n is the numbe ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We present new algorithms for listing triangles in dense and sparse graphs. The running time of our algorithm for dense graphs is Õ(nω + n3(ω−1)/(5−ω)t2(3−ω)/(5−ω)), and the running time of the algorithm for sparse graphs is Õ(m2ω/(ω+1) + m3(ω−1)/(ω+1)t(3−ω)/(ω+1)), where n is the number of vertices, m is the number of edges, t is the number of triangles to be listed, and ω < 2.373 is the exponent of fast matrix multiplication. With the current bound on ω, the running times of our algorithms are Õ(n2.373 +n1.568 t0.478) and Õ(m1.408 +m1.222 t0.186), respectively. We first obtain randomized algorithms with the desired running times and then derandomize them using sparse recovery techniques. If ω = 2, the running times of the algorithms become Õ(n2 + nt2/3) and Õ(m4/3 +mt1/3), respectively. In particular, if ω = 2, our algorithm lists m triangles in Õ(m4/3) time. Pǎtraşcu (STOC 2010) showed that Ω(m4/3−o(1)) time is required for listing m triangles, unless there exist subquadratic algorithms for 3SUM. We show that unless one can solve quadratic equation systems over a finite field significantly faster than the brute force algorithm, our triangle listing runtime bounds are tight assuming ω = 2, also for graphs with more triangles. 1
Multicore Triangle Computations Without Tuning
"... Abstract—Triangle counting and enumeration has emerged as a basic tool in largescale network analysis, fueling the development of algorithms that scale to massive graphs. Most of the existing algorithms, however, are designed for the distributedmemory setting or the externalmemory setting, and ca ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Triangle counting and enumeration has emerged as a basic tool in largescale network analysis, fueling the development of algorithms that scale to massive graphs. Most of the existing algorithms, however, are designed for the distributedmemory setting or the externalmemory setting, and cannot take full advantage of a multicore machine, whose capacity has grown to accommodate even the largest of realworld graphs. This paper describes the design and implementation of simple and fast multicore parallel algorithms for exact, as well as approximate, triangle counting and other triangle computations that scale to billions of nodes and edges. Our algorithms are provably cachefriendly, easy to implement in a language that supports dynamic parallelism, such as Cilk Plus or OpenMP, and do not require parameter tuning. On a 40core machine with twoway hyperthreading, our parallel exact global and local triangle counting algorithms obtain speedups of 17–50x on a set of realworld and synthetic graphs, and are faster than previous parallel exact triangle counting algorithms. We can compute the exact triangle count of the Yahoo Web graph (over 6 billion edges) in under 1.5 minutes. In addition, for approximate triangle counting, we are able to approximate the count for the Yahoo graph to within 99.6 % accuracy in under 10 seconds, and for a given accuracy we are much faster than existing parallel approximate triangle counting implementations. I.
Influential Community Search in Large Networks
"... Community search is a problem of finding densely connected subgraphs that satisfy the query conditions in a network, which has attracted much attention in recent years. However, all the previous studies on community search do not consider the influence of a community. In this paper, we introduce a ..."
Abstract
 Add to MetaCart
(Show Context)
Community search is a problem of finding densely connected subgraphs that satisfy the query conditions in a network, which has attracted much attention in recent years. However, all the previous studies on community search do not consider the influence of a community. In this paper, we introduce a novel community model called kinfluential community based on the concept of kcore, which can capture the influence of a community. Based on the new community model, we propose a lineartime online search algorithm to find the topr kinfluential communities in a network. To further speed up the influential community search algorithm, we devise a linearspace index structure which supports efficient search of the topr kinfluential communities in optimal time. We also propose an efficient algorithm to maintain the index when the network is frequently updated. We conduct extensive experiments on 7 realworld large networks, and the results demonstrate the efficiency and effectiveness of the proposed methods. 1.
E3/2/(
"... We consider the wellknown problem of enumerating all triangles of an undirected graph. Our focus is on determining the input/output (I/O) complexity of this problem. Let E be the number of edges, M < E the size of internal memory, and B the block size. The best results obtained previously are so ..."
Abstract
 Add to MetaCart
(Show Context)
We consider the wellknown problem of enumerating all triangles of an undirected graph. Our focus is on determining the input/output (I/O) complexity of this problem. Let E be the number of edges, M < E the size of internal memory, and B the block size. The best results obtained previously are sort(E3/2) I/Os (Dementiev, PhD thesis 2006) and O (E2/(MB)) I/Os (Hu et al., SIGMOD 2013), where sort(n) denotes the number of I/Os for sorting n items. We improve the I/O complexity to O
KAIST, Korea
"... We describe an optimal randomized MapReduce algorithm for the problem of triangle enumeration that requires O E3/2/(M m) rounds, where m denotes the expected memory size of a reducer and M the total available space. This generalizes the wellknown vertex partitioning approach proposed in (Suri an ..."
Abstract
 Add to MetaCart
(Show Context)
We describe an optimal randomized MapReduce algorithm for the problem of triangle enumeration that requires O E3/2/(M m) rounds, where m denotes the expected memory size of a reducer and M the total available space. This generalizes the wellknown vertex partitioning approach proposed in (Suri and Vassilvitskii, 2011) to multiple rounds, significantly increasing the size of the graphs that can be handled on a given system. We also give new theoretical (high probability) bounds on the work needed in each reducer, addressing the “curse of the last reducer”. Indeed, our work is the first to give guarantees on the maximum load of each reducer for an arbitrary input graph. Our experimental evaluation shows the scalability of our approach, that it is competitive with existing methods improving the performance by a factor up to 2×, and that it can significantly increase the size of datasets that can be processed.
Declaration
, 2014
"... I Ilias Giechaskiel of Magdalene College, being a candidate for the M.Phil in Advanced Computer Science, hereby declare that this report and the work described in it are my own work, unaided except as may be specified below, and that the report does not contain material that has already been used to ..."
Abstract
 Add to MetaCart
(Show Context)
I Ilias Giechaskiel of Magdalene College, being a candidate for the M.Phil in Advanced Computer Science, hereby declare that this report and the work described in it are my own work, unaided except as may be specified below, and that the report does not contain material that has already been used to any substantial extent for a comparable purpose. Total word count: 14,311 (excluding Appendices A and B)
PDTL: Parallel and Distributed Triangle Listing for Massive Graphs
, 2015
"... Abstract — This paper presents the first distributed triangle listing algorithm with provable CPU, I/O, Memory, and Network bounds. Finding all triangles (3cliques) in a graph has numerous applications for density and connectivity metrics. The majority of existing algorithms for massive graphs are ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract — This paper presents the first distributed triangle listing algorithm with provable CPU, I/O, Memory, and Network bounds. Finding all triangles (3cliques) in a graph has numerous applications for density and connectivity metrics. The majority of existing algorithms for massive graphs are sequential processing and distributed versions of algorithms do not guarantee their CPU, I/O, Memory or Network requirements. Our Parallel and Distributed Triangle Listing (PDTL) framework focuses on efficient externalmemory access in distributed environments instead of fitting subgraphs into memory. It works by performing efficient orientation and loadbalancing steps, and replicating graphs across machines by using an extended version of Hu et al.’s Massive Graph Triangulation algorithm. As a result, PDTL suits a variety of computational environments, from singlecore machines to highend clusters. PDTL computes the exact triangle count on graphs of over 6B edges and 1B vertices (e.g. Yahoo graphs), outperforming and using fewer resources than the stateoftheart systems PowerGraph, OPT, and PATRIC by 2 × to 4×. Our approach highlights the importance of I/O considerations in a distributed environment, which has received less attention in the graph processing literature. I.
Sharedmemory parallelism can be simple, . . .
, 2015
"... Parallelism is the key to achieving high performance in computing. However, writing efficient and scalable parallel programs is notoriously difficult, and often requires significant expertise. To address this challenge, it is crucial to provide programmers with highlevel tools to enable them to de ..."
Abstract
 Add to MetaCart
Parallelism is the key to achieving high performance in computing. However, writing efficient and scalable parallel programs is notoriously difficult, and often requires significant expertise. To address this challenge, it is crucial to provide programmers with highlevel tools to enable them to develop solutions efficiently, and at the same time emphasize the theoretical and practical aspects of algorithm design to allow the solutions developed to run efficiently under all possible settings. This thesis addresses this challenge using a threepronged approach consisting of the design of sharedmemory programming techniques, frameworks, and algorithms for important problems in computing. The thesis provides evidence that with appropriate programming techniques, frameworks, and algorithms, sharedmemory programs can be simple, fast, and scalable, both in theory and in practice. The results developed in this thesis serve to ease the transition into the multicore era. The first part of this thesis introduces tools and techniques for deterministic