Results 1 - 10
of
14
Scalable multi-threaded community detection in social networks
- in Workshop on Multithreaded Architectures and Applications (MTAAP
, 2012
"... Abstract—The volume of existing graphstructured data requires improved parallel tools and algorithms. Finding communities, smaller subgraphs densely connected within the subgraph than to the rest of the graph, plays a role both in developing new parallel algorithms as well as opening smaller portion ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
(Show Context)
Abstract—The volume of existing graphstructured data requires improved parallel tools and algorithms. Finding communities, smaller subgraphs densely connected within the subgraph than to the rest of the graph, plays a role both in developing new parallel algorithms as well as opening smaller portions of the data to current analysis tools. We improve performance of our parallel community detection algorithm by 20 % on the massively multithreaded Cray XMT, evaluate its performance on the next-generation Cray XMT2, and extend its reach to Intel-based platforms with OpenMP. To our knowledge, not only is this the first massively parallel community detection algorithm but also the only such algorithm that achieves excellent performance and good parallel scalability across all these platforms. Our implementation analyzes a moderate sized graph with 105 million vertices and 3.3 billion edges in around 500 seconds on a four processor, 80-logical-core Intel-based system and 1100 seconds on a 64-processor Cray XMT2.
Parallel Overlapping Community Detection with SLPA
"... Abstract—Social networks consist of various communities that host members sharing common characteristics. Often some members of one community are also members of other communities. Such shared membership of different communities leads to overlapping communities. Detecting such overlapping communitie ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Abstract—Social networks consist of various communities that host members sharing common characteristics. Often some members of one community are also members of other communities. Such shared membership of different communities leads to overlapping communities. Detecting such overlapping communities is a challenging and computationally intensive problem. In this paper, we investigate the usability of high performance computing in the area of social networks and community detection. We present highly scalable variants of a community detection algorithm called Speaker-listener Label Propagation Algorithm (SLPA). We show that despite of irregular data dependencies in the computation, parallel computing paradigms can significantly speed up the detection of overlapping communities of social networks which is computationally expensive. We show by experiments, how various parallel computing architectures can be utilized to analyze large social network data on both shared memory machines and distributed memory machines, such as IBM Blue Gene.
Parallel heuristics for scalable community detection
- in: Proc. International Workshop on Multithreaded Architectures and Applications, Vol. In Press
, 2014
"... Community detection has become a fundamental operation in numerous graph-theoretic applications. It is used to reveal natural divisions that exist within real world networks without imposing prior size or cardinality con-straints on the set of communities. Despite its potential for application, ther ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Community detection has become a fundamental operation in numerous graph-theoretic applications. It is used to reveal natural divisions that exist within real world networks without imposing prior size or cardinality con-straints on the set of communities. Despite its potential for application, there is only limited support for community detection on large-scale parallel com-puters, largely owing to the irregular and inherently sequential nature of the underlying heuristics. In this paper, we present parallelization heuristics for fast community detection using the Louvain method as the serial template. The Louvain method is an iterative heuristic for modularity optimization. Originally developed by Blondel et al. in 2008, the method has become in-creasingly popular owing to its ability to detect high modularity community partitions in a fast and memory-efficient manner. However, the method is also inherently sequential, thereby limiting its scalability. Here, we observe certain key properties of this method that present challenges for its paral-lelization, and consequently propose heuristics that are designed to break the ∗Corresponding author
Parallelizing SLPA for Scalable Overlapping Community Detection
, 2015
"... Communities in networks are groups of nodes whose connections to the nodes in a community are stronger than with the nodes in the rest of the network. Quite often nodes participate in multiple communities; that is, communities can overlap. In this paper, we first analyze what other researchers have ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Communities in networks are groups of nodes whose connections to the nodes in a community are stronger than with the nodes in the rest of the network. Quite often nodes participate in multiple communities; that is, communities can overlap. In this paper, we first analyze what other researchers have done to utilize high performance computing to perform efficient community detection in social, biological, and other networks. We note that detection of overlapping communities is more computationally intensive than disjoint community detection, and the former presents new challenges that algorithm designers have to face. Moreover, the efficiency of many existing algorithms grows superlinearly with the network size making them unsuitable to process large datasets. We use the Speaker-Listener Label Propagation Algorithm (SLPA) as the basis for our parallel overlapping community detection implementation. SLPA provides near linear time overlapping community detection and is well suited for parallelization.We explore the benefits of a multithreaded programming paradigm and show that it yields a significant performance gain over sequential execution while preserving the high quality of community detection. The algorithm was tested on four real-world datasets with up to 5.5 million nodes and 170 million edges. In order to assess the quality of community detection, at least 4 different metrics were used for each of the datasets.
Branch-avoiding graph algorithms
- Symposium on Parallelism in Algorithms and Architectures (SPAA
, 2015
"... This paper quantifies the impact of branches and branch mispredictions on the single-core performance for two classes of graph problems. Specifically, we consider classical algo-rithms for computing connected components and breadth-first search (BFS). We show that branch mispredictions are costly an ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
This paper quantifies the impact of branches and branch mispredictions on the single-core performance for two classes of graph problems. Specifically, we consider classical algo-rithms for computing connected components and breadth-first search (BFS). We show that branch mispredictions are costly and can reduce performance by as much as 30%-50%. This insight suggests that one should seek graph algorithms and implementations that avoid branches. As a proof-of-concept, we devise such implementations for both the classic top-down algorithm for BFS and the Shiloach-Vishkin algorithm for connected components. We evaluate these implementations on current x86 and ARM-based processors to show the efficacy of the approach. Our results suggest how both compiler writers and architects might exploit this insight to improve graph processing sys-tems more broadly and create better systems for such prob-lems. 1.
Scalable Flow-Based Community Detection for Large-Scale Network Analysis
"... Abstract—Community-detection is a powerful approach to un-cover important structures in large networks. Since networks of-ten describe flow of some entity, flow-based community-detection methods are particularly interesting. One such algorithm is called Infomap, which optimizes the objective functio ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract—Community-detection is a powerful approach to un-cover important structures in large networks. Since networks of-ten describe flow of some entity, flow-based community-detection methods are particularly interesting. One such algorithm is called Infomap, which optimizes the objective function known as the map equation. While Infomap is known to be an effective algo-rithm, its serial implementation cannot take advantage of multi-core processing in modern computers. In this paper, we propose a novel parallel generalization of Infomap called RelaxMap. This al-gorithm relaxes concurrency assumptions to avoid lock overhead, achieving 70 % parallel efficiency in shared-memory multicore experiments while exhibiting similar convergence properties and finding similar community structures as the serial algorithm. We evaluate our approach on a variety of real graph datasets as well as synthetic graphs produced by a popular graph generator used for benchmarking community detection algorithms. We describe the algorithm, the experiments, and some emerging research directions in high-performance community detection on massive graphs. I.
Parallel Toolkit for Measuring the Quality of Network Community Structure
"... Abstract—Many networks display community structure which identifies groups of nodes within which connections are denser than between them. Detecting and characterizing such community structure, which is known as community detection, is one of the fundamental issues in the study of network systems. I ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract—Many networks display community structure which identifies groups of nodes within which connections are denser than between them. Detecting and characterizing such community structure, which is known as community detection, is one of the fundamental issues in the study of network systems. It has received a considerable attention in the last years. Numerous techniques have been developed for both efficient and effective community detection. Among them, the most efficient algorithm is the label propagation algorithm whose computational complexity is O(jEj). Although it is linear in the number of edges, the running time is still too long for very large networks, creating the need for parallel community detection. Also, computing commu-nity quality metrics for community structure is computationally expensive both with and without ground truth. However, to date we are not aware of any effort to introduce parallelism for this problem. In this paper, we provide a parallel toolkit1 to calculate the values of such metrics. We evaluate the parallel algorithms on both distributed memory machine and shared memory machine. The experimental results show that they yield a significant performance gain over sequential execution in terms of total running time, speedup, and efficiency. I.
2013 IEEE 27th International Symposium on Parallel & Distributed Processing Workshops and PhD Forum Multithreaded Community Monitoring for Massive Streaming Graph Data
"... Abstract—Analyzing static snapshots of massive, graph-structured data cannot keep pace with the growth of social networks, financial transactions, and other valuable data sources. Current state-ofthe-art industrial methods analyze these streaming sources using only simple, aggregate metrics. There a ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—Analyzing static snapshots of massive, graph-structured data cannot keep pace with the growth of social networks, financial transactions, and other valuable data sources. Current state-ofthe-art industrial methods analyze these streaming sources using only simple, aggregate metrics. There are few existing scalable algorithms for monitoring complex global quantities like decomposition into community structure. Using our framework STING, we present the first known parallel algorithm specifically for monitoring communities in this massive, streaming, graph-structured data. Our algorithm performs incremental re-agglomeration rather than starting from scratch after each batch of changes, reducing the problem’s size to that of the change rather than the entire graph. We analyze our initial implementation’s performance on multithreaded platforms for execution time and latency. On an Intel-based multithreaded platform, our algorithm handles up to 100 million updates per second on social networks with one to 30 million edges, providing a speed-up from 4 × to 3700 × over statically recomputing the decomposition after each batch of changes. Possibly because of our artificial graph generator, resulting communities ’ modularity varies little from the initial graph. I.