Results 1  10
of
21
GPS: A Graph Processing System
"... GPS (for Graph Processing System) is a complete opensource system we developed for scalable, faulttolerant, and easytoprogram execution of algorithms on extremely large graphs. GPS is similar to Google’s proprietary Pregel system [MAB+ 11], with some useful additional functionality described in ..."
Abstract

Cited by 68 (3 self)
 Add to MetaCart
(Show Context)
GPS (for Graph Processing System) is a complete opensource system we developed for scalable, faulttolerant, and easytoprogram execution of algorithms on extremely large graphs. GPS is similar to Google’s proprietary Pregel system [MAB+ 11], with some useful additional functionality described in the paper. In distributed graph processing systems like GPS and Pregel, graph partitioning is the problem of deciding which vertices of the graph are assigned to which compute nodes. In addition to presenting the GPS system itself, we describe how we have used GPS to study the effects of different graph partitioning schemes. We present our experiments on the performance of GPS under different static partitioning schemes—assigning vertices to workers “intelligently ” before the computation starts—and with GPS’s dynamic repartitioning feature, which reassigns vertices to different compute nodes during the computation by observing their message sending patterns.
Optimizing Graph Algorithms on Pregellike Systems
, 2014
"... We study the problem of implementing graph algorithms efficiently on Pregellike systems, which can be surprisingly challenging. Standard graph algorithms in this setting can incur unnecessary inefficiencies such as slow convergence or high communication or computation cost, typically due to structu ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
(Show Context)
We study the problem of implementing graph algorithms efficiently on Pregellike systems, which can be surprisingly challenging. Standard graph algorithms in this setting can incur unnecessary inefficiencies such as slow convergence or high communication or computation cost, typically due to structural properties of the input graphs such as large diameters or skew in component sizes. We describe several optimization techniques to address these inefficiencies. Our most general technique is based on the idea of performing some serial computation on a tiny fraction of the input graph, complementing Pregel’s vertexcentric parallelism. We base our study on thorough implementations of several fundamental graph algorithms, some of which have, to the best of our knowledge, not been implemented on Pregellike systems before. The algorithms and optimizations we describe are fully implemented in our opensource Pregel implementation. We present detailed experiments showing that our optimization techniques improve runtime significantly on a variety of very large graph datasets.
Shattering and compressing networks for betweenness centrality
 In Proc. of SDM
, 2013
"... The betweenness metric has always been intriguing and used in many analyses. Yet, it is one of the most computationally expensive kernels in graph mining. For that reason, making betweenness centrality computations faster is an important and wellstudied problem. In this work, we propose the frame ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
(Show Context)
The betweenness metric has always been intriguing and used in many analyses. Yet, it is one of the most computationally expensive kernels in graph mining. For that reason, making betweenness centrality computations faster is an important and wellstudied problem. In this work, we propose the framework, BADIOS, which compresses a network and shatters it into pieces so that the centrality computation can be handled independently for each piece. Although BADIOS is designed and tuned for betweenness centrality, it can easily be adapted for other centrality metrics. Experimental results show that the proposed techniques can be a great arsenal to reduce the centrality computation time for various types and sizes of networks. In particular, it reduces the computation time of a 4.6 million edges graph from more than 5 days to less than 16 hours.
Scalable Complex Graph Analysis with the Knowledge Discovery Toolbox
 In Int. Conference on Acoustics, Speech, and Signal Processing
, 2012
"... The Knowledge Discovery Toolbox (KDT) enables domain experts to perform complex analyses of huge datasets on supercomputers using a highlevel language without grappling with the difficulties of writing parallel code, calling parallel libraries, or becoming a graph expert. KDT delivers competitive ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
The Knowledge Discovery Toolbox (KDT) enables domain experts to perform complex analyses of huge datasets on supercomputers using a highlevel language without grappling with the difficulties of writing parallel code, calling parallel libraries, or becoming a graph expert. KDT delivers competitive performance from a generalpurpose, reusable library for graphs on the order of 10 billion edges and greater. We describe our approach for supporting arbirary vertex and edge attributes, inplace graph filtering, and graph traversal using predefined access patterns. Index Terms — graph analytics, scalability, knowledge discovery, semantic graph, filter
Flashgraph: processing billionnode graphs on an array of commodity ssds
"... Abstract—Graph analysis performs many random reads and writes, thus these workloads are typically performed in memory. Traditionally, analyzing large graphs requires a cluster of machines so the aggregate memory exceeds the size of the graph. We demonstrate that a multicore server can process graph ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Graph analysis performs many random reads and writes, thus these workloads are typically performed in memory. Traditionally, analyzing large graphs requires a cluster of machines so the aggregate memory exceeds the size of the graph. We demonstrate that a multicore server can process graphs of billions of vertices and hundreds of billions of edges, utilizing commodity SSDs without much performance loss. We do so by implementing a graphprocessing engine within a userspace SSD file system designed for high IOPS and extreme parallelism. This allows us to localize computation to cached data in a nonuniform memory architecture and hide latency by overlapping computation with I/O. Our semiexternal memory graph engine, called FlashGraph, stores vertex state in memory and adjacency lists on SSDs. FlashGraph exposes a general and flexible programming interface that can express a variety of graph algorithms and their optimizations. FlashGraph in semiexternal memory performs many algorithms up to 20 times faster than PowerGraph, a generalpurpose, inmemory graph engine. Even breadthfirst search, which generates many small random I/Os, runs significantly faster in FlashGraph. I.
NetworKit: An interactive tool suite for highperformance network analysis,”
, 2014
"... Abstract We introduce NetworKit, an opensource software package for highperformance analysis of large complex networks. Complex networks are equally attractive and challenging targets for data mining, and novel algorithmic solutions as well as parallelization are required to handle data sets cont ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Abstract We introduce NetworKit, an opensource software package for highperformance analysis of large complex networks. Complex networks are equally attractive and challenging targets for data mining, and novel algorithmic solutions as well as parallelization are required to handle data sets containing billions of connections. Our goal for NetworKit is to package results of our algorithm engineering efforts and put them into the hands of domain experts. NetworKit is a hybrid combining the performance of kernels written in C++ with a convenient interactive interface written in Python. The package supports general multicore platforms and scales from notebooks to workstations to servers. In comparison with related software for network analysis, we propose NetworKit as the package which satisfies all of three important criteria: High performance (partly enabled by parallelism), interactive workflows and integration into an ecosystem of tested tools for data analysis and scientific computation. The current feature set includes standard network analytics kernels such as degree distribution, connected components, clustering coefficients, community detection, kcore decomposition, degree assortativity and centrality. Applying these to massive networks is enabled by efficient algorithms, parallelism or approximation. Furthermore, the package comes with a collection of graph generators and has basic support for visualization. With the current release, we present and open up the project to a community of both algorithm engineers and domain experts.
Shattering and Compressing Networks for Centrality Analysis
, 2012
"... (Previously submitted to ICDM on June 18, 2012) Who is more important in a network? Who controls the flow between the nodes or whose contribution is significant for connections? Centrality metrics play an important role while answering these questions. The betweenness metric is useful for network an ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
(Previously submitted to ICDM on June 18, 2012) Who is more important in a network? Who controls the flow between the nodes or whose contribution is significant for connections? Centrality metrics play an important role while answering these questions. The betweenness metric is useful for network analysis and implemented in various tools. Since it is one of the most computationally expensive kernels in graph mining, several techniques have been proposed for fast computation of betweenness centrality. In this work, we propose and investigate techniques which compress a network and shatter it into pieces so that the rest of the computation can be handled independently for each piece. Although we designed and tuned the shattering process for betweenness, it can be adapted for other centrality metrics in a straightforward manner. Experimental results show that the proposed techniques can be a great arsenal to reduce the centrality computation time for various types of networks.
HighProductivity and HighPerformance Analysis of Filtered Semantic Graphs
"... Abstract—High performance is a crucial consideration when executing a complex analytic query on a massive semantic graph. In a semantic graph, vertices and edges carry attributes of various types. Analytic queries on semantic graphs typically depend on the values of these attributes; thus, the compu ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract—High performance is a crucial consideration when executing a complex analytic query on a massive semantic graph. In a semantic graph, vertices and edges carry attributes of various types. Analytic queries on semantic graphs typically depend on the values of these attributes; thus, the computation must view the graph through a filter that passes only those individual vertices and edges of interest. Knowledge Discovery Toolbox (KDT), a Python library for parallel graph computations, is customizable in two ways. First, the user can write custom graph algorithms by specifying operations between edges and vertices. These programmerspecified operations are called semiring operations due to KDT’s underlying linearalgebraic abstractions. Second, the user can customize existing graph algorithms by writing filters that return true for those vertices and edges the user wants to retain during algorithm
HelP: Highlevel Primitives For LargeScale Graph Processing
"... Largescale graph processing systems typically expose a small set of functions, such as the compute() function of Pregel, or the gather(), apply(), and scatter() functions of PowerGraph. For some computations, these APIs are too lowlevel, yielding long and complex programs, but with shared coding ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Largescale graph processing systems typically expose a small set of functions, such as the compute() function of Pregel, or the gather(), apply(), and scatter() functions of PowerGraph. For some computations, these APIs are too lowlevel, yielding long and complex programs, but with shared coding patterns. Similar issues with the MapReduce framework have led to widelyused languages such as Pig Latin and Hive, which introduce higherlevel primitives. We take an analogous approach for graph processing: we propose HelP, a set of highlevel primitives that capture commonly appearing operations in largescale graph computations. Using our primitives we have implemented a large suite of algorithms, some of which we previously implemented with the APIs of existing systems. Our experience has been that implementing algorithms using our primitives is more intuitive and much faster than using the APIs of existing distributed systems. All of our primitives and algorithms are fully implemented as a library on top of the opensource GraphX system. 1.