Results 1  10
of
128
GraphChi: Largescale Graph Computation On just a PC
 In Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation, OSDI’12
, 2012
"... Current systems for graph computation require a distributed computing cluster to handle very large realworld problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible, developing distributed graph algorithms still remains c ..."
Abstract

Cited by 115 (6 self)
 Add to MetaCart
(Show Context)
Current systems for graph computation require a distributed computing cluster to handle very large realworld problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible, developing distributed graph algorithms still remains challenging, especially to nonexperts. In this work, we present GraphChi, a diskbased system for computing efficiently on graphs with billions of edges. By using a wellknown method to break large graphs into small parts, and a novel parallel sliding windows method, GraphChi is able to execute several advanced data mining, graph mining, and machine learning algorithms on very large graphs, using just a single consumerlevel computer. We further extend GraphChi to support graphs that evolve over time, and demonstrate that, on a single computer, GraphChi can process over one hundred thousand graph updates per second, while simultaneously performing computation. We show, through experiments and theoretical analysis, that GraphChi performs well on both SSDs and rotational hard drives. By repeating experiments reported for existing distributed systems, we show that, with only fraction of the resources, GraphChi can solve the same problems in very reasonable time. Our work makes largescale graph computation available to anyone with a modern PC. 1
Trinity: A Distributed Graph Engine on a Memory Cloud
 In SIGMOD
, 2013
"... Computationsperformedbygraphalgorithmsaredatadriven, and require a high degree of random data access. Despite the great progresses made in disk technology, it still cannot providethelevelofefficient randomaccess requiredbygraph computation. Ontheotherhand, memorybasedapproaches usually do not scale ..."
Abstract

Cited by 50 (0 self)
 Add to MetaCart
(Show Context)
Computationsperformedbygraphalgorithmsaredatadriven, and require a high degree of random data access. Despite the great progresses made in disk technology, it still cannot providethelevelofefficient randomaccess requiredbygraph computation. Ontheotherhand, memorybasedapproaches usually do not scale due to the capacity limit of single machines. Inthispaper, weintroduceTrinity,ageneralpurpose graph engine over a distributed memory cloud. Through optimized memory management and network communication, Trinity supports fast graph exploration as well as efficient parallel computing. In particular, Trinity leverages graph access patterns in both online and offline computation to optimize memory and communication for best performance. These enable Trinity to support efficient online query processing and offline analytics on large graphs with just a few commodity machines. Furthermore, Trinity provides a high level specification language called TSL for users to declare data schema and communication protocols, which brings great easeofuse for general purpose graphmanagement and computing. Our experiments show Trinity’s performance in both low latency graph queries as well as high throughput graph analytics on webscale, billionnode graphs.
Naiad: A Timely Dataflow System
"... Naiad is a distributed system for executing data parallel, cyclic dataflow programs. It offers the high throughput of batch processors, the low latency of stream processors, and the ability to perform iterative and incremental computations. Although existing systems offer some of these features, app ..."
Abstract

Cited by 48 (1 self)
 Add to MetaCart
(Show Context)
Naiad is a distributed system for executing data parallel, cyclic dataflow programs. It offers the high throughput of batch processors, the low latency of stream processors, and the ability to perform iterative and incremental computations. Although existing systems offer some of these features, applications that require all three have relied on multiple platforms, at the expense of efficiency, maintainability, and simplicity. Naiad resolves the complexities of combining these features in one framework. A new computational model, timely dataflow, underlies Naiad and captures opportunities for parallelism across a wide class of algorithms. This model enriches dataflow computation with timestamps that represent logical points in the computation and provide the basis for an efficient, lightweight coordination mechanism. We show that many powerful highlevel programming models can be built on Naiad’s lowlevel primitives, enabling such diverse tasks as streaming data analysis, iterative machine learning, and interactive graph mining. Naiad outperforms specialized systems in their target application domains, and its unique features enable the development of new highperformance applications. 1
XStream: Edgecentric Graph Processing using Streaming Partitions
"... XStream is a system for processing both inmemory and outofcore graphs on a single sharedmemory machine. While retaining the scattergather programming model with state stored in the vertices, XStream is novel in (i) using an edgecentric rather than a vertexcentric implementation of this mod ..."
Abstract

Cited by 35 (2 self)
 Add to MetaCart
(Show Context)
XStream is a system for processing both inmemory and outofcore graphs on a single sharedmemory machine. While retaining the scattergather programming model with state stored in the vertices, XStream is novel in (i) using an edgecentric rather than a vertexcentric implementation of this model, and (ii) streaming completely unordered edge lists rather than performing random access. This design is motivated by the fact that sequential bandwidth for all storage media (main memory, SSD, and magnetic disk) is substantially larger than random access bandwidth. We demonstrate that a large number of graph algorithms can be expressed using the edgecentric scattergather model. The resulting implementations scale well in terms of number of cores, in terms of number of I/O devices, and across different storage media. XStream competes favorably with existing systems for graph processing. Besides sequential access, we identify as one of the main contributors to better performance the fact that XStream does not need to sort edge lists during preprocessing. 1
A Lightweight Infrastructure for Graph Analytics ∗
"... Several domainspecific languages (DSLs) for parallel graph analytics have been proposed recently. In this paper, we argue that existing DSLs can be implemented on top of a generalpurpose infrastructure that (i) supports very finegrain tasks, (ii) implements autonomous, speculative execution of th ..."
Abstract

Cited by 30 (2 self)
 Add to MetaCart
Several domainspecific languages (DSLs) for parallel graph analytics have been proposed recently. In this paper, we argue that existing DSLs can be implemented on top of a generalpurpose infrastructure that (i) supports very finegrain tasks, (ii) implements autonomous, speculative execution of these tasks, and (iii) allows applicationspecific control of task scheduling policies. To support this claim, we describe such an implementation called the Galois system. We demonstrate the capabilities of this infrastructure in three ways. First, we implement more sophisticated algorithms for some of the graph analytics problems tackled by previous DSLs and show that endtoend performance can be improved by orders of magnitude even on powerlaw graphs, thanks to the better algorithms facilitated by a more general programming model. Second, we show that, even when an algorithm can be expressed in existing DSLs, the implementation of that algorithm in the more general system can be orders of magnitude faster when the input graphs are road networks and similar graphs with high diameter, thanks to more sophisticated scheduling. Third, we implement the APIs of three existing graph DSLs on top of the common infrastructure in a few hundred lines of code and show that even for powerlaw graphs, the performance of the resulting implementations often exceeds that of the original DSL systems, thanks to the lightweight infrastructure.
From "Think Like a Vertex " to "Think Like a Graph"
"... To meet the challenge of processing rapidly growing graph and network data created by modern applications, a number of distributed graph processing systems have emerged, such as Pregel and GraphLab. All these systems divide input graphs into partitions, and employ a “think like a vertex ” programmin ..."
Abstract

Cited by 25 (0 self)
 Add to MetaCart
(Show Context)
To meet the challenge of processing rapidly growing graph and network data created by modern applications, a number of distributed graph processing systems have emerged, such as Pregel and GraphLab. All these systems divide input graphs into partitions, and employ a “think like a vertex ” programming model to support iterative graph computation. This vertexcentric model is easy to program and has been proved useful for many graph algorithms. However, this model hides the partitioning information from the users, thus prevents many algorithmspecific optimizations. This often results in longer execution time due to excessive network messages (e.g. in Pregel) or heavy scheduling overhead to ensure data consistency (e.g. in GraphLab). To address this limitation, we propose a new “think like a graph ” programming paradigm. Under this graphcentric model, the partition structure is opened up to the users, and can be utilized so that communication within a partition can bypass the heavy message passing or scheduling machinery. We implemented this model in a new system, called Giraph++, based on Apache Giraph, an open source implementation of Pregel. We explore the applicability of the graphcentric model to three categories of graph algorithms, and demonstrate its flexibility and superior performance, especially on wellpartitioned data. For example, on a web graph with 118 million vertices and 855 million edges, the graphcentric version of connected component detection algorithm runs 63X faster and uses 204X fewer network messages than its vertexcentric counterpart. 1.
Mizan: A system for dynamic load balancing in largescale graph processing
 In EuroSys ’13
, 2013
"... Pregel [23] was recently introduced as a scalable graph mining system that can provide significant performance improvements over traditional MapReduce implementations. Existing implementations focus primarily on graph partitioning as a preprocessing step to balance computation across compute node ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
(Show Context)
Pregel [23] was recently introduced as a scalable graph mining system that can provide significant performance improvements over traditional MapReduce implementations. Existing implementations focus primarily on graph partitioning as a preprocessing step to balance computation across compute nodes. In this paper, we examine the runtime characteristics of a Pregel system. We show that graph partitioning alone is insufficient for minimizing endtoend computation. Especially where data is very large or the runtime behavior of the algorithm is unknown, an adaptive approach is needed. To this end, we introduce Mizan, a Pregel system that achieves efficient load balancing to better adapt to changes in computing needs. Unlike known implementations of Pregel, Mizan does not assume any a priori knowledge of the structure of the graph or behavior of the algorithm. Instead, it monitors the runtime characteristics of the system. Mizan then performs efficient finegrained vertex migration to balance computation and communication. We have fully implemented Mizan; using extensive evaluation we show that—especially for highlydynamic workloads— Mizan provides up to 84 % improvement over techniques leveraging static graph prepartitioning. 1.
FENNEL: Streaming Graph Partitioning for Massive Scale Graphs
"... Balanced graph partitioning in the streaming setting is a key problem to enable scalable and efficient computations on massive graph data such as web graphs, knowledge graphs, and graphs arising in the context of online social networks. Two families of heuristics for graph partitioning in the stream ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
(Show Context)
Balanced graph partitioning in the streaming setting is a key problem to enable scalable and efficient computations on massive graph data such as web graphs, knowledge graphs, and graphs arising in the context of online social networks. Two families of heuristics for graph partitioning in the streaming setting are in wide use: place the newly arrived vertex in the cluster with the largest number of neighbors or in the cluster with the least number of nonneighbors. In this work, we introduce a framework which unifies the two seemingly orthogonal heuristics and allows us to quantify the interpolation between them. More generally, the framework enables a well principled design of scalable, streaming graph partitioning algorithms that are amenable to distributed implementations. We derive a novel onepass, streaming graph partitioning algorithm and show that it yields significant performance improvements over previous approaches using an extensive set of realworld and synthetic graphs. Surprisingly, despite the fact that our algorithm is a onepass streaming algorithm, we found its performance to be in many cases comparable to the defacto standard offline software METIS and in some cases even superiror. For instance, for the Twitter graph with more than 1.4 billion of edges, our method partitions the graph in about 40 minutes achieving a balanced partition that cuts as few as 6.8 % of edges, whereas it took more than 8 1 hours by METIS to 2 produce a balanced partition that cuts 11.98 % of edges. We also demonstrate the performance gains by using our graph partitioner while solving standard PageRank computation in a graph processing platform with respect to the communication cost and runtime.
Medusa: Simplified Graph Processing on GPUs
, 2013
"... Graphs are common data structures for many applications, and efficient graph processing is a must for application performance. Recently, the graphics processing unit (GPU) has been adopted to accelerate various graph processing algorithms such as BFS and shortest paths. However, it is difficult to ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
Graphs are common data structures for many applications, and efficient graph processing is a must for application performance. Recently, the graphics processing unit (GPU) has been adopted to accelerate various graph processing algorithms such as BFS and shortest paths. However, it is difficult to write correct and efficient GPU programs and even more difficult for graph processing due to the irregularities of graph structures. To simplify graph processing on GPUs, we propose a programming framework called Medusa which enables developers to leverage the capabilities of GPUs by writing sequential C/C++ code. Medusa offers a small set of userdefined APIs, and embraces a runtime system to automatically execute those APIs in parallel on the GPU. We develop a series of graphcentric optimizations based on the architecture features of GPUs for efficiency. Additionally, Medusa is extended to execute on multiple GPUs within a machine. Our experiments show that (1) Medusa greatly simplifies implementation of GPGPU programs for graph processing, with many fewer lines of source code written by developers; (2) The optimization techniques significantly improve the performance of the runtime system, making its performance comparable with or better than manually tuned GPU graph operations.
COUNTING TRIANGLES IN MASSIVE GRAPHS WITH MAPREDUCE
, 2013
"... Graphs and networks are used to model interactions in a variety of contexts. There is a growing need to quickly assess the characteristics of a graph in order to understand its underlying structure. Some of the most useful metrics are trianglebased and give a measure of the connectedness of mutual ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
(Show Context)
Graphs and networks are used to model interactions in a variety of contexts. There is a growing need to quickly assess the characteristics of a graph in order to understand its underlying structure. Some of the most useful metrics are trianglebased and give a measure of the connectedness of mutual friends. This is often summarized in terms of clustering coefficients, which measure the likelihood that two neighbors of a node are themselves connected. Computing these measures exactly for largescale networks is prohibitively expensive in both memory and time. However, a recent wedge sampling algorithm has proved successful in efficiently and accurately estimating clustering coefficients. In this paper, we describe how to implement this approach in MapReduce to deal with extremely massive graphs. We show results on publiclyavailable networks, the largest of which is 132M nodes and 4.7B edges, as well as artificially generated networks (using the Graph500 benchmark), the largest of which has 240M nodes and 8.5B edges. We can estimate the clustering coefficient by degree bin (e.g., we use exponential binning) and the number of triangles per bin, as well as the global clustering coefficient and total number of triangles, in an average of 0.33 sec. per million edges plus overhead (approximately 225 sec. total for our configuration). The technique can also be used to study triangle statistics such as the ratio of the highest and lowest degree, and we highlight differences between social and nonsocial networks. To the best of our knowledge, these are the largest trianglebased graph computations published to date.