Results 1 - 10
of
115
PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs
"... Large-scale graph-structured computation is central to tasks ranging from targeted advertising to natural language processing and has led to the development of several graph-parallel abstractions including Pregel and GraphLab. However, the natural graphs commonly found in the real-world have highly ..."
Abstract
-
Cited by 128 (4 self)
- Add to MetaCart
(Show Context)
Large-scale graph-structured computation is central to tasks ranging from targeted advertising to natural language processing and has led to the development of several graph-parallel abstractions including Pregel and GraphLab. However, the natural graphs commonly found in the real-world have highly skewed power-law degree distributions, which challenge the assumptions made by these abstractions, limiting performance and scalability. In this paper, we characterize the challenges of computation on natural graphs in the context of existing graphparallel abstractions. We then introduce the PowerGraph abstraction which exploits the internal structure of graph programs to address these challenges. Leveraging the PowerGraph abstraction we introduce a new approach to distributed graph placement and representation that exploits the structure of power-law graphs. We provide a detailed analysis and experimental evaluation comparing PowerGraph to two popular graph-parallel systems. Finally, we describe three different implementation strategies for PowerGraph and discuss their relative merits with empirical evaluations on large-scale real-world problems demonstrating order of magnitude gains. 1
Trinity: A Distributed Graph Engine on a Memory Cloud
- In SIGMOD
, 2013
"... Computationsperformedbygraphalgorithmsaredatadriven, and require a high degree of random data access. Despite the great progresses made in disk technology, it still cannot providethelevelofefficient randomaccess requiredbygraph computation. Ontheotherhand, memory-basedapproaches usually do not scale ..."
Abstract
-
Cited by 50 (0 self)
- Add to MetaCart
(Show Context)
Computationsperformedbygraphalgorithmsaredatadriven, and require a high degree of random data access. Despite the great progresses made in disk technology, it still cannot providethelevelofefficient randomaccess requiredbygraph computation. Ontheotherhand, memory-basedapproaches usually do not scale due to the capacity limit of single machines. Inthispaper, weintroduceTrinity,ageneralpurpose graph engine over a distributed memory cloud. Through optimized memory management and network communication, Trinity supports fast graph exploration as well as efficient parallel computing. In particular, Trinity leverages graph access patterns in both online and offline computation to optimize memory and communication for best performance. These enable Trinity to support efficient online query processing and offline analytics on large graphs with just a few commodity machines. Furthermore, Trinity provides a high level specification language called TSL for users to declare data schema and communication protocols, which brings great ease-of-use for general purpose graphmanagement and computing. Our experiments show Trinity’s performance in both low latency graph queries as well as high throughput graph analytics on web-scale, billion-node graphs.
X-Stream: Edge-centric Graph Processing using Streaming Partitions
"... X-Stream is a system for processing both in-memory and out-of-core graphs on a single shared-memory ma-chine. While retaining the scatter-gather programming model with state stored in the vertices, X-Stream is novel in (i) using an edge-centric rather than a vertex-centric implementation of this mod ..."
Abstract
-
Cited by 35 (2 self)
- Add to MetaCart
(Show Context)
X-Stream is a system for processing both in-memory and out-of-core graphs on a single shared-memory ma-chine. While retaining the scatter-gather programming model with state stored in the vertices, X-Stream is novel in (i) using an edge-centric rather than a vertex-centric implementation of this model, and (ii) streaming com-pletely unordered edge lists rather than performing ran-dom access. This design is motivated by the fact that sequential bandwidth for all storage media (main mem-ory, SSD, and magnetic disk) is substantially larger than random access bandwidth. We demonstrate that a large number of graph algorithms can be expressed using the edge-centric scatter-gather model. The resulting implementations scale well in terms of number of cores, in terms of number of I/O devices, and across different storage media. X-Stream competes favorably with existing systems for graph pro-cessing. Besides sequential access, we identify as one of the main contributors to better performance the fact that X-Stream does not need to sort edge lists during pre-processing. 1
A Lightweight Infrastructure for Graph Analytics ∗
"... Several domain-specific languages (DSLs) for parallel graph analytics have been proposed recently. In this paper, we argue that existing DSLs can be implemented on top of a general-purpose infrastructure that (i) supports very fine-grain tasks, (ii) implements autonomous, speculative execution of th ..."
Abstract
-
Cited by 30 (2 self)
- Add to MetaCart
Several domain-specific languages (DSLs) for parallel graph analytics have been proposed recently. In this paper, we argue that existing DSLs can be implemented on top of a general-purpose infrastructure that (i) supports very fine-grain tasks, (ii) implements autonomous, speculative execution of these tasks, and (iii) allows application-specific control of task scheduling policies. To support this claim, we describe such an implementation called the Galois system. We demonstrate the capabilities of this infrastructure in three ways. First, we implement more sophisticated algorithms for some of the graph analytics problems tackled by previous DSLs and show that end-to-end performance can be improved by orders of magnitude even on power-law graphs, thanks to the better algorithms facilitated by a more general programming model. Second, we show that, even when an algorithm can be expressed in existing DSLs, the implementation of that algorithm in the more general system can be orders of magnitude faster when the input graphs are road networks and similar graphs with high diameter, thanks to more sophisticated scheduling. Third, we implement the APIs of three existing graph DSLs on top of the common infrastructure in a few hundred lines of code and show that even for power-law graphs, the performance of the resulting implementations often exceeds that of the original DSL systems, thanks to the lightweight infrastructure.
GraphX: Graph Processing in a Distributed Dataflow Framework
- USENIX ASSOCIATION 11TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDI ’14)
, 2014
"... In pursuit of graph processing performance, the systems community has largely abandoned general-purpose dis-tributed dataflow frameworks in favor of specialized graph processing systems that provide tailored programming ab-stractions and accelerate the execution of iterative graph algorithms. In thi ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
In pursuit of graph processing performance, the systems community has largely abandoned general-purpose dis-tributed dataflow frameworks in favor of specialized graph processing systems that provide tailored programming ab-stractions and accelerate the execution of iterative graph algorithms. In this paper we argue that many of the advan-tages of specialized graph processing systems can be re-covered in a modern general-purpose distributed dataflow system. We introduce GraphX, an embedded graph pro-cessing framework built on top of Apache Spark, a widely used distributed dataflow system. GraphX presents a fa-miliar composable graph abstraction that is sufficient to express existing graph APIs, yet can be implemented us-ing only a few basic dataflow operators (e.g., join, map, group-by). To achieve performance parity with special-ized graph systems, GraphX recasts graph-specific op-timizations as distributed join optimizations and mate-rialized view maintenance. By leveraging advances in distributed dataflow frameworks, GraphX brings low-cost fault tolerance to graph processing. We evaluate GraphX on real workloads and demonstrate that GraphX achieves an order of magnitude performance gain over the base dataflow framework and matches the performance of spe-cialized graph processing systems while enabling a wider range of computation.
Mizan: A system for dynamic load balancing in large-scale graph processing
- In EuroSys ’13
, 2013
"... Pregel [23] was recently introduced as a scalable graph min-ing system that can provide significant performance im-provements over traditional MapReduce implementations. Existing implementations focus primarily on graph par-titioning as a preprocessing step to balance computation across compute node ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
(Show Context)
Pregel [23] was recently introduced as a scalable graph min-ing system that can provide significant performance im-provements over traditional MapReduce implementations. Existing implementations focus primarily on graph par-titioning as a preprocessing step to balance computation across compute nodes. In this paper, we examine the run-time characteristics of a Pregel system. We show that graph partitioning alone is insufficient for minimizing end-to-end computation. Especially where data is very large or the run-time behavior of the algorithm is unknown, an adaptive ap-proach is needed. To this end, we introduce Mizan, a Pregel system that achieves efficient load balancing to better adapt to changes in computing needs. Unlike known implementa-tions of Pregel, Mizan does not assume any a priori knowl-edge of the structure of the graph or behavior of the algo-rithm. Instead, it monitors the runtime characteristics of the system. Mizan then performs efficient fine-grained vertex migration to balance computation and communication. We have fully implemented Mizan; using extensive evaluation we show that—especially for highly-dynamic workloads— Mizan provides up to 84 % improvement over techniques leveraging static graph pre-partitioning. 1.
Scale-up vs Scale-out for Hadoop: Time to rethink?
"... In the last decade we have seen a huge deployment of cheap clusters to run data analytics workloads. The conventional wisdom in industry and academia is that scaling out using a cluster of commodity machines is better for these workloads than scaling up by adding more resources to a single server. P ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
(Show Context)
In the last decade we have seen a huge deployment of cheap clusters to run data analytics workloads. The conventional wisdom in industry and academia is that scaling out using a cluster of commodity machines is better for these workloads than scaling up by adding more resources to a single server. Popular analytics infrastructures such as Hadoop are aimed at such a cluster scaleout environment. Is this the right approach? Our measurements as well as other recent work shows that the majority of realworld analytic jobs process less than 100 GB of input, but popular infrastructures such as Hadoop/MapReduce were originally designed for petascale processing. We claim that a single “scale-up ” server can process each of these jobs and do as well or better than a cluster in terms of performance, cost, power, and server density. We present an evaluation across 11 representative Hadoop jobs that shows scale-up to be competitive in all cases and significantly better in some cases, than scale-out. To achieve that performance, we describe several modifications to the Hadoop runtime that target scale-up configuration. These changes are transparent, do not require any changes to application code, and do not compromise scale-out performance; at the same time our evaluation shows that they do significantly improve Hadoop’s scale-up performance. ∗ Work done while on internship from Vrije Universiteit Amsterdam Copyright c ○ 2013 by the Association for Computing Machinery, Inc. (ACM). Permission to make digital or hard copies of portions of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page in print or the first screen in digital media. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission
Improving Learning and Inference in a Large Knowledge-base using Latent Syntactic Cues
"... Automatically constructed Knowledge Bases (KBs) are often incomplete and there is a gen-uine need to improve their coverage. Path Ranking Algorithm (PRA) is a recently pro-posed method which aims to improve KB cov-erage by performing inference directly over the KB graph. For the first time, we demon ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
(Show Context)
Automatically constructed Knowledge Bases (KBs) are often incomplete and there is a gen-uine need to improve their coverage. Path Ranking Algorithm (PRA) is a recently pro-posed method which aims to improve KB cov-erage by performing inference directly over the KB graph. For the first time, we demon-strate that addition of edges labeled with la-tent features mined from a large dependency parsed corpus of 500 million Web documents can significantly outperform previous PRA-based approaches on the KB inference task. We present extensive experimental results val-idating this finding. The resources presented in this paper are publicly available. 1
Medusa: Simplified Graph Processing on GPUs
, 2013
"... Graphs are common data structures for many applications, and efficient graph processing is a must for application performance. Recently, the graphics processing unit (GPU) has been adopted to accelerate various graph processing algorithms such as BFS and shortest paths. However, it is difficult to ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
Graphs are common data structures for many applications, and efficient graph processing is a must for application performance. Recently, the graphics processing unit (GPU) has been adopted to accelerate various graph processing algorithms such as BFS and shortest paths. However, it is difficult to write correct and efficient GPU programs and even more difficult for graph processing due to the irregularities of graph structures. To simplify graph processing on GPUs, we propose a programming framework called Medusa which enables developers to leverage the capabilities of GPUs by writing sequential C/C++ code. Medusa offers a small set of user-defined APIs, and embraces a runtime system to automatically execute those APIs in parallel on the GPU. We develop a series of graph-centric optimizations based on the architecture features of GPUs for efficiency. Additionally, Medusa is extended to execute on multiple GPUs within a machine. Our experiments show that (1) Medusa greatly simplifies implementation of GPGPU programs for graph processing, with many fewer lines of source code written by developers; (2) The optimization techniques significantly improve the performance of the runtime system, making its performance comparable with or better than manually tuned GPU graph operations.
Restreaming graph partitioning: Simple versatile algorithms for advanced balancing
- In ACM KDD
, 2013
"... Partitioning large graphs is difficult, especially when performed in the limited models of computation afforded to modern large scale computing systems. In this work we introduce restreaming graph partitioning and develop algorithms that scale similarly to stream-ing partitioning algorithms yet empi ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
(Show Context)
Partitioning large graphs is difficult, especially when performed in the limited models of computation afforded to modern large scale computing systems. In this work we introduce restreaming graph partitioning and develop algorithms that scale similarly to stream-ing partitioning algorithms yet empirically perform as well as fully offline algorithms. In streaming partitioning, graphs are partitioned serially in a single pass. Restreaming partitioning is motivated by scenarios where approximately the same dataset is routinely streamed, making it possible to transform streaming partitioning algorithms into an iterative procedure. This combination of simplicity and powerful performance allows restreaming algorithms to be easily adapted to efficiently tackle more challenging partitioning objectives. In particular, we consider the problem of stratified graph partitioning, where each of many node attribute strata are balanced simultaneously. As such, strati-fied partitioning is well suited for the study of network effects on social networks, where it is desirable to isolate disjoint dense sub-graphs with representative user demographics. To demonstrate, we partition a large social network such that each partition exhibits the same degree distribution in the original graph — a novel achieve-ment for non-regular graphs. As part of our results, we also observe a fundamental difference in the ease with which social graphs are partitioned when com-pared to web graphs. Namely, the modular structure of web graphs appears to motivate full offline optimization, whereas the locally dense structure of social graphs precludes significant gains from global manipulations.