Results

**21 - 30**of**30**### Fast Iterative Graph Computation with Resource Aware Graph Parallel Abstractions

"... Iterative computation on large graphs has challenged system research from two aspects: (1) how to conduct high per-formance parallel processing for both in-memory and out-of-core graphs; and (2) how to handle large graphs that exceed the resource boundary of traditional systems by re-source aware gr ..."

Abstract
- Add to MetaCart

(Show Context)
Iterative computation on large graphs has challenged system research from two aspects: (1) how to conduct high per-formance parallel processing for both in-memory and out-of-core graphs; and (2) how to handle large graphs that exceed the resource boundary of traditional systems by re-source aware graph partitioning such that it is feasible to run large-scale graph analysis on a single PC. This paper presents GraphLego, a resource adaptive graph processing system with multi-level programmable graph parallel ab-stractions. GraphLego is novel in three aspects: (1) we argue that vertex-centric or edge-centric graph partitioning are ineffective for parallel processing of large graphs and we introduce three alternative graph parallel abstractions to enable a large graph to be partitioned at the granularity of subgraphs by slice, strip and dice based partitioning; (2) we use dice-based data placement algorithm to store a large graph on disk by minimizing non-sequential disk access and enabling more structured in-memory access; and (3) we dy-namically determine the right level of graph parallel abstrac-tion to maximize sequential access and minimize random access. GraphLego can run efficiently on different computers with diverse resource capacities and respond to different memory requirements by real-world graphs of different com-plexity. Extensive experiments show the competitiveness of GraphLego against existing representative graph processing systems, such as GraphChi, GraphLab and X-Stream.

### Scaling Iterative Graph Computations with GraphMap

"... In recent years, systems researchers have devoted consider-able effort to the study of large-scale graph processing. Ex-isting distributed graph processing systems such as Pregel, based solely on distributed memory for their computations, fail to provide seamless scalability when the graph data and ..."

Abstract
- Add to MetaCart

(Show Context)
In recent years, systems researchers have devoted consider-able effort to the study of large-scale graph processing. Ex-isting distributed graph processing systems such as Pregel, based solely on distributed memory for their computations, fail to provide seamless scalability when the graph data and their intermediate computational results no longer fit into the memory; and most distributed approaches for itera-tive graph computations do not consider utilizing secondary storage a viable solution. This paper presents GraphMap, a distributed iterative graph computation framework that maximizes access locality and speeds up distributed itera-tive graph computations by effectively utilizing secondary storage. GraphMap has three salient features: (1) It distin-guishes data states that are mutable during iterative compu-tations from those that are read-only in all iterations to max-imize sequential access and minimize random access. (2) It entails a two-level graph partitioning algorithm that enables balanced workloads and locality-optimized data placement. (3) It contains a proposed suite of locality-based optimiza-tions that improve computational efficiency. Extensive ex-periments on several real-world graphs show that GraphMap outperforms existing distributed memory-based systems for various iterative graph algorithms.

### Design and Implementation is sponsored by USENIX. GraphX: Graph Processing in a Distributed Dataflow Framework

"... In pursuit of graph processing performance, the systems community has largely abandoned general-purpose dis-tributed dataflow frameworks in favor of specialized graph processing systems that provide tailored programming ab-stractions and accelerate the execution of iterative graph algorithms. In thi ..."

Abstract
- Add to MetaCart

(Show Context)
In pursuit of graph processing performance, the systems community has largely abandoned general-purpose dis-tributed dataflow frameworks in favor of specialized graph processing systems that provide tailored programming ab-stractions and accelerate the execution of iterative graph algorithms. In this paper we argue that many of the advan-tages of specialized graph processing systems can be re-covered in a modern general-purpose distributed dataflow system. We introduce GraphX, an embedded graph pro-cessing framework built on top of Apache Spark, a widely used distributed dataflow system. GraphX presents a fa-miliar composable graph abstraction that is sufficient to express existing graph APIs, yet can be implemented us-ing only a few basic dataflow operators (e.g., join, map, group-by). To achieve performance parity with special-ized graph systems, GraphX recasts graph-specific op-timizations as distributed join optimizations and mate-rialized view maintenance. By leveraging advances in distributed dataflow frameworks, GraphX brings low-cost fault tolerance to graph processing. We evaluate GraphX on real workloads and demonstrate that GraphX achieves an order of magnitude performance gain over the base dataflow framework and matches the performance of spe-cialized graph processing systems while enabling a wider range of computation. 1

### (will be inserted by the editor) M-Flash: Fast Billion-scale Graph Computation Using Block Partition Model

"... the date of receipt and acceptance should be inserted later Abstract Recent graph computation approaches such as GraphChi, X-Stream, TurboGraph and MMap demonstrated that a single PC can perform efficient computation on billion scale graphs. While they use different techniques to achieve scalability ..."

Abstract
- Add to MetaCart

(Show Context)
the date of receipt and acceptance should be inserted later Abstract Recent graph computation approaches such as GraphChi, X-Stream, TurboGraph and MMap demonstrated that a single PC can perform efficient computation on billion scale graphs. While they use different techniques to achieve scalability through opti-mizing I/O operations, such optimization often does not fully exploit the capabilities of modern hard drives. We contribute: (1) a novel and scalable graph computation framework called M-Flash that uses a block partition model to boost computation speed and reduce disk ac-cesses, by logically dividing a graph and its node data into blocks that can fully fit in RAM for reuse; (2) a flexible and deliberatively simple programming model, as part of M-Flash, that enables us to implement pop-ular and essential graph algorithms, including the first single-machine billion-scale eigensolver; and (3) exten-sive experiments on real graphs with up to 6.6 billion edges, demonstrating M-Flash’s consistent and signifi-cant speed-up against state-of-the-art approaches.

### GRE: A Graph Runtime Engine for Large-Scale Distributed Graph-Parallel Applications

"... Large-scale distributed graph-parallel computing is challeng-ing. On one hand, due to the irregular computation pattern and lack of locality, it is hard to express parallelism effi-ciently. On the other hand, due to the scale-free nature, real-world graphs are hard to partition in balance with low c ..."

Abstract
- Add to MetaCart

(Show Context)
Large-scale distributed graph-parallel computing is challeng-ing. On one hand, due to the irregular computation pattern and lack of locality, it is hard to express parallelism effi-ciently. On the other hand, due to the scale-free nature, real-world graphs are hard to partition in balance with low cut. To address these challenges, several graph-parallel frame-works including Pregel and GraphLab (PowerGraph) have been developed recently. In this paper, we present an al-ternative framework, Graph Runtime Engine (GRE). While retaining the vertex-centric programming model, GRE pro-poses two new abstractions: 1) a Scatter-Combine com-putation model based on active message to exploit massive fined-grained edge-level parallelism, and 2) a Agent-Graph data model based on vertex factorization to partition and represent directed graphs. GRE is implemented on commer-cial off-the-shelf multi-core cluster. We experimentally eval-uate GRE with three benchmark programs (PageRank, Single Source Shortest Path and Connected Components) on real-world and synthetic graphs of millions∼billion of vertices. Compared to PowerGraph, GRE shows 2.5∼17 times better performance on 8∼16 machines (192 cores). Specifically, the PageRank in GRE is the fastest when comparing to coun-terparts of other frameworks (PowerGraph, Spark,Twister) reported in public literatures. Besides, GRE significantly op-timizes memory usage so that it can process a large graph of 1 billion vertices and 17 billion edges on our cluster with totally 768GB memory, while PowerGraph can only process less than half of this graph scale. 1.

### Shared-memory parallelism can be simple, . . .

, 2015

"... Parallelism is the key to achieving high performance in computing. How-ever, writing efficient and scalable parallel programs is notoriously difficult, and often requires significant expertise. To address this challenge, it is crucial to provide programmers with high-level tools to enable them to de ..."

Abstract
- Add to MetaCart

Parallelism is the key to achieving high performance in computing. How-ever, writing efficient and scalable parallel programs is notoriously difficult, and often requires significant expertise. To address this challenge, it is crucial to provide programmers with high-level tools to enable them to develop solutions efficiently, and at the same time emphasize the theoretical and practical aspects of algorithm design to allow the solutions developed to run efficiently under all possible settings. This thesis addresses this challenge using a three-pronged approach consisting of the design of shared-memory programming techniques, frameworks, and algorithms for important problems in computing. The thesis provides evidence that with appropriate programming techniques, frameworks, and algorithms, shared-memory programs can be simple, fast, and scalable, both in theory and in practice. The results developed in this thesis serve to ease the transition into the multicore era. The first part of this thesis introduces tools and techniques for deterministic

### Chaos: Scale-out Graph Processing from Secondary Storage

"... Chaos scales graph processing from secondary storage to multiple machines in a cluster. Earlier systems that process graphs from secondary storage are restricted to a single ma-chine, and therefore limited by the bandwidth and capacity of the storage system on a single machine. Chaos is limited only ..."

Abstract
- Add to MetaCart

(Show Context)
Chaos scales graph processing from secondary storage to multiple machines in a cluster. Earlier systems that process graphs from secondary storage are restricted to a single ma-chine, and therefore limited by the bandwidth and capacity of the storage system on a single machine. Chaos is limited only by the aggregate bandwidth and capacity of all storage devices in the entire cluster. Chaos builds on the streaming partitions introduced by X-Stream in order to achieve sequential access to storage, but parallelizes the execution of streaming partitions. Chaos is novel in three ways. First, Chaos partitions for sequential storage access, rather than for locality and load balance, re-sulting in much lower pre-processing times. Second, Chaos distributes graph data uniformly randomly across the clus-ter and does not attempt to achieve locality, based on the observation that in a small cluster network bandwidth far outstrips storage bandwidth. Third, Chaos uses work steal-ing to allow multiple machines to work on a single partition, thereby achieving load balance at runtime. In terms of performance scaling, on 32 machines Chaos takes on average only 1.61 times longer to process a graph 32 times larger than on a single machine. In terms of capacity scaling, Chaos is capable of handling a graph with 1 trillion edges representing 16 TB of input data, a new milestone for graph processing capacity on a small commodity cluster. ∗This work was done when Amitabha Roy was at EPFL. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

### USENIX Association 13th USENIX Conference on File and Storage Technologies (FAST ’15) 45 FlashGraph: Processing Billion-Node Graphs on an Array of Commodity

"... Graph analysis performs many random reads and writes, thus, these workloads are typically performed in mem-ory. Traditionally, analyzing large graphs requires a clus-ter of machines so the aggregate memory exceeds the graph size. We demonstrate that a multicore server can process graphs with billion ..."

Abstract
- Add to MetaCart

(Show Context)
Graph analysis performs many random reads and writes, thus, these workloads are typically performed in mem-ory. Traditionally, analyzing large graphs requires a clus-ter of machines so the aggregate memory exceeds the graph size. We demonstrate that a multicore server can process graphs with billions of vertices and hundreds of billions of edges, utilizing commodity SSDs with min-imal performance loss. We do so by implementing a graph-processing engine on top of a user-space SSD file system designed for high IOPS and extreme paral-lelism. Our semi-external memory graph engine called FlashGraph stores vertex state in memory and edge lists on SSDs. It hides latency by overlapping computation with I/O. To save I/O bandwidth, FlashGraph only ac-

### Exploiting NVM in Large-scale Graph Analytics

"... Data center applications like graph analytics require servers with ever larger memory capacities. DRAM scaling, how-ever, is not able to match the increasing demands for ca-pacity. Emerging byte-addressable, non-volatile memory technologies (NVM) offer a more scalable alternative, with memory that i ..."

Abstract
- Add to MetaCart

(Show Context)
Data center applications like graph analytics require servers with ever larger memory capacities. DRAM scaling, how-ever, is not able to match the increasing demands for ca-pacity. Emerging byte-addressable, non-volatile memory technologies (NVM) offer a more scalable alternative, with memory that is directly addressable to software, but at a higher latency and lower bandwidth. Using an NVM hardware emulator, we study the suitabil-ity of NVM in meeting the memory demands of four state of the art graph analytics frameworks, namely Graphlab, Galois, X-Stream and Graphmat. We evaluate their perfor-mance with popular algorithms (Pagerank, BFS, Triangle Counting and Collaborative filtering) by allocating mem-ory exclusive from DRAM (DRAM-only) or emulated NVM