Results 1  10
of
30
GraphChi: Largescale Graph Computation On just a PC
 In Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation, OSDI’12
, 2012
"... Current systems for graph computation require a distributed computing cluster to handle very large realworld problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible, developing distributed graph algorithms still remains c ..."
Abstract

Cited by 109 (6 self)
 Add to MetaCart
(Show Context)
Current systems for graph computation require a distributed computing cluster to handle very large realworld problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible, developing distributed graph algorithms still remains challenging, especially to nonexperts. In this work, we present GraphChi, a diskbased system for computing efficiently on graphs with billions of edges. By using a wellknown method to break large graphs into small parts, and a novel parallel sliding windows method, GraphChi is able to execute several advanced data mining, graph mining, and machine learning algorithms on very large graphs, using just a single consumerlevel computer. We further extend GraphChi to support graphs that evolve over time, and demonstrate that, on a single computer, GraphChi can process over one hundred thousand graph updates per second, while simultaneously performing computation. We show, through experiments and theoretical analysis, that GraphChi performs well on both SSDs and rotational hard drives. By repeating experiments reported for existing distributed systems, we show that, with only fraction of the resources, GraphChi can solve the same problems in very reasonable time. Our work makes largescale graph computation available to anyone with a modern PC. 1
GraphX: Graph Processing in a Distributed Dataflow Framework
 USENIX ASSOCIATION 11TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDI ’14)
, 2014
"... In pursuit of graph processing performance, the systems community has largely abandoned generalpurpose distributed dataflow frameworks in favor of specialized graph processing systems that provide tailored programming abstractions and accelerate the execution of iterative graph algorithms. In thi ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
In pursuit of graph processing performance, the systems community has largely abandoned generalpurpose distributed dataflow frameworks in favor of specialized graph processing systems that provide tailored programming abstractions and accelerate the execution of iterative graph algorithms. In this paper we argue that many of the advantages of specialized graph processing systems can be recovered in a modern generalpurpose distributed dataflow system. We introduce GraphX, an embedded graph processing framework built on top of Apache Spark, a widely used distributed dataflow system. GraphX presents a familiar composable graph abstraction that is sufficient to express existing graph APIs, yet can be implemented using only a few basic dataflow operators (e.g., join, map, groupby). To achieve performance parity with specialized graph systems, GraphX recasts graphspecific optimizations as distributed join optimizations and materialized view maintenance. By leveraging advances in distributed dataflow frameworks, GraphX brings lowcost fault tolerance to graph processing. We evaluate GraphX on real workloads and demonstrate that GraphX achieves an order of magnitude performance gain over the base dataflow framework and matches the performance of specialized graph processing systems while enabling a wider range of computation.
Fast Iterative Graph Computation: A Path Centric Approach
 In SC
, 2014
"... Abstract—Large scale graph processing represents an interesting systems challenge due to the lack of locality. This paper presents PathGraph, a system for improving iterative graph computation on graphs with billions of edges. Our system design has three unique features: First, we model a large gra ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Large scale graph processing represents an interesting systems challenge due to the lack of locality. This paper presents PathGraph, a system for improving iterative graph computation on graphs with billions of edges. Our system design has three unique features: First, we model a large graph using a collection of treebased partitions and use pathcentric computation rather than vertexcentric or edgecentric computation. Our pathcentric graph parallel computation model significantly improves the memory and disk locality for iterative computation algorithms on large graphs. Second, we design a compact storage that is optimized for iterative graph parallel computation. Concretely, we use deltacompression, partition a large graph into treebased partitions and store trees in a DFS order. By clustering highly correlated paths together, we further maximize sequential access and minimize random access on storage media. Third but not the least, we implement the pathcentric computation model by using a scatter/gather programming model, which parallels the iterative computation at partition tree level and performs sequential local updates for vertices in each tree partition to improve the convergence speed. We compare PathGraph to most recent alternative graph processing systems such as GraphChi and XStream, and show that the pathcentric approach outperforms vertexcentric and edgecentric systems on a number of graph algorithms for both inmemory and outofcore graphs.
PrefEdge: SSD prefetcher for largescale graph traversal
 In Proceedings of the 7th International Systems and Storage Conference, SYSTOR ’14
, 2014
"... Mining large graphs has now become an important aspect of multiple diverse applications and a number of computer systems have been proposed to provide runtime support. Recent interest in this area has led to the construction of single machine graph computation systems that use solid state drives ( ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Mining large graphs has now become an important aspect of multiple diverse applications and a number of computer systems have been proposed to provide runtime support. Recent interest in this area has led to the construction of single machine graph computation systems that use solid state drives (SSDs) to store the graph. This approach reduces the cost and simplifies the implementation of graph algorithms, making computations on large graphs available to the average user. However, SSDs are slower than main memory, and making full use of their bandwidth is crucial for executing graph algorithms in a reasonable amount of time. In this paper, we present PrefEdge, a prefetcher for graph algorithms that parallelises requests to derive maximum throughput from SSDs. PrefEdge combines a judicious distribution of graph state between main memory and SSDs with an innovative readahead algorithm to prefetch needed data in parallel. This is in contrast to existing approaches that depend on multithreading the graph algorithms to saturate available bandwidth. Our experiments on graph algorithms using random access show that PrefEdge not only is capable of maximising the throughput from SSDs but is also able to almost hide the effect of I/O latency. The improvements in runtime for graph algorithms is up to 14 × when compared to a single threaded baseline. When compared to multithreaded implementations, PrefEdge performs up to 80 % faster without the program complexity and the programmer effort needed for multithreaded graph algorithms.
Flashgraph: processing billionnode graphs on an array of commodity ssds
"... Abstract—Graph analysis performs many random reads and writes, thus these workloads are typically performed in memory. Traditionally, analyzing large graphs requires a cluster of machines so the aggregate memory exceeds the size of the graph. We demonstrate that a multicore server can process graph ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Graph analysis performs many random reads and writes, thus these workloads are typically performed in memory. Traditionally, analyzing large graphs requires a cluster of machines so the aggregate memory exceeds the size of the graph. We demonstrate that a multicore server can process graphs of billions of vertices and hundreds of billions of edges, utilizing commodity SSDs without much performance loss. We do so by implementing a graphprocessing engine within a userspace SSD file system designed for high IOPS and extreme parallelism. This allows us to localize computation to cached data in a nonuniform memory architecture and hide latency by overlapping computation with I/O. Our semiexternal memory graph engine, called FlashGraph, stores vertex state in memory and adjacency lists on SSDs. FlashGraph exposes a general and flexible programming interface that can express a variety of graph algorithms and their optimizations. FlashGraph in semiexternal memory performs many algorithms up to 20 times faster than PowerGraph, a generalpurpose, inmemory graph engine. Even breadthfirst search, which generates many small random I/Os, runs significantly faster in FlashGraph. I.
Mmap: Fast billionscale graph computation on a pc via memory mapping
 in Proceedings of the IEEE International Conference on Big Data. IEEE
, 2014
"... Abstract—Graph computation approaches such as GraphChi and TurboGraph recently demonstrated that a single PC can perform efficient computation on billionnode graphs. To achieve high speed and scalability, they often need sophisticated data structures and memory management strategies. We propose a m ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Graph computation approaches such as GraphChi and TurboGraph recently demonstrated that a single PC can perform efficient computation on billionnode graphs. To achieve high speed and scalability, they often need sophisticated data structures and memory management strategies. We propose a minimalist approach that forgoes such requirements, by leveraging the fundamental memory mapping (MMap) capability found on operating systems. We contribute: (1) a new insight that MMap is a viable technique for creating fast and scalable graph algorithms that surpasses some of the best techniques; (2) the design and implementation of popular graph algorithms for billionscale graphs with little code, thanks to memory mapping; (3) extensive experiments on real graphs, including the 6.6 billion edge YahooWeb graph, and show that this new approach is significantly faster or comparable to the highlyoptimized methods (e.g., 9.5X faster than GraphChi for computing PageRank on 1.47B edge Twitter graph). We believe our work provides a new direction in the design and development of scalable algorithms. Our packaged code is available at
NUMAaware graphstructured analytics
 In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP
, 2015
"... Graphstructured analytics has been widely adopted in a number of big data applications such as social computation, websearch and recommendation systems. Though much prior research focuses on scaling graphanalytics on distributed environments, the strong desire on performance per core, dollar and ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Graphstructured analytics has been widely adopted in a number of big data applications such as social computation, websearch and recommendation systems. Though much prior research focuses on scaling graphanalytics on distributed environments, the strong desire on performance per core, dollar and joule has generated considerable interests of processing largescale graphs on a single serverclass machine, which may have several terabytes of RAM and 80 or more cores. However, prior graphanalytics systems are largely neutral to NUMA characteristics and thus have suboptimal performance. This paper presents a detailed study of NUMA characteristics and their impact on the efficiency of graphanalytics. Our study uncovers two insights: 1) either random or interleaved allocation of graph data will significantly hamper data locality and parallelism; 2) sequential internode (i.e., remote) memory accesses have much higher bandwidth than both intra and internode random ones. Based on them, this paper describes Polymer, a NUMAaware graphanalytics system on multicore with two key design decisions. First, Polymer differentially allocates and places topology data, applicationdefined data and mutable runtime states of a graph system according to their access patterns to minimize remote accesses. Second, for some remaining random accesses, Polymer carefully converts random remote accesses into sequential remote accesses, by using lightweight replication of vertices across NUMA nodes. To improve load balance and vertex convergence, Polymer is further built with a hierarchical barrier to boost parallelism and locality, an edgeoriented balanced partitioning for skewed graphs, and adaptive data structures according to the proportion of active vertices. A detailed evaluation on an 80core machine shows that Polymer often outperforms the stateoftheart singlemachine graphanalytics systems, including Ligra, XStream and Galois, for a set of popular realworld and synthetic graphs.
Graphchidb: Simple design for a scalable graph database system  on just a PC
 CoRR
"... We propose a new data structure, Parallel Adjacency Lists (PAL), for efficiently managing graphs with billions of edges on disk. The PAL structure is based on the graph storage model of GraphChi [14], but we extend it to enable online database features such as queries and fast insertions. In additio ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
We propose a new data structure, Parallel Adjacency Lists (PAL), for efficiently managing graphs with billions of edges on disk. The PAL structure is based on the graph storage model of GraphChi [14], but we extend it to enable online database features such as queries and fast insertions. In addition, we extend the model with edge and vertex attributes. Compared to previous data structures, PAL can store graphs more compactly while allowing fast access to both the incoming and the outgoing edges of a vertex, without duplicating data. Based on PAL, we design a graph database management system, GraphChiDB, which can also execute powerful analytical graph computation. We evaluate our design experimentally and demonstrate that GraphChiDB achieves stateoftheart performance on graphs that are much larger than the available memory. GraphChiDB enables anyone with just a laptop or a PC to work with extremely large graphs. 1.
GraphQ: Graph Query Processing with Abstraction Refinement Scalable and Programmable Analytics over Very Large Graphs on a Single PC
"... This paper introduces GraphQ, a scalable querying framework for very large graphs. GraphQ is built on a key insight that many interesting graph properties — such as finding cliques of a certain size, or finding vertices with a certain page rank — can be effectively computed by exploring only a sma ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
This paper introduces GraphQ, a scalable querying framework for very large graphs. GraphQ is built on a key insight that many interesting graph properties — such as finding cliques of a certain size, or finding vertices with a certain page rank — can be effectively computed by exploring only a small fraction of the graph, and traversing the complete graph is an overkill. The centerpiece of our framework is the novel idea of abstraction refinement, where the very large graph is represented as multiple levels of abstractions, and a query is processed through iterative refinement across graph abstraction levels. As a result, GraphQ enjoys several distinctive traits unseen in existing graph processing systems: query processing is naturally budgetaware, friendly for outofcore processing when “Big Graphs ” cannot entirely fit into memory, and endowed with strong correctness properties on query answers. With GraphQ, a wide range of complex analytical queries over very large graphs can be answered with resources affordable to a single PC, which complies with the recent trend advocating singlemachinebased Big Data processing. Experiments show GraphQ can answer queries in graphs 46 times bigger than the memory capacity, only in several seconds to minutes. In contrast, GraphChi, a stateoftheart graph processing system, takes hours to days to compute a wholegraph solution. An additional comparison with a modified version of GraphChi that terminates immediately when a query is answered shows that GraphQ is on average 1.6–13.4 × faster due to its ability to process partial graphs. 1
Exploiting iterativeness for parallel ML computations
"... Many largescale machine learning (ML) applications use iterative algorithms to converge on parameter values that make the chosen model fit the input data. Often, this approach results in the same sequence of accesses to parameters repeating each iteration. This paper shows that these repeating pa ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Many largescale machine learning (ML) applications use iterative algorithms to converge on parameter values that make the chosen model fit the input data. Often, this approach results in the same sequence of accesses to parameters repeating each iteration. This paper shows that these repeating patterns can and should be exploited to improve the efficiency of the parallel and distributed ML applications that will be a mainstay in cloud computing environments. Focusing on the increasingly popular “parameter server ” approach to sharing model parameters among worker threads, we describe and demonstrate how the repeating patterns can be exploited. Examples include replacing dynamic cache and server structures with static preserialized structures, informing prefetch and partitioning decisions, and determining which data should be cached at each thread to avoid both contention and slow accesses to memory banks attached to other sockets. Experiments show that such exploitation reduces periteration time by 33–98%, for three real ML workloads, and that these improvements are robust to variation in the patterns over time. 1.