Results 1 
8 of
8
GraphChi: Largescale Graph Computation On just a PC
 In Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation, OSDI’12
, 2012
"... Current systems for graph computation require a distributed computing cluster to handle very large realworld problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible, developing distributed graph algorithms still remains c ..."
Abstract

Cited by 115 (6 self)
 Add to MetaCart
(Show Context)
Current systems for graph computation require a distributed computing cluster to handle very large realworld problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible, developing distributed graph algorithms still remains challenging, especially to nonexperts. In this work, we present GraphChi, a diskbased system for computing efficiently on graphs with billions of edges. By using a wellknown method to break large graphs into small parts, and a novel parallel sliding windows method, GraphChi is able to execute several advanced data mining, graph mining, and machine learning algorithms on very large graphs, using just a single consumerlevel computer. We further extend GraphChi to support graphs that evolve over time, and demonstrate that, on a single computer, GraphChi can process over one hundred thousand graph updates per second, while simultaneously performing computation. We show, through experiments and theoretical analysis, that GraphChi performs well on both SSDs and rotational hard drives. By repeating experiments reported for existing distributed systems, we show that, with only fraction of the resources, GraphChi can solve the same problems in very reasonable time. Our work makes largescale graph computation available to anyone with a modern PC. 1
Fast Iterative Graph Computation: A Path Centric Approach
 In SC
, 2014
"... Abstract—Large scale graph processing represents an interesting systems challenge due to the lack of locality. This paper presents PathGraph, a system for improving iterative graph computation on graphs with billions of edges. Our system design has three unique features: First, we model a large gra ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Large scale graph processing represents an interesting systems challenge due to the lack of locality. This paper presents PathGraph, a system for improving iterative graph computation on graphs with billions of edges. Our system design has three unique features: First, we model a large graph using a collection of treebased partitions and use pathcentric computation rather than vertexcentric or edgecentric computation. Our pathcentric graph parallel computation model significantly improves the memory and disk locality for iterative computation algorithms on large graphs. Second, we design a compact storage that is optimized for iterative graph parallel computation. Concretely, we use deltacompression, partition a large graph into treebased partitions and store trees in a DFS order. By clustering highly correlated paths together, we further maximize sequential access and minimize random access on storage media. Third but not the least, we implement the pathcentric computation model by using a scatter/gather programming model, which parallels the iterative computation at partition tree level and performs sequential local updates for vertices in each tree partition to improve the convergence speed. We compare PathGraph to most recent alternative graph processing systems such as GraphChi and XStream, and show that the pathcentric approach outperforms vertexcentric and edgecentric systems on a number of graph algorithms for both inmemory and outofcore graphs.
ITERATIVE GRAPH COMPUTATION IN THE BIG DATA ERA
, 2015
"... Iterative graph computation is a key component in many realworld applications, as the graph data model naturally captures complex relationships between entities. The big data era has seen the rise of several new challenges to this classic computation model. In this dissertation we describe three p ..."
Abstract
 Add to MetaCart
Iterative graph computation is a key component in many realworld applications, as the graph data model naturally captures complex relationships between entities. The big data era has seen the rise of several new challenges to this classic computation model. In this dissertation we describe three projects that address different aspects of these challenges. First, because of the increasing volume of data, it is increasingly important to scale iterative graph computation to large graphs. We observe that an important class of graph applications performing little computation per vertex scales poorly when running on multiple cores. These computationally light applications are limited by memory access rates, and cannot fully utilize the benefits of multiple cores. We propose a new blockoriented computation model which creates two levels of iterative computation. On each processor, a small block of highly connected vertices is iterated locally, while the blocks are updated iteratively at the global level. We show that blockoriented execution reduces the communicationtocomputation ratio and significantly improves the perfor
Fast Iterative Graph Computation with Resource Aware Graph Parallel Abstractions
"... Iterative computation on large graphs has challenged system research from two aspects: (1) how to conduct high performance parallel processing for both inmemory and outofcore graphs; and (2) how to handle large graphs that exceed the resource boundary of traditional systems by resource aware gr ..."
Abstract
 Add to MetaCart
Iterative computation on large graphs has challenged system research from two aspects: (1) how to conduct high performance parallel processing for both inmemory and outofcore graphs; and (2) how to handle large graphs that exceed the resource boundary of traditional systems by resource aware graph partitioning such that it is feasible to run largescale graph analysis on a single PC. This paper presents GraphLego, a resource adaptive graph processing system with multilevel programmable graph parallel abstractions. GraphLego is novel in three aspects: (1) we argue that vertexcentric or edgecentric graph partitioning are ineffective for parallel processing of large graphs and we introduce three alternative graph parallel abstractions to enable a large graph to be partitioned at the granularity of subgraphs by slice, strip and dice based partitioning; (2) we use dicebased data placement algorithm to store a large graph on disk by minimizing nonsequential disk access and enabling more structured inmemory access; and (3) we dynamically determine the right level of graph parallel abstraction to maximize sequential access and minimize random access. GraphLego can run efficiently on different computers with diverse resource capacities and respond to different memory requirements by realworld graphs of different complexity. Extensive experiments show the competitiveness of GraphLego against existing representative graph processing systems, such as GraphChi, GraphLab and XStream.
Controlled Transactional Consistency for Web Caching
"... Inmemory readonly caches are widely used in cloud infrastructure to reduce access latency and to reduce load on backend databases. Operators view coherent caches as impractical at genuinely large scale and many clientfacing caches are updated in an asynchronous manner with besteffort pipelines. ..."
Abstract
 Add to MetaCart
(Show Context)
Inmemory readonly caches are widely used in cloud infrastructure to reduce access latency and to reduce load on backend databases. Operators view coherent caches as impractical at genuinely large scale and many clientfacing caches are updated in an asynchronous manner with besteffort pipelines. Existing incoherent cache technologies do not support transactional data access, even if the backend database supports transactions. We propose TCache, a cache that supports readonly transactions despite asynchronous and unreliable communication with the database. We also define cacheserializability, a variant of serializability that is suitable for incoherent caches, and prove that with unbounded resources TCache implements it. With limited resources, TCache allows the system manager to choose a tradeoff between performance and consistency. Our evaluation shows that TCache detects many inconsistencies with only nominal overhead. We use synthetic workloads to demonstrate the efficacy of TCache when data accesses are clustered and its adaptive reaction to workload changes. With workloads based on the realworld topologies, TCache detects 43 − 70 % of the inconsistencies and increases the rate of consistent transactions by 33−58%. 1
Managed Transactional Consistency for Web Caching
"... Abstract—Inmemory readonly caches are widely used in cloud infrastructure to reduce access latency and to reduce load on backend databases. Operators view coherent caches as impractical at genuinely large scale and many clientfacing caches are updated in an asynchronous manner with besteffort pi ..."
Abstract
 Add to MetaCart
Abstract—Inmemory readonly caches are widely used in cloud infrastructure to reduce access latency and to reduce load on backend databases. Operators view coherent caches as impractical at genuinely large scale and many clientfacing caches are updated in an asynchronous manner with besteffort pipelines. Existing solutions that support cache consistency are inapplicable to this scenario since they require a round trip to the database on every cache transaction. Existing incoherent cache technologies are oblivious to transactional data access, even if the backend database supports transactions. We propose TCache, a transactionaware cache for readonly transactions. TCache improves cache consistency despite asynchronous and unreliable communication between the cache and the database. We define cacheserializability, a variant of serializability that is suitable for incoherent caches, and prove that with unbounded resources TCache implements it. With limited resources, TCache allows the system manager to choose a tradeoff between performance and consistency. Our evaluation shows that TCache detects many inconsistencies with only nominal overhead. We use synthetic workloads to demonstrate the efficacy of TCache when data accesses are clustered and its adaptive reaction to workload changes. With workloads based on the realworld topologies, TCache detects 43−70 % of the inconsistencies and increases the rate of consistent transactions by 33 − 58%. I.
Dynamic Interaction Graphs with Probabilistic Edge Decay
"... Abstract—A large scale network of social interactions, such as mentions in Twitter, can often be modeled as a “dynamic interaction graph ” in which new interactions (edges) are continually added over time. Existing systems for extracting timely insights from such graphs are based on either a cumula ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—A large scale network of social interactions, such as mentions in Twitter, can often be modeled as a “dynamic interaction graph ” in which new interactions (edges) are continually added over time. Existing systems for extracting timely insights from such graphs are based on either a cumulative “snapshot” model or a “sliding window ” model. The former model does not sufficiently emphasize recent interactions. The latter model abruptly forgets past interactions, leading to discontinuities in which, e.g., the graph analysis completely ignores historically important influencers who have temporarily gone dormant. We introduce TIDE, a distributed system for analyzing dynamic graphs that employs a new “probabilistic edge decay ” (PED) model. In this model, the graph analysis algorithm of interest is applied at each time step to one or more graphs obtained as samples from the current “snapshot ” graph that comprises all interactions that have occurred so far. The probability that a given edge of the snapshot graph is included in a sample decays over time according to a user specified decay function. The PED model allows controlled tradeoffs between recency and continuity, and allows existing analysis algorithms for static graphs to be applied to dynamic graphs essentially without change. For the important class of exponential decay functions, we provide efficient methods that leverage past samples to incrementally generate new samples as time advances. We also exploit the large degree of overlap between samples to reduce memory consumption from O(N) to O(logN) when maintaining N sample graphs. Finally, we provide bulkexecution methods for applying graph algorithms to multiple sample graphs simultaneously without requiring any changes to existing graphprocessing APIs. Experiments on a real Twitter dataset demonstrate the effectiveness and efficiency of our TIDE prototype, which is built on top of the Spark distributed computing framework. I.
Blogel: A BlockCentric Framework for Distributed Computation on RealWorld Graphs
"... The rapid growth in the volume of many realworld graphs (e.g., social networks, web graphs, and spatial networks) has led to the development of various vertexcentric distributed graph computing systems in recent years. However, realworld graphs from different domains have very different characte ..."
Abstract
 Add to MetaCart
(Show Context)
The rapid growth in the volume of many realworld graphs (e.g., social networks, web graphs, and spatial networks) has led to the development of various vertexcentric distributed graph computing systems in recent years. However, realworld graphs from different domains have very different characteristics, which often create bottlenecks in vertexcentric parallel graph computation. We identify three such important characteristics from a wide spectrum of realworld graphs, namely (1)skewed degree distribution, (2)large diameter, and (3)(relatively) high density. Among them, only (1) has been studied by existing systems, but many realworld powerlaw graphs also exhibit the characteristics of (2) and (3). In this paper, we propose a blockcentric framework, called Blogel, which naturally handles all the three adverse graph characteristics. Blogel programmers may think like a block and develop efficient algorithms for various graph problems. We propose parallel algorithms to partition an arbitrary graph into blocks efficiently, and blockcentric programs are then run over these blocks. Our experiments on large realworld graphs verified that Blogel is able to achieve orders of magnitude performance improvements over the stateoftheart distributed graph computing systems. 1.