Results 1  10
of
25
An Experimental Comparison of Pregellike Graph Processing Systems∗
"... The introduction of Google’s Pregel generated much interest in the field of largescale graph data processing, inspiring the development of Pregellike systems such as Apache Giraph, GPS, Mizan, and GraphLab, all of which have appeared in the past two years. To gain an understanding of how Pregel ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
(Show Context)
The introduction of Google’s Pregel generated much interest in the field of largescale graph data processing, inspiring the development of Pregellike systems such as Apache Giraph, GPS, Mizan, and GraphLab, all of which have appeared in the past two years. To gain an understanding of how Pregellike systems perform, we conduct a study to experimentally compare Giraph, GPS, Mizan, and GraphLab on equal ground by considering graph and algorithm agnostic optimizations and by using several metrics. The systems are compared with four different algorithms (PageRank, single source shortest path, weakly connected components, and distributed minimum spanning tree) on up to 128 Amazon EC2 machines. We find that the system optimizations present in Giraph and GraphLab allow them to perform well. Our evaluation also shows Giraph 1.0.0’s considerable improvement since Giraph 0.1 and identifies areas of improvement for all systems. 1.
Pregelix: Big(ger) Graph Analytics on A Dataflow Engine
"... There is a growing need for distributed graph processing systems that are capable of gracefully scaling to very large graph datasets. Unfortunately, this challenge has not been easily met due to the intense memory pressure imposed by processcentric, message passing designs that many graph process ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
There is a growing need for distributed graph processing systems that are capable of gracefully scaling to very large graph datasets. Unfortunately, this challenge has not been easily met due to the intense memory pressure imposed by processcentric, message passing designs that many graph processing systems follow. Pregelix is a new open source distributed graph processing system that is based on an iterative dataflow design that is better tuned to handle both inmemory and outofcore workloads. As such, Pregelix offers improved performance characteristics and scaling properties over current open source systems (e.g., we have seen up to 15× speedup compared to Apache Giraph and up to 35 × speedup compared to distributed GraphLab), and more effective use of available machine resources to support Big(ger) Graph Analytics. 1.
Scalable Big Graph Processing in MapReduce
, 2014
"... MapReduce has become one of the most popular parallel computing paradigms in cloud, due to its high scalability, reliability, and faulttolerance achieved for a large variety of applications in big data processing. In the literature, there are MapReduce Class MRC and Minimal MapReduce ClassMMC to d ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
MapReduce has become one of the most popular parallel computing paradigms in cloud, due to its high scalability, reliability, and faulttolerance achieved for a large variety of applications in big data processing. In the literature, there are MapReduce Class MRC and Minimal MapReduce ClassMMC to define the memory consumption, communication cost, CPU cost, and number of MapReduce rounds for an algorithm to execute in MapReduce. However, neither of them is designed for big graph processing in MapReduce, since the constraints inMMC can be hardly achieved simultaneously on graphs and the conditions inMRC may induce scalability problems when processing big graph data. In this paper, we study scalable big graph processing in MapReduce. We introduce a Scalable Graph processing Class SGC by relaxing some constraints inMMC to make it suitable for scalable graph processing. We define two graph join operators in SGC, namely, EN join andNE join, using which a wide range of graph algorithms can be
Fast Iterative Graph Computation: A Path Centric Approach
 In SC
, 2014
"... Abstract—Large scale graph processing represents an interesting systems challenge due to the lack of locality. This paper presents PathGraph, a system for improving iterative graph computation on graphs with billions of edges. Our system design has three unique features: First, we model a large gra ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Large scale graph processing represents an interesting systems challenge due to the lack of locality. This paper presents PathGraph, a system for improving iterative graph computation on graphs with billions of edges. Our system design has three unique features: First, we model a large graph using a collection of treebased partitions and use pathcentric computation rather than vertexcentric or edgecentric computation. Our pathcentric graph parallel computation model significantly improves the memory and disk locality for iterative computation algorithms on large graphs. Second, we design a compact storage that is optimized for iterative graph parallel computation. Concretely, we use deltacompression, partition a large graph into treebased partitions and store trees in a DFS order. By clustering highly correlated paths together, we further maximize sequential access and minimize random access on storage media. Third but not the least, we implement the pathcentric computation model by using a scatter/gather programming model, which parallels the iterative computation at partition tree level and performs sequential local updates for vertices in each tree partition to improve the convergence speed. We compare PathGraph to most recent alternative graph processing systems such as GraphChi and XStream, and show that the pathcentric approach outperforms vertexcentric and edgecentric systems on a number of graph algorithms for both inmemory and outofcore graphs.
LargeScale Distributed Graph Computing Systems: An Experimental Evaluation
"... With the prevalence of graph data in realworld applications (e.g., social networks, mobile phone networks, web graphs, etc.) and their everincreasing size, many distributed graph computing systems have been developed in recent years to process and analyze massive graphs. Most of these systems ado ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
With the prevalence of graph data in realworld applications (e.g., social networks, mobile phone networks, web graphs, etc.) and their everincreasing size, many distributed graph computing systems have been developed in recent years to process and analyze massive graphs. Most of these systems adopt Pregel’s vertexcentric computing model, while various techniques have been proposed to address the limitations in the Pregel framework. However, there is a lack of comprehensive comparative analysis to evaluate the performance of various systems and their techniques, making it difficult for users to choose the best system for their applications. We conduct extensive experiments to evaluate the performance of existing systems on graphs with different characteristics and on algorithms with different design logic. We also study the effectiveness of various techniques adopted in existing systems, and the scalability of the systems. The results of our study reveal the strengths and limitations of existing systems, and provide valuable insights for users, researchers and system developers. 1.
GoFFish: A SubGraph Centric Framework for Largescale Graph Analytics
"... Vertex centric models for large scale graph processing are gaining traction due to their simple distributed programming abstraction. However, pure vertex centric algorithms underperform due to large communication overheads and slow iterative convergence. We introduce GoFFish a scalable subgraph ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Vertex centric models for large scale graph processing are gaining traction due to their simple distributed programming abstraction. However, pure vertex centric algorithms underperform due to large communication overheads and slow iterative convergence. We introduce GoFFish a scalable subgraph centric framework codesigned with a distributed persistent graph storage for large scale graph analytics on commodity clusters, offering the added natural flexibility of shared memory subgraph computation. We map Connected Components, SSSP and PageRank algorithms to this model and empirically analyze them for several real world graphs, demonstrating orders of magnitude improvements, in some cases, compared to Apache Giraph’s vertex centric framework.
NScale: Neighborhoodcentric LargeScale Graph Analytics
 in the Cloud,” http://arxiv.org/abs/1405.1499
, 2014
"... There is an increasing interest in executing rich and complex analysis tasks over largescale graphs, many of which require processing and reasoning about a large number of multihop neighborhoods or subgraphs in the graph. Examples of such tasks include ego network analysis, motif counting, findi ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
There is an increasing interest in executing rich and complex analysis tasks over largescale graphs, many of which require processing and reasoning about a large number of multihop neighborhoods or subgraphs in the graph. Examples of such tasks include ego network analysis, motif counting, finding social circles, personalized recommendations, link prediction, anomaly detection, analyzing influence cascades, and so on. These tasks are not well served by the existing vertexcentric graph processing frameworks, whose computation and execution models limit the user program to directly access the state of a single vertex; this results in high communication, scheduling, and memory overheads in executing such tasks using those frameworks. Further, most existing graph processing frameworks typically ignore the challenges in extracting the relevant portion of the graph that an analysis task needs, and loading it
Parallel Graph Partitioning for Complex Networks
 Proceedings of the 29th International Parallal and Distributed Processing Symposium
, 2015
"... Abstract—Processing large complex networks like social networks or web graphs has recently attracted considerable interest. In order to do this in parallel, we need to partition them into pieces of about equal size. Unfortunately, previous parallel graph partitioners originally developed for more r ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Processing large complex networks like social networks or web graphs has recently attracted considerable interest. In order to do this in parallel, we need to partition them into pieces of about equal size. Unfortunately, previous parallel graph partitioners originally developed for more regular meshlike networks do not work well for these networks. This paper addresses this problem by parallelizing and adapting the label propagation technique originally developed for graph clustering. By introducing size constraints, label propagation becomes applicable for both the coarsening and the refinement phase of multilevel graph partitioning. We obtain very high quality by applying a highly parallel evolutionary algorithm to the coarsened graph. The resulting system is both more scalable and achieves higher quality than stateoftheart systems like ParMetis or PTScotch. For large complex networks the performance differences are very big. For example, our algorithm can partition a web graph with 3.3 billion edges in less than sixteen seconds using 512 cores of a high performance cluster while producing a high quality partition – none of the competing systems can handle this graph on our system. I.
MOCgraph: Scalable Distributed Graph Processing Using Message Online Computing
"... Existing distributed graph processing frameworks, e.g., Pregel, Giraph, GPS and GraphLab, mainly exploit main memory to support flexible graph operations for efficiency. Due to the complexity of graph analytics, huge memory space is required especially for those graph analytics that spawn large int ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Existing distributed graph processing frameworks, e.g., Pregel, Giraph, GPS and GraphLab, mainly exploit main memory to support flexible graph operations for efficiency. Due to the complexity of graph analytics, huge memory space is required especially for those graph analytics that spawn large intermediate results. Existing frameworks may terminate abnormally or degrade performance seriously when the memory is exhausted or the external storage has to be used. In this paper, we propose MOCgraph, a scalable distributed graph processing framework to reduce the memory footprint and improve the scalability, based on message online computing. MOCgraph consumes incoming messages in a streaming manner, so as to handle larger graphs or more complex analytics with the same memory capacity. MOCgraph also exploits message online computing with external storage to provide an efficient outofcore support. We implement MOCgraph on top of Apache Giraph, and test it against several representative graph algorithms on large graph datasets. Experiments illustrate that MOCgraph is efficient and memorysaving, especially for graph analytics with large intermediate results. 1.
Efficient and Scalable Graph Similarity Joins in MapReduce
"... Along with the emergence of massive graphmodeled data, it is of great importance to investigate graph similarity joins due to their wide applications for multiple purposes, including data cleaning, and near duplicate detection. This paper considers graph similarity joins with edit distance constra ..."
Abstract
 Add to MetaCart
(Show Context)
Along with the emergence of massive graphmodeled data, it is of great importance to investigate graph similarity joins due to their wide applications for multiple purposes, including data cleaning, and near duplicate detection. This paper considers graph similarity joins with edit distance constraints, which return pairs of graphs such that their edit distances are no larger than a given threshold. Leveraging the MapReduce programming model, we propose MGSJoin, a scalable algorithm following the filteringverification framework for efficient graph similarity joins. It relies on counting overlapping graph signatures for filtering out nonpromising candidates. With the potential issue of too many keyvalue pairs in the filtering phase, spectral Bloom filters are introduced to reduce the number of keyvalue pairs. Furthermore, we integrate the multiway join strategy to boost the verification, where a MapReducebased method is proposed for GED calculation. The superior efficiency and scalability of the proposed algorithms are demonstrated by extensive experimental results.