#### DMCA

## GridGraph: Large-scale graph processing on a single machine using 2-level hierarchical partitioning

Venue: | In Proceedings of the Usenix Annual Technical Conference (2015), USENIX Association |

Citations: | 1 - 0 self |

### Citations

3176 | The PageRank Citation Ranking: Bringing Order to the Web.”
- Page, Sergey, et al.
- 1998
(Show Context)
Citation Context ...backed in files. It provides convenient and transparent access to vectors, and simplifies the programmingmodel: developers can treat it as normal arrays just as if they are in memory. We use PageRank =-=[19]-=- as an example to show how to implement algorithms using GridGraph (shown in Algorithm 32). PageRank is a link analysis algorithm that 2Accum(&s, a) is an atomic operation which adds a to s. calculate... |

1643 | On Power-Law Relationships of the Internet Topology
- Faloutsos, Faloutsos, et al.
- 1999
(Show Context)
Citation Context ...e too small to achieve substantial sequential bandwidth on HDDs. Figure 2 shows the distribution of edge block sizes in Twitter [12] graph using a 32× 32 partitioning, which conforms to the power-law =-=[7]-=-, with a large number of small files and a few big ones. Thus full sequential bandwidth can not be achieved sometimes due to potentially frequent disk seeks. To avoid such performance loss, an extra m... |

1351 |
A Bridging Model for Parallel Computation
- Valiant
- 1990
(Show Context)
Citation Context ...cache to hold edge data, which is very useful when active edge data becomes small enough to fit into memory. Another advantage of this streaming-apply model is that it not only supports classical BSP =-=[25]-=- model, but also allows asynchronous [3] updates. Since vertex updates are in-place and instant, the effect of an update can be seen by following vertex accesses, which makes lots of iterative algorit... |

934 | What is twitter, a social network or a news media
- Kwak, Lee, et al.
- 2010
(Show Context)
Citation Context ...o the irregular structure of real world graphs, some edge blocks might be too small to achieve substantial sequential bandwidth on HDDs. Figure 2 shows the distribution of edge block sizes in Twitter =-=[12]-=- graph using a 32× 32 partitioning, which conforms to the power-law [7], with a large number of small files and a few big ones. Thus full sequential bandwidth can not be achieved sometimes due to pote... |

480 | Group formation in large social networks: membership, growth, and evolution
- Backstrom, Huttenlocher, et al.
- 2006
(Show Context)
Citation Context ... of GridGraph through comparison with the latest version of GraphChi and X-Stream on d2.xlarge4 and i2.xlarge instances. For each system, we run BFS, WCC, SpMV and Pagerank on 4 datasets: LiveJournal =-=[2]-=-, Twitter [12], UK [4] and Yahoo [29]. All the graphs are real-world graphs with power-law degree distributions. LiveJournal and Twitter are social graphs, showing the following relationship between u... |

466 | Pregel: A system for large-scale graph processing
- Malewicz, Austern, et al.
- 2010
(Show Context)
Citation Context ...ny real-world problems, such as online social networks, web graphs, user-item matrices, and more, can be represented as graph computing problems. Many distributed graph processing systems like Pregel =-=[17]-=-, GraphLab [16], PowerGraph [8], GraphX [28], and others [1, 22] have been proposed in the past few years. They are able to handle graphs of very large scale by exploiting the powerful computation res... |

264 | The WebGraph framework I: Compression techniques
- Boldi, Vigna
- 2004
(Show Context)
Citation Context ...ster than mainstream distributed graph processing systems that require much more resources. The performance of GridGraph is mainly restricted by I/O bandwidth. We plan to employ compression techniques=-=[5, 14, 24]-=- on the edge grid to further reduce the I/O bandwidth required and improve efficiency. Acknowledgments We sincerely thank our shepherd Haibo Chen and the anonymous reviewers for their insightful comme... |

125 | Distributed graphlab: A framework for machine learning and data mining in the cloud
- Low, Bickson, et al.
(Show Context)
Citation Context ...roblems, such as online social networks, web graphs, user-item matrices, and more, can be represented as graph computing problems. Many distributed graph processing systems like Pregel [17], GraphLab =-=[16]-=-, PowerGraph [8], GraphX [28], and others [1, 22] have been proposed in the past few years. They are able to handle graphs of very large scale by exploiting the powerful computation resources of clust... |

115 | Powergraph: distributed graph-parallel computation on natural graphs
- GONZALEZ, LOW, et al.
- 2012
(Show Context)
Citation Context ...online social networks, web graphs, user-item matrices, and more, can be represented as graph computing problems. Many distributed graph processing systems like Pregel [17], GraphLab [16], PowerGraph =-=[8]-=-, GraphX [28], and others [1, 22] have been proposed in the past few years. They are able to handle graphs of very large scale by exploiting the powerful computation resources of clusters. However, lo... |

108 | Graphchi: Large-scale graph computation on just a pc
- Kyrola, Blelloch, et al.
(Show Context)
Citation Context ...nges for graph processing in distributed environment. Moreover, users need to be skillful since tuning a cluster and optimizing graph algorithms in distributed systems are non-trivial tasks. GraphChi =-=[13]-=-, X-Stream [21] and other out-of-core systems [9, 15, 31, 34] provide alternative solutions. They enable users to process large-scale graphs on a single machine by using disks efficiently. GraphChi pa... |

65 |
A Comparative Study into Distributed Load Balancing Algorithms for Cloud Computing
- Randles, Lamb, et al.
- 2010
(Show Context)
Citation Context ..., and others [1, 22] have been proposed in the past few years. They are able to handle graphs of very large scale by exploiting the powerful computation resources of clusters. However, load imbalance =-=[11, 20]-=-, synchronization overhead [33] and fault tolerance overhead [27] are still challenges for graph processing in distributed environment. Moreover, users need to be skillful since tuning a cluster and o... |

57 | A scalable distributed parallel breadth-first search algorithm on bluegene/l
- Yoo, Chow, et al.
(Show Context)
Citation Context ...chniques in these works can be integrated to improve in-memory performance further. The 2D grid partitioning used in GridGraph is also utilized similarly in distributed graph systems and applications =-=[10, 28, 30]-=- to reduce communication overhead. PowerLyra [6] provides an efficient hybrid-cut graph partitioning algorithm which combines edge-cut and vertex-cut with heuristics that differentiate the computation... |

46 | Trinity: a distributed graph engine on a memory cloud
- Shao, Wang, et al.
- 2013
(Show Context)
Citation Context ...raphs, user-item matrices, and more, can be represented as graph computing problems. Many distributed graph processing systems like Pregel [17], GraphLab [16], PowerGraph [8], GraphX [28], and others =-=[1, 22]-=- have been proposed in the past few years. They are able to handle graphs of very large scale by exploiting the powerful computation resources of clusters. However, load imbalance [11, 20], synchroniz... |

37 | Ligra: a lightweight graph processing framework for shared memory
- Shun, Blelloch
- 2013
(Show Context)
Citation Context ... many graph engines using shared memory configurations. X-Stream [21] has its in-memory streaming engine and uses a parallel multistage shuffler to fit vertex data of each partition into cache. Ligra =-=[23]-=- is a shared-memory graph processing framework which provides two very simple routines for mapping over vertices and edges, inspiring GridGraph for the streaming interface. It adaptively switches betw... |

31 | X-Stream: edge-centric graph processing using streaming partitions
- Roy, Mihailovic, et al.
(Show Context)
Citation Context ...processing in distributed environment. Moreover, users need to be skillful since tuning a cluster and optimizing graph algorithms in distributed systems are non-trivial tasks. GraphChi [13], X-Stream =-=[21]-=- and other out-of-core systems [9, 15, 31, 34] provide alternative solutions. They enable users to process large-scale graphs on a single machine by using disks efficiently. GraphChi partitions the ve... |

30 |
GraphX: A resilient distributed graph system on spark
- Xin, Gonzalez, et al.
- 2013
(Show Context)
Citation Context ...l networks, web graphs, user-item matrices, and more, can be represented as graph computing problems. Many distributed graph processing systems like Pregel [17], GraphLab [16], PowerGraph [8], GraphX =-=[28]-=-, and others [1, 22] have been proposed in the past few years. They are able to handle graphs of very large scale by exploiting the powerful computation resources of clusters. However, load imbalance ... |

27 | A lightweight infrastructure for graph analytics
- Nguyen, Lenharth, et al.
- 2013
(Show Context)
Citation Context ...od to model a large graph as a collection of tree-based partitions. Its compact design in storage allows efficient data access and achieves good performance on machines with sufficient memory. Galois =-=[18]-=- provides a machine-topology-aware scheduler, a priority scheduler and a library of scalable data structures, and uses a CSR format of graphs in its out-of-core implementation. GridGraph is inspired b... |

24 |
TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single pc
- Han, Lee, et al.
(Show Context)
Citation Context ...nt. Moreover, users need to be skillful since tuning a cluster and optimizing graph algorithms in distributed systems are non-trivial tasks. GraphChi [13], X-Stream [21] and other out-of-core systems =-=[9, 15, 31, 34]-=- provide alternative solutions. They enable users to process large-scale graphs on a single machine by using disks efficiently. GraphChi partitions the vertices into disjoint intervals and breaks the ... |

19 | Mizan: a system for dynamic load balancing in large-scale graph processing
- KHAYYAT, AWARA, et al.
(Show Context)
Citation Context ..., and others [1, 22] have been proposed in the past few years. They are able to handle graphs of very large scale by exploiting the powerful computation resources of clusters. However, load imbalance =-=[11, 20]-=-, synchronization overhead [33] and fault tolerance overhead [27] are still challenges for graph processing in distributed environment. Moreover, users need to be skillful since tuning a cluster and o... |

15 | A large time-aware web graph
- BOLDI, SANTINI, et al.
(Show Context)
Citation Context ...comparison with the latest version of GraphChi and X-Stream on d2.xlarge4 and i2.xlarge instances. For each system, we run BFS, WCC, SpMV and Pagerank on 4 datasets: LiveJournal [2], Twitter [12], UK =-=[4]-=- and Yahoo [29]. All the graphs are real-world graphs with power-law degree distributions. LiveJournal and Twitter are social graphs, showing the following relationship between users within each onlin... |

12 |
Giraph: Large-scale graph processing infrastruction on hadoop,” Hadoop
- Avery
- 2011
(Show Context)
Citation Context ...raphs, user-item matrices, and more, can be represented as graph computing problems. Many distributed graph processing systems like Pregel [17], GraphLab [16], PowerGraph [8], GraphX [28], and others =-=[1, 22]-=- have been proposed in the past few years. They are able to handle graphs of very large scale by exploiting the powerful computation resources of clusters. However, load imbalance [11, 20], synchroniz... |

7 |
Powerlyra: Differentiated graph computation and partitioning on skewed graphs
- CHEN, SHI, et al.
(Show Context)
Citation Context ...emory performance further. The 2D grid partitioning used in GridGraph is also utilized similarly in distributed graph systems and applications [10, 28, 30] to reduce communication overhead. PowerLyra =-=[6]-=- provides an efficient hybrid-cut graph partitioning algorithm which combines edge-cut and vertex-cut with heuristics that differentiate the computation and partitioning on high-degree and low-degree ... |

4 |
Replication-based fault-tolerance for large-scale graph processing
- WANG, ZHANG, et al.
(Show Context)
Citation Context ...re able to handle graphs of very large scale by exploiting the powerful computation resources of clusters. However, load imbalance [11, 20], synchronization overhead [33] and fault tolerance overhead =-=[27]-=- are still challenges for graph processing in distributed environment. Moreover, users need to be skillful since tuning a cluster and optimizing graph algorithms in distributed systems are non-trivial... |

4 | Fast iterative graph computation: a path centric approach
- YUAN, ZHANG, et al.
(Show Context)
Citation Context ...nt. Moreover, users need to be skillful since tuning a cluster and optimizing graph algorithms in distributed systems are non-trivial tasks. GraphChi [13], X-Stream [21] and other out-of-core systems =-=[9, 15, 31, 34]-=- provide alternative solutions. They enable users to process large-scale graphs on a single machine by using disks efficiently. GraphChi partitions the vertices into disjoint intervals and breaks the ... |

3 | Mmap: Fast billion-scale graph computation on a pc via memory mapping
- Lin, Kahng, et al.
- 2014
(Show Context)
Citation Context ...nt. Moreover, users need to be skillful since tuning a cluster and optimizing graph algorithms in distributed systems are non-trivial tasks. GraphChi [13], X-Stream [21] and other out-of-core systems =-=[9, 15, 31, 34]-=- provide alternative solutions. They enable users to process large-scale graphs on a single machine by using disks efficiently. GraphChi partitions the vertices into disjoint intervals and breaks the ... |

3 | Numa-aware graphstructured analytics
- ZHANG, CHEN, et al.
- 2015
(Show Context)
Citation Context ...the streaming interface. It adaptively switches between two modes based on the density of active vertex subsets when mapping over edges, and is especially efficient for applications like BFS. Polymer =-=[32]-=- uses graph-aware data allocation, layout and access strategy that reduces remote memory accesses and turns inevitable random remote accesses into sequential ones. While GridGraph concentrates on out-... |

3 | Flashgraph: Processing billion-node graphs on an array of commodity ssds
- ZHENG, MHEMBERE, et al.
- 2015
(Show Context)
Citation Context |

2 |
Graphbuilder: scalable graph etl framework
- JAIN, LIAO, et al.
(Show Context)
Citation Context ...chniques in these works can be integrated to improve in-memory performance further. The 2D grid partitioning used in GridGraph is also utilized similarly in distributed graph systems and applications =-=[10, 28, 30]-=- to reduce communication overhead. PowerLyra [6] provides an efficient hybrid-cut graph partitioning algorithm which combines edge-cut and vertex-cut with heuristics that differentiate the computation... |

2 | and faster: Parallel processing of compressed graphs with ligra
- SHUN, DHULIPALA, et al.
- 2015
(Show Context)
Citation Context ...ster than mainstream distributed graph processing systems that require much more resources. The performance of GridGraph is mainly restricted by I/O bandwidth. We plan to employ compression techniques=-=[5, 14, 24]-=- on the edge grid to further reduce the I/O bandwidth required and improve efficiency. Acknowledgments We sincerely thank our shepherd Haibo Chen and the anonymous reviewers for their insightful comme... |

2 | Graphq: Graph query processing with abstraction refinement
- WANG, XU, et al.
- 2015
(Show Context)
Citation Context ...m vertex access pattern. These solutions require a sorted adjacency list representation of graph, which needs time-consuming preprocessing, and SSDs to efficiently process random I/O requests. GraphQ =-=[26]-=- divides graphs into partitions and uses user programmable heuristics to merge partitions. It aims to answer queries by analyzing subgraphs. PathGraph [31] uses a pathcentric method to model a large g... |

1 |
Gipfeli-high speed compression algorithm
- LENHARDT, ALAKUIJALA
(Show Context)
Citation Context ...ster than mainstream distributed graph processing systems that require much more resources. The performance of GridGraph is mainly restricted by I/O bandwidth. We plan to employ compression techniques=-=[5, 14, 24]-=- on the edge grid to further reduce the I/O bandwidth required and improve efficiency. Acknowledgments We sincerely thank our shepherd Haibo Chen and the anonymous reviewers for their insightful comme... |

1 |
altavista web page hyperlink connectivity graph, circa 2002. http://webscope.sandbox.yahoo.com
- Yahoo
(Show Context)
Citation Context ...h the latest version of GraphChi and X-Stream on d2.xlarge4 and i2.xlarge instances. For each system, we run BFS, WCC, SpMV and Pagerank on 4 datasets: LiveJournal [2], Twitter [12], UK [4] and Yahoo =-=[29]-=-. All the graphs are real-world graphs with power-law degree distributions. LiveJournal and Twitter are social graphs, showing the following relationship between users within each online social networ... |

1 |
Lightgraph: Lighten communication in distributed graph-parallel processing
- ZHAO, YOSHIGOE, et al.
(Show Context)
Citation Context ...osed in the past few years. They are able to handle graphs of very large scale by exploiting the powerful computation resources of clusters. However, load imbalance [11, 20], synchronization overhead =-=[33]-=- and fault tolerance overhead [27] are still challenges for graph processing in distributed environment. Moreover, users need to be skillful since tuning a cluster and optimizing graph algorithms in d... |