### Citations

4579 | The anatomy of a large-scale hypertextual web search engine
- Brin, Page
- 1998
(Show Context)
Citation Context ...is model to Hadoop-based systems in more detail in Section 7. 1.2 Master.compute() Implementing a graph computation inside vertex.compute() is ideal for certain algorithms, such as computing PageRank =-=[9]-=-, finding shortest paths, or finding connected components, all of which can be performed in a fully “vertexcentric” and hence parallel fashion. However, some algorithms are a combination of vertex-cen... |

3206 | MapReduce: Simplified data processing on large clusters
- Dean, Ghemawat
- 2004
(Show Context)
Citation Context ...anguage to GPS, enabling easy expression of complex algorithms. 1. INTRODUCTION Building systems that process vast amounts of data has been made simpler by the introduction of the MapReduce framework =-=[14]-=-, and its open-source implementation Hadoop [2]. These systems offer automatic scalability to extreme volumes of data, automatic fault-tolerance, and a simple programming interface based around implem... |

1351 |
A Bridging Model for Parallel Computation
- Valiant
- 1990
(Show Context)
Citation Context ... of algorithms running on GPS. 1.1 Bulk Synchronous Graph Processing The computational framework introduced by Pregel and used by GPS is based on the Bulk Synchronous Parallel (BSP) computation model =-=[36]-=-. At the beginning of the computation, the vertices of the graph are distributed across compute nodes. Computation consists of iterations called supersteps. In each superstep, analogous to the map() a... |

532 | A faster algorithm for betweenness centrality
- Brandes
- 2001
(Show Context)
Citation Context ...he Green-Marl [19] domain-specific language for graph processing into GPS. As examples, Figures 13a and 13b show the Green-Marl language being used to implement PageRank and “Betweenness Centrality.” =-=[8]-=- Both of these programs are translated readily to GPS using our compiler, although only the second algorithm truly benefits from using a high-level language instead of GPS. Here are two example genera... |

465 | Pregel: a system for large-scale graph processing
- MALEWICZ, AUSTERN, et al.
- 2010
(Show Context)
Citation Context ...ems offer automatic scalability to extreme volumes of data, automatic fault-tolerance, and a simple programming interface based around implementing a set of functions. However, it has been recognized =-=[24, 26]-=- that these systems are not always suitable when processing data in the form of a large graph (details in Section 7). A framework similar to MapReduce—scalable, fault-tolerant, easy to program—but gea... |

264 | The WebGraph framework I: Compression techniques
- Boldi, Vigna
- 2004
(Show Context)
Citation Context ...ng 1Since this paper was written, master.compute() has been incorporated into Giraph [1]. 2These datasets were provided by “The Labaratory for Web Algorithmics” [23], using software packages WebGraph =-=[7]-=-, LLP [6] and UbiCrawler [5]. the partitions. By default METIS balances the number of vertices in each partition. We set the ufactor parameter to 5, resulting in at most 0.5% imbalance in the number o... |

187 | Spark: cluster computing with working sets
- ZAHARIA, CHOWDHURY, et al.
- 2010
(Show Context)
Citation Context ... to reason about low-level synchronization, scheduling, and communication primitives in their code in order to realize the efficiency; they also do not provide fault-tolerance. • Other systems: Spark =-=[38]-=- is a general cluster computing system, whose API is designed to express generic iterative computations. As a result, programming graph algorithms on Spark requires significant more coding effort than... |

172 | UbiCrawler: a scalable fully distributed Web crawler
- Boldi, Codenotti, et al.
(Show Context)
Citation Context ...tten, master.compute() has been incorporated into Giraph [1]. 2These datasets were provided by “The Labaratory for Web Algorithmics” [23], using software packages WebGraph [7], LLP [6] and UbiCrawler =-=[5]-=-. the partitions. By default METIS balances the number of vertices in each partition. We set the ufactor parameter to 5, resulting in at most 0.5% imbalance in the number of vertices assigned to each ... |

154 | Twister: a Runtime for Iterative MapReduce
- Ekanayake, Li, et al.
(Show Context)
Citation Context ... can express a graph algorithm as a series of MapReduce jobs, each one corresponding to one iteration of the algorithm. Pegasus [22], Mahout [3], HaLoop [10], iMapReduce [39], Surfer [13] and Twister =-=[15]-=- are examples of these systems. These systems suffer from two inefficiencies that do not exist in bulk synchronous messagepassing systems: (1) The input graph, which does not change from iteration to ... |

122 | Pegasus: A peta-scale graph mining system implementation and observations
- Kang, Tsourakakis, et al.
(Show Context)
Citation Context ... the voteToHalt() function in the API. Algorithms whose computation can be expressed in a fully vertex-centric fashion are easily implemented using this API, as in our first example. Example 2.1. HCC =-=[22]-=- is an algorithm to find the weakly connected components of an undirected graph: First, every vertex sets its value to its own ID. Then, in iterations, vertices set their values to the minimum value a... |

119 | HaLoop: Efficient iterative data processing on large clusters - Bu, Howe, et al. - 2010 |

85 | Graphlab: A new framework for parallel machine learning
- Low, Gonzalez, et al.
- 2010
(Show Context)
Citation Context ...ems offer automatic scalability to extreme volumes of data, automatic fault-tolerance, and a simple programming interface based around implementing a set of functions. However, it has been recognized =-=[24, 26]-=- that these systems are not always suitable when processing data in the form of a large graph (details in Section 7). A framework similar to MapReduce—scalable, fault-tolerant, easy to program—but gea... |

75 | The parallel bgl: A generic library for distributed graph computations
- GREGOR, LUMSDAINE
(Show Context)
Citation Context ... Passing Interface (MPI): MPI is a standard interface for building a broad range of message passing programs. There are several open-source implementations of MPI [29, 28]. MPI-based libraries, e.g., =-=[11, 17, 25]-=-, can also be used to implement parallel messagepassing graph algorithms. These libraries can be very efficient, but they require users to reason about low-level synchronization, scheduling, and commu... |

72 | Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks
- Boldi, Rosa, et al.
- 2011
(Show Context)
Citation Context ... this paper was written, master.compute() has been incorporated into Giraph [1]. 2These datasets were provided by “The Labaratory for Web Algorithmics” [23], using software packages WebGraph [7], LLP =-=[6]-=- and UbiCrawler [5]. the partitions. By default METIS balances the number of vertices in each partition. We set the ufactor parameter to 5, resulting in at most 0.5% imbalance in the number of vertice... |

66 | Scalable SPARQL querying of large RDF graphs
- Huang, Abadi, et al.
- 2011
(Show Context)
Citation Context ...rithms. We also studied the effects of GPS’s dynamic repartitioning scheme on performance. There are previous studies on the performance effects of different partitionings of graphs on other systems. =-=[21]-=- shows that by partitioning Resource Description Framework [31] (RDF) data with METIS and then “intelligently” replicating certain tuples, SPARQL [32] query run-times can be improved significantly ove... |

52 | The combinatorial blas: design, implementation, and applications
- Buluç, Gilbert
(Show Context)
Citation Context ... Passing Interface (MPI): MPI is a standard interface for building a broad range of message passing programs. There are several open-source implementations of MPI [29, 28]. MPI-based libraries, e.g., =-=[11, 17, 25]-=-, can also be used to implement parallel messagepassing graph algorithms. These libraries can be very efficient, but they require users to reason about low-level synchronization, scheduling, and commu... |

49 | A faster parallel algorithm and efficient multithreaded implementations for evaluating betweenness centrality on massive datasets - Madduri, Ediger, et al. |

48 | Streaming graph partitioning for large distributed graphs
- Stanton, Kilot
(Show Context)
Citation Context ...2] query run-times can be improved significantly over random partitioning. We study the effects of partitioning under batch algorithms, whereas SPARQL queries consist of short path-finding workloads. =-=[33]-=- develops a heuristic to partition the graph across machines during the initial loading phase. They study the reduction in the number of edges crossing machines and runtime improvements on Spark when ... |

41 |
Green-marl: a dsl for easy and efficient graph analysis
- HONG, CHAFI, et al.
(Show Context)
Citation Context ...6, we briefly discuss our work on compiling a high-level domain-specific language for graph computations into GPS. We discuss the advantages of implementing certain graph algorithms in the Green-Marl =-=[19]-=- language, as an alternative to programming directly in GPS. Section 7 covers related work and Section 8 concludes and proposes future work. 2. GPS SYSTEM GPS uses the distributed message-passing mode... |

34 | imapreduce: A distributed computing framework for iterative computation
- ZHANG, GAO, et al.
- 1112
(Show Context)
Citation Context ...adoop, in which the programmer can express a graph algorithm as a series of MapReduce jobs, each one corresponding to one iteration of the algorithm. Pegasus [22], Mahout [3], HaLoop [10], iMapReduce =-=[39]-=-, Surfer [13] and Twister [15] are examples of these systems. These systems suffer from two inefficiencies that do not exist in bulk synchronous messagepassing systems: (1) The input graph, which does... |

31 | Spectral analysis for billion-scale graphs: Discoveries and implementation - Kang, Meeder, et al. - 2011 |

30 | Image webs: Computing and exploiting connectivity in image collections - Heath, Gelfand, et al. - 2010 |

29 | Signal/Collect: Graph Algorithms for the (Semantic) Web
- STUTZ, BERNSTEIN, et al.
- 2010
(Show Context)
Citation Context ...eration. (2) Checking for the convergence criterion may require additional MapReduce jobs. • Asynchronous systems: GPS supports only bulk synchronous graph processing. GraphLab [24] and SignalCollect =-=[34]-=- support asynchronous vertex-centric graph processing. An advantage of asynchronous computation over bulk synchronous computation is that fast workers do not have to wait for slow workers. However, pr... |

26 | Towards effective partition management for large graphs
- Yang, Yan, et al.
(Show Context)
Citation Context ... or dynamic partitioning schemes. Reference [12] also experiments with the run-time effects of different ways of repartitioning a sparse matrix representation of graphs when computing PageRank. Sedge =-=[37]-=- is a graph query engine based on a simple Pregel implementation. In Sedge, multiple replicas of the graph are partitioned differently and stored on different groups of workers; queries are routed to ... |

19 | A flexible open-source toolbox for scalable complex graph analysis
- Lugowski, Alber, et al.
- 2012
(Show Context)
Citation Context ... Passing Interface (MPI): MPI is a standard interface for building a broad range of message passing programs. There are several open-source implementations of MPI [29, 28]. MPI-based libraries, e.g., =-=[11, 17, 25]-=-, can also be used to implement parallel messagepassing graph algorithms. These libraries can be very efficient, but they require users to reason about low-level synchronization, scheduling, and commu... |

9 | A high-level framework for distributed processing of large-scale graphs - KREPSKA, KIELMANN, et al. |

5 | GraphLab: A New Framework For Parallel - Low, Gonzalez, et al. - 2010 |

3 | Edic Sedlar, and Kunle Olukotun. Green-marl: a DSL for easy and efficient graph analysis - Hong, Chafi - 2012 |

3 | Site-based partitioning and repartitioning techniques for parallel pagerank computation
- Cevahir, Aykanat, et al.
(Show Context)
Citation Context ...he reduction in the number of edges crossing machines and runtime improvements on Spark when running PageRank. They do not study the effects of other static or dynamic partitioning schemes. Reference =-=[12]-=- also experiments with the run-time effects of different ways of repartitioning a sparse matrix representation of graphs when computing PageRank. Sedge [37] is a graph query engine based on a simple P... |

3 |
I.: Spark: cluster computing with working sets
- Shenker, Stoica
- 2010
(Show Context)
Citation Context ... of algorithms running on GPS. 1.1 Bulk Synchronous Graph Processing The computational framework introduced by Pregel and used by GPS is based on the Bulk Synchronous Parallel (BSP) computation model =-=[Val90]-=-. At the beginning of the computation, the vertices of the graph are distributed across compute nodes. Computation consists of iterations called supersteps. In each superstep, analogous to the map() a... |

1 | Programming Guide. https://github.com/mesos/spark/wiki/ Bagel-Programming-Guide/. 28 [BHBE10] [BP98] [BRSV11] [BV04] Paolo Boldi - Bagel |

1 | DG04] [ELZ+ 10] [GIR] [GOL] Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified data processing on large clusters - Bae, Fox - 2010 |

1 | Compiling Green-Marl into GPS
- Hong, Salihoglu, et al.
- 2012
(Show Context)
Citation Context ...m comparably with direct GPS implementations in terms of run-time and network I/O. Further details of Green-Marl, our compiler from Green-Marl to GPS, and our performance experiments, can be found in =-=[20]-=-. 7. RELATEDWORK There are several classes of systems designed to do largescale graph computations: • Bulk synchronous message-passing systems: Pregel [26] introduced the first bulk synchronous distri... |

1 |
iterative data processing on large clusters
- Brin, Page
- 2010
(Show Context)
Citation Context ... on top of Hadoop, in which the programmer can express a graph algorithm as a series of MapReduce jobs, each one corresponding to one iteration of the algorithm. Pegasus [KTF09], Mahout [MAH], HaLoop =-=[BHBE10]-=-, iMapReduce [ZGGW11], Surfer [CWHY10] and Twister [ELZ+10] are examples of these systems. These systems su↵er from two ineciencies that do not 25 exist in bulk synchronous message-passing systems: (... |

1 |
Engine
- Boldi, Rosa, et al.
- 1998
(Show Context)
Citation Context ...is model to Hadoop-based systems in more detail in Section 6. 1.2 Master.compute() Implementing a graph computation inside vertex.compute() is ideal for certain algorithms, such as computing PageRank =-=[BP98]-=-, finding shortest paths, or finding connected components, all of which can be performed in a fully “vertex-centric” and hence parallel fashion. However, some algorithms are a combination of vertex-ce... |

1 |
A MultiResolution Coordinate-Free Ordering for Compressing Social Networks
- Boldi, Vigna
- 2011
(Show Context)
Citation Context ...raphs we used in our experiments are specified in Table 1.1 We consider four 1These datasets were provided by “The Labaratory for Web Algorithmics” [LAW], using software packages WebGraph [BV04], LLP =-=[BRSV11]-=- and UbiCrawler [BCSV04]. 11 Name Vertices Edges Description uk-2007-d 106M 3.7B web graph of the .uk domain from 2007 (directed) uk-2007-u 106M 6.6B undirected version of uk-2007-d sk-2005-d 51M 1.9B... |

1 |
in the cloud
- Dean
- 2010
(Show Context)
Citation Context ...mmer can express a graph algorithm as a series of MapReduce jobs, each one corresponding to one iteration of the algorithm. Pegasus [KTF09], Mahout [MAH], HaLoop [BHBE10], iMapReduce [ZGGW11], Surfer =-=[CWHY10]-=- and Twister [ELZ+10] are examples of these systems. These systems su↵er from two ineciencies that do not 25 exist in bulk synchronous message-passing systems: (1) The input graph, which does not cha... |

1 | and Geo↵rey Fox. Twister: a runtime for iterative mapreduce - comgoldenorb - 2010 |

1 |
http://incubator.apache.org/giraph
- orghama
(Show Context)
Citation Context ...raph Processing System, which has drawn from Google’s Pregel. In addition to being open-source, GPS has three new features that do not exist in Pregel, nor in an alternative open-source system Giraph =-=[GIR]-=- (discussed further in Section 5): 1. Only “vertex-centric” algorithms can be implemented easily and eciently with the Pregel API. The GPS API has an extension that enables ecient implementation of ... |

1 |
Tsourakakis, The Laboratory for Web Algorithmics
- Low, Gonzalez, et al.
- 2009
(Show Context)
Citation Context ...enters(G) 5 assignEachVertexToClosestClusterCenter(G, clusterCenters) 6 numEdgesCrossing = countNumEdgesCrossingClusters(G) Figure 3: A simple k-means like graph clustering algorithm. Example 2.1 HCC =-=[KTF09]-=- is an algorithm to find the weakly connected components of an undirected graph: First, every vertex sets its value to its own ID. Then, in iterations, vertices set their values to the minimum value a... |

1 |
Learning
- Grzegorz
(Show Context)
Citation Context ...tiple runs varied by only a very small margin. The graphs we used in our experiments are specified in Table 1.1 We consider four 1These datasets were provided by “The Labaratory for Web Algorithmics” =-=[LAW]-=-, using software packages WebGraph [BV04], LLP [BRSV11] and UbiCrawler [BCSV04]. 11 Name Vertices Edges Description uk-2007-d 106M 3.7B web graph of the .uk domain from 2007 (directed) uk-2007-u 106M ... |

1 | not-so-foreign - comxslogicphoebus - 2008 |

1 |
Streaming Graph
- Stanton, Kliot
(Show Context)
Citation Context ...ing scheme on performance. There are previous studies on the performance e↵ects of di↵erent partitionings of graphs on other systems. [HAR11] shows that by partitioning Resource Description Framework =-=[RDF04]-=- (RDF) data with METIS and then “intelligently” replicating certain tuples, SPARQL [SPA06] query 26 run-times can be improved significantly over random partitioning. We study the e↵ects of partitionin... |

1 |
for Large Distributed Graphs
- Query
- 2011
(Show Context)
Citation Context ...on. (2) Checking for the convergence criterion may require additional MapReduce jobs. • Asynchronous systems: GPS supports only bulk synchronous graph processing. GraphLab [LGK+10] and Signal-Collect =-=[SBC10]-=- support asynchronous vertex-centric graph processing. An advantage of asynchronous computation over bulk synchronous computation is that fast workers do not have to wait for slow workers. However, pr... |

1 |
4. http://www.w3.org/TR/rdf-sparql-query
- Thusoo, Sarma, et al.
- 2006
(Show Context)
Citation Context ... query 26 run-times can be improved significantly over random partitioning. We study the e↵ects of partitioning under batch algorithms, whereas SPARQL queries consist of short path-finding workloads. =-=[SK11]-=- develops a heuristic to partition the graph across machines during the initial loading phase. They study the reduction in the number of edges crossing machines and run-time improvements on Spark when... |

1 |
Hive- A Warehousing Solution Over a Map-Reduce
- Wycko↵, Murthy
(Show Context)
Citation Context ...t partitionings of graphs on other systems. [HAR11] shows that by partitioning Resource Description Framework [RDF04] (RDF) data with METIS and then “intelligently” replicating certain tuples, SPARQL =-=[SPA06]-=- query 26 run-times can be improved significantly over random partitioning. We study the e↵ects of partitioning under batch algorithms, whereas SPARQL queries consist of short path-finding workloads. ... |

1 | Anthony - Valiant - 2009 |

1 | Spark - Zhang, Gao, et al. - 2010 |