Results 1  10
of
15
Pregel: A system for largescale graph processing
 IN SIGMOD
, 2010
"... Many practical computing problems concern large graphs. Standard examples include the Web graph and various social networks. The scale of these graphs—in some cases billions of vertices, trillions of edges—poses challenges to their efficient processing. In this paper we present a computational model ..."
Abstract

Cited by 496 (0 self)
 Add to MetaCart
Many practical computing problems concern large graphs. Standard examples include the Web graph and various social networks. The scale of these graphs—in some cases billions of vertices, trillions of edges—poses challenges to their efficient processing. In this paper we present a computational model suitable for this task. Programs are expressed as a sequence of iterations, in each of which a vertex can receive messages sent in the previous iteration, send messages to other vertices, and modify its own state and that of its outgoing edges or mutate graph topology. This vertexcentric approach is flexible enough to express a broad set of algorithms. The model has been designed for efficient, scalable and faulttolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier. Distributionrelated details are hidden behind an abstract API. The result is a framework for processing large graphs that is expressive and easy to program.
CHALLENGES IN PARALLEL GRAPH PROCESSING
 PARALLEL PROCESSING LETTERS
, 2006
"... Graph algorithms are becoming increasingly important for solving many problems in scientific computing, data mining and other domains. As these problems grow in scale, parallel computing resources are required to meet their computational and memory requirements. Unfortunately, the algorithms, softwa ..."
Abstract

Cited by 45 (4 self)
 Add to MetaCart
Graph algorithms are becoming increasingly important for solving many problems in scientific computing, data mining and other domains. As these problems grow in scale, parallel computing resources are required to meet their computational and memory requirements. Unfortunately, the algorithms, software, and hardware that have worked well for developing mainstream parallel scientific applications are not necessarily effective for largescale graph problems. In this paper we present the interrelationships between graph problems, software, and parallel hardware in the current state of the art and discuss how those issues present inherent challenges in solving largescale graph problems. The range of these challenges suggests a research agenda for the development of scalable highperformance software for graph problems.
Software engineering for multicore systems: an experience report
 In IWMSE ’08: Proceedings of the 1st international workshop on Multicore software engineering
, 2008
"... The emergence of inexpensive parallel computers powered by multicore chips combined with stagnating clock rates raises new challenges for software engineering. As future performance improvements will not come “for free ” from increased clock rates, performance critical applications will need to be ..."
Abstract

Cited by 20 (9 self)
 Add to MetaCart
(Show Context)
The emergence of inexpensive parallel computers powered by multicore chips combined with stagnating clock rates raises new challenges for software engineering. As future performance improvements will not come “for free ” from increased clock rates, performance critical applications will need to be parallelized. However, little is known about the engineering principles for parallel generalpurpose applications. This paper presents an experience report with four diverse case studies on multicore software development for generalpurpose applications. They were programmed in different languages and benchmarked on several multicore computers. Empirical findings include: 1) Multicore computers deliver: Real speedups are achievable, albeit with significant programming effort and speedups that are typically lower than the number of cores employed; 2) Massive refactoring of sequential programs is required, sometimes at several levels. Special tools for parallelization refactorings appear to be an important area of research; 3) Autotuning is indispensable, as manually tuning thread assignment, number of pipeline stages, size of data partitions and other parameters is difficult and error prone; 4) Architectures that encompass several parallel components are poorly understood. Tuneable architectural patterns with parallelism at several levels need to be discovered.
PHAST: hardwareaccelerated shortest path trees
 J. PARALLEL DISTRIB. COMPUT
, 2013
"... We present a novel algorithm to solve the nonnegative singlesource shortest path problem on road networks and graphs with low highway dimension. After a quick preprocessing phase, we can compute all distances from a given source in the graph with essentially a linear sweep over all vertices. Becaus ..."
Abstract

Cited by 20 (4 self)
 Add to MetaCart
(Show Context)
We present a novel algorithm to solve the nonnegative singlesource shortest path problem on road networks and graphs with low highway dimension. After a quick preprocessing phase, we can compute all distances from a given source in the graph with essentially a linear sweep over all vertices. Because this sweep is independent of the source, we are able to reorder vertices in advance to exploit locality. Moreover, our algorithm takes advantage of features of modern CPU architectures, such as SSE and multiple cores. Compared to Dijkstra’s algorithm, our method needs fewer operations, has better locality, and is better able to exploit parallelism at multicore and instruction levels. We gain additional speedup when implementing our algorithm on a GPU, where it is up to three orders of magnitude faster than Dijkstra’s algorithm on a highend CPU. This makes applications based on allpairs shortestpaths practical for continentalsized road networks. Several algorithms, such as computing the graph diameter, arc flags, or exact reaches, can be greatly accelerated by our method.
Experimental Study on SpeedUp Techniques for Timetable Information Systems
 PROCEEDINGS OF THE 7TH WORKSHOP ON ALGORITHMIC APPROACHES FOR TRANSPORTATION MODELING, OPTIMIZATION, AND SYSTEMS (ATMOS 2007
, 2007
"... During the last years, impressive speedup techniques for DIJKSTRA’s algorithm have been developed. Unfortunately, recent research mainly focused on road networks. However, fast algorithms are also needed for other applications like timetable information systems. Even worse, the adaption of recentl ..."
Abstract

Cited by 18 (10 self)
 Add to MetaCart
(Show Context)
During the last years, impressive speedup techniques for DIJKSTRA’s algorithm have been developed. Unfortunately, recent research mainly focused on road networks. However, fast algorithms are also needed for other applications like timetable information systems. Even worse, the adaption of recently developed techniques to timetable information is more complicated than expected. In this work, we check whether results from road networks are transferable to timetable information. To this end, we present an extensive experimental study of the most prominent speedup techniques on different types of inputs. It turns out that recently developed techniques are much slower on graphs derived from timetable information than on road networks. In addition, we gain amazing insights into the behavior of speedup techniques in general.
An experimental study of a parallel shortest path algorithm for solving largescale graph instances
 Ninth Workshop on Algorithm Engineering and Experiments (ALENEX 2007)
, 2007
"... We present an experimental study of the single source shortest path problem with nonnegative edge weights (NSSP) on largescale graphs using the $\Delta$stepping parallel algorithm. We report performance results on the Cray MTA2, a multithreaded parallel computer. The MTA2 is a highend shared m ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
We present an experimental study of the single source shortest path problem with nonnegative edge weights (NSSP) on largescale graphs using the $\Delta$stepping parallel algorithm. We report performance results on the Cray MTA2, a multithreaded parallel computer. The MTA2 is a highend shared memory system offering two unique features that aid the efficient parallel implementation of irregular algorithms: the ability to exploit finegrained parallelism, and lowoverhead synchronization primitives. Our implementation exhibits remarkable parallel speedup when compared with competitive sequential algorithms, for lowdiameter sparse graphs. For instance, $\Delta$stepping on a directed scalefree graph of 100 million vertices and 1 billion edges takes less than ten seconds on 40 processors of the MTA2, with a relative speedup of close to 30. To our knowledge, these are the first performance results of a shortest path problem on realistic graph instances in the order of billions of vertices and edges.
RoundBased Public Transit Routing
 In ALENEX
, 2012
"... We study the problem of computing all Paretooptimal journeys in a dynamic public transit network for two criteria: arrival time and number of transfers. Existing algorithms consider this as a graph problem, and solve it using variants of Dijkstra’s algorithm. Unfortunately, this leads to either h ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
(Show Context)
We study the problem of computing all Paretooptimal journeys in a dynamic public transit network for two criteria: arrival time and number of transfers. Existing algorithms consider this as a graph problem, and solve it using variants of Dijkstra’s algorithm. Unfortunately, this leads to either high query times or suboptimal solutions. We take a different approach. We introduce RAPTOR, our novel roundbased public transit router. Unlike previous algorithms, it is not Dijkstrabased, looks at each route (such as a bus line) in the network at most once per round, and can be made even faster with simple pruning rules and parallelization using multiple cores. Because it does not rely on preprocessing, RAPTOR works in fully dynamic scenarios. Moreover, it can be easily extended to handle flexible departure times or arbitrary additional criteria, such as fare zones. When run on London’s complex public transportation network, RAPTOR computes all Paretooptimal journeys between two random locations an order of magnitude faster than previous approaches, which easily enables interactive applications. 1
Employing transactional memory and helper threads to speedup dijkstra’s algorithm
 In ICPP
, 2009
"... Abstract—In this paper we work on the parallelization of the inherently serial Dijkstra’s algorithm on modern multicore platforms. Dijkstra’s algorithm is a greedy algorithm that computes Single Source Shortest Paths for graphs with nonnegative edges and is based on the iterative extraction of nod ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Abstract—In this paper we work on the parallelization of the inherently serial Dijkstra’s algorithm on modern multicore platforms. Dijkstra’s algorithm is a greedy algorithm that computes Single Source Shortest Paths for graphs with nonnegative edges and is based on the iterative extraction of nodes from a priority queue. This property limits the explicit parallelism of the algorithm and any attempt to utilize the remaining parallelism results in significant slowdowns due to synchronization overheads. To deal with these problems, we employ the concept of Helper Threads (HT) to extract parallelism on a nontraditional fashion and Transactional Memory (TM) to efficiently orchestrate the concurrent threads ’ accesses to shared data structures. Results demonstrate that the proposed implementation is able to achieve performance speedups (reaching up to 1.84 for 14 threads), indicating that the two paradigms could be efficiently combined. I.
Advanced Shortest Paths Algorithms on a MassivelyMultithreaded Architecture
"... We present a study of multithreaded implementations of Thorup’s algorithm for solving the Single Source Shortest Path (SSSP) problem for undirected graphs. Our implementations leverage the fledgling MultiThreaded Graph Library (MTGL) to perform operations such as finding connected components and ext ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
We present a study of multithreaded implementations of Thorup’s algorithm for solving the Single Source Shortest Path (SSSP) problem for undirected graphs. Our implementations leverage the fledgling MultiThreaded Graph Library (MTGL) to perform operations such as finding connected components and extracting induced subgraphs. To achieve good parallel performance from this algorithm, we deviate from several theoretically optimal algorithmic steps. In this paper, we present simplifications that perform better in practice, and we describe details of the multithreaded implementation that were necessary for scalability. We study synthetic graphs that model unstructured networks, such as social networks and economic transaction networks. Most of the recent progress in shortest path algorithms relies on structure that these networks do not have. In this work, we take a step back and explore the synergy between an elegant theoretical algorithm and an elegant computer architecture. Finally, we conclude with a prediction that this work will become relevant to shortest path computation on structured networks. 1.
Early Experiences on Accelerating Dijkstra’s Algorithm Using Transactional Memory
"... In this paper we use Dijkstra’s algorithm as a challenging, hard to parallelize paradigm to test the efficacy of several parallelization techniques in a multicore architecture. We consider the application of Transactional Memory (TM) as a means of concurrent accesses to shared data and compare its p ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
In this paper we use Dijkstra’s algorithm as a challenging, hard to parallelize paradigm to test the efficacy of several parallelization techniques in a multicore architecture. We consider the application of Transactional Memory (TM) as a means of concurrent accesses to shared data and compare its performance with straightforward parallel versions of the algorithm based on traditional synchronization primitives. To increase the granularity of parallelism and avoid excessive synchronization, we combine TM with Helper Threading (HT). Our simulation results demonstrate that the straightforward parallelization of Dijkstra’s algorithm with traditional locks and barriers has, as expected, disappointing performance. On the other hand, TM by itself is able to provide some performance improvement in several cases, while the version based on TM and HT exhibits a significant performance improvement that can reach up toaspeedup of 1.46.