An efficient parallel biconnectivity algorithm
 SIAM J. Computing
, 1985
"... Abstract. In this paper we propose a new algorithm for finding the blocks (biconnected components) of an undirected graph. A serial implementation runs in O(n + m) time and space on a graph of n vertices and m edges. A parallel implementation runs in O(log n) time and O(n + m) space using O(n + m) p ..."
Cited by 109 (5 self)
Abstract. In this paper we propose a new algorithm for finding the blocks (biconnected components) of an undirected graph. A serial implementation runs in O(n + m) time and space on a graph of n vertices and m edges. A parallel implementation runs in O(log n) time and O(n + m) space using O(n + m
Parallel Numerical Linear Algebra
, 1993
"... We survey general techniques and open problems in numerical linear algebra on parallel architectures. We first discuss basic principles of parallel processing, describing the costs of basic operations on parallel machines, including general principles for constructing efficient algorithms. We illust ..."
Cited by 773 (23 self)
We survey general techniques and open problems in numerical linear algebra on parallel architectures. We first discuss basic principles of parallel processing, describing the costs of basic operations on parallel machines, including general principles for constructing efficient algorithms. We
CommunicationEfficient Parallel Sorting
, 1996
"... We study the problem of sorting n numbers on a pprocessor bulksynchronous parallel (BSP) computer, which is a parallel multicomputer that allows for general processortoprocessor communication rounds provided each processor sends and receives at most h items in any round. We provide parallel sort ..."
Cited by 74 (5 self)
We study the problem of sorting n numbers on a pprocessor bulksynchronous parallel (BSP) computer, which is a parallel multicomputer that allows for general processortoprocessor communication rounds provided each processor sends and receives at most h items in any round. We provide parallel
Fast Parallel Algorithms for ShortRange Molecular Dynamics
 JOURNAL OF COMPUTATIONAL PHYSICS
, 1995
"... Three parallel algorithms for classical molecular dynamics are presented. The first assigns each processor a fixed subset of atoms; the second assigns each a fixed subset of interatomic forces to compute; the third assigns each a fixed spatial region. The algorithms are suitable for molecular dyn ..."
Cited by 653 (7 self)
dynamics models which can be difficult to parallelize efficiently  those with shortrange forces where the neighbors of each atom change rapidly. They can be implemented on any distributedmemory parallel machine which allows for messagepassing of data between independently executing processors
Cilk: An Efficient Multithreaded Runtime System
, 1995
"... Cilk (pronounced “silk”) is a Cbased runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk workstealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the “work” and “critical path ” of a Cilk co ..."
Cited by 763 (33 self)
Cilk (pronounced “silk”) is a Cbased runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk workstealing scheduler, both empirically and analytically. We show that on real and synthetic applications, the “work” and “critical path ” of a Cilk
LogP: Towards a Realistic Model of Parallel Computation
, 1993
"... A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding developme ..."
Cited by 560 (15 self)
the computing bandwidth, the communication bandwidth, the communication delay, and the efficiency of coupling communication and computation. Portable parallel algorithms typically adapt to the machine configuration, in terms of these parameters. The utility of the model is demonstrated through examples
Dryad: Distributed DataParallel Programs from Sequential Building Blocks
 In EuroSys
, 2007
"... Dryad is a generalpurpose distributed execution engine for coarsegrain dataparallel applications. A Dryad application combines computational “vertices ” with communication “channels ” to form a dataflow graph. Dryad runs the application by executing the vertices of this graph on a set of availa ..."
Cited by 762 (27 self)
Dryad is a generalpurpose distributed execution engine for coarsegrain dataparallel applications. A Dryad application combines computational “vertices ” with communication “channels ” to form a dataflow graph. Dryad runs the application by executing the vertices of this graph on a set
SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets
"... Companies providing cloudscale services have an increasing need to store and analyze massive data sets such as search logs and click streams. For cost and performance reasons, processing is typically done on large clusters of sharednothing commodity machines. It is imperative to develop a programm ..."
Cited by 206 (9 self)
Execution), targeted for this type of massive data analysis. The language is designed for ease of use with no explicit parallelism, while being amenable to efficient parallel execution on large clusters. SCOPE borrows several features from SQL. Data is modeled as sets of rows composed of typed columns
Efficient parallel programming in . . .
, 2009
"... The ML family of languages and LCFstyle interactive theorem proving have been closely related from their beginnings about 30 years ago. Here we report on a recent project to adapt both the Poly/ML compiler and the Isabelle theorem prover to current multicore hardware. Checking theories and proofs i ..."
Cited by 2 (1 self)
in typical Isabelle application takes minutes or hours, and users expect to make efficient use of “home machines” with 4–16 cores. Poly/ML and Isabelle are big and complex software systems that have evolved over more than two decades. Faced with the requirement to deliver a stable and efficient parallel
Efficient dispersal of information for security, load balancing, and fault tolerance
 Journal of the ACM
, 1989
"... Abstract. An Information Dispersal Algorithm (IDA) is developed that breaks a file F of length L = ( F ( into n pieces F,, 1 5 i 5 n, each of length ( F, 1 = L/m, so that every m pieces suffice for reconstructing F. Dispersal and reconstruction are computationally efficient. The sum of the lengths ..."
Cited by 561 (1 self)
Abstract. An Information Dispersal Algorithm (IDA) is developed that breaks a file F of length L = ( F ( into n pieces F,, 1 5 i 5 n, each of length ( F, 1 = L/m, so that every m pieces suffice for reconstructing F. Dispersal and reconstruction are computationally efficient. The sum of the lengths
