Abstract. The emerging discipline of algorithm engineering has primarily focussed on transforming pencil-and-paper sequential algorithms into robust, efficient, well tested, and easily used implementations. As parallel computing becomes ubiquitous, we need to extend algorithm engineering techniques to parallel computation. Such an extension adds significant complications. After a short review of algorithm engineering achievements for sequential computing, we review the various complications caused by parallel computing, present some examples of successful efforts, and give a personal view of possible future research. 1
|
6327
|
Introduction to algorithms
– Cormen, Leiserson, et al.
- 1990
|
|
1011
|
A Bridging Model for Parallel Computation
– Valiant
- 1990
|
|
990
|
Performance Fortran Forum. High Performance Fortran Language Specification
– High
- 1993
|
|
862
|
Condor - A Hunter of Idle Workstations
– Litzkow, Livny, et al.
- 1988
|
|
587
|
An Introduction to Parallel Algorithms
– J'aJ'a
- 1992
|
|
554
|
A high-performance, portable implementation of the MPI message passing interface standard
– Gropp, Lusk, et al.
- 1996
|
|
498
|
The nas parallel benchmarks
– Bailey, Barton, et al.
- 1991
|
|
467
|
Eicken. LogP: Towards a realistic model of parallel computation
– Culler, Karp, et al.
- 1993
|
|
432
|
The input/output complexity of sorting and related problems
– Aggarwal, Vitter
- 1988
|
|
388
|
LAPACK Users' Guide
– Anderson, Bai, et al.
- 1992
|
|
367
|
The SGI Origin: a ccNUMA highly scalable server
– Laudon, Lenoski
- 1997
|
|
300
|
FFTW: An adaptive software architecture for the FFT
– Frigo, Johnson
- 1998
|
|
261
|
Algorithms for parallel memory I: Two-level memories
– Vitter, Shriver
- 1994
|
|
258
|
ScaLAPACK Users' Guide
– Blackford, Choi, et al.
- 1997
|
|
249
|
Automatically tuned linear algebra software
– Whaley, Dongarra
- 1998
|
|
248
|
BEOWULF: A parallel workstation for scienti computation
– Sterling, Savarese, et al.
- 1995
|
|
202
|
LogGP: Incorporating Long Messages into the LogP Model - One Step Closer Towards a Realistic Model for Parallel Computation
– Alexandrov, Ionescu, et al.
- 1995
|
|
185
|
A comparison of sorting algorithms for the connection machine cm-2
– Blelloch, Leiserson, et al.
- 1991
|
|
159
|
ScaLAPACK: A scalable linear algebra library for distributed memory concurrent computers
– Choi, Dongarra, et al.
- 1992
|
|
156
|
Synthesis of Parallel Algorithms
– Reif
- 1993
|
|
137
|
Scalable Performance Analysis: The Pablo Performance Analysis Environment
– Reed, Aydt, et al.
- 1993
|
|
112
|
A probabilistic algorithm for k-SAT and constraint satisfaction problems
– Schoning
- 1999
|
|
87
|
BSPlib: The BSP programming library
– Hill, McColl, et al.
- 1998
|
|
68
|
Algorithms for parallel memory II: Hierarchical multilevel memories
– Vitter, Shriver
- 1994
|
|
66
|
Starfire: Extending the SMP Envelope
– Charlesworth
- 1998
|
|
63
|
The network architecture of the Connection Machine CM-5
– Pierre, Wong, et al.
- 1992
|
|
59
|
How to Build a Beowulf: a Guide to the Implementation and Application of PC Clusters
– Sterling, Salmon, et al.
- 1999
|
|
58
|
auf der Heide. Truly efficient parallel algorithms: c-optimal multisearch for an extension of the BSP model
– Baumker, Dittrich, et al.
- 1995
|
|
51
|
SIMPLE: A methodology for programming high performance algorithms on clusters of symmetric multiprocessors (SMPs
– Bader, JáJá
- 1999
|
|
51
|
A quantitative comparison of parallel computation models
– Juurlink, Wijshoff
- 1996
|
|
46
|
VAMPIR: Visualization and analysis of MPI resources
– Nagel, Arnold, et al.
- 1996
|
|
45
|
Towards a theory of cache-efficient algorithms
– Sen, Chatterjee
- 2000
|
|
40
|
Submachine locality in the bulk synchronous setting
– Torre, Kruskal
- 1996
|
|
39
|
Practical Parallel Algorithms for Dynamic Data Redistribution, Median Finding, and Selection
– Bader, J'aJ'a
- 1995
|
|
36
|
Twelve ways to fool the masses when giving performance results on parallel computers
– Bailey
- 1991
|
|
34
|
Fast parallel sorting under LogP: From theory to practice
– Culler, Dusseau, et al.
- 1993
|
|
32
|
Parallel algorithms for personalized communication and sorting with an experimental study
– Helman, Bader, et al.
- 1996
|
|
29
|
SKaMPI: A detailed, accurate MPI benchmark
– Reussner, Sanders, et al.
- 1998
|
|
27
|
Practical parallel algorithms for personalized communication and integer sorting
– Bader, Helman, et al.
- 1996
|
|
27
|
Parallel Algorithms for Image Histogramming and Connected Components with an Experimental Study
– Bader, J'aJ'a
- 1994
|
|
27
|
The Paderborn University BSP (PUB) Library - Design, Implementation and Performance
– Bonorden, Juurlink, et al.
- 1999
|
|
22
|
Using PRAM algorithms on a uniform-memory-access shared-memory architecture
– Bader, Illendula, et al.
- 2001
|
|
21
|
An experimental analysis of parallel sorting algorithms
– Blelloch, Leiserson, et al.
- 1998
|
|
21
|
Accessing multiple sequences through set associative caches
– Sanders
- 1999
|
|
20
|
Designing Practical Efficient Algorithms for Symmetric Multiprocessors
– Helman, JáJá
- 1999
|
|
20
|
A New Deterministic Parallel Sorting Algorithm With an Experimental Evaluation
– Helman, JáJá, et al.
- 1996
|
|
20
|
High-performance algorithm engineering for computational phylogenetics
– Moret, Bader, et al.
- 2001
|
|
19
|
Parallel algorithms for image enhancement and segmentation by region growing with an experimental study
– Bader, JáJá, et al.
- 1996
|
|
19
|
Runtime support for multi-tier programming of block-structured applications on smp clusters
– Fink, Baden
- 1997
|
|
19
|
Inversion medians outperform breakpoint medians in phylogeny reconstruction from gene-order data
– Moret, Siepel, et al.
- 2002
|