Abstract:
Speculative parallel execution of non-analyzable codes on Distributed Shared-Memory (DSM) multiprocessors is challenging due to the long-latency and distribution involved. However, such an approach may well be the best way of speeding up codes whose dependences can not be compiler analyzed. In previous work, we suggested executing the loop speculatively in parallel and adding extensions to the memory hierarchy hardware to detect any dependence violation. If the violation occurs, execution is interrupted, the variables are restored, and the code is re-executed serially. The scheme is targeted to loops where most of the invocations turn out to run in parallel without any dependence violation. In this paper, we present a more advanced scheme for the speculative parallel execution of loops that have a modest number of cross-iteration dependences. In this case, when a dependence violation is detected, we locally repair the state. Then, we restart parallel execution from that point on. We call the general algorithm the Sliding Commit algorithm. If the loop dependences are of the special form of reduction, we use a specialized algorithm. Simulations indicate significant speedups relative to sequential execution. Finally, we propose hardware for optimizing reductions and obtain very good experimental results.
Citations
|
1206
|
Introduction to Parallel Algorithms and Architectures: Arrays
– Leighton
- 1992
|
|
401
|
Supercompilers for Parallel and Vector Computers
– Zima, Chapman
- 1991
|
|
338
|
The Directory-Based Cache Coherence Protocol for the Dash Multiprocessor
– Lenoski
- 1990
|
|
212
|
Maximizing Multiprocessor Performance with the SUIF
– Hall, Anderson, et al.
- 1996
|
|
177
|
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization
– Steffan, Mowry
- 1998
|
|
157
|
The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization
– Rauchwerger, Padua
- 1995
|
|
141
|
Speculative versioning cache
– Gopal, Vijaykumar, et al.
- 1998
|
|
102
|
Experience in the automatic parallelization of four perfect benchmark programs
– Hoeflinger, Li, et al.
- 1992
|
|
80
|
et al., “The PERFECT Club Benchmarks: Effective Performance Evaluation of Supercomputers
– Berry
- 1988
|
|
62
|
Simulation of Multiprocessors: Accuracy and Performance
– Goldschmidt
- 1993
|
|
59
|
Runtime compilation methods for multicomputers
– Wu, Saltz, et al.
- 1991
|
|
56
|
Improving the performance of runtime parallelization
– Leung, Zahorjan
- 1993
|
|
48
|
Advanced program restructuring for high-performance computers with Polaris
– Blume, Doallo, et al.
- 1996
|
|
48
|
Software and Hardware for Exploiting Speculative Parallelism with a Multiprocessor
– Oplinger, Heine, et al.
- 1997
|
|
38
|
Hardware for Speculative Run-Time Parallelization in Distributed Shared-Memory Multiprocessors
– Zhang, Rauchwerger, et al.
- 1998
|
|
37
|
Hardware and Software Support for Speculative Execution of Sequential Binaries on a Chip-Multiprocessor
– Krishnan, Torrellas
- 1998
|
|
37
|
Efficient Parallel Algorithms for Graph Problems
– Kruskal
- 1986
|
|
36
|
An Efficient Algorithm for the Run-time Parallelization of DOACROSS Loops
– Chen, Torrellas, et al.
|
|
21
|
HPF-2 Scope of Activities and Motivating Applications
– Duff, Schreiber, et al.
- 1994
|
|
18
|
A Scalable Method for Run-Time Loop Parallelization
– Rauchwerger, Amato, et al.
- 1995
|
|
16
|
Architectural Implications of a Family of Irregular Applications
– O’Hallaron, Shewchuk, et al.
- 1998
|
|
14
|
On the Automatic Parallelization of Sparse and Irregular Fortran Codes
– Asenjo, Gutierrez, et al.
- 1996
|
|
11
|
et al. The ParaScope parallel programming environment
– Cooper
- 1993
|
|
3
|
Speculative Parallel Execution of Loops with with CrossIteration Dependences in DSM Multiprocessors
– Zhang, Rauchwerger, et al.
- 1998
|