MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Speculative Parallel Execution of Loops with Cross-Iteration Dependences in DSM Multiprocessors (1999) [6 citations — 5 self]

Download:
Download as a PDF | Download as a PS
by Ye Zhang Lawrence Rauchwerger
In Proc. of HPCA-5
http://www.cs.tamu.edu/faculty/rwerger/pubs/hpca5_sub_tr1536.ps.gz
Add To MetaCart

Abstract:

Speculative parallel execution of non-analyzable codes on Distributed Shared-Memory (DSM) multiprocessors is challenging due to the long-latency and distribution involved. However, such an approach may well be the best way of speeding up codes whose dependences can not be compiler analyzed. In previous work, we suggested executing the loop speculatively in parallel and adding extensions to the memory hierarchy hardware to detect any dependence violation. If the violation occurs, execution is interrupted, the variables are restored, and the code is re-executed serially. The scheme is targeted to loops where most of the invocations turn out to run in parallel without any dependence violation. In this paper, we present a more advanced scheme for the speculative parallel execution of loops that have a modest number of cross-iteration dependences. In this case, when a dependence violation is detected, we locally repair the state. Then, we restart parallel execution from that point on. We call the general algorithm the Sliding Commit algorithm. If the loop dependences are of the special form of reduction, we use a specialized algorithm. Simulations indicate significant speedups relative to sequential execution. Finally, we propose hardware for optimizing reductions and obtain very good experimental results.

Citations

1206 Introduction to Parallel Algorithms and Architectures: Arrays – Leighton - 1992
401 Supercompilers for Parallel and Vector Computers – Zima, Chapman - 1991
338 The Directory-Based Cache Coherence Protocol for the Dash Multiprocessor – Lenoski - 1990
212 Maximizing Multiprocessor Performance with the SUIF – Hall, Anderson, et al. - 1996
177 The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization – Steffan, Mowry - 1998
157 The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization – Rauchwerger, Padua - 1995
141 Speculative versioning cache – Gopal, Vijaykumar, et al. - 1998
102 Experience in the automatic parallelization of four perfect benchmark programs – Hoeflinger, Li, et al. - 1992
80 et al., “The PERFECT Club Benchmarks: Effective Performance Evaluation of Supercomputers – Berry - 1988
62 Simulation of Multiprocessors: Accuracy and Performance – Goldschmidt - 1993
59 Runtime compilation methods for multicomputers – Wu, Saltz, et al. - 1991
56 Improving the performance of runtime parallelization – Leung, Zahorjan - 1993
48 Advanced program restructuring for high-performance computers with Polaris – Blume, Doallo, et al. - 1996
48 Software and Hardware for Exploiting Speculative Parallelism with a Multiprocessor – Oplinger, Heine, et al. - 1997
38 Hardware for Speculative Run-Time Parallelization in Distributed Shared-Memory Multiprocessors – Zhang, Rauchwerger, et al. - 1998
37 Hardware and Software Support for Speculative Execution of Sequential Binaries on a Chip-Multiprocessor – Krishnan, Torrellas - 1998
37 Efficient Parallel Algorithms for Graph Problems – Kruskal - 1986
36 An Efficient Algorithm for the Run-time Parallelization of DOACROSS Loops – Chen, Torrellas, et al.
21 HPF-2 Scope of Activities and Motivating Applications – Duff, Schreiber, et al. - 1994
18 A Scalable Method for Run-Time Loop Parallelization – Rauchwerger, Amato, et al. - 1995
16 Architectural Implications of a Family of Irregular Applications – O’Hallaron, Shewchuk, et al. - 1998
14 On the Automatic Parallelization of Sparse and Irregular Fortran Codes – Asenjo, Gutierrez, et al. - 1996
11 et al. The ParaScope parallel programming environment – Cooper - 1993
3 Speculative Parallel Execution of Loops with with CrossIteration Dependences in DSM Multiprocessors – Zhang, Rauchwerger, et al. - 1998