MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  1 The LRPD Test: Speculative Run--Time Parallelization of Loops with Privatization and Reduction Parallelization

Download:
Download as a PDF | Download as a PS
by Lawrence Rauchwerger, David A. Padua
http://www.cs.tamu.edu/faculty/rwerger/pubs/i3e.ps.gz
Add To MetaCart

Abstract:

Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. As parallelizable loops arise frequently in practice, we advocate a novel framework for their identification: speculatively execute the loop as a doall, and apply a fully parallel data dependence test to determine if it had any cross--iteration dependences; if the test fails, then the loop is re--executed serially. Since, from our experience, a significant amount of the available parallelism in Fortran programs can be exploited by loops transformed through privatization and reduction parallelization, our methods can speculatively apply these transformations and then check their validity at run--time. Another important contribution of this paper is a novel method for reduction recognition which goes beyond syntactic pattern matching: it detects at run--time if the values stored in an array participate in a reduction operation, even if they are transferred through private variables and/or are affected by statically unpredictable control flow. We present experimental results on loops from the PERFECT Benchmarks which substantiate our claim that these techniques can yield significant speedups which are often superior to those obtainable by inspector/executor methods.

Citations

1206 Introduction to Parallel Algorithms and Architectures: Arrays – Leighton - 1992
693 Virtual time – Jefferson - 1985
401 Supercompilers for Parallel and Vector Computers – Zima, Chapman - 1991
344 Dependence Analysis for Supercomputing – Banerjee - 1988
296 Advanced compiler optimizations for supercomputers – Padua, Wolfe - 1986
218 Dependence graphs and compiler optimizations – Kuck, Kuhn, et al. - 1981
214 Conversion of control dependence to data dependence – ALLEN, KENNEDY, et al.
213 The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers – Berry, Chen, et al. - 1989
157 The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization – Rauchwerger, Padua - 1995
133 The program dependence web: a representation supporting control, data-, and demand-driven interpretation of imperative languages – Ottenstein, Ballance, et al. - 1990
123 Automatic Array Privatization – Tu, Padua - 1993
115 RunTime Parallelization and Scheduling of Loops – Saltz, Mirchandaney, et al. - 1991
114 Performance analysis of parallelizing compilers on the Perfect Benchmarks programs – Blume, Eigenmann - 1992
102 Experience in the automatic parallelization of four perfect benchmark programs – Hoeflinger, Li, et al. - 1992
92 An empirical comparison of monitoring algorithms for access anomaly detection – Dinning, Schonberg - 1990
80 Compiler algorithms for synchronization – Midkiff, Padua - 1987
72 Array Privatization for Parallel Execution of Loops – Li - 1992
63 On-the-fly detection of data races for programs with nested fork-join parallelism – Mellor-Crummey - 1991
61 Detecting nondeterminacy in parallel programs – Emrath, Ghosh, et al. - 1992
59 Runtime compilation methods for multicomputers – Wu, Saltz, et al. - 1991
58 Compiler Optimizations for Enhancing Parallelism and Their Impact on the Architecture Design – Polychronopoulos - 1988
58 A Scheme to Enforce Data Dependence on Large Multiprocessor Systems – Zhu, Yew - 1987
56 Improving the performance of runtime parallelization – Leung, Zahorjan - 1993
48 The privatizing doall test: A run-time technique for doall loop identification and array privatization – Rauchwerger, Padua - 1994
47 On-the-fly detection of access anomalies – Schonberg - 1991
45 Data Dependence and Data-Flow Analysis of Arrays – Maydan, Amarasinghe, et al. - 1992
45 Optimizing Compilers for Supercomputers – Wolfe - 1989
40 Massively Parallel Methods for Engineering and Scientific Problems – Camp, Plimpton, et al. - 1994
39 Array privatization for shared and distributed memory machines – Tu, Padua - 1992
38 Hardware for Speculative Run-Time Parallelization in Distributed Shared-Memory Multiprocessors – Zhang, Rauchwerger, et al. - 1998
37 Efficient Parallel Algorithms for Graph Problems – Kruskal - 1986
36 An Efficient Algorithm for the Run-time Parallelization of DOACROSS Loops – Chen, Torrellas, et al.
33 Tools for the efficient development of efficient parallel programs – Nudler, Rudolph - 1986
30 Automatic generation of nested, fork-join parallelism – Burke, Cytron, et al. - 1989
30 Compile-time support for efficient data race detection in shared-memory parallel programs – Mellor-Crummey - 1993
30 Parallelizing While Loops for Multiprocessor Systems – Rauchwerger, Padua - 1995
26 A manual for PARTI runtime primitives – Berryman, Saltz - 1990
21 An approach to synchronization of parallel computing – Krothapalli, Sadayappan - 1988
18 A Scalable Method for Run-Time Loop Parallelization – Rauchwerger, Amato, et al. - 1995
18 The preprocessed doacross loop – Saltz, Mirchandaney - 1991
13 The doconsider loop – Saltz, Mirchandaney, et al. - 1989
6 Debugging fortran on a shared-memory machine – Allen, Padua - 1987
6 Speculative Parallel Execution of Loops with Cross-Iteration Dependences in DSM Multiprocessors – Zhang, Rauchwerger, et al. - 1999
5 Time-stamping algorithms for parallelization of loops at run-time – Xu, Chaudhary - 1997
4 GSA based demand-driven symbolic analysis – Tu, Padua - 1994
3 LSI Circuit Simulation on Vector Computers – Vladimirescu - 1982
3 Effects of Parallelism Degree on Runtime Parallelism of Loops – Xu - 1998