MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  A Unified Approach to Speculative Parallelization of Loops in DSM Multiprocessors (1998) [2 citations — 2 self]

Download:
Download as a PDF | Download as a PS
by Ye Zhang Lawrence Rauchwerger
http://www.cs.tamu.edu/faculty/rwerger/pubs/unfied_tr1546.ps.gz
Add To MetaCart

Abstract:

Speculative parallel execution of statically non-analyzable codes on Distributed Shared-Memory (DSM) multiprocessors is challenging because of the long latency and memory distribution present. However, such an approach may well be the best way of speeding up codes whose dependences can not be compiler analyzed. In this paper, we have extended past work by proposing a hardware scheme for the speculative parallel execution of loops that have a modest number of cross-iteration dependences. In this case, when a dependence violation is detected, we locally repair the state. Then, depending on the situation, we either re-execute one out-of-order iteration or, restart parallel execution from that point on. The general algorithm, called the Unified Privatization and Reduction algorithm (UPAR), privatizes, on demand, at cache-line level, executes reductions in parallel, merges the last values and partial results of reductions on-the-fly with minimum residual work at loop end. UPAR allows for completely dynamic scheduling and does not get slowed down if the working set of an iteration is larger than the cache size. Simulations indicate good speedups relative to sequential execution. The hardware support for reduction optimizations brings, on average, 50 % performance improvement and can be used both in speculative and normal execution.

Citations

1206 Introduction to Parallel Algorithms and Architectures: Arrays – Leighton - 1992
401 Supercompilers for Parallel and Vector Computers – Zima, Chapman - 1991
344 Dependence Analysis for Supercomputing – Banerjee - 1988
338 The Directory-Based Cache Coherence Protocol for the Dash Multiprocessor – Lenoski - 1990
296 Advanced compiler optimizations for supercomputers – Padua, Wolfe - 1986
177 The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization – Steffan, Mowry - 1998
166 MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors – Veenstra, Fowler - 1994
157 The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization – Rauchwerger, Padua - 1995
141 Speculative versioning cache – Gopal, Vijaykumar, et al. - 1998
114 Performance analysis of parallelizing compilers on the Perfect Benchmarks programs – Blume, Eigenmann - 1992
102 Experience in the automatic parallelization of four perfect benchmark programs – Hoeflinger, Li, et al. - 1992
80 et al., “The PERFECT Club Benchmarks: Effective Performance Evaluation of Supercomputers – Berry - 1988
48 Advanced program restructuring for high-performance computers with Polaris – Blume, Doallo, et al. - 1996
45 Optimizing Compilers for Supercomputers – Wolfe - 1989
40 Massively Parallel Methods for Engineering and Scientific Problems – Camp, Plimpton, et al. - 1994
39 LCM: Memory system support for parallel language implementation – LARUS, RICHARDS, et al. - 1994
38 Hardware for Speculative Run-Time Parallelization in Distributed Shared-Memory Multiprocessors – Zhang, Rauchwerger, et al. - 1998
37 Hardware and Software Support for Speculative Execution of Sequential Binaries on a Chip-Multiprocessor – Krishnan, Torrellas - 1998
37 Efficient Parallel Algorithms for Graph Problems – Kruskal - 1986
30 Kunle Olukotun. Data Speculation Support for a Chip Multiprocessor – Hammond, Willey - 1998
26 Level 3 Basic Linear Algebra Subprograms for sparse matrices: a user level interface – Marrone, Radicati, et al. - 1997
21 HPF-2 Scope of Activities and Motivating Applications – Duff, Schreiber, et al. - 1994
18 A Scalable Method for Run-Time Loop Parallelization – Rauchwerger, Amato, et al. - 1995
16 Architectural Implications of a Family of Irregular Applications – O’Hallaron, Shewchuk, et al. - 1998
14 On the Automatic Parallelization of Sparse and Irregular Fortran Codes – Asenjo, Gutierrez, et al. - 1996
8 Compiler Technology for Machine-Independent Parallel Programming – Kennedy - 1994
6 Speculative Parallel Execution of Loops with Cross-Iteration Dependences in DSM Multiprocessors – Zhang, Rauchwerger, et al. - 1999