Download:
|
by Vikram S. Adve, Ewa Deelman, Rizos Sakellariou
ftp://pcl.cs.ucla.edu/pub/papers/jpdc_adve.ps
Add To MetaCart
Abstract:
In this paper, we propose and evaluate practical, automatic techniques that exploit compiler analysis to facilitate simulation of very large message-passing systems. We use compiler techniques and a compiler-synthesized static task graph model to identify the subset of the computations whose values have no significant effect on the performance of the program, and to generate symbolic estimates of the execution times of these computations. For programs with regular computation and communication patterns, this information allows us to avoid executing or simulating large portions of the computational code during the simulation. It also allows us to avoid performing some of the message data transfers, while still simulating the message performance in detail. We have used these techniques to integrate the MPISim parallel simulator at UCLA with the Rice dHPF compiler infrastructure. We evaluate the accuracy and benefits of these techniques for three standard messagepassing benchmarks on a wide range of problem and system sizes. The optimized simulator has errors of less than 16 % compared with direct program measurement in all the cases we studied, and typically much smaller errors. Furthermore, it requires factors of 5 to 2000 less memory and up to a factor of 10 less time to execute than the original simulator. These dramatic savings allow us to simulate regular message-passing programs on systems and problem sizes 10 to 100 times larger than is possible with the original simulator, or other current state-of-the-art simulators.
Citations
|
527
|
Interprocedural slicing using dependence graphs
– Horwitz, Reps, et al.
- 1990
|
|
468
|
T.Lasinski. The NAS parallel benchmarks
– Bailey, Simon, et al.
- 1991
|
|
293
|
Distributed simulation: A case study in design and verification of distributed programs
– Chandy, Misra
- 1979
|
|
225
|
PROTEUS : A high-performance parallelarchitecture simulator
– Brewer, Dellarocas, et al.
- 1991
|
|
193
|
The NAS parallel benchmarks 2.0
– Bailey, Harris, et al.
- 1995
|
|
188
|
The Wisconsin wind tunnel: Virtual prototyping of parallel computers
– Reinhardt, Hill, et al.
- 1993
|
|
169
|
PARSEC: A Parallel Simulation Environment for Complex Systems
– Bagrodia, Meyer, et al.
- 1998
|
|
133
|
Multiprocessor Simulation and Tracing Using Tango
– Davis, Goldschmidt, et al.
- 1991
|
|
74
|
The Rice parallel processing testbed
– Covington, Madala, et al.
- 1988
|
|
56
|
The conditional event approach to distributed simulation
– Chandy, Sherman
- 1989
|
|
54
|
Using Integer Sets for Data-Parallel Program Analysis and Optimization
– Adve, Mellor-Crummey
- 1998
|
|
34
|
POEMS: End-to-end performance design of large parallel adaptive computational systems
– Adve, Bagrodia, et al.
|
|
34
|
Parallelized direct execution simulation of message passing parallel programs
– Dickens, Heidelberger, et al.
- 1996
|
|
27
|
Performance Prediction of Large Parallel Applications using Parallel Simulations
– Bagrodia, Deelman, et al.
- 1999
|
|
27
|
A distributed memory LAPSE: Parallel simulation of messagepassing programs
– Dickens, Heidelberger, et al.
- 1994
|
|
26
|
MPI-SIM: Using Parallel Simulation to Evaluate MPI Programs
– Prakash, Bagrodia
- 1998
|
|
25
|
Application Representations for Multiparadigm Performance Modeling of LargeScale Parallel Scientific Codes
– Adve, Sakellariou
- 2000
|
|
25
|
Transparent Implementation of Conservative Algorithms in Parallel
– Jha, Bagrodia
- 1993
|
|
22
|
Improving the accuracy vs. speed tradeoff for simulating shared-memory multiprocessors with ILP processors
– Durbhakula, Pai, et al.
- 1999
|
|
21
|
Reducing synchronization overhead in parallel simulation
– Legedza, Weihl
- 1996
|
|
19
|
POEMS: End-to-End Performance Design of Large Parallel Adaptive Computational Systems
– Deelman, Dube, et al.
- 2000
|
|
19
|
FAST: A Functional Algorithm Simulation Testbed
– Dikaiakos, Rogers, et al.
- 1994
|
|
13
|
Optimistic simulation of parallel architectures using program executables
– Chandrasekaran, Hill
- 1996
|
|
11
|
Compiler Synthesis of Task Graphs for Parallel Programs
– Adve, Sakellariou
- 2000
|
|
11
|
Improving Lookahead in Parallel Discrete Event Simulations of Large-Scale Applications Using Compiler Analysis
– Deelman, Bagrodia, et al.
- 2001
|
|
11
|
Parallel simulation of data parallel programs
– Prakash, Bagrodia
- 1995
|
|
9
|
Asynchronous Parallel Simulation of Parallel Programs
– Prakash, Deelman, et al.
|
|
7
|
Functional Algorithm Simulation of the Fast Multipole Method: Architectural Implications
– Dikaiakos, Rogers, et al.
- 1996
|
|
6
|
An adaptive synchronization method for unpredictable communication patterns in dataparallel programs
– Prakash, Bagrodia
- 1995
|
|
5
|
The Wisconsin Wind Tunnel
– Reinhardt, Hill, et al.
- 1993
|