Analytical models for parallel programs have been successful at providing simple qualitative insights and bounds on scalability, but have been less successful in practice for predicting detailed, quantitative information about program performance. We develop a conceptually simple model that provides detailed performance prediction for parallel programs with arbitrary task graphs, a wide variety of task scheduling policies, shared-memory communication, and significant resource contention. Unlike many previous models, our model assumes deterministic task execution times which permits detailed analysis of synchronization, task scheduling, the order of task execution, as well as mean values of communication costs. The assumption of deterministic task times is supported by a recent study of the influence of non-deterministic delays in parallel programs. We show that the deterministic task graph model is accurate and efficient for five shared-memory programs, including programs with large and/or complex task graphs, sophisticated task scheduling, highly non-uniform task times, and significant communication and resource contention. We also use three example programs to illustrate the predictive capabilities of the model. In two cases, broad insights and detailed metrics from the model are used to suggest improvements in loadbalancing and the model quickly and accurately predicts the impact of these changes. In the third case, further novel metrics are used to obtain insight into the impact of program design changes that improve communication locality as well as load-balancing. Finally, we briefly present results of a comparison between our model and representative models based on stochastic task execution times.
|
705
|
SPLASH: Stanford Parallel Applications for Shared Memory
– Singh, Weber, et al.
- 1992
|
|
434
|
LogP: Towards a Realistic Model of Parallel Computation
– Culler, al
- 1993
|
|
269
|
Validity of the Single Processor Approach to Achieving Large-Scale Computing Capabilities
– Amdahl
- 1967
|
|
172
|
A static performance estimator to guide data partitioning decisions
– BALASUNDARAM, Fox, et al.
- 1991
|
|
153
|
Fundamentals of computer algorithms
– Horowitz, Sahni
- 1978
|
|
130
|
Speedup versus efficiency in parallel systems
– Eager, Zahorjan, et al.
- 1989
|
|
113
|
Characterizations of parallelism in applications and their use in scheduling
– Sevcik
- 1989
|
|
80
|
Predicting the Performance of Parallel Computations
– Mak, Lundstrom
- 1990
|
|
72
|
Automatic Performance Prediction for Parallel Programs on Massively Parallel Computers
– Fahringer
- 1993
|
|
59
|
Parallel Performance Prediction Using Lost Cycles Analysis”, SuperComputing
– Crovella, LeBlanc
- 1994
|
|
45
|
An Accurate and Efficient Performance Analysis Technique for Multiprocessor Snooping Cache-Consistency Protocols
– Vernon, Lazowska, et al.
- 1988
|
|
43
|
Analytic queueing models for programs with internal concurrency
– Heidelberger, Trivedi
- 1983
|
|
42
|
Performance of parallel processors
– Flatt, Kennedy
- 1989
|
|
41
|
Quantitative System Performance
– Lazowska, Zahorjan, et al.
- 1984
|
|
41
|
Performance prediction of parallel processing systems: The Pamela methodology
– Gemund
- 1993
|
|
38
|
Analytic Queueing Network Models for Parallel Processing of Task Systems
– Thomasian, Bay
- 1986
|
|
36
|
ªPerformance of Synchronous Parallel Algorithms with Regular Structures,º
– Madala, Sinclair
- 1991
|
|
34
|
Analyzing the Behavior and Performance of Parallel Programs
– Adve
- 1993
|
|
30
|
Bounds for the Mean Runtime of Parallel Programs
– Hartleb, Mertsiotakis
- 1992
|
|
29
|
LoPC: Modeling Contention in Parallel Algorithms
– Frank, Agarwal, et al.
- 1997
|
|
26
|
The influence of random delays on parallel execution times
– Adve, Vernon
- 1993
|
|
26
|
An Analytical Model of Multistage Interconnection Networks
– Willick, Eager
- 1990
|
|
25
|
Performance Prediction and Calibration for a Class of Multiprocessors
– Vrsalovic, Siewiorek, et al.
- 1988
|
|
23
|
A Modeling Methodology for the Analysis of Concurrent Systems and Computations
– Kapelnikov, Muntz, et al.
- 1989
|
|
22
|
A static parameter-based performance prediction tool for parallel programs
– Fahringer, Zima
- 1993
|
|
20
|
A probabilistic approach to parallel system performance modelling
– Jonkers, Gemund, et al.
- 1995
|
|
19
|
Performance of synchronized iterative processes in multiprocessor systems
– Dubois, Briggs
- 1982
|
|
19
|
Performance of Parallel Programs: Model and Analyses
– Mohan
- 1984
|
|
18
|
The Effects of Problem Partitioning, Allocation, and Granularity on the Performance of Multiple-Processor Systems
– Cvetanovic
- 1987
|
|
17
|
FAST: A functional algorithm simulation testbed. Number TR-444-94
– Dikaiakos, Rogers, et al.
- 1994
|
|
16
|
Analysis of Fork-Join Program Response Times on Multiprocessors
– Towsley, Rommel, et al.
- 1990
|
|
14
|
Parallelism in Numeric and Symbolic Programs
– Larus
- 1990
|
|
14
|
Performance Modeling of Parallel Systems
– Gemund
- 1996
|
|
13
|
Interpreting the Performance of HPF/Fortran 90D
– Parashar, Hariri, et al.
- 1994
|
|
12
|
PCP: A parallel extension of C that is 99% fat free
– Brooks
- 1988
|
|
12
|
Semi-empirical Multiprocessor Performance
– Xu, Zhang, et al.
- 1996
|
|
9
|
Asynchronous analysis of parallel dynamic programming algorithms
– Lewandowski, Condon, et al.
- 1996
|
|
8
|
Polynomial root-finding: Analysis and computational investigation of a parallel algorithm
– Narendran, Tiwari
- 1992
|
|
8
|
Diagnosing parallel program speedup limitations using resource contention models
– Tsuei, Vernon
- 1990
|
|
7
|
Analyzing Multiprocessor Cache Behavior Through Data Reference Modeling
– TSAI, AGARWAL
- 1993
|
|
6
|
A simple model for parallel processing
– Flatt
- 1984
|
|
5
|
Performance Modeling of Parallel Algorithms
– Ammar, Islam, et al.
- 1990
|
|
5
|
On the promise of general-purpose parallel computing
– Hack
- 1989
|
|
4
|
A Multiprocessor Bus Design Model Validated by System Measurement
– Tsuei, Vernon
- 1992
|
|
2
|
Predicting Application Behavior in Large-Scale Shared Memory Multiprocessors
– Harzallah, Sevcik
- 1995
|
|
1
|
PerformanceAnalysis of Mesh Interconnection Networks with Deterministic Routing
– Adve, Vernon
- 1994
|
|
1
|
ReevaluatingAmdahl's Law
– Gustafson
- 1988
|