Abstract:
In today's high performance NUMA (Non-Uniform Memory Architecture) multiprocessors with memory hierarchy or distributed memory, the partition and distribution of data associated with parallel computations affect the amount of parallelism that can be exploited and the amount of data movement in the system. The objective of this research is to study and evaluate compile time data management techniques to enhance parallelism and to improve locality of memory reference for large scientific programs written in Fortran. Our first step is to reduce the amount of shared data through privatization. Privatization is a technique that allocates a separate copy of a shared variable in the private storage of each processor such that each processor can access a distinct instance of the variable. Privatization can enhance inherent parallelism of a program by eliminating memory-related anti- and output dependences. It can also improve the locality of references since accessing a private variable is inherently local and communication free. We present our algorithm for array privatization and the result of our experiment on the effectiveness of the algorithm. For the remaining shared data, we introduce a new concept: placement matrix, and show its application in deriving data alignment and data decomposition to reduce communication. We also incorporate the ratio of communication to computation in our evaluation of different data partitions. The work is continuing on heuristics for data distribution and the implementation of the tools.
Citations
|
296
|
Advanced compiler optimizations for supercomputers
– Padua, Wolfe
- 1986
|
|
179
|
SUPERB: A tool for semi-automatic MIMD/SIMD parallelization
– ZIMA, BAST, et al.
- 1988
|
|
163
|
Process decomposition through locality of reference
– ROGERS, PINGALI
- 1989
|
|
151
|
Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers
– Gupta, Banerjee
- 1992
|
|
141
|
Compiling programs for distributed-memory multiprocessors
– CALLAHAN, KENNEDY
- 1988
|
|
117
|
The data alignment phase in compiling programs for distributed-memory machines
– Li, Chen
- 1991
|
|
102
|
Experience in the automatic parallelization of four perfect benchmark programs
– Hoeflinger, Li, et al.
- 1992
|
|
85
|
Compile-time techniques for data distribution in distributed memory machines
– Ramanujam, Sadayappan
- 1991
|
|
81
|
Array Expansion
– Feautrier
- 1988
|
|
81
|
Compiler support for machine-independent parallel programming in Fortran D
– Hiranandani, Kennedy, et al.
|
|
75
|
Supercomputer performance evaluation and the Perfect Benchmarks
– CYBENKO, KIPP, et al.
- 1990
|
|
59
|
Measuring parallelism in computationintensive scientific/engineering applications
– Kumar
- 1988
|
|
31
|
Stencils and problem partitionings: Their influence on the performance of multiple processor systems
– Reed, Adams, et al.
- 1987
|
|
30
|
Automatic generation of nested, fork-join parallelism
– Burke, Cytron, et al.
- 1989
|
|
19
|
MAXPAR: An execution driven simulator for studying parallel systems
– Chen
- 1989
|
|
16
|
Machine-Independent Evaluation of Parallelizing Compilers
– Petersen, Padua
- 1992
|
|
11
|
Programming Concurrent Processors
– Fox
- 1989
|
|
11
|
The Delta Program Manipulation system --- Preliminary design
– Padua
- 1989
|
|
6
|
Translating control parallelism to data parallelism
– Balasundaram
- 1991
|
|
3
|
Stencils and problem partitionings: Their in uence on the performance of multiple processor systems
– Reed, Adams, et al.
- 1987
|