by Gerald Roth, John Mellor-crummey, Ken Kennedy, R. Gregg Brickner
In Proceedings of SC '97: High Performance Networking and Computing
http://www.supercomp.org/sc97/proceedings/TECH/ROTH/ROTH.PS
Add To MetaCart
Abstract:
For many Fortran90 and HPF programs performing dense matrix computations, the main computational portion of the program belongs to a class of kernels known as stencils. Stencil computations are commonly used in solving partial differential equations, image processing, and geometric modeling. The efficient handling of such stencils is critical for achieving high performance on distributed-memory machines. Compiling stencils into efficient code is viewed as so important that some companies have built special-purpose compilers for handling them and others have added stencilrecognizers to existing compilers. In this paper we present a general compilation strategy for stencils written using Fortran90 array constructs. Our strategy is capable of optimizing single or multistatement stencils and is applicable to stencils specified with shift intrinsics or with array-syntax all equally well. The strategy eliminates the need for pattern-recognition algorithms by orchestrating a set of optimizations that address the overhead of both intraprocessor and interprocessor data movement that results from the translation of Fortran90 array constructs. Our experimental results show that code produced by this strategy beats or matches the best code produced by the special-purpose compilers or pattern-recognition schemes that are known to us. In addition, our strategy produces highly optimized code in situations where the others fail, producing several orders of magnitude performance improvement, and thus provides a stencil compilation strategy that is more robust than its predecessors.
Citations
|
963
|
Performance Fortran Forum. High Performance Fortran language specification version 1.0
– High
- 1993
|
|
639
|
Efficiently Computing Static Single Assignment Form and the Control Dependence Graph
– Cytron, Ferrante, et al.
- 1991
|
|
441
|
Optimizing Supercompilers for Supercomputers
– Wolfe
- 1989
|
|
253
|
Improving data locality with loop transformations
– McKinley, Carr, et al.
- 1996
|
|
188
|
Compiler optimizations for improving data locality
– Carr, McKinley, et al.
- 1994
|
|
74
|
Updating Distributed Variables in Local Computations
– Gerndt
- 1990
|
|
38
|
Fortran at ten gigaflops: The Connection Machine convolution compiler
– BROMLEY, HELLER, et al.
- 1991
|
|
33
|
Problems to Test Parallel and Vector Languages
– Rice, Jing
- 1990
|
|
31
|
An HPF compiler for the IBM SP2
– Gupta, Midkiff, et al.
- 1995
|
|
30
|
Typed fusion with applications to parallel and sequential code generation
– Kennedy, S
- 1993
|
|
27
|
Application Benchmark Set for Fortran-D and High Performance Fortran
– Mohamed, Fox, et al.
- 1992
|
|
15
|
A compiler for a massively parallel distributed memory MIMD computer
– Sabot
- 1992
|
|
13
|
Compiling Fortran 77D and 90D for MIMD Distributed-Memory Machines
– Choudhary, Fox, et al.
- 1992
|
|
12
|
A stencil compiler for the Connection Machine models CM-2/200
– BRICKNER, GEORGE, et al.
- 1993
|
|
11
|
Polyshift communications software for the Connection
– George, Brickner, et al.
- 1994
|
|
10
|
Compiling Data Parallel Programs to Message Passing Programs for Massively Parallel MIMD Systems
– Brandes
- 1993
|
|
9
|
Optimization techniques for SIMD Fortran compilers. Concurrency: Practice and Experience
– Knobe, Lukas, et al.
- 1993
|
|
8
|
PGHPF -- An optimizing High Performance Fortran compiler for distributed memory machines
– Bozkus, Meadows, et al.
- 1997
|
|
7
|
Optimizing Fortran 90 shift operations on distributed-memory multicomputers
– Kennedy, Mellor-Crummey, et al.
- 1995
|
|
6
|
Context optimization for SIMD execution
– Kennedy, Roth
- 1994
|
|
5
|
Optimizing Fortran90D/HPF for Distributed-Memory Computers
– Roth
- 1997
|
|
3
|
Techniques for compiling and executing HPF programs on shared-memory and distributed-memory parallel systems
– Bozkus, Meadows, et al.
- 1994
|
|
1
|
Low level HPF compiler benchmark suite
– Haupt, Reddy, et al.
- 1995
|