by Naraig Manjikian, Tarek S. Abdelrahman
In Proceedings of the International Conference on Parallel Processing ICPP 96
http://www.eecg.toronto.edu/~tsa/icpp96.ps
Add To MetaCart
Abstract:
Abstract---Tiling exploits temporal reuse carried by an outer loop of a loop nest to enhance cache locality. Loop skewing is typically required to make tiling legal. This restricts parallelism to wavefronts in the tiled iteration space. For a small number of processors, wavefront parallelism can be efficiently exploited using dynamic selfscheduling with a large tile size. Such a strategy enhances intratile locality, but does not necessarily enhance intertile locality. We show that dynamic self-scheduling performs poorly on scalable shared-memory multiprocessors where smaller tiles are necessary to provide sufficient parallelism---smaller tiles place greater importance on intertile locality. We propose static scheduling strategies which enhance intertile locality for small tiles. Results of experiments on a Convex SPP1000 multiprocessor demonstrate that our strategies outperform dynamic selfscheduling by a factor of up to 2.3 on 30 processors. 1
Citations
|
676
|
A data locality optimizing algorithm
– Wolf, Lam
- 1991
|
|
291
|
Compiler transformations for high-performance computing
– BACON, GRAHAM, et al.
- 1994
|
|
180
|
Guided selfscheduling: A Practical Scheduling Scheme for Parallel Computers
– Polychronopoulos, Kuck
- 1987
|
|
131
|
Improving Locality and Parallelism in Nested Loops
– Wolf
- 1992
|
|
127
|
Using processor affinity in loop scheduling on shared-memory multiprocessors
– Markatos, LeBlanc
- 1994
|
|
93
|
Factoring: A Method for Scheduling Parallel Loops
– Hummel, Schonberg, et al.
- 1992
|
|
91
|
Ultracomputers: A Teraflop Before its Time
– Bell
- 1992
|
|
60
|
Constructive methods for scheduling uniform loop nests
– Darte, Robert
- 1994
|
|
47
|
Fusion of loops for parallelism and locality
– Manjikian, Abdelrahman
- 1995
|
|
23
|
Compiler cache optimizations for banded matrix problems
– Li
- 1995
|
|
16
|
et al. The NUMAchine multiprocessor
– Vranesic
- 1995
|
|
14
|
Locality and Loop Scheduling on NUMA Multiprocessors
– Li, Tandri, et al.
- 1993
|
|
6
|
et al. The Stanford FLASH Multiprocessor
– Heinrich
- 1994
|