MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Scheduling of wavefront parallelism on scalable shared memory multiprocessor (1996) [13 citations — 1 self]

Download:
pdf | ps
by Naraig Manjikian, Tarek S. Abdelrahman
In Proceedings of the International Conference on Parallel Processing ICPP 96
http://www.eecg.toronto.edu/~tsa/icpp96.ps
Add To MetaCart

Abstract:

Abstract---Tiling exploits temporal reuse carried by an outer loop of a loop nest to enhance cache locality. Loop skewing is typically required to make tiling legal. This restricts parallelism to wavefronts in the tiled iteration space. For a small number of processors, wavefront parallelism can be efficiently exploited using dynamic selfscheduling with a large tile size. Such a strategy enhances intratile locality, but does not necessarily enhance intertile locality. We show that dynamic self-scheduling performs poorly on scalable shared-memory multiprocessors where smaller tiles are necessary to provide sufficient parallelism---smaller tiles place greater importance on intertile locality. We propose static scheduling strategies which enhance intertile locality for small tiles. Results of experiments on a Convex SPP1000 multiprocessor demonstrate that our strategies outperform dynamic selfscheduling by a factor of up to 2.3 on 30 processors. 1

Citations

676 A data locality optimizing algorithm – Wolf, Lam - 1991
291 Compiler transformations for high-performance computing – BACON, GRAHAM, et al. - 1994
180 Guided selfscheduling: A Practical Scheduling Scheme for Parallel Computers – Polychronopoulos, Kuck - 1987
131 Improving Locality and Parallelism in Nested Loops – Wolf - 1992
127 Using processor affinity in loop scheduling on shared-memory multiprocessors – Markatos, LeBlanc - 1994
93 Factoring: A Method for Scheduling Parallel Loops – Hummel, Schonberg, et al. - 1992
91 Ultracomputers: A Teraflop Before its Time – Bell - 1992
60 Constructive methods for scheduling uniform loop nests – Darte, Robert - 1994
47 Fusion of loops for parallelism and locality – Manjikian, Abdelrahman - 1995
23 Compiler cache optimizations for banded matrix problems – Li - 1995
16 et al. The NUMAchine multiprocessor – Vranesic - 1995
14 Locality and Loop Scheduling on NUMA Multiprocessors – Li, Tandri, et al. - 1993
6 et al. The Stanford FLASH Multiprocessor – Heinrich - 1994