MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Communicated by (Name of Editor)

Download:
pdf | ps
by Wesley K. Kaplow, Boleslaw K. Szymanski
http://www.cs.rpi.edu/~kaploww/spdp.ps
Add To MetaCart

Abstract:

Tiling has been used by parallelizing compilers to define fine-grain parallel tasks and to optimize cache performance. In this paper we present a novel compile-time technique, called miss-driven cache simulation, for determining tile size that achieves the highest cache hit-rate. The widening disparity between the processor's peak instruction rate and the main memory access time in modern processor makes this kind of optimization increasingly important for overall program efficiency. Our technique identifies potential cache misses through compile-time analysis of a loop nest and then processes them on an architecturally accurate cache model. Processing only a small portion of the memory reference trace of a program yields simulation speeds in the millions memory references per second on workstations, with statistics of misses per reference and inter-reference interference counts gathered at the same time. We discuss the results of applying this method to guide loop tiling for such commonly used computational kernels as matrix multiplication and Jacobi iteration for various cache parameters.

Citations

487 The cache performance and optimizations of blocked algorithms – LAM, ROTHBERG, et al. - 1991
361 A Loop Transformation Theory and an Algorithm to Maximize Parallelism – Wolf, Lam - 1991
253 Improving data locality with loop transformations – McKinley, Carr, et al. - 1996
251 Strategies for cache and local memory management by global program transformation – Gannon, Jalby, et al. - 1988
188 Compiler optimizations for improving data locality – Carr, McKinley, et al. - 1994
87 MemSpy: Analyzing memory system bottlenecks in programs – MARTONOSI, GUPTA, et al. - 1992
62 A quantitative analysis of loop nest locality – McKinley, Temam - 1996
26 Automatic program transformations for virtual memory computers – Abu-Sufah, Kuck, et al. - 1979
10 Automatic Cache Performance Prediction in a Parallelizing Compiler – Fahringer - 1993
5 Performance debugging shared-memory multiprocessor programs with mtool – Goldberg, Hennessy - 1991
5 Program optimization based on compile-time cache performance prediction. Parallel Processing – Kaplow, Szymanski - 1996