Communicated by (Name of Editor)
Abstract:
Tiling has been used by parallelizing compilers to define fine-grain parallel tasks and to optimize cache performance. In this paper we present a novel compile-time technique, called miss-driven cache simulation, for determining tile size that achieves the highest cache hit-rate. The widening disparity between the processor's peak instruction rate and the main memory access time in modern processor makes this kind of optimization increasingly important for overall program efficiency. Our technique identifies potential cache misses through compile-time analysis of a loop nest and then processes them on an architecturally accurate cache model. Processing only a small portion of the memory reference trace of a program yields simulation speeds in the millions memory references per second on workstations, with statistics of misses per reference and inter-reference interference counts gathered at the same time. We discuss the results of applying this method to guide loop tiling for such commonly used computational kernels as matrix multiplication and Jacobi iteration for various cache parameters.

