Download:
|
by Somnath Ghosh, Margaret Martonosi, Sharad Malik
In Proceedings of the 1997 ACM International Conference on Supercomputing
http://www.ee.princeton.edu/~sghosh/psfiles/wshop97.ps
Add To MetaCart
Abstract:
With the widening performance gap between processors and main memory, efficient memory referencing behavior is necessary for good program performance. Both hand-tuning and compiler optimization techniques are often used to transform codes in ways that improve memory performance. In either case, however, effective transformations require detailed knowledge about the frequency and causes of cache misses in the code. This paper describes methods for generating and solving Cache Miss equations that give a detailed representation of the cache misses in loop-oriented scientific code. Implemented within the SUIF compiler framework, our approach extends on traditional compiler reuse analysis to generate linear Diophantine equations that summarize each loop's memory behavior. Mathematical techniques for manipulating Diophantine equations allow us to compute the number of possible solutions, where each solution corresponds to a potential cache miss. These equations provide a general framework to guide code optimizations for improving cache performance. The paper gives an example of their use to determine array padding and offset amounts that minimize cache misses. Overall, these equations represent an analysis framework that is more precise than traditional memory behavior heuristics, but that can also be faster than simulation. 1
Citations
|
676
|
A data locality optimizing algorithm
– Wolf, Lam
- 1991
|
|
352
|
The omega test: a fast and practical integer programming algorithm for dependence analysis
– Pugh
- 1991
|
|
264
|
Tolerating Latency Through Software-Controlled Data Prefetching
– Mowry
- 1994
|
|
251
|
Strategies for cache and local memory management by global program transformation
– Gannon, Jalby, et al.
- 1988
|
|
230
|
Evaluating associativity in CPU caches
– Hill, Smith
- 1989
|
|
188
|
Compiler optimizations for improving data locality
– Carr, McKinley, et al.
- 1994
|
|
173
|
More iteration space tiling
– Wolfe
- 1989
|
|
168
|
Tile size selection using cache organization and data layout
– Coleman, McKinley
- 1995
|
|
138
|
Cache profiling and the SPEC benchmarks: A case study
– Lebeck, Wood
- 1994
|
|
131
|
Improving Locality and Parallelism in Nested Loops
– Wolf
- 1992
|
|
102
|
Optimizing for parallelism and data locality
– Kennedy, McKinley
- 1992
|
|
94
|
Compiler blockability of numerical algorithms
– Carr, Kennedy
- 1992
|
|
87
|
MemSpy: Analyzing memory system bottlenecks in programs
– MARTONOSI, GUPTA, et al.
- 1992
|
|
81
|
Counting solutions to linear and nonlinear constraints through ehrhart polynomials: Applications to analyze and transform scientific programs
– Clauss
- 1996
|
|
79
|
Performance estimation of embedded software with instruction cache modeling
– Li, Malik, et al.
- 1999
|
|
68
|
Cache interference phenomena
– Temam, Fricker, et al.
- 1994
|
|
62
|
A quantitative analysis of loop nest locality
– McKinley, Temam
- 1996
|
|
54
|
et al. “SUIF: An Infrastructure for Research on Parallelizing and Optimizing
– Wilson
- 1994
|
|
45
|
The cache performance of blocked algorithms
– Lam, Rothberg, et al.
- 1991
|
|
45
|
Loop transformations for restructuring compilers
– Banerjee
- 1993
|
|
30
|
An empirical study on array subscripts and data dependencies
– Shen, Li, et al.
- 1989
|
|
13
|
On estimating and enhancing cache effectiveness (extended abstract
– Ferrante, Sarkar, et al.
- 1991
|
|
6
|
et al. A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness
– Bacon
- 1994
|
|
4
|
The Theory of Numbers - A Text and Source Book of Problems
– Adler, Coury
|
|
1
|
Compiler optimirsations for improving data locality
– PI
- 1994
|
|
1
|
On estimating and enhancing cache effectiveness (extended abstract
– Supercomputing
- 1996
|
|
1
|
on Architectural Support for Pmgmmming Language* and Opemting Syrtems
– Pugh
- 1991
|
|
1
|
Loop tramformations for rertructuring compilCII
– Banerjee
- 1993
|
|
1
|
Tile size selection using cache organiesrion and data layout. h Pwc
– Coleman, McKinley
- 1995
|
|
1
|
Evaluating azsociativity in CPU caches
– Hill, Smith
- 1989
|
|
1
|
Cache interference phenomena
– Temarn, Jalby
- 1994
|
|
1
|
et al. SUIF: An infrastructure for rerewh on parallelieing and optimiting compilers
– Wilson
- 1994
|
|
1
|
A data locality op&nisation algorithm
– Wolf, Lam
- 1991
|