MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Compiler-Controlled Cache Mapping Rules (1995) [9 citations — 0 self]

Download:
Download as a PDF | Download as a PS
by Robert A. Wagner, Robert A. Wagner
ftp://ftp.cs.duke.edu/pub/dist/techreport/1995/1995-31.ps.gz
Add To MetaCart

Abstract:

The gap between memory speed and CPU speed in current RISC machines is often bridged with one or more levels of set-associative cache. In programs which operate on dense matrices, performance is often limited by memory reference times, rather than the apparent arithmetic complexity. Attempts to improve performance by blocking the loops of the code may fail to achieve the promised speed, because the submatrices which they try to maintain in cache exhibit self-interference. That is, many distinct cache lines of some or all of the submatrices map into the same cache association set. This can cause essentially every reference to such cache lines to produce a cache miss. A new cache design is presented. This design allows the compiler to program the mappings used to select cache association sets from virtual addresses, allowing a different mapping to be used for each array. It is shown how instructions to program such mappings can be inserted into straightforward loop blocking code to allocate the cache so that distinct arrays referenced during the innermost loop nests occupy different areas of the cache, and that these array blocks are mapped in such a way that no array exhibits selfinterference. These blocks then remain co-resident in the cache during such loops. Any compiler which automatically "blocks " loops for increased locality can easily generate the proposed instructions. Index terms and phrases

Citations

503 The cache performance and optimizations of blocked algorithms – Lam, Rothberg, et al. - 1991
12 Impact of cache interferences on usual numerical dense loop nests – Temam, Fricker, et al. - 1993
8 On Reconfigurable On-Chip Data Caches – Dahlgren, Stenstrom - 1991
3 Increased memory performance during vector accesses through the use of linear address transformations – Harper - 1992
1 Using set-associative caches deterministically to achieve near peak performance on RS6000 machines – Wagner - 1995