by Jim Pierce, Trevor Mudge
http://www.eecg.toronto.edu/~corinna/courses/ece1758/Pierce.micro29.ps
Add To MetaCart
Abstract:
Instruction cache misses can severely limit the performance of both superscalar processors and high speed sequential machines. Instruction prefetch algorithms attempt to reduce the performance degradation by bringing lines into the instruction cache before they are needed by the CPU fetch unit. There have been several algorithms proposed to do this, most notably next line prefetching and target prefetching. We propose a new scheme called wrong-path prefetching which combines next-line prefetching with the prefetching of all control instruction targets regardless of the predicted direction of conditional branches. The algorithm substantially reduces the cycles lost to instruction cache misses while somewhat increasing the amount of memory traffic. Wrong-path prefetching performs better than the other prefetch algorithms studied in all of the cache configurations examined while requiring little additional hardware. For example, the best wrong-path prefetch algorithm can result in a speed up of 16 % when using an 8K instruction cache. In fact, an 8K wrong-path prefetched instruction cache is shown to achieve the same miss rate as a 32K non-prefetch cache. Finally, it is shown that wrong-path prefetching is applicable to both multi-issue and long L1 miss latency machines.
Citations
|
680
|
Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and
– Jouppi
- 1990
|
|
537
|
Cache Memories
– Smith
- 1982
|
|
274
|
Lockup-free instruction fetch/prefetch cache organisation
– Kroft
- 1981
|
|
156
|
An architecture for software-controlled data prefetching
– Klaiber, Levy
- 1991
|
|
151
|
Reducing memory latency via nonblocking and prefetching caches
– Chen, Baer
- 1992
|
|
145
|
Aspects of Cache Memory and Instruction Buffer Performance
– Hill
- 1987
|
|
137
|
Tracing with Pixie
– Smith
- 1991
|
|
136
|
A Case for Direct-mapped Caches
– Hill
- 1988
|
|
54
|
Informing Memory Operations: Providing Memory Performance Feedback
– Horowitz
- 1996
|
|
45
|
Prefetching in Supercomputer Instruction Caches
– Smith, Hsu
- 1992
|
|
42
|
Designing the TFP Microprocessor
– Hsu
- 1994
|
|
42
|
The Alpha AXP architecture and 21064 processor
– McLellan
- 1993
|
|
31
|
Architecture of the Pentium Microprocessor
– Alpert, Avnon
- 1993
|
|
20
|
The Effect of Speculative Execution on Cache Performance
– Pierce, Mudge
- 1994
|
|
15
|
Improving performance of small on-chip instruction caches
– Farrens, Pleszkun
- 1989
|
|
8
|
The Alpha AXP Architecture and 21064
– McLellan
- 1993
|
|
5
|
IDtrace - A Tracing Tool for i486 Simulation
– PIERCE, MUDGE
- 1994
|
|
5
|
IDtrace: A Tracing Tool for i486 Simulation
– Pierce, Mudge
- 1994
|
|
4
|
Tynero: A Multiple Cache Simulator
– Quinlan, Lai
- 1991
|
|
3
|
and the PowerPC
– Weiss, Smith, et al.
- 1994
|
|
2
|
A performance model for instruction prefetch in pipelined instruction units
– Grohoski, Patel
- 1982
|