Abstract:
A relatively small set of static instructions has significant leverage on program execution performance. These problem instructions contribute a disproportionate number of cache misses and branch mispredictions because their behavior cannot be accurately anticipated using existing prefetching or branch prediction mechanisms. The behavior of many problem instructions can be predicted by executing a small code fragment called a speculative slice. If a speculative slice is executed before the corresponding problem instructions are fetched, then the problem instructions can move smoothly through the pipeline because the slice has tolerated the latency of the memory hierarchy (for loads) or the pipeline (for branches). This technique results in speedups up to 43 percent over an aggressive baseline machine. To benefit from branch predictions generated by speculative slices, the predictions must be bound to specific dynamic branch instances. We present a technique that invalidates predictions when it can be determined (by monitoring the program's execution path) that they will not be used. This enables the remaining predictions to be correctly correlated.
Citations
|
5825
|
Introduction to Algorithms
– Cormen, Leiserson, et al.
- 2001
|
|
1253
|
The Simplescalar toolset, version 2.0
– Burger, Austin
- 1997
|
|
908
|
Program slicing
– Weiser
- 1984
|
|
261
|
Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor
– Tullsen, Eggers, et al.
- 1996
|
|
139
|
Assigning Confidence to Conditional Branch Predictions
– Jacobsen, Rotenberg, et al.
- 1996
|
|
137
|
Dependence Based Prefetching for Linked Data Structures
– Roth, Moshovos, et al.
- 1998
|
|
110
|
Slipstream Processors: Improving both Performance and Fault Tolerance
– Sundaramoorthy, Purser, et al.
- 2000
|
|
106
|
Speculative data-driven multithreading
– Roth, Sohi
- 2001
|
|
95
|
The YAGS branch prediction scheme
– Eden, Mudge
- 1998
|
|
76
|
Threaded multiple path execution
– Wallace, Calder, et al.
- 1998
|
|
64
|
Simultaneous subordinate microthread (SSMT
– Chappell, Stark, et al.
- 1999
|
|
56
|
Predictability of load/store instruction latencies
– Abraham, Sugumar, et al.
- 1993
|
|
54
|
Understanding the Backwards Slices of Performance Degrading Instructions
– Zilles, Sohi
- 2000
|
|
47
|
The cascaded predictor: Economical and adaptive branch target prediction
– Driesen, Hölze
- 1998
|
|
44
|
Assisted execution
– Song, Dubois
- 1998
|
|
38
|
Dataflow Analysis of Branch Mispredictions and Its Application to Early Resolution of Branch Outcomes
– Farcy, Temam, et al.
- 1998
|
|
30
|
Predicting data cache misses in non-numeric applications through correlation profiling
– Mowry, Luk
- 1997
|
|
16
|
Improving Virtual Function Call Target Prediction via Dependence-based Pre-computation
– Roth, Moshovos, et al.
- 1999
|
|
11
|
Register integration: a simple and efficient implementation of squash reuse
– Roth, Sohi
- 2000
|