Abstract:
For many applications, branch mispredictions and cache misses limit a processor's performance to a level well below its peak instruction throughput. A small fraction of static instructions, whose behavior cannot be anticipated using current branch predictors and caches, contribute a large fraction of such performance degrading events. This paper analyzes the dynamic instruction stream leading up to these performance degrading instructions to identify the operations necessary to execute them early. The backward slice (the subset of the program that relates to the instruction) of these performance degrading instructions, if small compared to the whole dynamic instruction stream, can be pre-executed to hide the instruction's latency. To overcome conservative dependence assumptions that result in large slices, speculation can be used, resulting in speculative slices. This paper provides an initial characterization of the backward slices of L2 data cache misses and branch mispredictions, and shows the effectiveness of techniques, including memory dependence prediction and control independence, for reducing the size of these slices. Through the use of these techniques, many slices can be reduced to less than one tenth of the full dynamic instruction stream when considering the 512 instructions before the performance degrading instruction.
Citations
|
1253
|
The Simplescalar toolset, version 2.0
– Burger, Austin
- 1997
|
|
908
|
Program slicing
– Weiser
- 1984
|
|
657
|
Advanced Compiler Design and Implementation
– Muchnick
- 1997
|
|
445
|
A survey of program slicing techniques
– Tip
- 1995
|
|
205
|
Limits of control flow on parallelism
– Lam, Wilson
- 1992
|
|
155
|
Memory Dependence Prediction using Store Sets
– Chrysos, Emer
- 1998
|
|
150
|
Dynamic speculation and synchronization of data dependences
– Moshovos, Breach, et al.
- 1997
|
|
139
|
Assigning Confidence to Conditional Branch Predictions
– Jacobsen, Rotenberg, et al.
- 1996
|
|
137
|
Dependence Based Prefetching for Linked Data Structures
– Roth, Moshovos, et al.
- 1998
|
|
97
|
Improving the Accuracy and Performance of Memory Communication Through Renaming
– Tyson, Austin
- 1997
|
|
95
|
The YAGS branch prediction scheme
– Eden, Mudge
- 1998
|
|
84
|
Effective jump-pointer prefetching for linked data structures
– Roth, Sohi
- 1999
|
|
76
|
Dynamic dependency analysis of ordinary programs
– Austin, Sohi
- 1992
|
|
64
|
Simultaneous subordinate microthread (SSMT
– Chappell, Stark, et al.
- 1999
|
|
56
|
Predictability of load/store instruction latencies
– Abraham, Sugumar, et al.
- 1993
|
|
47
|
The cascaded predictor: Economical and adaptive branch target prediction
– Driesen, Hölze
- 1998
|
|
44
|
Assisted execution
– Song, Dubois
- 1998
|
|
44
|
The use of multithreading for exception handling
– Zilles, Emer, et al.
- 1999
|
|
38
|
Dataflow Analysis of Branch Mispredictions and Its Application to Early Resolution of Branch Outcomes
– Farcy, Temam, et al.
- 1998
|
|
32
|
Memory Dependence Prediction
– Moshovos
- 1998
|
|
30
|
Predicting data cache misses in non-numeric applications through correlation profiling
– Mowry, Luk
- 1997
|
|
29
|
Limits of Instruction Level Parallelism
– Wall
- 1991
|
|
19
|
Streamlining Inter-Operation Communication via Data Dependence Prediction
– Moshovos, Sohi
- 1997
|
|
16
|
Improving Virtual Function Call Target Prediction via Dependence-based Pre-computation
– Roth, Moshovos, et al.
- 1999
|
|
15
|
Optimizations and oracle parallelism with dynamic translation
– Ebcio˘glu, Altman, et al.
- 1999
|
|
12
|
Classifying load and store instructions for memory renaming
– Reinman, Calder, et al.
- 1999
|
|
7
|
Speculative Data Driven Sequencing for Imperative Programs
– Roth, Sohi
- 2000
|
|
2
|
Advances in Computers, chapter 34: Program Slicing
– Binkley, Gallagher
- 1996
|