Modern processors come close to executing as fast as true dependences allow. The particular dependences that constrain execution speed constitute the critical path of execution. To optimize the performance of the processor, we either have to reduce the critical path or execute it more efficiently. In both cases, it can be done more effectively if we know the actual instructions that constitute that path. This paper describes Critical Path Prediction for dynamically identifying instructions likely to be on the critical path, allowing various processor optimizations to take advantage of this information. We show several possible critical path prediction techniques, and apply critical path prediction to value prediction and clustered architecture scheduling. We show that critical path prediction has the potential to increase the effectiveness of these hardware optimizations by as much as 70%, without adding greatly to their cost. 1
|
560
|
Trace scheduling: A technique for global microcode compaction
– Fisher
- 1981
|
|
540
|
Simultaneous multithreading: Maximizing on-chip parallelism
– Tullsen, Eggers, et al.
- 1995
|
|
359
|
The Tera Computer System
– Alverson, Callahan, et al.
- 1990
|
|
344
|
Complexityeffective superscalar processors
– Palacharla, Jouppi, et al.
- 1997
|
|
314
|
Value Locality and Load Value Prediction
– Lipasti, Wilkerson, et al.
- 1996
|
|
258
|
Trace Cache: A Low Latency Approach to High Bandwidth Instruction Fetching
– Rotenberg, Bennett, et al.
- 1996
|
|
246
|
Exceeding the Dataflow Limit via Value Prediction
– Lipasti, Shen
- 1996
|
|
221
|
The predictability of data values
– Sazeides, Smith
- 1997
|
|
186
|
Efficient path profiling
– Ball, Larus
- 1996
|
|
172
|
Highly Accurate Data Value Prediction using Hybrid Predictors
– Wang, Franklin
- 1997
|
|
169
|
The multiflow trace scheduling compiler
– Lowney, Freudenberger, et al.
- 1993
|
|
155
|
Dynamic Instruction Reuse
– Sodani, Sohi
- 1997
|
|
141
|
Speculative execution based on value prediction
– Gabbay, Mendelson
- 1996
|
|
132
|
The Multicluster Architecture: Reducing Cycle Time ghrough Partitioning
– Farkas, Chow, et al.
- 1997
|
|
115
|
Selective Value Prediction
– Calder, Reinman, et al.
- 1999
|
|
109
|
The M-machine multicomputer
– Fillo, Keckler, et al.
- 1995
|
|
107
|
Efficient instruction scheduling for a pipelined architecture
– Gibbons, Muchnick
- 1986
|
|
90
|
The alpha 21 264 microprocessor architecture
– Kessler, McLellan, et al.
- 1998
|
|
76
|
Threaded multiple path execution
– Wallace, Calder, et al.
- 1998
|
|
66
|
Instruction Fetching: Coping with Code Bloat
– Uhlig, Nagle, et al.
- 1995
|
|
63
|
Simulation and modeling of a simultaneous multithreading processor
– Tullsen
- 1996
|
|
60
|
Selective eager execution on the polypath architecture
– Klauser, Paithankar, et al.
- 1998
|
|
57
|
Speculation techniques for improving load related instruction scheduling
– Yoaz, Erez, et al.
- 1999
|
|
54
|
Understanding the Backwards Slices of Performance Degrading Instructions
– Zilles, Sohi
- 2000
|
|
50
|
Selective dual path execution
– Heil, Smith
- 1996
|
|
48
|
The Potential of Data Value Speculation to Boost ILP
– González, González
- 1998
|
|
44
|
The effect of instruction fetch bandwidth on value prediction
– Gabbay, Mendelson
|
|
39
|
Techniques for critical path reduction of scalar programs
– Schlansker, Kathail
- 1997
|
|
26
|
Power and performance tradeoffs using various caching strategies
– Bahar, Albera
- 1998
|
|
21
|
The non-critical buffer: Using load latency tolerance to improve data cache efficiency
– Fisk, Bahar
- 1999
|
|
12
|
Computing along the critical path
– Tullsen, Calder
- 1998
|
|
7
|
Pews: A decentralized dynamic scheduling algorithm for ilp processing
– Kemp, Franklin
- 1996
|