In order to effectively exploit the hardware parallelism offered by modern superscalar processors, efficient code schedules must be generated that maximize the use of available machine resources. In the late 1960s, hardware techniques were proposed in the IBM 360/91[Tom67] and CDC 6600[Tho70] machines to perform this
|
672
|
The program dependence graph and its use in optimization
– Ferrante, Ottenstein, et al.
- 1987
|
|
560
|
Trace scheduling: A technique for global microcode compaction
– Fisher
- 1981
|
|
455
|
Software Pipelining, “An Effective Scheduling Technique for VLIW
– Lam
|
|
455
|
Design and evaluation of a compiler algorithm for prefetching
– Mowry, Lam, et al.
- 1992
|
|
333
|
Limits of instruction-level parallelism
– Wall
- 1991
|
|
276
|
An efficient algorithm for exploiting multiple arithmetic units
– Tomasulo
- 1967
|
|
274
|
Lockup-free instruction fetch/prefetch cache organisation
– Kroft
- 1981
|
|
245
|
Superscalar Microprocessor Design
– Johnson
- 1991
|
|
216
|
Branch prediction strategies and branch target buffer design,” Computer
– Lee, Smith
- 1984
|
|
205
|
Limits of control flow on parallelism
– Lam, Wilson
- 1992
|
|
205
|
Implementation of precise interrupts in pipelined processors
– Smith, Pleszkun
- 1985
|
|
182
|
Available instruction-level parallelism for superscalar and superpipelined machines
– Jouppi, Wall
- 1989
|
|
151
|
Reducing memory latency via nonblocking and prefetching caches
– Chen, Baer
- 1992
|
|
142
|
Branch prediction for free
– Ball, Larus
- 1993
|
|
131
|
Two-level adaptive branch prediction
– Yeh, Patt
- 1991
|
|
115
|
Global instruction scheduling for superscalar machines
– Bernstein, Rodeh
- 1991
|
|
106
|
The Expandable Split Window Paradigm for Exploiting Fine-Grain Parallelism
– Franklin, Sohi
- 1992
|
|
103
|
Limits on multiple instruction issue
– Smith, Johnson, et al.
- 1989
|
|
83
|
Complexity/performance tradeoffs with non-blocking loads
– Farkas, Jouppi
- 1994
|
|
72
|
Measuring the Parallelism Available for Very Long Instruction Word Architectures
– Nicolau, Fisher
- 1984
|
|
69
|
E cient superscalar performance through boosting
– Smith, Horowitz, et al.
- 1992
|
|
67
|
Run-Time Disambiguation: Coping with Statically Unpredictable Dependencies
– Nicolau
- 1989
|
|
63
|
Region scheduling: an approach for detecting and redistributing parallelism
– Gupta, Soff&
|
|
56
|
Detection and Parallel Execution of Independent Instructions
– Tjaden, Flynn
- 1970
|
|
53
|
Instruction Issue Logic in Pipelined Supercomputers
– Weiss, Smith
- 1984
|
|
43
|
Balanced Scheduling: Instruction Scheduling When Memory Latency is Uncertain
– Kerns, Eggers
- 1993
|
|
32
|
Organization of the Motorola 88110 superscalar RISC microprocessor
– Diefendorff, Allen
- 1992
|
|
28
|
The Nonuniform Distribution of instructionlevel and machine parallelism and its effect on performance
– Jouppi
- 1989
|
|
27
|
A 0.6um BiCMOS processor with dynamic execution
– Colwell, Steck
- 1995
|
|
27
|
Critical issues regarding HPS, a high performance microarchitecture
– Patt, Melvin, et al.
- 1985
|
|
27
|
On the Limits of Program Parallelism and its Smoothability
– Theobald, Gao, et al.
- 1992
|
|
26
|
A realistic resource-constrained software pipelining algorithm
– Aiken, Nicolau
- 1991
|
|
18
|
Exploiting fine-grained parallelism through a combination of hardware and software techniques
– Melvin, Patt
- 1991
|
|
16
|
Comparing Static and Dynamic Code Scheduling for Multiple-Instruction-Issue Processors
– CHANG, CHEN, et al.
- 1991
|
|
16
|
Avoidance and suppression of compensation code in a trace scheduling compiler
– Freudenberger, Gross, et al.
- 1994
|
|
11
|
Design of a Computer:The Control Data 6600
– Thornton
- 1971
|
|
9
|
a new microarchitecture: Rationale and introduction
– HPS
- 1985
|
|
6
|
An Introduction to Compilation Issues for Parallel Machines
– Gokhale, Carlson
- 1992
|
|
3
|
Kemal Ebcioglu. An efficient resource-constrained global scheduling technique for superscalar and VLIW processors
– Moon
- 1992
|
|
1
|
An overview of the 21164 Alpha AXP microprocessor
– Edmonson, Rubinfield
- 1994
|
|
1
|
UltraSPARC, a new era in SPARC performance
– Microsystems
- 1994
|