by Ramadass Nagarajan, Sundeep K. Kushwaha, Doug Burger, Kathryn S. Mckinley, Calvin Lin, Stephen W. Keckler
ftp://ftp.cs.utexas.edu/pub/dburger/papers/PACT04.pdf
Add To MetaCart
Abstract:
Technology trends present new challenges for processor architectures and their instruction schedulers. Growing transistor density will increase the number of execution units on a single chip, and decreasing wire transmission speeds will cause long and variable on-chip latencies. These trends will severely limit the two dominant conventional architectures: dynamic issue superscalars, and static placement and issue VLIWs. We present a new execution model in which the hardware and static scheduler instead work cooperatively, called Static Placement Dynamic Issue (SPDI). This paper focuses on the static instruction scheduler for SPDI. We identify and explore three issues SPDI schedulers must consider—locality, contention, and depth of speculation. We evaluate a range of SPDI scheduling algorithms executing on an Explicit Data Graph Execution (EDGE) architecture. We find that a surprisingly simple one achieves an average of 5.6 instructions-per-cycle (IPC) for SPEC2000 64-wide issue machine, and is within 80 % of the performance without on-chip latencies. These results suggest that the compiler is effective at balancing on-chip latency and parallelism, and that the division of responsibilities between the compiler and the architecture is well suited to future systems. 1
Citations
|
594
|
MediaBench: A tool for evaluating and synthesizing multimedia and communication systems
– Lee, Potkonjak, et al.
- 1997
|
|
560
|
Trace scheduling: A technique for global microcode compaction
– Fisher
- 1981
|
|
455
|
Software Pipelining, “An Effective Scheduling Technique for VLIW
– Lam
|
|
260
|
Bulldog: A Compiler for VLIW Architectures
– Ellis
- 1985
|
|
227
|
Clock rate versus IPC: The end of the road for conventional microarchitectures
– Agarwal, Hrishikesh, et al.
|
|
194
|
IMPACT: An architectural framework for multiple-instruction-issue processors
– Chang, Mahlke, et al.
- 1991
|
|
132
|
The Multicluster Architecture: Reducing Cycle Time ghrough Partitioning
– Farkas, Chow, et al.
- 1997
|
|
89
|
Very Long Instruction Word Architectures and the ELI-52
– Fisher
- 1983
|
|
84
|
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays
– Hrishikesh, Burger, et al.
- 2002
|
|
46
|
Parallel processing: a smart compiler and a dumb machine
– Fisher
- 1984
|
|
43
|
Balanced Scheduling: Instruction Scheduling When Memory Latency is Uncertain
– Kerns, Eggers
- 1993
|
|
42
|
Integrated predicated and speculative execution
– August, Connors, et al.
- 1998
|
|
36
|
CARS: A New Code Generation Framework for Clustered
– Kailas, Ebcioglu, et al.
- 2001
|
|
31
|
Treegion scheduling for wide-issue processors
– Havanki, Banerjia, et al.
- 1998
|
|
26
|
Introducing the IA-64 architecture
– Huck, Morris, et al.
- 2000
|
|
19
|
High-speed electrical signaling: overview and limitations
– Horowitz, Yang, et al.
- 1998
|
|
12
|
Convergent scheduling
– Lee, Puppin, et al.
- 2002
|
|
11
|
the TRIPS Team. Scaling to the end of silicon with edge architectures
– Burger, Keckler, et al.
|
|
10
|
Optimal integrated code generation for clustered vliw architectures
– Kessler, Bednarski
- 2002
|
|
6
|
Effective instruction scheduling techniques for an interleaved cache clustered VLIW processor
– Gibert, Sanchez, et al.
- 2002
|
|
4
|
Load scheduling with profile information
– Lindenmaier, McKinley, et al.
- 2000
|