MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Appears in the 13 th International Conference on Parallel Architecture and Compilation Techniques (PACT 2004) Static Placement, Dynamic Issue (SPDI) Scheduling for EDGE Architectures

Download:
pdf
by Ramadass Nagarajan, Sundeep K. Kushwaha, Doug Burger, Kathryn S. Mckinley, Calvin Lin, Stephen W. Keckler
ftp://ftp.cs.utexas.edu/pub/dburger/papers/PACT04.pdf
Add To MetaCart

Abstract:

Technology trends present new challenges for processor architectures and their instruction schedulers. Growing transistor density will increase the number of execution units on a single chip, and decreasing wire transmission speeds will cause long and variable on-chip latencies. These trends will severely limit the two dominant conventional architectures: dynamic issue superscalars, and static placement and issue VLIWs. We present a new execution model in which the hardware and static scheduler instead work cooperatively, called Static Placement Dynamic Issue (SPDI). This paper focuses on the static instruction scheduler for SPDI. We identify and explore three issues SPDI schedulers must consider—locality, contention, and depth of speculation. We evaluate a range of SPDI scheduling algorithms executing on an Explicit Data Graph Execution (EDGE) architecture. We find that a surprisingly simple one achieves an average of 5.6 instructions-per-cycle (IPC) for SPEC2000 64-wide issue machine, and is within 80 % of the performance without on-chip latencies. These results suggest that the compiler is effective at balancing on-chip latency and parallelism, and that the division of responsibilities between the compiler and the architecture is well suited to future systems. 1

Citations

594 MediaBench: A tool for evaluating and synthesizing multimedia and communication systems – Lee, Potkonjak, et al. - 1997
560 Trace scheduling: A technique for global microcode compaction – Fisher - 1981
455 Software Pipelining, “An Effective Scheduling Technique for VLIW – Lam
260 Bulldog: A Compiler for VLIW Architectures – Ellis - 1985
227 Clock rate versus IPC: The end of the road for conventional microarchitectures – Agarwal, Hrishikesh, et al.
194 IMPACT: An architectural framework for multiple-instruction-issue processors – Chang, Mahlke, et al. - 1991
132 The Multicluster Architecture: Reducing Cycle Time ghrough Partitioning – Farkas, Chow, et al. - 1997
89 Very Long Instruction Word Architectures and the ELI-52 – Fisher - 1983
84 The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays – Hrishikesh, Burger, et al. - 2002
46 Parallel processing: a smart compiler and a dumb machine – Fisher - 1984
43 Balanced Scheduling: Instruction Scheduling When Memory Latency is Uncertain – Kerns, Eggers - 1993
42 Integrated predicated and speculative execution – August, Connors, et al. - 1998
36 CARS: A New Code Generation Framework for Clustered – Kailas, Ebcioglu, et al. - 2001
31 Treegion scheduling for wide-issue processors – Havanki, Banerjia, et al. - 1998
26 Introducing the IA-64 architecture – Huck, Morris, et al. - 2000
19 High-speed electrical signaling: overview and limitations – Horowitz, Yang, et al. - 1998
12 Convergent scheduling – Lee, Puppin, et al. - 2002
11 the TRIPS Team. Scaling to the end of silicon with edge architectures – Burger, Keckler, et al.
10 Optimal integrated code generation for clustered vliw architectures – Kessler, Bednarski - 2002
6 Effective instruction scheduling techniques for an interleaved cache clustered VLIW processor – Gibert, Sanchez, et al. - 2002
4 Load scheduling with profile information – Lindenmaier, McKinley, et al. - 2000