MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  ph: (608)-265-5737

Download:
Download as a PDF | Download as a PS
by James E. Smith, Gurindar S. Sohi
ftp://ftp.cs.wisc.edu/sohi/papers/1995/ieee-proc.superscalar.ps.gz
Add To MetaCart

Abstract:

Superscalar processing is the latest in a long series of innovations aimed at producing ever-faster microprocessors. By exploiting instruction-level parallelism, superscalar processors are capable of executing more than one instruction in a clock cycle. This paper discusses the microarchitecture of superscalar processors. We begin with a discussion of the general problem solved by superscalar processors: converting an ostensibly sequential program into a more parallel one. The principles underlying this process, and the constraints that must be met, are discussed. The paper then provides a description of the specific implementation techniques used in the important phases of superscalar processing. The major phases include: i) instruction fetching and conditional branch processing, ii) the determination of data dependences involving register values, iii) the initiation, or issuing, of instructions for parallel execution, iv) the communication of data values through memory via loads and stores, and v) committing the process state in correct order so that precise interrupts can be supported. Examples of recent superscalar microprocessors, the MIPS R10000, the DEC 21164, and the AMD K5 are used to illustrate a variety of superscalar methods.

Citations

554 Cache memories – Smith - 1982
445 Multiscalar Processors – Sohi, Breach, et al. - 1995
374 A study of branch prediction strategies – Smith - 1981
335 Limits of instruction-level parallelism – Wall - 1991
284 Lockup-free instruction fetch/prefetch cache organization – Kroft - 1981
282 An efficient algorithm for exploiting multiple arithmetic units – Tomasulo - 1967
224 Branch prediction strategies and branch target buffer design – Lee, Smith - 1984
210 Implementation of precise interrupts in pipelined processors – Smith, Pleszkun - 1985
208 Limits of Control Flow on Parallelism – Lam, Wilson - 1992
201 Instruction issue logic for highperformance, interruptible pipelined processors – Sohi, Vajapeyam - 1987
189 Available instruction-level parallelism for superscalar and superpipelined machines – Jouppi, Wall - 1989
174 Improving the accuracy of dynamic branch prediction using branch correlation – Pan, So, et al. - 1992
171 Instruction-Level Parallel Processing: History, Overview, and Perspective – Rau, Fisher - 1993
168 A VLIW architecture for a trace scheduling compiler – Colwell, Nix, et al. - 1991
147 Predicting Conditional Branch Directions from Previous Runs of a Program – Fisher, Freudenberger - 1992
121 High-bandwidth data memory systems for superscalar processors – Sohi, Franklin - 1991
120 Optimization of instruction fetch mechanisms for high issue rates – Conte, Menezes, et al. - 1995
104 Limits on multiple instruction issue – Smith, Johnson, et al. - 1989
96 Synchronization, coherence, and event ordering in multiprocessors – Dubois, Scheurich, et al. - 1988
86 Complexity/Performance Tradeoffs with Non-Blocking Loads – Farkas, Jouppi - 1994
84 Branch History Table Prediction of Moving Target Branches due to Subroutine Returns – Kaeli, Emma - 1991
79 Single-Program Speculative Multithreading (SPSM) Architecture – Dubey, O’Brien, et al. - 1995
59 Detection and parallel execution of independent instructions – Tjaden, Flynn - 1970
55 The Cydra 5 Departmental Supercomputer: Design – Rau - 1989
52 Look-ahead processors – Keller - 1975
48 Parallel operation in the control data 6600 – Thornton - 1964
37 HPSm, a High Performance Restricted Data Flow Architecture Having Minimal Functionality – Hwu, Patt - 1986
34 HPS, A New Microarchitecture: Rationale and Introduction – Patt, Hwu, et al. - 1985
33 Organization of the Motorola 88110 superscalar RISC microprocessor – Deifendorff, Allen - 1992
32 A Hardware Mechanism for Dynamic Memory Disambiguation – Franklin, Sohi, et al. - 1996
32 Checkpoint Repair for High-Performance Out-of-Order Execution Machines – Hwu, Patt - 1987
31 IBM RISC System/6000 processor architecture – Oehler, Groves - 1990
31 The CRAY-1 Computer System – Russel - 1978
30 Multis: A new class of multiprocessor computers – Bell
27 Optimal pipelining in supercomputers – Kunkel, Smith - 1986
27 Critical issues regarding HPS, a high performance microarchitecture – Patt, Melvin, et al. - 1985
26 The effect of speculatively updating branch history on branch prediction accuracy, revisited – Hao, Chang, et al. - 1994
23 Exploring the Design Space for a Shared-Cache Multiprocessor – Nayfeh, Olukotun - 1994
22 The IBM System/360 Model 91: Machine Philosophy and Instruction-Handling – Anderson, Sparacio, et al. - 1967
20 Optimal Pipelining – Dubey, Flynn - 1990
17 Machine Organization of the IBM RISC System/6000 processor – Grohoski - 1990
16 Alternative Implementations of Two-Level Adaptive Training Branch Prediction – Yeh, Patt - 1992
15 Hardware/Software Tradeoffs for Increased Performance – Hennessy, Jouppi, et al. - 1982
11 Design of the R8000 Microprocessor – Hsu - 1994
11 MIPS R10000 Uses Decoupled Architecture. Microprocessor Report – Gwennap - 1994
9 MIPS R10000 Uses Decoupled Architecture," Microprocessor Report – Gwennap - 1994
9 Organization of the Motorola 88110 – Diefendorff, Allen - 1992
8 Digital Leads the Pack with 21164," Microprocessor Report – Gwennap - 1994
7 Cray X-MP: The birth of a supercomputer – August, Brost, et al. - 1989
7 Intel Reveals Pentium Implementation Details,’’ Microprocessor Report – Case