• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Complexity-effective superscalar processors (1997)

Download From

IEEE
Download from IEEE

Download Links

  • [www.eecg.toronto.edu]
  • [www.eecg.utoronto.ca]
  • [www.ece.wisc.edu]
  • [ftp.cs.wisc.edu]
  • [www.cs.utah.edu]
  • [www.cs.utah.edu]
  • [www.cs.utah.edu]
  • [www.shimada.nuee.nagoya-u.ac.jp]
  • [ftp.cs.wisc.edu]
  • [www.princeton.edu]
  • [www.ece.wisc.edu]
  • [www.eecg.toronto.edu]
  • [www.crhc.uiuc.edu]

  • Other Repositories/Bibliography

  • DBLP
  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Subbarao Palacharla , J. E. Smith
Venue:In Proceedings of the 24th Annual International Symposium on Computer Architecture
Citations:385 - 5 self
  • Summary
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@INPROCEEDINGS{Palacharla97complexity-effectivesuperscalar,
    author = {Subbarao Palacharla and J. E. Smith},
    title = {Complexity-effective superscalar processors},
    booktitle = {In Proceedings of the 24th Annual International Symposium on Computer Architecture},
    year = {1997},
    pages = {206--218}
}

Years of Citing Articles

Bookmark

citeulike Connotea Bibsonomy Del.icio.us Digg Reddit

OpenURL

 

Abstract

The performance tradeoff between hardware complexity and clock speed is studied. First, a generic superscalar pipeline is defined. Then the specific areas of register renaming, instruction window wakeup and selection logic, and operand bypassing are analyzed. Each is modeled and Spice simulated for feature sizes of 0:8 m, 0:35 m, and0:18 m. Performance results and trends are expressed in terms of issue width and window size. Our analysis indicates that window wakeup and selection logic as well as operand bypass logic are likely to be the most critical in the future. A microarchitecture that simplifies wakeup and selection logic is proposed and discussed. This implementation puts chains of dependent instructions into queues, and issues instructions from multiple queues in parallel. Simulation shows little slowdown as compared with a completely flexible issue window when performance is measured in clock cycles. Furthermore, because only instructions at queue heads need to be awakened and selected, issue logic is simplified and the clock cycle is faster – consequently overall performance is improved. By grouping dependent instructions together, the proposed microarchitecture will help minimize performance degradation due to slow bypasses in future wide-issue machines. 1

Citations

7321 Introduction to Algorithms - Cormen, Leiserson, et al. - 2001
3633 Computer Architecture: A Quantitative Approach - Hennessy, Patterson - 1990
598 Trace scheduling: A technique for global microcode compaction - Fisher - 1981
544 Combining branch predictors - McFarling - 1993
398 Evaluating Future Microprocessors: The SimpleScalar Tool Set - Burger, Austin, et al. - 1996
273 Bulldog: A compiler for VLIW architectures - Ellis - 1986
257 Superscalar Microprocessor Design - Johnson - 1991
218 Limits of control flow on parallelism - Lam, Wilson - 1992
192 Available Instruction-Level Parallelism for Superscalar and Superpipelined Machines - Jouppi, Wall - 1989
171 A VLIW architecture for a trace scheduling compiler - Colwell, Nix, et al. - 1988
165 Dynamic Speculation and Synchronization of Data Dependences - Moshovos, Breach, et al. - 1997
155 The multicluster architecture: reducing cycle time through partitioning - Farkas, Jouppi, et al. - 1997
143 The Architecture of Pipelined Computers - Kogge - 1981
113 The Multiscalar Architecture - Franklin - 1993
110 The expandable split window paradigm for exploiting fine-grain parallelism - Franklin, Sohi - 1992
99 Intel’s P6 Uses Decoupled Superscalar Design - Gwennap - 1995
89 Partitioned register files for VLIWs: A preliminary analysis of tradeoffs - Capitanio, Dutt, et al. - 1992
87 Interconnect Scaling: The Real Limiter to High Performance VLSI," seminar given at MIT Nov 7 - Bohr - 1995
83 Dynamic dependency analysis of ordinary programs - Austin, Sohi - 1992
81 A prelinary architecture for a basic data-flow processor - Dennis, Misunas - 1975
79 An approach to scientific array processing: The architectural design of the AP-120B/FPS-164 family - Charlesworth - 1981
77 Physical Scalability Sabotage Performance Gains - Will - 1997
73 Digital 21264 Sets New Standard - Gwennap - 1996
53 Selective dual path execution - Heil, Smith - 1996
52 PIPE: A VLSI decoupled architecture - Goodman, Hsieh, et al. - 1985
52 The 21264: a superscalar Alpha processor with out-of-order execution. Presented at the 9 th Annual Microprocessor Forum - Keller - 1996
42 HPSm, a High Performance Restricted Data Flow Architecture Having Minimal Functionality - Hwu, Patt - 1986
40 Register file design considerations in dynamically scheduled processors - Farkas, Jouppi, et al. - 1996
29 The performance impact of incomplete bypassing in processor pipelines - Ahuja, Clark, et al. - 1995
29 The IBM system/360 model 91: machine philosophy and instruction-handling - ANDERSON, SPARACIO, et al. - 1967
29 PEWs: A Decentralized Dynamic Scheduler for ILP Processing - Kemp, Franklin - 1996
28 et al. A 600MHz Superscalar RISC Microprocessor with Out-of-Order Execution - Gieseke - 1997
27 Optimal pipelining in supercomputers - KUNKEL, E - 1986
25 An Investigation of the Performance of Various Dynamic Scheduling Techniques - Butler, Patt - 1992
22 et al., “Design of IonImplanted MOSFET’s with Very Small Dimensions - Dennard - 1974
21 The effectiveness of decoupling - Bird, Rawsthorne, et al. - 1993
21 Optimal pipelining - DUBEY, J - 1990
21 The HP PA-8000 RISC CPU: a high performance out-of-order processor - Kumar - 1996
20 Machine organization of the IBM RISC System/6000 processor - Grohoski - 1990
18 A 9-ns 1.4 gigabyte 17-ported CMOS register file - Jolly - 1991
17 Design and Evaluation of a Multiscalar Processor - Breach - 1999
12 Intel’s MMX Speeds Multimedia - Gwennap - 1996
12 Introducing the Intel i860 64-bit microprocessor - Kohn, Margulis - 1989
11 Design of the R-8000 microprocessor - Hsu - 1994
9 Cray X-MP: The Birth of a Supercomputer - August, Brost, et al. - 1989
5 A 14-port 3.8ns 116-word 64b Read-Renaming Register File - Asato, Montoye, et al. - 1995
5 A 3.1ns 32b CMOS Adder in Multiple Output Domino Logic - Hwang, Fisher - 1988
4 Planning a computer system : Project Stretch - Bucholtz - 1962
4 UltraSparc Adds Multimedia Instructions - Gwennap - 1995
4 Tutorial on Recent Trends in Processor Design: Reclimbing the Complexity Curve - Horowitz, Przybylski, et al. - 1992
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University