In the past several decades, the world of computers and especially that of microprocessors has witnessed phenomenal advances. Computers have exhibited ever-increasing performance and decreasing costs, making them more affordable and, in turn, accelerating additional software and hardware development that fueled this process even more. The technology that enabled this exponential growth is a combination of advancements in process technology, microarchitecture, architecture, and design and development tools. While the pace of this progress has been quite impressive over the last two decades, it has become harder and harder to keep up this pace. New process technology requires more expensive megafabs and new performance levels require larger die, higher power consumption, and enormous design and validation effort. Furthermore, as CMOS technology continues to advance, microprocessor design is exposed to a new set of challenges. In the near future, microarchitecture has to consider and explicitly manage the limits of semiconductor technology, such as wire delays, power dissipation, and soft errors. In this paper, we describe the role of microarchitecture in the computer world, present the challenges ahead of us, and highlight areas where microarchitecture can help address these challenges. Keywords—Design tradeoffs, microarchitecture, microarchitecture trends, microprocessor, performance improvements, power issues, technology scaling. I.
|
680
|
Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and
– Jouppi
- 1990
|
|
537
|
Cache Memories
– Smith
- 1982
|
|
432
|
Multi-scalar processors
– Sohi, Breach, et al.
- 1995
|
|
344
|
Complexityeffective superscalar processors
– Palacharla, Jouppi, et al.
- 1997
|
|
333
|
Limits of instruction-level parallelism
– Wall
- 1991
|
|
314
|
The Alpha 21264 microprocessor
– Kessler
- 1999
|
|
276
|
An efficient algorithm for exploiting multiple arithmetic units
– Tomasulo
- 1967
|
|
258
|
Trace Cache: A Low Latency Approach to High Bandwidth Instruction Fetching
– Rotenberg, Bennett, et al.
- 1996
|
|
246
|
Exceeding the Dataflow Limit via Value Prediction
– Lipasti, Shen
- 1996
|
|
221
|
The predictability of data values
– Sazeides, Smith
- 1997
|
|
197
|
The simulation and evaluation of dynamic voltage scaling algorithms
– Pering, Burd, et al.
|
|
177
|
DIVA: A Reliable Substrate for Deep Submicron
– Austin
- 1999
|
|
175
|
Design challenges of technology scaling
– Borkar
- 1999
|
|
155
|
Dynamic Instruction Reuse
– Sodani, Sohi
- 1997
|
|
139
|
Assigning Confidence to Conditional Branch Predictions
– Jacobsen, Rotenberg, et al.
- 1996
|
|
137
|
A dynamic multithreading processor
– Akkary, Driscoll
- 1998
|
|
137
|
AR/SMT: A Microarchitectural Approach to Fault Tolerance
– Rotenberg
- 1998
|
|
132
|
The Multicluster Architecture: Reducing Cycle Time ghrough Partitioning
– Farkas, Chow, et al.
- 1997
|
|
103
|
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading
– Lo, Emer, et al.
- 1997
|
|
89
|
Very Long Instruction Word Architectures and the ELI-52
– Fisher
- 1983
|
|
83
|
Streamlining Interoperation Memory Communication via Data Dependence Prediction
– Moshovos, Sohi
- 1997
|
|
73
|
Putting the Fill Unit to Work: Dynamic Optimizations for Trace Cache
– Friendly, Patel, et al.
- 1998
|
|
57
|
Cache and Memory Hierarchy Design: A Performance-Directed Approach
– Przybylski
- 1990
|
|
57
|
Speculation techniques for improving load related instruction scheduling
– Yoaz, Erez, et al.
- 1999
|
|
56
|
A Novel Renaming Scheme to Exploit Value Temporal Locality through Physical Register Reuse and Unification
– Jourdan, Ronen, et al.
- 1998
|
|
55
|
Alternative implementations of hybrid branch predictors
– Chang, Hao, et al.
- 1995
|
|
55
|
Dynamic Flow Instruction Cache Memory Organized around Trace Segments
– Peleg, Weiser
- 1994
|
|
54
|
Disjoint eager execution: An optimal form of speculative execution
– Uht, Sindagi
- 1995
|
|
53
|
Tuning the Pentium Pro Microarchitecture
– Papworth
- 1996
|
|
52
|
Physical Scalability Sabotage Performance Gains
– Matzke, “Will
- 1997
|
|
46
|
Parallel operations in control data 6600
– Thornton
- 1964
|
|
46
|
The block-based trace cache
– Black, Rychlik, et al.
- 1999
|
|
44
|
Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation
– Richardson
- 1992
|
|
41
|
Transmeta Breaks x86 Low-Power Barrier
– Halfhill
- 2000
|
|
40
|
EPIC: Explicitly Parallel Instruction Computing
– Schlansker, Rau
|
|
34
|
Deep-Submicron Microprocessor Design Issues
– Flynn, Hung, et al.
- 1999
|
|
33
|
Power4 focuses on memory bandwidth
– Diefendorff
- 1999
|
|
31
|
One billion transistors, one uniprocessor, one chip
– Patt, Patel, et al.
- 1997
|
|
30
|
Correlated load-address predictors
– Bekerman, Jourdan, et al.
- 1999
|
|
28
|
A study of time-redundant fault tolerance techniques for high-performance pipelined computers
– Sohi, Franklin, et al.
- 1989
|
|
27
|
Interconnect scaling-the real limiter to high performance
– Bohr
- 1995
|
|
24
|
Using Value Prediction to Increase the Power of Speculative Execution Hardware
– Gabbay, Mendelson
- 1998
|
|
24
|
Early load address resolution via register tracking
– Bekerman, Yoaz, et al.
- 2000
|
|
14
|
et al, “High-performance microprocessor design” Solid-State Circuits
– Gronowski
- 1998
|
|
14
|
MLP yes! ILP no
– Glew
- 1998
|
|
14
|
Effective hardware based data prefetching for high-performance processors
– CHEN, BAER
- 1995
|
|
13
|
The Anatomy of a High-Performance Microprocessor: A Systems Perspective
– Shriver, Smith
- 1998
|
|
10
|
An overview of the Intel Pentium processor
– Saini
- 1993
|
|
8
|
Performance Evaluation Corporation
– SPEC
- 1998
|
|
6
|
Branch history table indexing to prevent pipeline bubbles in wide-issue superscalar processors
– Yeh, Patt
- 1993
|