The Relative Importance of Memory Latency, Bandwidth, and Branch Limits to Performance (1997) [23 citations — 0 self]
Abstract:
This study investigates the relative importance of memory latency, memory bandwidth, and branch predictability in determining limits to processor performance. We use an aggressive simulation model with few other limits to study the performance of SPEC92 benchmarks. Our basic machine model assumes a dynamically scheduled processor with a 16536 entry instruction window. Up to 16536 instructions of any type can be issued each cycle, subject to data dependencies. In systems with unlimited memory bandwidth and perfect branch predictability, we find that memory latency is not a significant limit to performance until it exceeds 100 to 200 cycles. Memory bandwidth is not usually a significant limit either. In systems with memory latency of 16 cycles and perfect branch predictability, many applications require less than 6 bytes per cycle, while all but one perform well if 100 bytes per cycle are available. Based on current trends in the semiconductor industry and current research in packaging and interface technology, we expect these latency requirements and bandwidth requirements to be achievable in future processors. By far, the biggest limit to performance was branch unpredictability. The use of one of the best existing branch predictors provided with very large table sizes resulted in performance three to five times less than that achievable with perfect branch prediction for many benchmarks. Additionally, many benchmarks had significantly lower performance than the perfect case even with a futuristic branch predictor (with a mix of 75 % perfect and 25 % existing) was simulated. These results suggest that improved branch prediction techniques are crucial for improving future uniprocessor performance.
Citations
| 664 | ATOM: A system for building customized program analysis tools – Srivastava, Eustace - 1994 |
| 521 | Combining branch predictors – McFarling - 1993 |
| 356 | The MIPS R10000 superscalar microprocessor – Yeager - 1996 |
| 333 | Limits of instruction-level parallelism – Wall - 1991 |
| 172 | Hitting the memory wall: Implications of the obvious – Wulf, McKee - 1995 |
| 55 | Alternative implementations of hybrid branch predictors – Chang, Hao, et al. - 1995 |
| 3 | Graffiti on "The Memory Wall – Johnson - 1995 |
| 1 | Single instruction parallelism is greater than two – Butler, Yeh, et al. - 1991 |

