MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  The Relative Importance of Memory Latency, Bandwidth, and Branch Limits to Performance (1997) [23 citations — 0 self]

Download:
pdf | ps
by Parthasarathy Ranganathan
In The Workshop on Mixing Logic and DRAM: Chips that Compute and Remember
http://www-ece.rice.edu/~parthas/publications/iram97.ps
Add To MetaCart

Abstract:

This study investigates the relative importance of memory latency, memory bandwidth, and branch predictability in determining limits to processor performance. We use an aggressive simulation model with few other limits to study the performance of SPEC92 benchmarks. Our basic machine model assumes a dynamically scheduled processor with a 16536 entry instruction window. Up to 16536 instructions of any type can be issued each cycle, subject to data dependencies. In systems with unlimited memory bandwidth and perfect branch predictability, we find that memory latency is not a significant limit to performance until it exceeds 100 to 200 cycles. Memory bandwidth is not usually a significant limit either. In systems with memory latency of 16 cycles and perfect branch predictability, many applications require less than 6 bytes per cycle, while all but one perform well if 100 bytes per cycle are available. Based on current trends in the semiconductor industry and current research in packaging and interface technology, we expect these latency requirements and bandwidth requirements to be achievable in future processors. By far, the biggest limit to performance was branch unpredictability. The use of one of the best existing branch predictors provided with very large table sizes resulted in performance three to five times less than that achievable with perfect branch prediction for many benchmarks. Additionally, many benchmarks had significantly lower performance than the perfect case even with a futuristic branch predictor (with a mix of 75 % perfect and 25 % existing) was simulated. These results suggest that improved branch prediction techniques are crucial for improving future uniprocessor performance.

Citations

664 ATOM: A system for building customized program analysis tools – Srivastava, Eustace - 1994
521 Combining branch predictors – McFarling - 1993
356 The MIPS R10000 superscalar microprocessor – Yeager - 1996
333 Limits of instruction-level parallelism – Wall - 1991
172 Hitting the memory wall: Implications of the obvious – Wulf, McKee - 1995
55 Alternative implementations of hybrid branch predictors – Chang, Hao, et al. - 1995
3 Graffiti on "The Memory Wall – Johnson - 1995
1 Single instruction parallelism is greater than two – Butler, Yeh, et al. - 1991