Effective Hardware-based Data Prefetching for High-performance Processors (1995)
| Venue: | IEEE Transactions on Computers |
| Citations: | 180 - 2 self |
BibTeX
@ARTICLE{Chen95effectivehardware-based,
author = {Tien-fu Chen and Jean-loup Baer},
title = {Effective Hardware-based Data Prefetching for High-performance Processors},
journal = {IEEE Transactions on Computers},
year = {1995},
volume = {44},
pages = {609--623}
}
Years of Citing Articles
OpenURL
Abstract
Abstract-Memory latency and bandwidth are progressing at a much slower pace than processor performance. In this paper, we describe and evaluate the performance of three variations of a hardware function unit whose goal is to assist a data cache in prefetching data accesses so that memory latency is hidden as often as possible. The basic idea of the prefetching scheme is to keep track of data access patterns in a Reference Prediction Table (RPT) organized as an instruction cache. The three designs differ mostly on the timing of the prefetching. In the simplest scheme (basic), prefetches can be generated one iteration ahead of actual use. The lookahead variation takes advantage of a lookahead pro-gram counter that ideally stays one memory latency time ahead of the real program counter and that is used as the control mecha-nism to generate the prefetches. Finally the correlated scheme uses a more sophisticated design to detect patterns across loop levels. These designs are evaluated by simulating the ten SPEC benchmarks on a cycle-by-cycle basis. The results show that 1) the three hardware prefetching schemes all yield significant reductions in the data access penalty when compared with regu-lar caches, 2) the benefits are greater when the hardware assist augments small on-chip caches, and 3) the lookahead scheme is the preferred one cost-performance wise. Index Terms-Prefetching, hardware function unit, reference prediction, branch prediction, data cache, cycle-by-cycle simulations. I.







