Download:
by Daniel F. Zucker, Ruby B. Lee, Michael J. Flynn
In System Sciences, 1998, Proceedings of the Thirty-First Hawaii International Conference on
http://www.ee.princeton.edu/~rblee/HPpapers/automatedSWcachePrefetching.pdf
Add To MetaCart
Abstract:
Abstract — As the gap between cycle time and main memory access time increases, memory system per-formance becomes increasingly important. The trend to higher instruction level parallelism with superscalar processors puts even higher demands on the memory system. Prefetching is a common strategy to tolerate this increased memory latency. This paper presents a software only technique to prefetch data to the CPU cache before it is needed in order combat this problem. The software prefetching technique presented is moti-vated by emulation of a hardware stride prediction table (SPT). Performance similar, and in some cases superior, to the hardware based technique is achieved with no additional hardware costs. In the first step, a simulation of the hardware SPT is conducted to identify where useful prefetches are best added. In the next step, soft-ware prefetches are added to the executable code. The technique is automated and could be implemented by a compiler as a two phase optimization of a profile step followed by an optimization step. Data is presented for both SPEC95 and multimedia benchmarks. In the best case, a performance improvement of 2.78X is observed over the same code with no prefetching at no extra hardware costs.
Citations
|
680
|
Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and
– Jouppi
- 1990
|
|
664
|
ATOM: A system for building customized program analysis tools
– Srivastava, Eustace
- 1994
|
|
537
|
Cache Memories
– Smith
- 1982
|
|
455
|
Design and evaluation of a compiler algorithm for prefetching
– Mowry, Lam, et al.
- 1992
|
|
199
|
An effective on-chip preloading scheme to reduce data access penalty
– Baer, Chen
- 1991
|
|
165
|
Evaluating Stream Buffers as a Secondary Cache Replacement
– Palacharla, Kessler
- 1994
|
|
159
|
Effective Hardware-based Data Prefetching for High-performance Processors
– Chen, Baer
- 1995
|
|
135
|
Software methods for improvement of cache performance on supercomputer applications
– Porterfield
- 1989
|
|
120
|
A Performance Study of Software and Hardware Data Prefetching Schemes
– Chen, Baer
- 1994
|
|
110
|
Stride directed prefetching in scalar processors
– Fu, Patel
- 1992
|
|
100
|
Performance of a software mpeg video decoder
– Patel, Smith, et al.
- 1993
|
|
98
|
Data prefetching in multiprocessor vector cache memories
– Fu, Patel
- 1991
|
|
37
|
Prefetch unit for vector operations on scalar computers
– Sklenar
- 1992
|
|
27
|
1995], A Comparison of Hardware Prefetching Techniques for Multimedia Benchmarks
– ZUCKER, FLYNN, et al.
|
|
7
|
RYO: a versatile instruction instrumentation tool for PA-RISC
– Zucker, Karp
- 1995
|
|
7
|
Hardware and software cache prefetching techniques for MPEG benchmarks
– Zucker, Lee, et al.
- 1997
|
|
7
|
Architecture and Arithmetic for Multimedia Enhanced Processors
– Zucker
- 1997
|
|
6
|
Reducing cache miss rates using prediction caches
– Bennett, Flynn
- 1996
|
|
4
|
Latency tolerance for dynamic processors
– Bennett, Flynn
- 1996
|
|
2
|
Wei-Chung Hsu, “Data prefetching on the
– Santhanam, Gornish
- 1997
|
|
1
|
Architecture and Arithmetic for MuMmedia Enhanced Processors
– Zucker
- 1997
|
|
1
|
Wei-Chung Hsu, "Data prefetching on the HP PA-8000
– Santhanam, Gornish
- 1997
|