Download:
|
by Zheng Zhang, Josep Torrellas
In Proceedings of the 22nd Annual International Symposium on Computer Architecture
http://polaris.cs.uiuc.edu/reports/1466.ps.gz
Add To MetaCart
Abstract:
While many parallel applications exhibit good spatial locality, other important codes in areas like graph problem-solving or CAD do not. Often, these irregular codes contain small records accessed via pointers. Consequently, while the former applications benefit from long cache lines, the latter prefer short lines. One good solution is to combine short lines with prefetching. In this way, each application can exploit the amount of spatial locality that it has. However, prefetching, if provided, should also work for the irregular codes. This paper presents a new prefetching scheme that, while usable by regular applications, is specifically targeted to irregular ones:
Citations
|
705
|
SPLASH: Stanford Parallel Applications for Shared Memory
– Singh, Weber, et al.
- 1992
|
|
680
|
Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and
– Jouppi
- 1990
|
|
455
|
Design and evaluation of a compiler algorithm for prefetching
– Mowry, Lam, et al.
- 1992
|
|
264
|
Tolerating Latency Through SoftwareControlled Prefetching in Shared-Memory Multiprocessors
– Mowry, Gupta
- 1991
|
|
240
|
Software prefetching
– Callahan, Kennedy, et al.
- 1991
|
|
199
|
An effective on-chip preloading scheme to reduce data access penalty
– Baer, Chen
- 1991
|
|
156
|
An architecture for software-controlled data prefetching
– Klaiber, Levy
- 1991
|
|
120
|
A Performance Study of Software and Hardware Data Prefetching Schemes
– Chen, Baer
- 1994
|
|
107
|
Analysis of cache invalidation patterns in multiprocessors
– Weber, Gupta
- 1989
|
|
98
|
Data prefetching in multiprocessor vector cache memories
– Fu, Patel
- 1991
|
|
97
|
False sharing and spatial locality in multiprocessor caches
– Torrellas, Lam, et al.
- 1994
|
|
88
|
Compiler-directed data prefetching in multiprocessors with memory hierarchies
– Gornish, Granston, et al.
- 1990
|
|
67
|
Data access microarchitectures for superscalar processors with compiler-assisted data prefetching
– Chen, Mahlke, et al.
- 1991
|
|
62
|
Simulation of Multiprocessors: Accuracy and Performance
– Goldschmidt
- 1993
|
|
55
|
Adjustable block size coherent caches
– Dubnicki, LeBlanc
- 1992
|
|
40
|
The Performance Advantages of Integrating Block Data Transfer
– Woo, Singh, et al.
- 1996
|
|
20
|
Fixed and adaptive sequential prefetching in shared-memory multiprocessors
– Dahlgren, Dubois, et al.
- 1993
|
|
18
|
Performance Evaluation of Hybrid Hardware and Software Distributed Shared Memory Protocols
– Chandra, Gharachorloo, et al.
- 1994
|