Download:
|
by Lixin Zhang, Sally A. Mckee, Wilson C. Hsieh, John B. Carter
ftp://ftp2.cs.utah.edu/pub/users/sam/isca00ws.ps.gz
Add To MetaCart
Abstract:
Prefetching has long been used to mask the latency of memory loads. This paper presents results for an initial implementation of pointer-based prefetching within the Impulse adaptable memory controller. We conduct our experiments on a four-way issue superscalar machine. For the microbenchmarks we examine, we consistently realize about a 20 % improvement in execution time for linked data structures accessed within medium to short loop iterations. This compares favorably to software prefetching when the data working set fits in cache, and exceeds the performance of the latter technique for large working sets. We also find that a superscalar, outof-order processor hides the memory latency of linked data structures accessed in large loop iterations exceptionally well, which makes any pointer prefetching unnecessary.
Citations
|
680
|
Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and
– Jouppi
- 1990
|
|
455
|
Design and evaluation of a compiler algorithm for prefetching
– Mowry, Lam, et al.
- 1992
|
|
199
|
An effective on-chip preloading scheme to reduce data access penalty
– Baer, Chen
- 1991
|
|
165
|
Compiler-based prefetching for recursive data structures
– Luk, Mowry
- 1996
|
|
137
|
Dependence Based Prefetching for Linked Data Structures
– Roth, Moshovos, et al.
- 1998
|
|
115
|
Synchronization and communication in the T3E multiprocessor
– Scott
- 1996
|
|
98
|
Data prefetching in multiprocessor vector cache memories
– Fu, Patel
- 1991
|
|
84
|
Effective jump-pointer prefetching for linked data structures
– Roth, Sohi
- 1999
|
|
72
|
RSIM reference manual, version 1.0
– Pai, Ranganathan, et al.
- 1997
|
|
67
|
MemorySystem Design Considerations for Dynamically-Scheduled Processors
– Farkas, Chow, et al.
- 1997
|
|
65
|
A performance comparison of contemporary DRAM architectures
– Cuppu, Jacob, et al.
- 1999
|
|
63
|
Impulse: Building a smarter memory controller
– Carter, Hsieh, et al.
- 1999
|
|
60
|
A Prefetching Technique for Irregular Accesses to Linked Data Structures
– Karlsson, Dahlgren, et al.
- 2000
|
|
53
|
Sequential Hardware Prefetching in Shared-Memory Multiprocessors
– Dahlgren, Dubois, et al.
- 1995
|
|
43
|
An effective programmable prefetch engine for on-chip caches
– Chen
- 1995
|
|
39
|
Examination of a Memory Access Classification Scheme for Pointer-Intensive and Numeric Programs
– Mehrotra, Harrison
- 1996
|
|
39
|
Increasing TLB Reach Using Superpages Backed by Shadow Memory
– Swanson, Stoller, et al.
- 1998
|
|
37
|
Prefetch unit for vector operations on scalar computers
– Sklenar
- 1992
|
|
33
|
et al. Internal Organization of the Alpha 21164, a 300-MHz 64-bit Quad–issue
– Edmondson
- 1995
|
|
21
|
A vectorizing, software pipelining compiler for LIW and superscalar architecture
– Meadows, Nakamoto, et al.
- 1992
|
|
17
|
Design and evaluation of dynamic access ordering hardware
– McKee
- 1996
|
|
15
|
URSIM Reference Manual
– Zhang
|
|
14
|
The NAS860 library user's manual
– LEE
- 1993
|
|
14
|
Code Restructuring to Exploit Page Mode and Read-Ahead
– Palacharla, Kessler
- 1995
|
|
14
|
Memory system support for image processing
– Zhang, Carter, et al.
- 1999
|
|
11
|
Single PE optimization techniques for the cray T3D system
– Brooks
- 1995
|
|
5
|
DRAM on-chip caching
– Wong, Baer
- 1997
|
|
3
|
Distributed prefetchbuffer /cache design for high performance memory systems
– Alexander, Kedem
- 1996
|
|
2
|
A comparison of online superpage promotion policies
– Fang, Zhang, et al.
- 2000
|