by S. A. Mckee, C. W. Oliver, Wright J. H. Aylor, Sally A. Mckee, Sally A. Mckee, Christopher W. Oliver, Christopher W. Oliver, Wm. A. Wulf, Wm. A. Wulf, Wm. A. Wulf, Kenneth L. Wright, Kenneth L. Wright, James H. Aylor, James H. Aylor
In Proc. International Conference on Supercomputing
ftp://ftp.cs.virginia.edu/pub/techreports/CS-95-46.ps.Z
Add To MetaCart
Abstract:
Memory bandwidth is rapidly becoming the limiting performance factor for many applications, particularly for streaming computations--- such as scientific vector processing or multimedia (de)compression--- that lack the locality of reference that makes caching effective. We describe and evaluate a system that addresses the memory bandwidth problem for this class of computations by dynamically reordering stream accesses to exploit memory system architecture and device features. The technique is practical to implement, using existing compiler technology and requiring only a modest amount of special-purpose hardware. With our prototype system, we have observed performance improvements by over 200 % over normal caching.
Citations
|
680
|
Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and
– Jouppi
- 1990
|
|
617
|
A set of level 3 basic linear algebra subprograms
– Dongarra, Croz, et al.
- 1990
|
|
165
|
Evaluating Stream Buffers as a Secondary Cache Replacement
– Palacharla, Kessler
- 1994
|
|
140
|
The Livermore Fortran kernels: a computer test of the numerical performance range
– McMahon
- 1986
|
|
55
|
PIPE: A VLSI decoupled architecture
– Goodman, Hsieh, et al.
- 1985
|
|
51
|
Code Generation for Streaming: An Access/Execute Mechanism
– Benitez, Davidson
- 1991
|
|
44
|
Evaluation of the wm architecture
– Wulf
- 1992
|
|
38
|
Access ordering and memory-conscious cache utilization
– Wulf
- 1995
|
|
26
|
Access Ordering and Effective Memory Bandwidth
– Moyer
- 1993
|
|
21
|
Improving Memory Bandwidth for Streamed Computations
– McKee
- 1995
|
|
19
|
Access Ordering and Memory-Conscious Cache Utilization
– McKee, Wulf
- 1995
|
|
14
|
Code Restructuring to Exploit Page Mode and Read-Ahead
– Palacharla, Kessler
- 1995
|
|
11
|
Single PE optimization techniques for the cray T3D system
– Brooks
- 1995
|
|
10
|
Memory Bandwidth Optimizations for Wide-Bus Machines
– Alexander, Bailey, et al.
- 1993
|
|
8
|
A Vectorizing Software Pipelining Compiler for
– Meadows, Nakamoto, et al.
- 1992
|
|
5
|
The NAS860 Library User's Manual", NAS
– Lee
- 1993
|
|
4
|
An Approach for Optimizing Synthesized HighSpeed ASICs
– Landon, Klenke, et al.
- 1995
|
|
4
|
Design of a Processor Bus Interface ASIC for the Stream Memory Controller
– McGee, Klenke, et al.
- 1994
|
|
3
|
Breaking the Memory Bottleneck, Parts 1 & 2", Supercomputing Review
– Loshin, Budge
- 1992
|
|
3
|
Single PE Optimization Techniques for the Cray T3D
– Brooks
- 1995
|
|
3
|
Breaking the Memory Bottleneck, Parts 1 & 2”, Supercomputing Review, Jan./Feb
– Loshin, Budge
- 1992
|
|
2
|
Avoiding Irreproducible Results: Modeling the Stream Memory Controller
– McKee, Weikle, et al.
- 1995
|
|
2
|
Memory Bandwidth Optimizations for Wide-Bus
– Alexander, Bailey, et al.
- 1993
|
|
2
|
The NAS860 Library User’s Manual”, NAS
– Lee
- 1993
|
|
1
|
High-speed DRAMs", EDN, May 23
– Quinnell
- 1991
|
|
1
|
High-speed DRAMs”, EDN
– Quinnell
- 1991
|