Impulse is a memory system architecture that adds an optional level of address indirection at the memory controller. Applications can use this level of indirection to remap their data structures in memory. As a result, they can control how their data is accessed and cached, which can improve cache and bus utilization. The Impulse design does not require any modification to processor, cache, or bus designs, since all the functionality resides at the memory controller. As a result, Impulse can be adopted in conventional systems without major system changes. We describe the design of the Impulse architecture and show how an Impulse memory system can be used in a variety of ways to improve the performance of memory-bound applications. Impulse can be used to dynamically create superpages cheaply, to dynamically recolor physical pages, to perform strided fetches, and to perform gathers and scatters through indirection vectors. Our performance results demonstrate the effectiveness of these optimizations in a variety of scenarios. Using Impulse can speed up a range of applications from 20 % to over a factor of 5. Alternatively, Impulse can be used by the OS for dynamic superpage creation; the best policy for creating superpages using Impulse outperforms previously known superpage creation policies.
|
3148
|
Computer Architecture: A Quantitative Approach
– Hennessy, Patterson
- 1996
|
|
680
|
Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and
– Jouppi
- 1990
|
|
664
|
ATOM: A system for building customized program analysis tools
– Srivastava, Eustace
- 1994
|
|
487
|
The cache performance and optimizations of blocked algorithms
– LAM, ROTHBERG, et al.
- 1991
|
|
371
|
Fast volume rendering using a shear–warp factorization of the viewing transformation
– Lacroute, Levoy
- 1994
|
|
359
|
The Tera Computer System
– Alverson, Callahan, et al.
- 1990
|
|
322
|
Digital Image Warping
– Wolberg
- 1990
|
|
245
|
Lightweight Remote Procedure Call
– Bershad, Anderson, et al.
- 1990
|
|
165
|
Evaluating Stream Buffers as a Secondary Cache Replacement
– Palacharla, Kessler
- 1994
|
|
158
|
Memory bandwidth limitations of future microprocessors
– Burger, Goodman, et al.
- 1996
|
|
108
|
Interactive ray tracing for isosurface rendering
– Parker, Shirley, et al.
- 1998
|
|
98
|
Surpassing the TLB performance of superpages with less operating system support
– Talluri, Hill
- 1994
|
|
95
|
To copy or not to copy: A compile-time technique for assesing when data copying should be used to eliminate cache conflicts
– Temam, Granston, et al.
- 1993
|
|
93
|
et al. The NAS Parallel Benchmarks
– Bailey
- 1991
|
|
93
|
Avoiding conflict misses dynamically in large direct-mapped caches
– Bershad, Lee, et al.
- 1994
|
|
88
|
A bandwidth-efficient architecture for media processing
– Rixner, Dally, et al.
- 1998
|
|
72
|
RSIM reference manual, version 1.0
– Pai, Ranganathan, et al.
- 1997
|
|
67
|
MemorySystem Design Considerations for Dynamically-Scheduled Processors
– Farkas, Chow, et al.
- 1997
|
|
63
|
The Organization and Use of Parallel Memories
– Budnik, Kuck
- 1971
|
|
63
|
Impulse: Building a smarter memory controller
– Carter, Hsieh, et al.
- 1999
|
|
61
|
A simulation based study of TLB performance
– Chen, Borg, et al.
- 1992
|
|
52
|
Reducing TLB and Memory Overhead Using Online Superpage Promotion
– Romer, Ohlrich, et al.
- 1995
|
|
43
|
et al, “Baring it all to software: Raw machines
– Waingold
- 1997
|
|
39
|
3-D Transformations of Images in Scanline Order
– Catmull, Smith
- 1980
|
|
39
|
Increasing TLB Reach Using Superpages Backed by Shadow Memory
– Swanson, Stoller, et al.
- 1998
|
|
38
|
Access ordering and memory-conscious cache utilization
– Wulf
- 1995
|
|
36
|
Software-managed address translation
– Jacob, Mudge
- 1997
|
|
34
|
Virtual Memory Support for Multiple Page Sizes
– Khalidi, Talluri, et al.
- 1993
|
|
32
|
Shen,”The Intrinsic Bandwidth Requirements of Ordinary Programs”, Architectural Support for Programming Languages and Operating Systems VII
– Huang, P
- 1996
|
|
26
|
Spark98: Sparse matrix kernels for shared memory and message passing systems
– O’Hallaron
- 1997
|
|
25
|
Active pages: a model of computation for intelligent memory
– Oskin, Chong, et al.
- 1998
|
|
20
|
A Look at Several Memory Management Units, TLB-Refill Mechanisms, and Page Table Organizations
– Jacob, Mudge
- 1998
|
|
20
|
Recency-based TLB preloading
– Saulsbury, Dahlgren, et al.
- 2000
|
|
19
|
Image Processing for Computer Graphics
– Gomes, Velho
- 1997
|
|
15
|
Control data STAR-100 processor design
– Hintz, Tate
- 1972
|
|
15
|
Big memories on the desktop
– Mogul
- 1993
|
|
15
|
URSIM Reference Manual
– Zhang
|
|
15
|
ªArchitectural Adaptation for Application-Specific Locality Optimizations,º
– Zhang, Dasdan, et al.
- 1997
|
|
14
|
Memory system support for image processing
– Zhang, Carter, et al.
- 1999
|
|
9
|
Kozyrakis et al. Scalable processors in the billion-transistor era: IRAM
– E
- 1997
|
|
5
|
Data-Intensive System Benchmark Suite Analysis and Specification. Atlantic Aerospace Electronics Corp
– Manke, Wu
- 1999
|
|
5
|
Using Virtual Memory to Improve Cache and TLB Performance
– Romer
- 1998
|
|
4
|
Software prefetching and caching for translation buffers
– Bala, Kaashoek, et al.
- 1994
|
|
3
|
Revisiting superpage promotion with hardware support
– Fang, Zhang, et al.
- 2001
|