MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Memory system support for image processing (1999) [14 citations — 7 self]

Download:
Download as a PDF
by Lixin Zhang, John B. Carter, Wilson C. Hsieh, Sally A. Mckee
In Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
http://www.cs.utah.edu/techreports/1999/pdf/UUCS-99-002.pdf
Add To MetaCart

Abstract:

Processor speeds are increasing rapidly, but memory speeds are not keeping pace. Image processing is an important application domain that is particularly impacted by this growing performance gap. Image processing algorithms tend to have poor memory locality because they access their data in a non-sequential fashion and reuse that data infrequently. As a result, they often exhibit poor cache and TLB hit rates on conventional memory systems, which limits overall performance. Most current approaches to addressing the memory bottleneck focus on modifying cache organizations or introducing processor-based prefetching. The Impulse memory system takes a different approach: allowing application software to control how, when, and where data are loaded into a conventional processor cache. Impulse does this by letting software configure how the memory controller interprets the physical addresses exported by a processor. Introducing an extra level of address translation in the memory controller enables an application to dynamically change how its data are fetched from memory. Data that is sparse in memory can be accessed densely, which improves both cache and TLB utilization, and Impulse hides memory latency by prefetching data within the memory controller. We describe how Impulse improves the performance of three image processing algorithms: volume rendering, image warping, andimage filtering. We find that for these codes, an Impulse memory system yields speedups of 40 % to 226 % over an otherwise identical machine with a conventional memory system.

Citations

680 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and – Jouppi - 1990
371 Fast volume rendering using a shear–warp factorization of the viewing transformation – Lacroute, Levoy - 1994
359 The Tera Computer System – Alverson, Callahan, et al. - 1990
322 Digital Image Warping – Wolberg - 1990
165 Evaluating Stream Buffers as a Secondary Cache Replacement – Palacharla, Kessler - 1994
159 Effective Hardware-based Data Prefetching for High-performance Processors – Chen, Baer - 1995
158 Memory bandwidth limitations of future microprocessors – Burger, Goodman, et al. - 1996
115 Synchronization and communication in the T3E multiprocessor – Scott - 1996
108 Interactive ray tracing for isosurface rendering – Parker, Shirley, et al. - 1998
97 Simultaneous multithreading: a platform for next-generation processors – Eggers, Emer, et al. - 1997
88 A bandwidth-efficient architecture for media processing – Rixner, Dally, et al. - 1998
75 A case for two-way skewed-associative caches – Seznec - 1993
67 MemorySystem Design Considerations for Dynamically-Scheduled Processors – Farkas, Chow, et al. - 1997
63 Impulse: Building a smarter memory controller – Carter, Hsieh, et al. - 1999
53 Sequential Hardware Prefetching in Shared-Memory Multiprocessors – Dahlgren, Dubois, et al. - 1995
41 The impact of instruction-level parallelism on multiprocessor performance and simulation methodology – Pai, Ranganathan, et al. - 1997
39 3-D Transformations of Images in Scanline Order – Catmull, Smith - 1980
39 Increasing TLB Reach Using Superpages Backed by Shadow Memory – Swanson, Stoller, et al. - 1998
38 A case for intelligent RAM: IRAM – Patterson, Anderson, et al. - 1997
32 Data relocation and prefetching for programs with large data sets – Yamada, Gyllenhall, et al. - 1994
28 Command Vector Memory Systems: High Performance at Low Cost – Corbal, Espasa, et al. - 1998
25 Active pages: a model of computation for intelligent memory – Oskin, Chong, et al. - 1998
19 Image Processing for Computer Graphics – Gomes, Velho - 1997
17 Paint: PA instruction set interpreter – Stoller, Kuramkote, et al. - 1996
17 Design and evaluation of dynamic access ordering hardware – McKee - 1996
15 ªArchitectural Adaptation for Application-Specific Locality Optimizations,º – Zhang, Dasdan, et al. - 1997
14 et al. Design and evaluation of dynamic access ordering hardware – McKee - 1996
13 Performance Study of a Concurrent Multithreaded Processor – Tsai, Jiang, et al. - 1998
12 A new memory system design for commercial and technical computing products – Hotchkiss, Marschke, et al. - 1996
10 et al.: Simultaneous Multithreading: A Platform for Next-Generation Processors – Eggers - 1997