MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Fine-Grain Priority Scheduling on Multi-Channel Memory Systems (2002) [5 citations — 1 self]

Download:
Download as a PDF
by Zhichun Zhu, Zhao Zhang, Xiaodong Zhang
University of Science of Technology
http://www.cs.wm.edu/hpcs/WWW/HTML/publications/./papers/TR-02-1.pdf
Add To MetaCart

Abstract:

Configurations of contemporary DRAM memory systems become increasingly complex. A recent study [5] shows that application performance is highly sensitive to choices of configurations, and suggests that tuning burst sizes and channel configurations be an effective way to optimize the DRAM performance for a given memory-intensive workload. However, this approach is workload dependent. In this study we show that, by utilizing fine-grain priority access scheduling, we are able to find a workload independent configuration that achieves optimal performance on a multichannel memory system. Our approach can well utilize the available high concurrency and high bandwidth on such memory systems, and effectively reduce the memory stall time of memory-intensive applications. Conducting execution-driven simulation of a 4-way issue, 2 GHz processor, we show that the average performance improvement for fifteen memory-intensive SPEC2000 programs by using an optimized fine-grain priority scheduling is about 13 % and 8 % for a 2-channel and a 4-channel Direct Rambus DRAM memory systems, respectively, compared with gang scheduling. Compared with burst scheduling, the average performance improvement is 16 % and 14 % for the 2-channel and 4-channel memory systems, respectively. 1

Citations

1253 The Simplescalar toolset, version 2.0 – Burger, Austin - 1997
356 The MIPS R10000 superscalar microprocessor – Yeager - 1996
102 Speculative precomputation: Longrange prefetching of delinquent loads – Collins, Wang, et al. - 2001
100 Execution-based Prediction Using Speculative Slices – Zilles, Sohi - 2001
95 Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors – Luk - 2001
88 A bandwidth-efficient architecture for media processing – Rixner, Dally, et al. - 1998
67 MemorySystem Design Considerations for Dynamically-Scheduled Processors – Farkas, Chow, et al. - 1997
52 Data prefetching by dependence graph precomputation – Annavaram, Patel, et al.
52 Memory access scheduling – Rixner, Dally, et al. - 2000
50 Data Prefetch Mechanisms – Vanderwiel, Lilja - 2000
49 Reducing DRAM Latencies with an Integrated Memory Hierarchy Design – Lin - 2001
39 The PowerPC 604 RISC microprocessor – Song, Denman, et al. - 1994
38 Access ordering and memory-conscious cache utilization – Wulf - 1995
34 Dynamically allocating processor resources between nearby and distant ILP – Balasubramonian, Dwarkadas, et al. - 2001
26 Access Ordering and Effective Memory Bandwidth – Moyer - 1993
24 Design of a parallel vector access unit for SDRAM memory systems – Mathew, McKee, et al. - 2000
23 Access order and effective bandwidth for streams on a direct rambus memory – Hong, McKee, et al. - 1999
19 A Permutation-Based Page Interleaving Scheme to Reduce Row-Buffer Conflicts and Exploit Data Locality,” Proc. 33rd Int’l Symp. Microarchitecture – Zhang, Zhu, et al. - 2000
18 latency, or system overhead: Which has the largest impact on uniprocessor dram-system performance – Cuppu, Jacob, et al.