| McKee, S.A., Klenke, R.H., Schwab, A.J., Wulf, Wm.A., Moyer, S.A., Hitchcock, C., Aylor, J.H., "Experimental Implementation of Dynamic Access Ordering", Proc. HICSS-27, Maui, HI, January 1994; also University of Virginia, TR CS-93-42, August 1993. |
....of the Obvious Appeared in Computer Architecture News, 23(1) 20 24, March 1995. 4 Our prediction of the memory wall is probably wrong too but it suggests that we have to start thinking out of the box . All the techniques that the authors are aware of, including ones we have proposed [McK94, McK94a], provide one time boosts to either bandwidth or latency. While these delay the date of impact, they don t change the fundamentals. The most convenient resolution to the problem would be the discovery of a cool, dense memory technology whose speed scales with that of processors. We are not aware ....
S.A. McKee, et. al., "Experimental Implementation of Dynamic Access Ordering", Proc. 27th Hawaii International Conference on System Sciences, Maui, HI, January 1994.
....performance has increased by 50 to 100 percent in the last decade, while memory performance has increased by only 10 to 15 percent. Additional hardware support such as larger, faster caches [Joup90] software assisted caches [Call91] speculative loads [Roge92] stream memory controllers [McKe94], and machines with wider memory buses, helps, but the problem is serious enough that performance gains by any approach, including software, are worth pursuing. Furthermore, even with additional hardware, processors often do not obtain anywhere near their peak performance with respect to their ....
McKee, S. A., Klenke, R. H., Schwab, A. J., Wulf, W. A., Moyer, S. A., and Aylor, J. H., "Experimental Implementation of Dynamic Access Ordering", Proceedings of the 27th Annual Hawaii International Conference on System Sciences, Maui, HI, January 1994.
....Other methods of reducing the effect of the CPUmemory performance gap have been suggested. Previous research has examined the impact of allowing loads to be performed ahead of earlier accesses, through techniques such as a write buffer [20, 13] and nonblocking caches [5] McKee, Wulf et al. [17, 18] examined memory access ordering in the area of multiple stream buffers in a vector computer. The decoupled access execute [21, 12] architectures process memory accesses and scalar computations in different units to allow explicit overlap of computation and memory access overhead. Lipasti et al. ....
Sally A. McKee, Robert H. Klenke, Andrew J. Schwab, Wm. A. Wulf, Steven A. Moyer, James H. Aylor, and Charles Y. Hitchcock. "Experimental Implementation of Dynamic Access Ordering," In Proceedings of the 27th Hawaii International Conference on Systems Sciences (HICSS-27), January 1994.
....ordering statically at compile time. McK93a] proposes a combined hardware software scheme for implementing access ordering dynamically at run time, and presents numerous simulation results demonstrating its effectiveness. The hardware part of this solution is the Stream Memory Controller (SMC) [McK93b]. Here we develop an analytical model to bound SMC performance. 2. Access Ordering Memory components are usually assumed to require about the same amount of time to access any random location, but this assumption no longer applies to modern memory devices: most components manufactured in the ....
....beneficial impact of access ordering on effective memory bandwidth together with the limitations inherent in implementing the technique statically motivate us to consider an implementation that reorders accesses dynamically at run time. What follows is an overview of the architecture proposed in [McK93b, McK93c]: see those documents for more details. Our discussion is based on the simplified architecture of Figure 1. In this system, memory is interfaced to the processors through a controller labeled MSU for Memory Scheduling Unit. The MSU includes logic to issue memory requests as well as logic to ....
[Article contains additional citation context not shown here]
McKee, S.A., Klenke, R.H., Schwab, A.J., Wulf, Wm.A., Moyer, S.A., Hitchcock, C., Aylor, J.H., "Experimental Implementation of Dynamic Access Ordering", University of Virginia, TR CS-93-42, August 1993. To appear in Proc. HICSS-27, Maui, HI, January 1994.
....One way to do this is via access ordering, which we define as any technique for changing the order of memory requests to increase bandwidth. Here we are especially concerned with ordering a set of vector like stream accesses. For a more thorough discussion of access ordering, see [Moy92, Moy93, McK93a, McK93b]. The performance benefits of doing such static access ordering can be quite dramatic [Moy92, Moy93] but without the kinds of address alignment information that are usually only available at run time, the compiler can t generate the optimal access sequence. The extent to which a compiler can ....
....is that we reorder stream accesses to exploit the architectural and component features that make memory systems sensitive to the sequence of requests. 3. The Stream Memory Controller The design space of access ordering systems and the rationale for the approach presented here is discussed in [McK93a, McK93b]. The approach we suggest is generally applicable to any uniprocessor computing system, but will be described based on the simplified architecture of Figure 1. Memory is interfaced to the processor through a controller labeled MSU for Memory Scheduling Unit. The MSU includes logic to issue ....
McKee, S.A., Klenke, R.H., Schwab, A.J., Wulf, Wm.A., Moyer, S.A., Hitchcock, C., Aylor, J.H., "Experimental Implementation of Dynamic Access Ordering", University of Virginia, TR CS-93-42, August 1993.
....used for numeric problems. Such problems do not show high degree of data reuse, and therefore render caches ineffective. Consequently, researchers have begun to focus on organizations and technologies like software assisted caches [Call91] speculative loads [Smit92] and stream memory controllers [McKe94]. Most software approaches that tackle the memory bandwidth problem focus on reducing the memory bandwidth requirements of a program. One of the fundamental compiler optimizations for reducing a program s memory bandwidth requirements is register allocation [Chai81, Chow90] This technique ....
McKee, S. A., Klenke, R. H., Schwab, A. J., Wulf, W. A., Moyer, S. A., and Aylor, J. H., "Experimental Implementation of Dynamic Access Ordering", Proceedings of the 27th Annual Hawaii International Conference on System Sciences, Maui, HI, January 1994, pp. 431-440.
....transmit information about them to the hardware at run time, and Benitez and Davidson s recurrence detection and optimization algorithm [Ben91] can be used to do this. With respect to the fourth item, the hardware development project has proceeded in parallel with the investigations discussed here [Alu95,Lan95a,Lan95b,McG94,McK94a]. At the time of this writing, an initial implementation has been fabricated and is being tested. Gatelevel and back annotated hardware timing simulations indicate that this design meets its specifications. The following chapters address the remaining tasks: developing analytic performance models ....
....general structure of the dissertation is illustrated by the tree shown in Figure 1.5: Chapter 1: Introduction 10 Some of our results have been published previously. The uniprocessor SMC architecture and parts of the corresponding simulation results from Chapter 2 and Chapter 3 were described in [McK94a,McK94b,McK95b]. The analytic models in Chapter 3 and Chapter 4 and a description of the Symmetric Multiprocessor SMC organization introduced in Chapter 4 were first presented in [McK95b] Parts of the results in Chapter 2 appear in [McK95a] Complete results for the functional simulations and analytic models ....
S.A. McKee, R.H. Klenke, A.J. Schwab, Wm.A. Wulf, S.A. Moyer, C. Hitchcock, and J.H. Aylor, "Experimental Implementation of Dynamic Access Ordering", Proceedings of the IEEE 27th Hawaii International Conference on Systems Sciences (HICSS-27), pages 431-440, Maui, HI, January 1994.
....One way to do this is via access ordering, which we define as any technique for changing the order of memory requests to increase bandwidth. Here we are especially concerned with ordering a set of vector like stream accesses. For a more thorough discussion of access ordering, see [Moy92, Moy93, McK93a, McK93b]. The performance benefits of doing such static access ordering can be quite dramatic [Moy92, Moy93] but without the kinds of address alignment information that are usually only available at run time, the compiler can t generate the optimal access sequence. The extent to which a compiler can ....
....here is that we reorder stream accesses to exploit the architectural and component features that make memory systems sensitive to the sequence of requests. 3. The Stream Memory Controller The design space of access ordering systems and the rationale for the approach presented here is discussed in [McK93a, McK93b]. The approach we suggest is generally applicable to any uniprocessor computing system, but will be described based on the simplified architecture of Figure 1. Memory is interfaced to the processor through a controller labeled MSU for Memory Scheduling Unit. The MSU includes logic to issue ....
McKee, S.A., Klenke, R.H., Schwab, A.J., Wulf, Wm.A., Moyer, S.A., Hitchcock, C., Aylor, J.H., "Experimental Implementation of Dynamic Access Ordering", University of Virginia, TR CS-93-42, August 1993.
....A 71,000 transistor ASIC has been designed and fabricated, and is currently being tested and used to verify expected SMC performance gains. Our results indicate that the fabricated SMC can deliver the expected bandwidth improvements for inner loops of important streaming computations [6] [10], 11] Our need to use graduate students, our experience and access to MGC tools, and the necessity to use a particular IC fabrication process (0.75 m HP through MOSIS) forced us to use tools that were not tightly integrated. This led to the development of the design and revision process ....
McKee, S.A., Klenke, R.H., Schwab, A.J., Wulf, Wm.A., Moyer, S.A., Hitchcock, C., Aylor, J.H., "Experimental Implementation of Dynamic Access Ordering", Proc. HICSS-27, Maui, HI, January 1994.
....we develop analytic models that bound the performance of any uniprocessor or symmetric multiprocessor memory system on streams. We present highlights of these results, comparing them to the performance of a scheme we have proposed for accessing stream data the Stream Memory Controller (SMC) [McK94a, McK94b]. There are two independent comparisons: a bus level simulation, and a gatelevel simulation of the SMC s VHDL description. Both forms predict the SMC consistently delivers nearly the maximum attainable bandwidth determined by the analytic bounds. While not reported here, preliminary tests of the ....
....1 gcd b stride , Appeared in Proceedings of Europar 95, Stockholm, Sweden, August 1995. Lecture Notes in Computer Science 966, S. Haridi, et al. Eds. Springer Verlag, Berlin, 1995, pages 83 99. 12 [McK94a,McK94b]. Complete shared memory multiprocessor results can be found in [McK94c] Since our concern here is to correlate the performance bounds of our analytic model with our functional simulation results, we present only the maximum percentage of peak bandwidth attained by any order issue policy ....
McKee, S.A., et. al., "Experimental Implementation of Dynamic Access Ordering", Proc. 27th Hawaii International Conference on Systems Sciences, Jan. 1994.
....address, stride, length, and data size) and a set of high speed buffers holds stream operands. The stream buffers are implemented logically as a set of FIFOs, with each stream assigned to one FIFO. Detailed performance models and simulation results for this organization are presented elsewhere [23, 24, 25]. What follows is an approximate model to determine memory performance for a single vector of a computation. Accurate prediction requires knowledge of the entire computation, since performance for each stream depends on the nature and number of other streams. Let be the FIFO depth in vector ....
McKee, S.A., et.al., "Experimental Implementation of Dynamic Access Ordering", HICSS-27, Maui, HI, January, 1994.
No context found.
McKee, S.A., Klenke, R.H., Schwab, A.J., Wulf, Wm.A., Moyer, S.A., Hitchcock, C., Aylor, J.H., "Experimental Implementation of Dynamic Access Ordering", Proc. HICSS-27, Maui, HI, January 1994; also University of Virginia, TR CS-93-42, August 1993.
No context found.
McKee, S.A., Klenke, R.H., Schwab, A.J., Wulf, Wm.A., Moyer, S.A., Hitchcock, C., Aylor, J.H., "Experimental Implementation of Dynamic Access Ordering", University of Virginia, TR CS-93-42, August 1993. In Proc. HICSS-27, Maui, HI, January 1994.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC