| S. A. McKee, "Hardware support for dynamic access ordering: Performance of some design options, Tech. Rep. CS-93-08, 9, 1993. |
.... Multiprocessor SMC organization introduced in Chapter 4 were first presented in [McK95b] Parts of the results in Chapter 2 appear in [McK95a] Complete results for the functional simulations and analytic models presented in Chapter 2 through Chapter 5 can be found in our technical reports [McK93a,McK93b,McK94c,McK94d]. Maximizing Memory Bandwidth for Streamed Computations Introduction Access Ordering Conclusions The SMC Dense Matrix Uniprocessor Sparse Matrix Performance Performance Implementation Concerns Other Systems Issues Compiler Recommendations Hardware Development Uniprocessors Symmetric ....
....first of these encourages several banks to be working on the same FIFO, while the second encourages different banks to be working on different FIFOs. It is not intuitively obvious which of these is preferable, and in fact, our experiments demonstrate no consistent performance advantage to either [McK93a]. 3.2 Analytic Models For the systems we consider, bandwidth is limited by how many page misses a computation incurs. This means that we can derive a bound for any ordering algorithm by calculating the minimum possible number of page misses, and we can use this bound to evaluate the Chapter 3: ....
[Article contains additional citation context not shown here]
S.A. McKee, "Hardware Support for Dynamic Access Ordering: Performance of Some Design Options", Technical Report CS-93-08, Department of Computer Science, University of Virginia, August 1993.
....on registers and cache. A system that reorders accesses at runtime and provides separate buffer space can reap the benefits of access ordering without these disadvantages, at the expense of adding a relatively small amount of special purpose hardware. One such scheme is depicted in Figure 1 [23, 25]. In this organization, memory is interfaced to the processor through a controller (or Memory Scheduling Unit) that includes logic to issue memory requests and logic to determine the order of requests during streaming computations. A set of control registers allow the processor A 1 A 2 , B 1 B 2 ....
....address, stride, length, and data size) and a set of high speed buffers holds stream operands. The stream buffers are implemented logically as a set of FIFOs, with each stream assigned to one FIFO. Detailed performance models and simulation results for this organization are presented elsewhere [23, 24, 25]. What follows is an approximate model to determine memory performance for a single vector of a computation. Accurate prediction requires knowledge of the entire computation, since performance for each stream depends on the nature and number of other streams. Let be the FIFO depth in vector ....
McKee, S.A, "Hardware Support for Dynamic Access Ordering: Performance of Some Design Options", Univ. of Virginia, Department of Computer Science, TR CS-93-08, August, 1993.
No context found.
S. A. McKee, "Hardware support for dynamic access ordering: Performance of some design options, Tech. Rep. CS-93-08, 9, 1993.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC