3 citations found. Retrieving documents...
S. A. McKee, "Hardware support for dynamic access ordering: Performance of some design options, Tech. Rep. CS-93-08, 9, 1993.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Maximizing Memory Bandwidth for Streamed Computations - McKee (1995)   (7 citations)  Self-citation (Mckee)   (Correct)

.... Multiprocessor SMC organization introduced in Chapter 4 were first presented in [McK95b] Parts of the results in Chapter 2 appear in [McK95a] Complete results for the functional simulations and analytic models presented in Chapter 2 through Chapter 5 can be found in our technical reports [McK93a,McK93b,McK94c,McK94d]. Maximizing Memory Bandwidth for Streamed Computations Introduction Access Ordering Conclusions The SMC Dense Matrix Uniprocessor Sparse Matrix Performance Performance Implementation Concerns Other Systems Issues Compiler Recommendations Hardware Development Uniprocessors Symmetric ....

....first of these encourages several banks to be working on the same FIFO, while the second encourages different banks to be working on different FIFOs. It is not intuitively obvious which of these is preferable, and in fact, our experiments demonstrate no consistent performance advantage to either [McK93a]. 3.2 Analytic Models For the systems we consider, bandwidth is limited by how many page misses a computation incurs. This means that we can derive a bound for any ordering algorithm by calculating the minimum possible number of page misses, and we can use this bound to evaluate the Chapter 3: ....

[Article contains additional citation context not shown here]

S.A. McKee, "Hardware Support for Dynamic Access Ordering: Performance of Some Design Options", Technical Report CS-93-08, Department of Computer Science, University of Virginia, August 1993.


Access Order and Memory-Conscious Cache Utilization - McKee, Wulf (1995)   (2 citations)  Self-citation (Mckee)   (Correct)

....on registers and cache. A system that reorders accesses at runtime and provides separate buffer space can reap the benefits of access ordering without these disadvantages, at the expense of adding a relatively small amount of special purpose hardware. One such scheme is depicted in Figure 1 [23, 25]. In this organization, memory is interfaced to the processor through a controller (or Memory Scheduling Unit) that includes logic to issue memory requests and logic to determine the order of requests during streaming computations. A set of control registers allow the processor A 1 A 2 , B 1 B 2 ....

....address, stride, length, and data size) and a set of high speed buffers holds stream operands. The stream buffers are implemented logically as a set of FIFOs, with each stream assigned to one FIFO. Detailed performance models and simulation results for this organization are presented elsewhere [23, 24, 25]. What follows is an approximate model to determine memory performance for a single vector of a computation. Accurate prediction requires knowledge of the entire computation, since performance for each stream depends on the nature and number of other streams. Let be the FIFO depth in vector ....

McKee, S.A, "Hardware Support for Dynamic Access Ordering: Performance of Some Design Options", Univ. of Virginia, Department of Computer Science, TR CS-93-08, August, 1993.


Beyond Performance: Secure and Fair Memory Management for.. - Macian, al.   (Correct)

No context found.

S. A. McKee, "Hardware support for dynamic access ordering: Performance of some design options, Tech. Rep. CS-93-08, 9, 1993.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC