4 citations found. Retrieving documents...
McKee, S.A., "Uniprocessor SMC Performance on Vectors with Non-unit Strides", University of Virginia, TR CS-93-67, December, 1993.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Dynamic Access Ordering: Bounds on Memory Bandwidth - McKee (1994)   (1 citation)  Self-citation (Mckee)   (Correct)

.... 2 numerous simulation results demonstrating its effectiveness [McK94a] The hardware part of this solution is the Stream Memory Controller (SMC) An analytical model to bound asymptotic SMC performance for unit stride vectors has been developed and extended for non unit stride vectors in [McK93b, McK93c]. Here we develop a model to bound SMC performance on short vectors, and we extend the asymptotic model to describe symmetric multiprocessor (SMP) SMC performance. Note that we shall use the terms vector and stream interchangeably when doing so causes no confusion: a read vector is equivalent ....

....for our SMP systems. Section 4 discusses the assumptions underlying both the startup delay model presented in Section 5 and the asymptotic performance models of Section 6. Section 7 and Section 8 discuss the environment and benchmark kernels used in the simulation studies of SMC performance [McK93a, McK93c, McK94c], and Section 9 correlates the performance curves generated by our analytic models with sample simulation results. 2. The SMC Moyer develops algorithms and analyzes the performance benefits and limitations of doing compile time access ordering [Moy93] His scheme involves unrolling loops and ....

[Article contains additional citation context not shown here]

McKee, S.A., "Uniprocessor SMC Performance on Vectors with Non-unit Strides", University of Virginia, Technical Report CS-93-67, December, 1993.


Maximizing Memory Bandwidth for Streamed Computations - McKee (1995)   (7 citations)  Self-citation (Mckee)   (Correct)

....of this quantity of data and the overwhelming similarity of the performance curves for most ordering policies argue against including all the results here. Instead, we present highlights, focusing on general performance trends. Detailed uniprocessor results can be found in our technical reports [McK93a,McK93c]. 3.3.1 Simulation Environment As mentioned above, we model the processor as a generator of non cached loads and stores of vector elements in order to put as much stress as possible on the memory system. Instruction and scalar data references are assumed to hit in cache, and all stream ....

....system and benchmark. 4.4. 1 Ordering Policy The overwhelming similarity of the performance curves presented in Chapter 3 and our uniprocessor SMC studies indicates that neither the ordering strategy nor the processor s access pattern has a large effect on the MSU s ability to optimize bandwidth [McK93a, McK93c]. For moderately long vectors whose stride is relatively prime to the number of memory banks, the SMC consistently delivers nearly the full system bandwidth. In symmetric multiprocessor SMC systems, however, there are more factors that can potentially affect performance, thus different ....

S.A. McKee, "Uniprocessor SMC Performance on Vectors with Non-Unit Strides", Technical Report CS-93-67, Department of Computer Science, University of Virginia, November 1993.


Dynamic Access Ordering for Symmetric Shared-Memory Multiprocessors - McKee (1994)   Self-citation (Mckee)   (Correct)

.... access ordering at run time [McK94a] Simulation studies indicate that dynamic access ordering is a valuable technique for improving uniprocessor memory performance for stream computations the SMC, or Stream Memory Controller, consistently delivers almost the entire available bandwidth [McK93a, McK94b, McK93c]. The applicability of dynamic access ordering is not limited to uniprocessor environments. This paper discusses the effectiveness of dynamic access ordering with respect to the memory performance of symmetric multiprocessor (SMP) systems. Our simulation results show that a modest number of ....

....our experiments. Here vector length refers to the amount of data processed by the entire parallel computation, not just by one CE. The 10,000 element vectors facilitate comparisons between SMP and uniprocessor systems, since this is one of the vector lengths used in the uniprocessor SMC studies [McK93a, McK93c]. These vectors are long enough that SMC startup transients become insignificant in most cases, but as the number of CEs increases, the amount of data processed by each CE decreases, and startup effects become more evident under certain parallelization techniques. We present 8 CE SMC simulation ....

[Article contains additional citation context not shown here]

McKee, S.A., "Uniprocessor SMC Performance on Vectors with Non-unit Strides", University of Virginia, TR CS-93-67, December, 1993.


Dynamic Access Ordering for Symmetric Shared-Memory Multiprocessors - McKee (1994)   (Correct)

No context found.

McKee, S.A., "Uniprocessor SMC Performance on Vectors with Non-unit Strides", University of Virginia, TR CS-93-67, December, 1993.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC