3 citations found. Retrieving documents...
Fabrice Rastello and Yves Robert. Loop partitioning versus tiling for cache-based multiprocessors. Technical Report 98-13, LIP, ENS Lyon, France, February 1998. Avalaible on the WEB at http://www.enslyon. fr/yrobert. 7

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Optimizing Graph Algorithms for Improved Cache Performance - Joon-Sang Park Michael (2002)   (Correct)

....Prasanna discuss dynamic data remapping to improve cache performance for the DFT in [15] One characteristic that all these problems share is a very regular memory accesses that are known at compile time. Another area that has been studied is the area of compiler optimizations (see for example [19]) Optimizing blocked algorithms has been extensively studied (see for example [12] The SUIF compiler framework includes a large set of libraries including libraries for performing data dependency analysis and loop transformations. In this context, it is important to note that SUIF does not ....

F. Rastello and Y. Robert. Loop Partitioning Versus Tiling for Cache-Based Multiprocessor. In Proc. of International Conference Parallel and Distributed Computing and Systems, Las Vegas, Nevada, 1998.


Cache-Friendly Implementations of Transitive Closure - Penner, Prasanna (2001)   (Correct)

....and Prasanna discuss dynamic data remapping to improve cache performance for the DFT in [13] One characteristic that these problems share is a very regular memory accesses that are known at compile time. Another area that has been studied is the area of compiler optimizations (see for example [15, 16, 26]) Optimizing blocked algorithms has been extensively studied (see for example [12] The SUIF compiler framework includes libraries for performing data dependency analysis and loop transformations among other things. In this context, it is important to note that SUIF does not handle the data ....

....element (4,4) updates C44=C44 A44 B44 [19] The key difference between this and the iteration space is the idea of scheduling operations in space. The iteration space actually deals only with scheduling operations in time, whereas the USTR represents operations divided in space as well as time [15]. As we will show in the next section, this fact allows us to generate implementations that are cache friendly. Figure 5: USTR for 4x4 matrix multiply In summary, what we mean by a USTR is an NxN array of computational elements (CEs) where each element performs O(N) computations. Thus, when ....

F. Rastello and Y. Robert. Loop Partitioning Versus Tiling for Cache-Based Multiprocessor. In Proc. of International Conference Parallel and Distributed Computing and Systems, Las Vegas, Nevada, 1998.


Loop Partitioning for Cache-based Multiprocessors - Rastello, Robert   Self-citation (Rastello)   (Correct)

No context found.

Fabrice Rastello and Yves Robert. Loop partitioning versus tiling for cache-based multiprocessors. Technical Report 98-13, LIP, ENS Lyon, France, February 1998. Avalaible on the WEB at http://www.enslyon. fr/yrobert. 7

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC