| P.Y. Calland, J. Dongarra, and Y. Robert. Tiling with limited resources. In L. Thiele, J. Fortes, K. Vissers, V. Taylor, T. Noll, and J. Teich, editors, Application Speci c Systems, Achitectures, and Processors, ASAP'97, pages 229-238. IEEE Computer Society Press, 1997. Extended version available on the WEB at http://www.ens-lyon.fr/yrobert. |
.... includes: improving the performance of a memory hierarchy [6, 7, 23, 25, 35, 40] determining the sizes and shapes of tile to minimise communication overhead on distributed memory machines [10, 29, 30, 33, 38] and determining the tile size to minimise execution time on distributed memory machines [1, 5, 26]. To integrate our techniques with a data parallel compiler, compiler techniques for selecting a tiling transformation for commonly occurring iteration space shapes and for selecting a processor layout remain to be developed. We are not aware of any work studying the impact of processor layout on ....
....are not aware of any work studying the impact of processor layout on the performance of tiled code. Almost all 34 previous work on time minimal tilings focuses on finding good rectangular tiles for rectangular iteration spaces without taking cache optimisation on a single processor into account [1, 5, 26]. The problem of selecting a tiling that minimises the total execution time by considering many factors simultaneously such as communication overhead and cache performance is difficult. Some initial attempt can be found in [25] Methods for removing anti and output dependences and for transforming ....
P. Y. Calland, J. Dongarra, and Y. Robert. Tiling with limited resources. In L. Thiele, J. Fortes, K.Vissers, V. Taylor, T. Noll, and J. Teich, editors, 1997 Application Specific Systems, Architectures and Processors, pages 229--238. IEEE Computer Society Press, 1997.
....achieved for a variety of problems, we brie y sketch three case studies hereafter. Tiling We start with a simple example where dependences prevent dynamic strategies to reach a good e ciency: consider a tiled computation over a rectangular iteration space as represented in Figure 1 (see [14] and [7] for further information on tiling) There are p available processors, numbered from 1 to p, which are assigned columns of tiles. When targeting a homogeneous NOW, a natural way to allocate tile columns to physical processors using a pure cyclic allocation [15, 14, 1] For heterogeneous NOWs, we ....
P.Y. Calland, J. Dongarra, and Y. Robert. Tiling with limited resources. In L. Thiele, J. Fortes, K. Vissers, V. Taylor, T. Noll, and J. Teich, editors, Application Specic Systems, Achitectures, and Processors, ASAP'97, pages 229-238. IEEE Computer Society Press, 1997. Extended version available on the WEB at http://www.ens-lyon.fr/yrobert.
No context found.
P.Y. Calland, J. Dongarra, and Y. Robert. Tiling with limited resources. In L. Thiele, J. Fortes, K. Vissers, V. Taylor, T. Noll, and J. Teich, editors, Application Speci c Systems, Achitectures, and Processors, ASAP'97, pages 229-238. IEEE Computer Society Press, 1997. Extended version available on the WEB at http://www.ens-lyon.fr/yrobert.
No context found.
P.Y. Calland, J. Dongarra, and Y. Robert. Tiling with limited resources. In L. Thiele, J. Fortes, K. Vissers, V. Taylor, T. Noll, and J. Teich, editors, Application Speci c Systems, Achitectures, and Processors, ASAP'97, pages 229-238. IEEE Computer Society Press, 1997. Extended version available on the WEB at http://www.enslyon. fr/yrobert.
No context found.
P.Y. Calland, J. Dongarra, and Y. Robert. Tiling with limited resources. In L. Thiele, J. Fortes, K. Vissers, V. Taylor, T. Noll, and J. Teich, editors, Application Speci c Systems, Achitectures, and Processors, ASAP'97, pages 229238. IEEE Computer Society Press, 1997. Extended version available on the WEB at http://www.enslyon. fr/yrobert.
.... tile before another processor can start the execution of the second one, and so on) as well as some load imbalance problems (the larger the tile, the more difficult to distribute computations equally among the processors) Tiling has been studied by several researchers and in different contexts [13, 19, 21, 17, 20, 4, 5, 16, 1, 8, 15, 6, 12, 2] 1 . Rather than providing a detailed motivation for tiling, we refer the reader to the papers by Calland, Dongarra, and Robert [6] and by Hogsted, Carter and Ferrante [12] which provide a review of the existing literature. Most of the work amounts to partitioning the iteration space of a ....
....computations equally among the processors) Tiling has been studied by several researchers and in different contexts [13, 19, 21, 17, 20, 4, 5, 16, 1, 8, 15, 6, 12, 2] 1 . Rather than providing a detailed motivation for tiling, we refer the reader to the papers by Calland, Dongarra, and Robert [6] and by Hogsted, Carter and Ferrante [12] which provide a review of the existing literature. Most of the work amounts to partitioning the iteration space of a uniform loop nest into tiles whose shape and size are optimized according to some criteria (such as the communication tocomputation ....
P.Y. Calland, J. Dongarra, and Y. Robert. Tiling with limited resources. In L. Thiele, J. Fortes, K. Vissers, V. Taylor, T. Noll, and J. Teich, editors, Application Specific Systems, Achitectures, and Processors, ASAP'97, pages 229--238. IEEE Computer Society Press, 1997.
.... equally among the processors) The tiling technique was originally restricted to perfect loop nests with uniform dependencies, as de ned by Banerjee [4] but has been extended to sets of fully permutable loops [24, 16, 11] Tiling has been studied by several researchers and in di erent contexts [15, 21, 23, 20, 22, 5, 6, 18, 1, 9, 17, 7, 14, 3] 1 . Most of the work amounts to partitioning the iteration space of a uniform loop nest into tiles whose shape and size are optimized according to some criteria (such as the communication to computation ratio) see Section 2 for an example. Once the tile shape and size are de ned, there remains ....
P.Y. Calland, J. Dongarra, and Y. Robert. Tiling with limited resources. In L. Thiele, J. Fortes, K. Vissers, V. Taylor, T. Noll, and J. Teich, editors, Application Specic Systems, Achitectures, and Processors, ASAP'97, pages 229-238. IEEE Computer Society Press, 1997. Extended version available on the WEB at http://www.ens-lyon.fr/yrobert.
....to reach a good eciency, we use the simple example of a tiled computation over a rectangular iteration space. Tiling has been studied by several authors and in di erent contexts, and we refer the reader to the papers by H ogsted, Carter, and Ferrante [14] and by Calland, Dongarra, and Robert [6], which provide a review of the existing literature. Brie y, the idea is to partition the iteration space of a loop nest with uniform dependences into tiles whose shape and size are optimized according to some criterion (such as the communication to computation ratio) Once the tile shape and size ....
P.Y. Calland, J. Dongarra, and Y. Robert. Tiling with limited resources. In L. Thiele, J. Fortes, K. Vissers, V. Taylor, T. Noll, and J. Teich, editors, Application Specic Systems, Achitectures, and Processors, ASAP'97, pages 229-238. IEEE Computer Society Press, 1997. Extended version available on the WEB at http://www.ens-lyon.fr/yrobert.
.... tile before another processor can start the execution of the second one, and so on) as well as some load imbalance problems (the larger the tile, the more difficult to distribute computations equally among the processors) Tiling has been studied by several researchers and in different contexts [13, 19, 21, 17, 20, 4, 5, 16, 1, 8, 15, 6, 12, 2] 1 . Rather than providing a detailed motivation for tiling, we refer the reader to the papers by Calland, Dongarra, and Robert [6] and by Hogsted, Carter and Ferrante [12] which provide a review of the existing literature. Most of the work amounts to partitioning the iteration space of a ....
....computations equally among the processors) Tiling has been studied by several researchers and in different contexts [13, 19, 21, 17, 20, 4, 5, 16, 1, 8, 15, 6, 12, 2] 1 . Rather than providing a detailed motivation for tiling, we refer the reader to the papers by Calland, Dongarra, and Robert [6] and by Hogsted, Carter and Ferrante [12] which provide a review of the existing literature. Most of the work amounts to partitioning the iteration space of a uniform loop nest into tiles whose shape and size are optimized according to some criteria (such as the communication to computation ....
P.Y. Calland, J. Dongarra, and Y. Robert. Tiling with limited resources. In L. Thiele, J. Fortes, K. Vissers, V. Taylor, T. Noll, and J. Teich, editors, Application Specific Systems, Achitectures, and Processors, ASAP'97, pages 229--238. IEEE Computer Society Press, 1997.
....is to distribute computations equally among the processors. Tiling has been studied by several authors and in di erent contexts (see, for example, 17, 22, 21, 6, 19, 1, 9] Rather than providing a detailed motivation for tiling, we refer the reader to the papers by Calland, Dongarra, and Robert [8] and by H ogsted, Carter, and Ferrante [16] which provide a review of the existing literature. Brie y, most of the work amounts to partitioning the iteration space of a uniform loop nest into tiles whose shape and size are optimized according to some criterion (such as the ....
....quite natural for load balancing computations. Specifying a columnwise execution may lead to the simplest code generation. When all processors have equal speed, it turns out that a pure cyclic columnwise allocation provides the best solution among all possible distributions of tiles to processors [8] provided that the communication cost for a tile is not greater than the computation cost. Since the communication cost for a tile is proportional to its surface, while the computation cost is proportional to its volume, 1 this hypothesis will be satis ed if the tile is large enough. 2 ....
[Article contains additional citation context not shown here]
P.Y. Calland, J. Dongarra, and Y. Robert. Tiling with limited resources. In L. Thiele, J. Fortes, K. Vissers, V. Taylor, T. Noll, and J. Teich, editors, Application Specic Systems, Achitectures, and Processors, ASAP'97, pages 229-238. IEEE Computer Society Press, 1997. Extended version available on the WEB at http://www.ens-lyon.fr/yrobert.
.... tile before another processor can start the execution of the second one, and so on) as well as some load imbalance problems (the larger the tile, the more difficult to distribute computations equally among the processors) Tiling has been studied by several researchers and in different contexts [13, 19, 21, 17, 20, 4, 5, 16, 1, 8, 15, 6, 12, 2] 1 . Rather than providing a detailed motivation for tiling, we refer the reader to the papers by Calland, Dongarra, and Robert [6] and by Hogsted, Carter and Ferrante [12] which provide a review of the existing literature. Most of the work amounts to partitioning the iteration space of a ....
....computations equally among the processors) Tiling has been studied by several researchers and in different contexts [13, 19, 21, 17, 20, 4, 5, 16, 1, 8, 15, 6, 12, 2] 1 . Rather than providing a detailed motivation for tiling, we refer the reader to the papers by Calland, Dongarra, and Robert [6] and by Hogsted, Carter and Ferrante [12] which provide a review of the existing literature. Most of the work amounts to partitioning the iteration space of a uniform loop nest into tiles whose shape and size are optimized according to some criteria (such as the communication to computation ....
P.Y. Calland, J. Dongarra, and Y. Robert. Tiling with limited resources. In L. Thiele, J. Fortes, K. Vissers, V. Taylor, T. Noll, and J. Teich, editors, Application Specific Systems, Achitectures, and Processors, ASAP'97, pages 229--238. IEEE Computer Society Press, 1997.
.... first tile before another processor can start the execution of the second one, and so on) as well as some load imbalance problems (the larger the tile, the more difficult to distribute computations equally among the processors) Tiling has been studied by several authors and in different contexts [13, 19, 21, 18, 4, 20, 5, 16, 1, 7, 15, 6, 12] 1 . Rather than providing a detailed motivation for tiling, we refer the reader to the papers by Calland, Dongarra and Robert [6] and by Hogsted, Carter and Ferrante [12] which provide a review of the existing literature. In a word, most of the work amounts to partitioning the iteration space ....
....computations equally among the processors) Tiling has been studied by several authors and in different contexts [13, 19, 21, 18, 4, 20, 5, 16, 1, 7, 15, 6, 12] 1 . Rather than providing a detailed motivation for tiling, we refer the reader to the papers by Calland, Dongarra and Robert [6] and by Hogsted, Carter and Ferrante [12] which provide a review of the existing literature. In a word, most of the work amounts to partitioning the iteration space of a uniform loop nest into tiles whose shape and size are optimized according to some criteria (such as the ....
[Article contains additional citation context not shown here]
P.Y. Calland, J. Dongarra, and Y. Robert. Tiling with limited resources. In L. Thiele, J. Fortes, K. Vissers, V. Taylor, T. Noll, and J. Teich, editors, Application Specific Systems, Achitectures, and Processors, ASAP'97, pages 229--238. IEEE Computer Society Press, 1997.
....another processor can start the execution of the second one. Tiling also presents load imbalance problems: the larger the tile, the more difficult it is to distribute computations equally among the processors. Tiling has been studied by several authors and in different contexts (see, for example, [13, 19, 21, 18, 4, 20, 5, 16, 1, 7, 15, 6, 12]) Rather than providing a detailed motivation for tiling, we refer the reader to the papers by Calland, Dongarra, and Robert [6] and by Hogsted, Carter, and Ferrante [12] which provide a review of the existing literature. Briefly, most of the work amounts to partitioning the iteration space of a ....
....equally among the processors. Tiling has been studied by several authors and in different contexts (see, for example, 13, 19, 21, 18, 4, 20, 5, 16, 1, 7, 15, 6, 12] Rather than providing a detailed motivation for tiling, we refer the reader to the papers by Calland, Dongarra, and Robert [6] and by Hogsted, Carter, and Ferrante [12] which provide a review of the existing literature. Briefly, most of the work amounts to partitioning the iteration space of a uniform loop nest into tiles whose shape and size are optimized according to some criterion (such as the ....
[Article contains additional citation context not shown here]
P.Y. Calland, J. Dongarra, and Y. Robert. Tiling with limited resources. In L. Thiele, J. Fortes, K. Vissers, V. Taylor, T. Noll, and J. Teich, editors, Application Specific Systems, Achitectures, and Processors, ASAP'97, pages 229--238. IEEE Computer Society Press, 1997.
No context found.
P.Y. Calland, J. Dongarra, and Y. Robert. Tiling with limited resources. In L. Thiele, J. Fortes, K. Vissers, V. Taylor, T. Noll, and J. Teich, editors, Application Speci c Systems, Achitectures, and Processors, ASAP'97, pages 229-238. IEEE Computer Society Press, 1997. Extended version available on the WEB at http://www.ens-lyon.fr/yrobert.
No context found.
P. Calland, J. Dongarra, and Y. Robert. Tiling with limited resources. In L. Thiele, J. Fortes, K. Vissers, V. Taylor, T. Noll, and J. Teich, editors, Application Specific Systems, Achitectures, and Processors, ASAP'97, pages 229-- 238. IEEE Computer Society Press, 1997. Extended version available on the WEB at http://www.ens-lyon.fr/yrobert.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC