34 citations found. Retrieving documents...
Pierre Boulet, Alain Darte, Tanguy Risset, and Yves Robert. (Pen)-ultimate tiling? In INTEGRATION, the VLSI Journal, volume 17, pages 33--51. 1994.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Tiling With Limited Resources - Calland, Dongarra, Robert (1997)   (4 citations)  (Correct)

....automatically designing block versions of nested loop kernels. Schreiber and Dongarra [17] have discussed how to determine the size and shape of the tiles so as to optimize the communication to computation ratio. Their work has been extended by Ramanujam and Sadayappan [16] and by Boulet et al. [3]. Several other papers have discussed the same framework, including [18, 4, 1, 5] Back to Example 1 In Figure 1, we sketch a valid timing for Example 1. The matrix H is the one derived using the scalable communication to computation criteria of Boulet et al. 3] H = 1 16 3 1 We ....

....[16] and by Boulet et al. 3] Several other papers have discussed the same framework, including [18, 4, 1, 5] Back to Example 1 In Figure 1, we sketch a valid timing for Example 1. The matrix H is the one derived using the scalable communication to computation criteria of Boulet et al. [3]: H = 1 16 3 1 We check that HD 0. Note that the volume of the tile, which represents the number of computations per tile, is given by the determinant of H : V comp = det(H ) 96. The number of communications is the following: each tile sends ffl 24 data items to its right ....

Pierre Boulet, Alain Darte, Tanguy Risset, and Yves Robert. (Pen)-ultimate tiling? Integration, the VLSI Journal, 17:33--51, 1994.


Tiling on Systems with Communication/Computation Overlap - Pierre-Yves Calland Jack (1999)   (1 citation)  (Correct)

.... equally among the processors) The tiling technique was originally restricted to perfect loop nests with uniform dependencies, as defined by Banerjee [4] but has been extended to sets of fully permutable loops [24, 16, 11] Tiling has been studied by several researchers and in different contexts [15, 21, 23, 20, 22, 5, 6, 18, 1, 9, 17, 7, 14, 3] 1 . Most of the work amounts to partitioning the iteration space of a uniform loop nest into tiles whose shape and size are optimized according to some criteria (such as the communication to computation ratio) see Section 2 for an example. Once the tile shape and size are defined, there ....

....condition HD 0, where H is the matrix of vectors normal to the faces (or the edges in two dimensional problems) of the tile [15] In Figure 1, we sketch a valid tiling for our example. The matrix H is the one derived using the scalable communication to computation criteria of Boulet et al. [5]: H = 1 16 0 1 3 1 2 0 : We check that HD 0. Note that the volume of the tile, which represents the number of computations per tile, is given by the determinant of H Gamma1 : V comp = det(H Gamma1 ) 96. The number of communications is the following: each tile sends ffl 24 ....

Pierre Boulet, Alain Darte, Tanguy Risset, and Yves Robert. (pen)- ultimate tiling? Integration, the VLSI Journal, 17:33--51, 1994.


Determining the Idle Time of a Tiling: New Results - Frederic Desprez Jack (1997)   (3 citations)  (Correct)

.... tile before another processor can start the execution of the second one, and so on) as well as some load imbalance problems (the larger the tile, the more difficult to distribute computations equally among the processors) Tiling has been studied by several researchers and in different contexts [13, 19, 21, 17, 20, 4, 5, 16, 1, 8, 15, 6, 12, 2] 1 . Rather than providing a detailed motivation for tiling, we refer the reader to the papers by Calland, Dongarra, and Robert [6] and by Hogsted, Carter and Ferrante [12] which provide a review of the existing literature. Most of the work amounts to partitioning the iteration space of a ....

Pierre Boulet, Alain Darte, Tanguy Risset, and Yves Robert. (pen)-ultimate tiling? Integration, the VLSI Journal, 17:33--51, 1994.


Tiling With Limited Resources - Pierre-Yves Calland Jack (1997)   (4 citations)  (Correct)

....condition HD 0, where H is the matrix of vectors normal to the edges (or the edges in two dimensional problems) of the tile [9] In Figure 1, we sketch a valid tiling for our example. The matrix H is the one derived using the scalable communication tocomputation criteria of Boulet et al. [4]: H = 1 16 0 1 3 1 2 0 : We check that HD 0. Note that the volume of the tile, which represents the number of computations per tile, is given by the determinant of H Gamma1 : V comp = det(H Gamma1 ) 96. The number of communications is the following: each tile sends ffl ....

Pierre Boulet, Alain Darte, Tanguy Risset, and Yves Robert. (Pen)-ultimate tiling? Integration, the VLSI Journal, 17:33--51, 1994.


Tiling on Systems with Communication/Computation Overlap - Calland, Dongarra, Robert (1997)   (1 citation)  (Correct)

.... equally among the processors) The tiling technique was originally restricted to perfect loop nests with uniform dependencies, as de ned by Banerjee [4] but has been extended to sets of fully permutable loops [24, 16, 11] Tiling has been studied by several researchers and in di erent contexts [15, 21, 23, 20, 22, 5, 6, 18, 1, 9, 17, 7, 14, 3] 1 . Most of the work amounts to partitioning the iteration space of a uniform loop nest into tiles whose shape and size are optimized according to some criteria (such as the communication to computation ratio) see Section 2 for an example. Once the tile shape and size are de ned, there remains ....

....condition HD 0, where H is the matrix of vectors normal to the faces (or the edges in two dimensional problems) of the tile [15] In Figure 1, we sketch a valid tiling for our example. The matrix H is the one derived using the scalable communication to computation criteria of Boulet et al. [5]: H = 1 16 0 1 3 1 2 0 : We check that HD 0. Note that the volume of the tile, which represents the number of computations per tile, is given by the determinant of H 1 : V comp = det(H 1 ) 96. The number of communications is the following: each tile sends 24 data items ....

Pierre Boulet, Alain Darte, Tanguy Risset, and Yves Robert. (pen)-ultimate tiling? Integration, the VLSI Journal, 17:33-51, 1994.


Determining the Idle Time of a Tiling: New Results - Fr'ed'eric Desprez Jack (1997)   (3 citations)  (Correct)

.... tile before another processor can start the execution of the second one, and so on) as well as some load imbalance problems (the larger the tile, the more difficult to distribute computations equally among the processors) Tiling has been studied by several researchers and in different contexts [1, 2, 4, 5, 6, 8, 12, 13, 15, 16, 17, 20, 21, 22] 1 . Rather than providing a detailed motivation for tiling, we refer the reader to the papers by Calland, Dongarra, and Robert [6] and by Hogsted, Carter and Ferrante [12] which provide a review of the existing literature. Most of the work amounts to partitioning the iteration space of a ....

Pierre Boulet, Alain Darte, Tanguy Risset, and Yves Robert. (pen)-ultimate tiling? Integration, the VLSI Journal, 17:33--51, 1994.


Determining the Idle Time of a Tiling: New Results - Desprez, Dongarra, Rastello, .. (1997)   (3 citations)  (Correct)

.... tile before another processor can start the execution of the second one, and so on) as well as some load imbalance problems (the larger the tile, the more difficult to distribute computations equally among the processors) Tiling has been studied by several researchers and in different contexts [13, 19, 21, 17, 20, 4, 5, 16, 1, 8, 15, 6, 12, 2] 1 . Rather than providing a detailed motivation for tiling, we refer the reader to the papers by Calland, Dongarra, and Robert [6] and by Hogsted, Carter and Ferrante [12] which provide a review of the existing literature. Most of the work amounts to partitioning the iteration space of a ....

Pierre Boulet, Alain Darte, Tanguy Risset, and Yves Robert. (pen)-ultimate tiling? Integration, the VLSI Journal, 17:33--51, 1994.


Tiling With Limited Resources - Calland, Dongarra, Robert (1997)   (4 citations)  (Correct)

....condition HD 0, where H is the matrix of vectors normal to the edges (or the edges in two dimensional problems) of the tile [9] In Figure 1, we sketch a valid tiling for our example. The matrix H is the one derived using the scalable communication to computation criteria of Boulet et al. [4]: H = 1 16 0 1 3 1 2 0 : 3 We check that HD 0. Note that the volume of the tile, which represents the number of computations per tile, is given by the determinant of H Gamma1 : V comp = det(H Gamma1 ) 96. The number of communications is the following: each tile sends ffl 24 ....

Pierre Boulet, Alain Darte, Tanguy Risset, and Yves Robert. (Pen)- ultimate tiling? Integration, the VLSI Journal, 17:33--51, 1994.


Loop Partitioning versus Tiling for Cache-based Multiprocessors - Fabrice Rastello (1998)   (2 citations)  (Correct)

....tandem cache local memory in the former sentence being replaced by the tandem local memory secondary storage ) Loop partitioning amounts to divide an iteration space into hyper parallelepipeds, whose size and shape are optimized according to some criteria. It is closely related to tiling [11, 14, 2, 6, 13, 5, 10], a technique also known as loop blocking [15] whose objective is to increase the granularity of computations, the locality of data references, and the computation to communication ratio of fully permutable loop nests. In fact, loop partitioning and tiling have similar objectives: the basic idea ....

....the optimization problem. 4.1 Related problems. The problem of minimizing the expression (1) is difficult. In fact, we know how to minimize the two related expressions: Problem 1 If ( Gamma a 1 ; Gamma am ) are m free vectors, and jdetEj = 1 V calc , Boulet et al. propose in [2] a solution for minimizing the following expression: m X i=1 n X k=1 Gamma e k : Gamma a i The solution is simply E = A Gamma1 if m = n but gets very complex if m 6= n. Note that the Gamma a i represent dependences in the context of tiling fully permutable loop nests, hence ....

[Article contains additional citation context not shown here]

Pierre Boulet, Alain Darte, Tanguy Risset, and Yves Robert. (pen)-ultimate tiling? Integration, the VLSI Journal, 17:33--51, 1994.


Data-centric Multi-level Blocking - Kodukula, Ahmed, Pingali (1997)   (77 citations)  (Correct)

....However, it is unclear how a compiler can discover automatically the right sequence of transformations to perform; it is also unclear whether this approach can be generalized for a machine with a multi level memory hierarchy. Finally, there is a large body of work on determining good tile sizes [6, 14, 20, 22, 12]. This research focuses on perfectly nested loops with uniform dependences (i.e. dependence vectors can be represented as distances) While this work is not directly comparable to ours, the detailed memory models used in some of this research [18, 22, 16] are useful in general for estimating ....

Pierre Boulet, Alain Darte, Tanguy Risset, and Yves Robert. (Pen)-ultimate tiling? In INTEGRATION, the VLSI Journal, volume 17, pages 33--51. 1994.


Loop Partitioning for Cache-based Multiprocessors - Rastello, Robert (1998)   (Correct)

....tandem cache local memory in the former sentence being replaced by the tandem local memory secondary storage ) Loop partitioning amounts to divide an iteration space into hyper parallelepipeds, whose size and shape are optimized according to some criteria. It is closely related to tiling [11, 15, 2, 6, 14, 5, 10], a technique also known as loop blocking [16] whose objective is to increase the granularity of computations, the locality of data references, and the computation to communication ratio of fully permutable loop nests. In fact, loop partitioning and tiling have similar objectives: the basic idea ....

....Kranz and Natarajan [1] 4.2 Related problems The problem of minimizing the expression (1) is difficult. In fact, we know how to minimize the two related expressions: Problem 1 If ( Gamma a 1 ; Gamma am ) are m free vectors, and jdetEj = 1 V calc , Boulet et al. propose in [2] a solution for minimizing the following expression: m X i=1 n X k=1 Gamma e k ; Gamma a i The solution is simply E = A Gamma1 if m = n but gets very complex if m 6= n, with a cost exponential in m. Note that the Gamma a i represent dependences in the context of tiling ....

Pierre Boulet, Alain Darte, Tanguy Risset, and Yves Robert. (Pen)-ultimate tiling? Integration, the VLSI Journal, 17:33--51, 1994.


Determining the Idle Time of a Tiling: New Results - Desprez, Dongarra, Rastello, .. (1997)   (3 citations)  (Correct)

.... tile before another processor can start the execution of the second one, and so on) as well as some load imbalance problems (the larger the tile, the more difficult to distribute computations equally among the processors) Tiling has been studied by several researchers and in different contexts [13, 19, 21, 17, 20, 4, 5, 16, 1, 8, 15, 6, 12, 2] 1 . Rather than providing a detailed motivation for tiling, we refer the reader to the papers by Calland, Dongarra, and Robert [6] and by Hogsted, Carter and Ferrante [12] which provide a review of the existing literature. Most of the work amounts to partitioning the iteration space of a ....

Pierre Boulet, Alain Darte, Tanguy Risset, and Yves Robert. (Pen)-ultimate tiling? Integration, the VLSI Journal, 17:33--51, 1994.


Loop Partitioning versus Tiling for Cache-based Multiprocessors - Rastello, ROBERT (1998)   (2 citations)  (Correct)

....tandem cache local memory in the former sentence being replaced by the tandem local memory secondary storage ) Loop partitioning amounts to divide an iteration space into hyper parallelepipeds, whose size and shape are optimized according to some criteria. It is closely related to tiling [11, 14, 2, 6, 13, 5, 10], a technique also known as loop blocking [15] whose objective is to increase the granularity of computations, the locality of data references, and the computation to communication ratio of fully permutable loop nests. In fact, loop partitioning and tiling have similar objectives: the basic idea ....

....the optimization problem. 4.1 Related problems. The problem of minimizing the expression (1) is difficult. In fact, we know how to minimize the two related expressions: Problem 1 If ( Gamma a 1 ; Gamma am ) are m free vectors, and jdetEj = 1 V calc , Boulet et al. propose in [2] a solution for minimizing the following expression: m X i=1 n X k=1 Gamma e k : Gamma a i The solution is simply E = A Gamma1 if m = n but gets very complex if m 6= n. Note that the Gamma a i represent dependences in the context of tiling fully permutable loop nests, hence ....

[Article contains additional citation context not shown here]

Pierre Boulet, Alain Darte, Tanguy Risset, and Yves Robert. (pen)-ultimate tiling? Integration, the VLSI Journal, 17:33--51, 1994.


Mathematical Tools for Loop Transformations: From Systems of.. - Darte (1997)   Self-citation (Darte)   (Correct)

....the last one equal to jt:sj (processors are active every jt:sj clock cycles) In this case, Hm gives the temporal activity for a given processor. Partitioning techniques, Hermite forms, non unimodular transformations are also linked, in the context of loops, to loop tiling (see for example [17, 28, 4]) Furthermore, the two forms of code generation mentioned above, the form timeprocessor (or SEQ PAR) and the form processor time (or PAR SEQ) are used depending on the architecture behavior. The rst one is more a data parallel model of execution, the second a control parallel model of ....

Pierre Boulet, Alain Darte, Tanguy Risset, and Yves Robert. (pen)-ultimate tiling? Integration, the VLSI Journal, 17:3351, 1994.


Tiling for Heterogeneous Computing Platforms - Boulet, Dongarra, Robert, Vivien (1998)   (1 citation)  Self-citation (Boulet)   (Correct)

....contract ERE 96 1104 A000 DRET DS SR. 1 the second one. Tiling also presents load imbalance problems: the larger the tile, the more difficult it is to distribute computations equally among the processors. Tiling has been studied by several authors and in different contexts (see, for example, [13, 19, 21, 18, 4, 20, 5, 16, 1, 7, 15, 6, 12]) Rather than providing a detailed motivation for tiling, we refer the reader to the papers by Calland, Dongarra, and Robert [6] and by Hogsted, Carter, and Ferrante [12] which provide a review of the existing literature. Briefly, most of the work amounts to partitioning the iteration space of a ....

....over all possible solutions. 2.2 Discussion We survey our hypotheses and assess their motivations, as well as the limitations that they may induce. Rectangular iteration space and tiles We note that the tiled iteration space is the outcome of previous program transformations, as explained in [13, 19, 21, 18, 4]. The first step in tiling amounts to determining the best shape and size of the tiles, assuming an infinite grid of virtual processors. Because this step will lead to tiles whose edges are parallel to extremal dependence vectors, we can perform a unimodular transformation and rewrite the original ....

Pierre Boulet, Alain Darte, Tanguy Risset, and Yves Robert. (pen)-ultimate tiling? Integration, the VLSI Journal, 17:33--51, 1994. 16


Mathematical Tools for Loop Transformations: From Systems of.. - Darte (1997)   Self-citation (Darte)   (Correct)

....the last one equal to jt:sj (processors are active every jt:sj clock cycles) In this case, Hm gives the temporal activity for a given processor. Partitioning techniques, Hermite forms, non unimodular transformations are also linked, in the context of loops, to loop tiling (see for example [17, 28, 4]) Furthermore, the two forms of code generation mentioned above, the form timeprocessor (or SEQ PAR) and the form processor time (or PAR SEQ) are used depending on the architecture behavior. The rst one is more a data parallel model of execution, the second a control parallel model of ....

Pierre Boulet, Alain Darte, Tanguy Risset, and Yves Robert. (pen)-ultimate tiling? Integration, the VLSI Journal, 17:3351, 1994.


HPFIT: A Set of Integrated Tools for the.. - Brandes.. (1996)   (4 citations)  Self-citation (Darte)   (Correct)

....innermost parallel loops are often not parallel enough to offer good performance. In this case, the grain of parallelism must be increased, either by trying to move up the parallel loop to the outermost possible level, or by using blocking (tiling) techniques. We studied this tiling problem in [13] in the simple case of uniform loop nests, and it turns out that Darte and Vivien s algorithm can be easily adapted to the tiling technique, as Wolf and Lam s algorithm that was developed with a tiling spirit . Actually, the detection of parallel loops and the detection of maximal tiling, related ....

Pierre Boulet, Alain Darte, Tanguy Risset, and Yves Robert. (pen)-Ultimate Tiling? Integration, the VLSI Journal, 17:33--51, 1994.


Static Tiling for Heterogeneous Computing Platforms - Boulet, Dongarra, Robert.. (1999)   (5 citations)  Self-citation (Boulet)   (Correct)

....another processor can start the execution of the second one. Tiling also presents load imbalance problems: the larger the tile, the more di cult it is to distribute computations equally among the processors. Tiling has been studied by several authors and in di erent contexts (see, for example, [17, 22, 21, 6, 19, 1, 9]) Rather than providing a detailed motivation for tiling, we refer the reader to the papers by Calland, Dongarra, and Robert [8] and by H ogsted, Carter, and Ferrante [16] which provide a review of the existing literature. Brie y, most of the work amounts to partitioning the iteration space of a ....

....over all possible solutions. 2.2 Discussion We survey our hypotheses and assess their motivations, as well as the limitations that they may induce. Rectangular iteration space and tiles. We note that the tiled iteration space is the outcome of previous program transformations, as explained in [22, 21, 6]. The rst step in tiling amounts to determining the best shape and size of the tiles, assuming an in nite grid of virtual processors. Because this step will lead to tiles whose edges are parallel to extremal dependence vectors, we can perform a unimodular transformation and rewrite the original ....

Pierre Boulet, Alain Darte, Tanguy Risset, and Yves Robert. (Pen)-ultimate tiling? Integration, the VLSI Journal, 17:33-51, 1994.


HPFIT: A Set of Integrated Tools for the.. - Brandes.. (1996)   (4 citations)  Self-citation (Darte)   (Correct)

....innermost parallel loops are often not parallel enough to offer good performance. In this case, the grain of parallelism must be increased, either by trying to move up the parallel loop to the outermost possible level, or by using blocking (tiling) techniques. We studied this tiling problem in [7] in the simple case of uniform loop nests, and it turns out that Darte and Vivien s algorithm can be easily adapted to the tiling technique, as Wolf and Lam s algorithm that was developed with a tiling spirit . Actually, the detection of parallel loops and the detection of maximal tiling, related ....

Pierre Boulet, Alain Darte, Tanguy Risset, and Yves Robert. (pen)-Ultimate Tiling? Integration, the VLSI Journal, 17:33--51, 1994.


Mathematical Tools for Loop Transformations: From Systems of.. - Darte (1998)   Self-citation (Darte)   (Correct)

....one equal to jt:sj (processors are active every jt:sj clock cycles) In this case, Hm gives the temporal activity for a given processor. 22 ALAIN DARTE Partitioning techniques, Hermite forms, non unimodular transformations are also linked, in the context of loops, to loop tiling (see for example [17,28,4]) Furthermore, the two forms of code generation mentioned above, the form time processor (or SEQPAR) and the form processor time (or PAR SEQ) are used depending on the architecture behavior. The rst one is more a data parallel model of execution, the second a control parallel model of ....

Pierre Boulet, Alain Darte, Tanguy Risset, and Yves Robert. (pen)-ultimate tiling? Integration, the VLSI Journal, 17:33-51, 1994.


Combining Retiming and Scheduling Techniques for Loop.. - Darte, Silber, Vivien (1996)   (17 citations)  Self-citation (Darte)   (Correct)

....shape of a tile are chosen following various criteria, for achieving better vectorization of communications and or computations, for improving cache reuse, reducing communications, etc. All these criteria are very machine dependent, and despite the large amount of different optimization strategies [7,10,13,14,16,2], choosing a good tiling remains an open problem. However, before even defining the size and shape of the tiles, one has to make sure that they will be atomic, i.e. that they can be computed with no intervening synchronization or communication. This atomicity property is fulfilled if the ....

....we have two possibilities. On one hand, if the number of elementary cycles is small, we can directly work with the cone generated by the weights of the cycles of G. The corresponding polar cone contains all candidate vectors X. Then, to select the matrix M , optimization techniques such as in [2] can be used. On the other hand, if generating all the elementary cycles is too expensive, we can still build one solution in polynomial time, by choosing M as done in the proof of Lemma 5 (see [3] for more details) The key point is that a basis of U can be built in polynomial time using a basis ....

Pierre Boulet, Alain Darte, Tanguy Risset, and Yves Robert. (pen)-ultimate tiling? Integration, the VLSI Journal, 17:33--51, 1994.


Synthesizing Transformations for Locality Enhancement of.. - Ahmed, Mateev, Pingali (2000)   (14 citations)  (Correct)

No context found.

Pierre Boulet, Alain Darte, Tanguy Risset, and Yves Robert. (Pen)-ultimate tiling? In INTEGRATION, the VLSI Journal, volume 17, pages 33--51. 1994.


Loop Partitioning for Cache-based Multiprocessors - Rastello, Robert   (Correct)

No context found.

Pierre Boulet, Alain Darte, Tanguy Risset, and Yves Robert. (Pen)-ultimate tiling? Integration, the VLSI Journal, 17:33-51, 1994.


Parallelization of the Numerical Lyapunov Calculation for the.. - RASTELLO (2001)   (Correct)

No context found.

Pierre Boulet, Alain Darte, Tanguy Risset, and Yves Robert. (Pen)-ultimate tiling? Integration, the VLSI Journal, 17:33-51, 1994.


Synthesizing Transformations for Locality Enhancement of.. - Nawaaz Ahmed Nikolay (2000)   (14 citations)  (Correct)

No context found.

Pierre Boulet, Alain Darte, Tanguy Risset, and Yves Robert. (Pen)-ultimate tiling? In INTEGRATION, the VLSI Journal, volume 17, pages 33--51. 1994.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC