| A. Darte, G.-A. Silber, and F. Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. PPL, 7(4):379-392, 1997. |
....into tiles. By executing tiles as atomic units of computation, communication takes place per tile instead of per iteration. By adjusting the size of tiles, tiling can achieve coarse grain DOACROSS parallelism while reducing communication overhead and frequency on distributed memory machines [13, 26, 29, 30, 33, 38]. While a lot of work has been done on tiling or other related optimisations [10, 32] little research efforts in the literature are 1 A Rectangularly Tiled Iteration Space Generate Sequential Tiled Code Generate SPMD Code Computation Distribution Data Distribution Message Passing Code ....
A. Darte and F. Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Technical Report 96--34, Ecole Normale Superieure de Lyon, November 1996.
....algorithm for finding canonical transformations is implemented. The reuse framework based on vector spaces is due to Wolf and Lam [12] The relevance of cones to solving compiler problems is being increasingly recognised. Successful applications are dependence abstraction [4, 18] loop scheduling [3], and tiling for parallelism [1, 3, 10, 16] One more application considered in this paper is tiling for data locality. The concept of fully permutable loop nests introduced in [13] has emerged to be a useful. Initially in [13] the concept is related to the maximal degree of doall parallelism ....
....transformations is implemented. The reuse framework based on vector spaces is due to Wolf and Lam [12] The relevance of cones to solving compiler problems is being increasingly recognised. Successful applications are dependence abstraction [4, 18] loop scheduling [3] and tiling for parallelism [1, 3, 10, 16]. One more application considered in this paper is tiling for data locality. The concept of fully permutable loop nests introduced in [13] has emerged to be a useful. Initially in [13] the concept is related to the maximal degree of doall parallelism inherent in the program. In [3, 9] and this ....
[Article contains additional citation context not shown here]
A. Darte and F. Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Technical Report 96--34, Ecole Normale Superieure de Lyon, November 1996.
.... problems (the larger the tile, the more difficult to distribute computations equally among the processors) The tiling technique was originally restricted to perfect loop nests with uniform dependencies, as defined by Banerjee [4] but has been extended to sets of fully permutable loops [24, 16, 11]. Tiling has been studied by several researchers and in different contexts [15, 21, 23, 20, 22, 5, 6, 18, 1, 9, 17, 7, 14, 3] 1 . Most of the work amounts to partitioning the iteration space of a uniform loop nest into tiles whose shape and size are optimized according to some criteria (such as ....
Alain Darte, Georges-Andr'e Silber, and Fr'ed'eric Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Parallel Processing Letters, 1997. Special issue, to appear. Also available as Tech. Rep. LIP, ENS-Lyon, RR96-34, and on the WEB at http://www.ens-lyon.fr/LIP.
....Tiling is a widely used technique to increase the granularity of computations and the locality of data references. This technique was originally restricted to perfect loop nests with uniform dependencies, as defined by Banerjee [3] but has been extended to sets of fully permutable loops [22, 14, 10]. Tiling is a widely used technique to increase the granularity of computations and the locality of data references. The basic idea is to group elemental computation points into tiles that will be viewed as computational units. The larger the tiles, the more efficient the computations performed ....
Alain Darte, Georges-Andr'e Silber, and Fr'ed'eric Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Parallel Processing Letters, 1997. Special issue, to appear. Also available as Tech. Rep. LIP, ENS-Lyon, RR96-34.
.... problems (the larger the tile, the more dicult to distribute computations equally among the processors) The tiling technique was originally restricted to perfect loop nests with uniform dependencies, as de ned by Banerjee [4] but has been extended to sets of fully permutable loops [24, 16, 11]. Tiling has been studied by several researchers and in di erent contexts [15, 21, 23, 20, 22, 5, 6, 18, 1, 9, 17, 7, 14, 3] 1 . Most of the work amounts to partitioning the iteration space of a uniform loop nest into tiles whose shape and size are optimized according to some criteria (such as ....
Alain Darte, Georges-Andre Silber, and Frederic Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Parallel Processing Letters, 1997. Special issue, to appear. Also available as Tech. Rep. LIP, ENS-Lyon, RR96-34, and on the WEB at http://www.ens-lyon.fr/LIP.
....Tiling is a widely used compiler technique to increase the granularity of computations and the locality of data references. This technique was originally restricted to perfect loop nests with uniform dependencies, as defined by Banerjee [3] but has been extended to sets of fully permutable loops [10, 14, 23]. The basic idea of tiling, also known as loop blocking, is to group elemental computation points into tiles that will be viewed as computational units. The larger the tiles, the more efficient the computations performed using state of the art processors with pipelined arithmetic units and a ....
Alain Darte, Georges-Andr'e Silber, and Fr'ed'eric Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Parallel Processing Letters, 1997. Special issue, to appear. Also available as Tech. Rep. LIP, ENS-Lyon, RR96-34, and on the WEB at http://www.ens-lyon.fr/LIP.
....Tiling is a widely used technique to increase the granularity of computations and the locality of data references. This technique was originally restricted to perfect loop nests with uniform dependencies, as defined by Banerjee [3] but has been extended to sets of fully permutable loops [22, 14, 10]. Tiling is a widely used technique to increase the granularity of computations and the locality of data references. The basic idea is to group elemental computation points into tiles that will be viewed as computational units. The larger the tiles, the more efficient the computations performed ....
Alain Darte, Georges-Andr'e Silber, and Fr'ed'eric Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Parallel Processing Letters, 1997. Special issue, to appear. Also available as Tech. Rep. LIP, ENS-Lyon, RR96-34.
....before compilation to VHDL, and it is used in the project HPFIT [5, 6] to implement parallelization algorithms. Nestor is now publicly available with its source code and its documentation at the address http: www.ens lyon.fr gsilber nestor We are implementing new parallelization algorithms [8] into it. These parallelization algorithms could be included in the base Nestor package and then transform it into a more powerful source to source automatic parallelization kernel. ....
Alain Darte, Georges-Andr# Silber, and Fr#d#ric Vivien. Combining Retiming and Scheduling Techniques for Loop Parallelization and Loop Tiling. Parallel Processing Letters, 7(4):379392, 1997.
....first apply graph transformation techniques to the DFG and schedule the acyclic (DAG) part of the resulting graph. A great deal of research has been done attempting to optimize the schedule of tasks for an application after applying various graph transformation techniques to the application s DFG [1, 6, 13, 14]. One of the more effective of these techniques is retiming [2, 8, 12] where delays are redistributed among the edges so that the application s function remains the same, but the length of the longest zero delay path, called the clock period of the DFG G and denoted cl(G) is decreased. After ....
A. Darte, G.-A. Silber, and F. Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Parallel Processing Letters, 7:379--392, 1997.
....first apply graph transformation techniques to the DFG and schedule the acyclic (DAG) part of the resulting graph. A great deal of research has been done attempting to optimize the schedule of tasks for an application after applying various graph transformation techniques to the application s DFG [4, 7]. One of the more effective of these techniques is retiming [1, 5] where delays are redistributed among the edges so that the application s function remains the same, but the length of the longest zero delay path, called the clock period of the DFG G and denoted cl(G) is decreased. After ....
A. Darte, G.-A. Silber, and F. Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Parallel Processing Letters, 7:379--392, 1997.
....with pipelined arithmetic units and a multilevel memory hierarchy (illustrated by recasting numerical linear algebra algorithms in terms of blocked Level 3 BLAS kernels [7, 9] But loop partitioning and tiling operate in different contexts. Tiling is valid only if the loops are fully permutable [8, 12, 16], and the optimization criteria aim at minimizing the communication tocomputation ratio. Loop partitioning can be applied to any loop nest with affine dependences, and the optimization criteria is to minimize the number of accessed data. We explicit this difference in Section 4.1. Still, because ....
Alain Darte, Georges-Andr'e Silber, and Fr'ed'eric Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Parallel Processing Letters, 7(4):379--392, 1997.
....with pipelined arithmetic units and a multilevel memory hierarchy (illustrated by recasting numerical linear algebra algorithms in terms of blocked Level 3 BLAS kernels [7, 9] But loop partitioning and tiling operate in different contexts. Tiling is valid only if the loops are fully permutable [8, 13, 17], and the optimization criterion is to minimize the communication tocomputation ratio. Loop partitioning can be applied to any loop nest with affine dependences, and the optimization criterion is to minimize the number of accessed data. We explicit this difference in Section 4.2. Still, because ....
Alain Darte, Georges-Andr'e Silber, and Fr'ed'eric Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Parallel Processing Letters, 7(4):379--392, 1997.
....Tiling is a widely used technique to increase the granularity of computations and the locality of data references. This technique was originally restricted to perfect loop nests with uniform dependencies, as defined by Banerjee [3] but has been extended to sets of fully permutable loops [22, 14, 10]. Tiling is a widely used technique to increase the granularity of computations and the locality of data references. The basic idea is to group elemental computation points into tiles that will be viewed as computational units. The larger the tiles, the more efficient the computations performed ....
Alain Darte, Georges-Andr'e Silber, and Fr'ed'eric Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Parallel Processing Letters, 1997. Special issue, to appear. Also available as Tech. Rep. LIP, ENSLyon, RR96-34.
....with pipelined arithmetic units and a multilevel memory hierarchy (illustrated by recasting numerical linear algebra algorithms in terms of blocked Level 3 BLAS kernels [7, 9] But loop partitioning and tiling operate in different contexts. Tiling is valid only if the loops are fully permutable [8, 12, 16], and the optimization criteria aim at minimizing the communication tocomputation ratio. Loop partitioning can be applied to any loop nest with affine dependences, and the optimization criteria is to minimize the number of accessed data. We explicit this difference in Section 4.1. Still, because ....
Alain Darte, Georges-Andr'e Silber, and Fr'ed'eric Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Parallel Processing Letters, 7(4):379--392, 1997.
....framework, it is nontrivial to translate the piece wise affine schedules into parallel code. Attempts have also been made to derive affine schedules that expose coarse grain parallelism. Darte proposed an algorithm for detecting permutable loops for the restricted domain of perfectly nested loops[6]. Kelly and Pugh s algorithm finds one dimension of parallelism for programs with arbitrary nestings and sequences of loops[10] Their repertoire of program transforms include loop permutations and reversals, but not loop skewing. The exclusion of loop skewing enables them to enumerate all the ....
A. Darte, G. Silber, and F. Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Technical Report 96-34, Laboratoire de l'Informatique du Parall'elisme, November 1996.
No context found.
A. Darte, G.-A. Silber, and F. Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. PPL, 7(4):379-392, 1997.
No context found.
Alain Darte, Georges-Andr Silber, and Frdric Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Parallel Processing Letters, 7(4):379392, 1997.
No context found.
Alain Darte, Georges-Andr Silber, and Frdric Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Parallel Processing Letters, 7(4):379392, 1997.
....shifter linear schedule (see Section 5.1) It is optimal for maximal parallelism detection if dependences are approximated by dependence polyhedra. Since it is simpler than Feautrier s algorithm, more optimizing criteria can be handled: the detection of permutable loops and outer parallelism (see [13]) and the minimization of synchronizations through loop fusion (see Section 6.2) Furthermore, the code generation is simpler (see Section 5) However, it may nd less parallelism than Feautrier s algorithm when exact dependence analysis is feasible because of its restricted choice of ....
....we impose that [r S ] i S;S Gamma 1] r S ] i S;S Gamma 1] where i S;S is the rst level of the innermost block of permutable loops surrounding S and S . Fortunately, this technical condition is true for shifted linear schedules that are built by the algorithm proposed in [13] for which we developed these simplication techniques. Back to Example 1: We assume that the rst two dimensions correspond to a block of permutable loops. Following the proof of Theorem 2, we nd the two loop skewing transformations G S : i; j; k) i; i j; Gammai k) and G S : i; j; k ....
Alain Darte, Georges-Andr# Silber, and Fr#d#ric Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Parallel Processing Letters, 1997. Special issue, to appear. 28
....was previously developed by Peir [27] see also [34] None of these techniques however answers completely the question we stated above. The idea to decompose the construction of an a ne transformation in two steps a step for the linear part, a step for the shifting part was developed in [10]. But, again, they were not able to completely characterize the cases when a loop shifting exists that completes the unimodular part of the transformation. In [20] our main problem is addressed: a polynomial time algorithm is given, but the technique misses one point and, as we will show, the ....
....as a loop shifting technique for program transformations is not new and has been mainly used for software pipelining (see [5, 3, 8] taking advantage of results from the VLSI community. We can also notice some attempts to study the possibilities of retiming as a tool for loop parallelization (see [26, 20, 10]) or for other code optimizations (see [14] In the following, we will use indi erently a shift or a retiming of a dependence graph as a function assigning an integer value r(u) scalar or vector) to each vertex u of the graph. The principle of a scalar retiming is to move an operation u from ....
[Article contains additional citation context not shown here]
Alain Darte, Georges-Andr Silber, and Frdric Vivien. Combining Retiming and Scheduling Techniques for Loop Parallelization and Loop Tiling. Parallel Processing Letters, 7(4):379392, 1997. 30
....zero weight edges, i.e. edges without register . What we called (G) is now the largest delay of a zero weight path, called the clock period of the circuit. This link between loop shifting and circuit retiming is not new. It was used in several algorithms on loop transformations (see for example [5, 3, 4, 7]) including software pipelining. 3.3 Selecting loop shifting for loop compaction How can we select a good shifting for loop compaction Let us rst consider the strategies followed by the di erent move then schedule algorithms. Enhanced software pipelining and its extensions [21] circular ....
Alain Darte, Georges-Andr Silber, and Frdric Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Parallel Processing Letters, 7(4):379392, 1997.
....linear schedule (see Section 5.1) It is optimal for maximal parallelism detection if dependences are approximated by dependence polyhedra. Since it is simpler than Feautrier s algorithm, more optimizing criteria can be handled: the detection of permutable loops and outer parallelism 8 (see [13]) and the minimization of synchronizations through loop fusion (see Section 6.2) Furthermore, the code generation is simpler (see Section 5) However, it may nd less parallelism than Feautrier s algorithm when exact dependence analysis is feasible because of its restricted choice of ....
....we impose that [r S ] i S;S 0 Gamma 1] r S 0 ] i S;S 0 Gamma 1] where i S;S 0 is the rst level of the innermost block of permutable loops surrounding S and S 0 . Fortunately, this technical condition is true for shifted linear schedules that are built by the algorithm proposed in [13] for which we developed these simplication techniques. Back to Example 1: We assume that the rst two dimensions correspond to a block of permutable loops. Following the proof of Theorem 2, we nd the two loop skewing transformations G S : i; j; k) i; i j; Gammai k) and G S 0 : i; j; k ....
Alain Darte, Georges-Andr# Silber, and Fr#d#ric Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Parallel Processing Letters, 1997. Special issue, to appear. 28
....graph of edges without register . What we called (G) is now the largest delay 6 of a path without register, called the clock period of the circuit. This link between loop shifting and circuit retiming is not new. It has been used in several algorithms on loop transformations (see for example [5, 3, 4, 7]) including software pipelining. 3.3 Selecting loop shifting for loop compaction How can we select a good shifting for loop compaction Let us rst consider the strategies followed by the di erent move then schedule algorithms. Enhanced software pipelining and its extensions [19] circular ....
Alain Darte, Georges-Andr Silber, and Frdric Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Parallel Processing Letters, 7(4):379392, 1997.
....before compilation to VHDL, and it is used in the project HPFIT [5] to implement parallelization algorithms. Nestor is now publicly available with its source code and its documentation at the address http: www.ens lyon.fr gsilber nestor. We are implementing new parallelization algorithms [7] into it. These parallelization algorithms could be included in the base Nestor package and then transform it into a more powerful source to source automatic parallelization kernel. ....
Alain Darte, Georges-Andre Silber, and Frederic Vivien. Combining Retiming and Scheduling Techniques for Loop Parallelization and Loop Tiling. Parallel Processing Letters, 7(4):379-392, 1997.
....overlap, mapping, limited resources, different speed processors, heterogeneous networks 1 Introduction Tiling is a widely used technique to increase the granularity of computations and the locality of data references. This technique applies to sets of fully permutable loops [22, 14, 10]. The basic idea is to group elemental computation points into tiles that will be viewed as computational units (the loop nest must be permutable so that such a transformation is valid) The larger the tiles, the more efficient are the computations performed using state of the art processors with ....
Alain Darte, Georges-Andr'e Silber, and Fr'ed'eric Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Parallel Processing Letters, 1997. Special issue, to appear. Also available as Tech. Rep. LIP, ENS-Lyon, RR96-34, and on the WEB at http://www.ens-lyon.fr/LIP.
....normale sup erieure de Lyon and partly supported by DRET DGA under contract ERE 96 1104 A000 DRET DS SR. 1 Introduction Tiling is a widely used technique to increase the granularity of computations and the locality of data references. This technique applies to sets of fully permutable loops [23, 18, 13]. The basic idea is to group elemental computation points into tiles that will be viewed as computational units (the loop nest must be permutable so that such a transformation is valid) The larger the tiles, the more e cient are the computations performed using state of the art processors with ....
Alain Darte, Georges-Andre Silber, and Frederic Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Parallel Processing Letters, 7(4):379-392, 1997.
.... to remove disturbing false dependences (see [6] ffl By construction, it can be naturally adapted to the search for maximal sets of fully permutable loops which is, in theory, an equivalent problem, and is, in practice, a way to exploit medium grain parallelism (for a complete study see [13]) ffl It produces schedules as regular as possible in order to generate codes as simple as possible. Indeed, our algorithm rewrites the codes using affine schedules, but, unlike Feautrier s algorithm, these affine schedules are chosen such as as many statements as possible have the same linear ....
....we do not specify, on purpose, how the vector X and the constants ae are selected, so as to allow various selection criteria. We refer to Section 7 for more details. For example, a maximal set of linearly independent vectors X can be selected if the goal is to derive fully permutable loops (see [13]) Back to Example 5 Consider the uniform dependence graph of Figure 16. There are two elementary cycles of weights (1; 0; 1) and (0; 1; 1) and five self loops of weights (0; 0; 1) 0; 0; Gamma1) 0; 1; 0) twice) and (0; Gamma1; 0) Therefore, all edges (except the edges that only belong to ....
[Article contains additional citation context not shown here]
Alain Darte, Georges-Andr'e Silber, and Fr'ed'eric Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Parallel Processing Letters, 1997. Special issue, to appear. Also available as Tech. Rep. LIP, ENS-Lyon, RR96-34.
....for each vertex v 2 V , there exists a constant ae v such that: 8e = x; y) 2 E; X: w e ae y Gamma ae x 0 Furthermore, the subgraph generated by the edges with null delay is acyclic. The proofs use Bellman Ford s algorithm. All details are provided in the extended version of this paper [3]. Note that the number of elementary cycles in a graph can be exponential in the number of vertices and edges of the graph. Therefore, checking directly that X : wC 1 or X: w C l C for all elementary cycles can be exponential, even if in practice it can be fast when the number of cycles ....
....loops detection in polyhedral reduced dependence graphs (PRDG) we conjecture it is also true for maximal permutable loops detection. space, we only state the main results that are needed to understand the technique. All detailed proofs are available in the extended version of this paper [3]. Condition 1 is a necessary condition, expressed in terms of edges. It can be reformulated as a necessary condition on cycles: Lemma 3 (Condition on cycles) Let M be a matrix. M satisfies Condition 1 for some vectors ae v , v 2 V , if and only if M wC 0 for each cycle C of G. We now show ....
[Article contains additional citation context not shown here]
Alain Darte, Georges-Andr'e Silber, and Fr'ed'eric Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Technical Report 9634, LIP, ENS-Lyon, France, November 1996.
....shifter linear schedule (see Section 4.1) It is optimal for maximal parallelism detection if dependences are approximated by dependence polyhedra. Since it is simpler than Feautrier s algorithm, more optimizing criteria can be handled: the detection of permutable loops and outer parallelism (see [14]) and the minimization of synchronizations through loop fusion (see Section 5.2) Furthermore, the code generation is simpler (see Section 4) However, it may nd less parallelism than Feautrier s algorithm when exact dependence analysis is feasible because of its restricted choice of ....
Alain Darte, Georges-Andr# Silber, and Fr#d#ric Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Parallel Processing Letters, 1997. Special issue, to appear. Also available as Tech. Rep. LIP, ENS-Lyon, RR96-34.
.... not explicitly described in these terms in [20] The notion of timing vectors is in the heart of the hyperplane method and its variants [17,7] which are particularly interesting for exposing fine grain parallelism, whereas the notion of fully permutable loops is the base of all tiling techniques [15,18,4,20,8], which are used for exposing coarse grain parallelism. As said before, both formulations are equivalent when reasoning on Gamma . Example 4 DO i=1,n DO j=1,n DO k=1,n a(i,j,k) a(i 1,j i,k) a(i,j,k 1) a(i,j 1,k 1) CONTINUE 1 0 0 1 1 0 0 1 Fig. 3: Example 4 and its Reduced ....
Alain Darte, Georges-Andr'e Silber, and Fr'ed'eric Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Technical Report 9634, LIP, ENS-Lyon, France, November 1996.
....Normale Sup erieure de Lyon and is partly supported by DRET DGA under contract ERE 96 1104 A000 DRET DS SR. 1 Introduction Tiling is a widely used technique to increase the granularity of computations and the locality of data references. This technique applies to sets of fully permutable loops [22, 14, 10]. The basic idea is to group elemental computation points into tiles that will be viewed as computational units (we need the loop nest to be permutable so that such a transformation is valid) The larger the tiles, the more efficient the computations are performed using state of the art processors ....
Alain Darte, Georges-Andr'e Silber, and Fr'ed'eric Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Parallel Processing Letters, 1997. Special issue, to appear. Also available as Tech. Rep. LIP, ENS-Lyon, RR96-34.
....Normale Sup erieure de Lyon and is partly supported by DRET DGA under contract ERE 96 1104 A000 DRET DS SR. 1 Introduction Tiling is a widely used technique to increase the granularity of computations and the locality of data references. This technique applies to sets of fully permutable loops [22, 14, 10]. The basic idea is to group elemental computation points into tiles that will be viewed as computational units (the loop nest must be permutable so that such a transformation is valid) The larger the tiles, the more efficient are the computations performed using state of the art processors with ....
Alain Darte, Georges-Andr'e Silber, and Fr'ed'eric Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Parallel Processing Letters, 1997. Special issue, to appear. Also available as Tech. Rep. LIP, ENS-Lyon, RR96-34.
....Step (1) and Step (3) In Step (3) we do not specify, on purpose, how the vector X and the constants ae are selected, so as to allow various selection criteria. For example, a maximal set of linearly independent vectors X can be selected if the goal is to derive fully permutable loops (see [13] for details) Back to Example 6.3 Consider the uniform dependence graph of Figure 6.6. There are two elementary cycles of weights (1; 0; 1) and (0; 1; 1) and five self loops of weights (0; 0; 1) 0; 0; Gamma1) 0; 1; 0) twice) and (0; Gamma1; 0) Therefore, all edges (except the edges that ....
....to obtain its linear programs, which Darte Vivien avoids thanks to its uniformization scheme. Therefore, Feautrier s linear programs are more complex. Both algorithms were extended from fine grain to medium grain parallelism detection through a search for fully permutable loops. Darte et al. [13] proposed an extension of Darte Vivien which is a mere generalization of Wolf Lam. Lim and Lam [27] proposed an extension of Feautrier which finds maximal sets of fully permutable loops while minimizing the amount of synchronizations required in the parallelized code. Darte Vivien produces ....
Alain Darte, Georges-Andr'e Silber, and Fr'ed'eric Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Technical Report 96-34, LIP, ENS-Lyon, France, November 1996.
No context found.
Alain Darte, Georges-Andre Silber, and Frederic Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Parallel Processing Letters, 7(4):379-392, 1997.
No context found.
A. Darte, G.-A. Silber, and F. Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Parallel Processing Letters, 7(4):379--392, 1997.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC