| A. Darte and Y. Robert, "Constructive Methods for Scheduling Uniform Loop Nests," IEEE Trans. Parallel Distrib. Syst., vol. 5, no. 8, pp. 814-822, Aug. 1994. |
....in Section 6, with some overall remarks on each method s efficiency. 2. Preliminary Concepts and Definitions 2.1. Model of the Algorithms We will use the computational uniform data dependence model of perfectly nested FOR loop algorithms, widely used in many similar papers (e.g. see [1] 2] [3], 6] 7] 8] So, our algorithms are of the form: FOR i 1 =l 1 TO u 1 DO . FOR i n =l n TO u n DO AS 1 (i) AS k (i) ENDFOR . Figure 1. The algorithm model. where l i and u are integer valued constants (boundary values of the i th inner loop) instance vector is denoted as i = ....
A. Darte and Y. Robert, "Constructive Methods for Scheduling Uniform Loop Nests," IEEE Trans. Parallel Distrib. Syst., vol. 5, no. 8, pp. 814-822, Aug. 1994.
.... in literature to find the optimal hyperplane; some of them are based on the solution of diophantine equations [8, 10] and others on the use of integer programming [13, 3] or even linear programming in subspaces [13] When index spaces with uniform de1 For further studying on this method, see [2] and [13] pendence vectors are concerned, a polynomial complexity scheduling algorithm is presented in [7] Once optimal parallel execution is found, an efficient method of mapping the concurrent groups of computations (hyperplanes) into the parallel architecture should be applied. A systematic ....
A. Darte and Y. Robert, "Constructive Methods for Scheduling Uniform Loop Nests", IEEE Transactions on Parallel and Distributed Systems, vol. 5, pp. 814-822, August 1994.
....two scheduling techniques, we introduce the terms M task and S task to denote a task that can run on multiple processors and a single processor, respectively. An M task can be either a purely data parallel task, or a mixed task data parallel routine. While pure data parallel scheduling techniques [3, 11, 12, 15, 24] could still be applied within data parallel M tasks, pure task scheduling techniques [17, 18, 19, 25, 26] are no longer applicable to schedule M tasks. As a result, new approaches have to be found that fully exploit the available parallelism. Scheduling is known to be NP complete even for the ....
A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. Parallel and Distributed Systems, 5(8):814--822, 1994.
....case of direction vectors. Here, the respective dependence vectors are (0; 1) Gamma) and ( Gamma) In the second dimension, the 1 and the Gamma prevents to detect two levels of fully permutable loops. Therefore, the code remains unchanged. No parallelism is detected. Darte and Robert [DR94, DR95] Darte and Robert look for an affine schedule for each statement that satisfies all dependences. Exact dependence analysis is needed, and a quite large linear system (obtained by the duality theorem of linear programming) has to be solved. This technique leads to the valid schedule T (i; j) ....
Alain Darte and Yves Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. Parallel Distributed Systems, 5(8):814--822, 1994.
....tranformation, which organizes indiced computations into well defined distinct groups, called hyperplanes. As the hyperplane method was proved to be nearly optimal from Darte [3] many researchers have focused on it. As a result, it was extended by several researchers such as Shang [12] and Darte [4], by allowing This work was partially supported by the General Secretariat of Research and Development. more computations to be executed in parallel. On the other hand, general scheduling approaches were never used in loop parallelization. General multiprocessor scheduling with precedence ....
A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Transactions on Parallel and Distributed Systems, 5(8):814--822, 1994.
....It is sucient to consider the vertices W i of the polytops I i since the linear functions i (i) have their extreme values only at the vertices of a polytop. The vertex method allows to consider di erent index spaces I i for the equations of the SURE in contrary to linear program proposed in [19, 2] which is based on the application of the duality theorem. The objective function is the product of the latency L = t max t min and the chip area of one processor C p = jMj P j=0 n j c j . This product is appropriate if the available chip area C c is signi cantly less than the necessary chip ....
A. Darte, Y. Robert: "Constructive Methods for Scheduling Uniform Loop Nests", IEEE Trans. on Parallel ans Distributed Systems, Vol. 5, No. 8, pp. 814-822, 1994
....in part by the NSF CAREER grant MIP 9501006, and by the William D. Mensch, Jr. Fellowship. have considered the optimization of nested loops, a software point of view of the MD systems [1, 7, 12] In the area of high level synthesis, researchers also have focused on the optimization of MD problems [2, 6]. In a previous study, it has been shown that full parallelism can be obtained by the application of MD retiming techniques [10] In general, these methods transform the loops in such a way to obtain a new sequence of execution characterized by a higher parallelism. This sequence of execution is ....
....constraint that can not be achieved by the straightforward implementation of the loop. In this case, optimization techniques are used to improve the parallelism among the operations in order to satisfy the time constraint. Some of the existing optimization methods point directly to MD problems [2, 6, 10, 12]. Most of these methods derive from the work by Lamport [5] and require a new scheduling direction when executing the loop iterations. Such a change implies in complex formulations of loop bounds and indices when writing the transformed, optimized, code. A possible method of obtaining ....
A. Darte and Y. Robert, " Constructive Methods for Scheduling Uniform Loop Nests," IEEE Trans. on Parallel and Distributed Systems, 1994, Vol. 5, no. 8, pp. 814-822.
....j Gamma 1) node D: z(i; j) t(i; j) w(i; j) node A: y(i; j) z(i; j) x(i; j) There exists a serial execution of two additions and one multiplication. Using optimization techniques, we can transform the loop such that all its operations can be executed simultaneously within one iteration [3, 7, 11, 15]. This transformation requires a change in the execution sequence, which is usually computed by the optimization technique. For example, the index shift method [7] and the chained multi dimensional retiming [11] would require a new execution direction (1; 1) Under this new schedule, the loop body ....
A. Darte and Y. Robert, "Constructive Methods for Scheduling Uniform Loop Nests," IEEE Trans. on Parallel and Distributed Systems, 1994, Vol. 5, no. 8, pp. 814-822.
....two scheduling techniques, we shall use the terms M task and S task to denote a task that can run on multiple processors and a single processor, respectively. An M task can be either a purely data parallel task, or a mixed task data parallel routine. While pure data parallel scheduling techniques [1, 5, 13, 14, 16, 19, 29] could still be applied within data parallel M tasks, pure task scheduling techniques [12, 15, 17, 22, 23, 24, 30, 31] are no longer applicable to schedule M tasks. As a result, new approaches have to be found 2 that fully exploit the available parallelism. Scheduling is known to be NP complete ....
A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Transactions on Parallel and Distributed Systems, 5(8):814--822, 1994.
.... of a linear program: maximize h T i h T j subject to Hi h 0 Hj h 0 Application of the duality theorem [10] leads to: minimize h T 0 (y 1 y 2 ) subject to y T 1 H = h y T 2 H = h (18) which is again an approximation since the duality theorem is valid only for linear programs [1]. 7 In the following we assume that both the number K of processors of the full size array forming one processor of the partitioned processor array, and the iteration interval are given. For techniques treating and K as variable in similar linear programs we refer to e.g. 4] Linearization ....
A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. on Parallel and Distributed Systems, 5(8):814--822, 1994.
....in Section 6, with some overall remarks on each method s efficiency. 2. Preliminary Concepts and Definitions 2.1. Model of the Algorithms We will use the computational uniform data dependence model of perfectly nested FOR loop algorithms, widely used in many similar papers (e.g. see [1] 2] [3], 6] 7] 8] So, our algorithms are of the form: FOR i 1 =l 1 TO u 1 DO . FOR i n =l n TO u n DO AS 1 (i) AS k (i) ENDFOR . ENDFOR Figure 1. The algorithm model. where l i and u are integer valued constants (boundary values of the i th inner loop) instance vector is ....
A. Darte and Y. Robert, "Constructive Methods for Scheduling Uniform Loop Nests," IEEE Trans. Parallel Distrib. Syst., vol. 5, no. 8, pp. 814-822, Aug. 1994.
....two scheduling techniques, we introduce the terms M task and S task to denote a task that can run on multiple processors and a single processor, respectively. An M task can be either a purely data parallel task, or a mixed task data parallel routine. While pure data parallel scheduling techniques [3, 11, 12, 15, 24] could still be applied within data parallel M tasks, pure task scheduling techniques [17, 18, 19, 25, 26] are no longer applicable to schedule M tasks. As a result, new approaches have to be found that fully exploit the available parallelism. Scheduling is known to be NP complete even for the ....
A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. Parallel and Distributed Systems, 5(8):814--822, 1994.
.... array by consideration of the processor functionality [3] An approach to compute a variety of linear allocation and scheduling functions is proposed in [11] Some notes to the determination of unconstrained minimal scheduling functions for algorithms with uniform dependencies can be found in [1]. Resource constraint scheduling for a given processor functionality is presented in [13] An approach to minimize the throughput by consideration of the chip area is proposed in [9] In [2] the approach [13] is extended to determine additionally the processor functionality in order to minimize a ....
A. Darte, Y. Robert: "Constructive Methods for Scheduling Uniform Loop Nests", IEEE Trans. on Parallel and Distributed Systems, Vol. 5, No. 8, pp. 814-822, 1994
....tools are required to exploit the degrees of freedom during the design process for deriving optimal parallel implementations of algorithms on recon gurable architectures. In this paper we present an approach to map regular algorithms onto FPGAs using methods of the design of processor arrays [1, 5, 6, 7, 10, 11]. The design is restricted by the limited number of Con gurable Logic Blocks (CLBs) available in the FPGA. As objective e mail: mmel,merker iee1.et.tu dresden.de The research was supported by the Deutsche Forschungsgemeinschaft , in the project A1 SFB358. we consider the minimal latency of ....
A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. on Parallel and Distributed Systems, 5(8):814-822, 1994.
....covers the design of a cost minimal interconnection network and the organization of the data transfers using this interconnections. A solution of the problem of organizing the data transfers for a given interconnection network is also presented. The design of processor arrays is well studied (e.g. [2,7,8,10,13]) and became more realistic by inclusion of resource constraints [3,5,12] But up to now, only some work has been done in the organization of data transfer. Fortes and Moldovan [6] as well as Lee and Kedem [9] discuss the need of a decomposition of global interconnections into a set of local ....
A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. on Parallel and Distributed Systems, 5(8):814-822, 1994.
....s4 d1 d2 d3 d4 d5 S1 to S4: d1= 0,1) S1 to S3: d2= 0,2) S2 to S1: d3= 1,5) S4 to S1: d4= 1, 4) S3 to S2: d5= 1, 6) a) b) c) Figure 4: Dataflow graph and dependence vectors. 6 Experiments In this section we present the application of our method to a well known example taken from [6]. The loop body is: statement S1 a(i, j) b(i, j 6) d(i 1,j 3) statement S2 b(i 1,j 1) c(i 2, j 5) statement S3 c(i 3,j 1) a(i, j 2) statement S4 d(i, j 1) a(i, j 1) In the example, the variable a(i; j) is produced by statement S 1 (i; j) and consumed by statement S 4 (i; j 1) ....
....10 20 30 40 50 60 70 80 90 100 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 10 Partition width LRU FIFO Carrot hole optimal point Figure 5: Performance of carrot hole data scheduling compared to FIFO and LRU. Examples On chip memory Partition size LRU FIFO Carrot hole Nested loop 1 [6] 500 33 1053708 1055163 109535 Nested loop 1 [6] 1000 66 1002325 1004427 64760 Nested loop 2 [6] 500 71 1859952 3324436 52910 Nested loop 2 [6] 1000 142 1701443 1777341 31952 WDF [13] 500 250 864110 845355 6404 WDF [13] 1000 500 1470274 1433933 4002 IIR filter [13] 500 249 1983047 5900471 ....
[Article contains additional citation context not shown here]
A. Darte and Y. Robert, "Constructive Methods for Scheduling Uniform Loop Nests," IEEE Transactions on Parallel and Distributed Systems, Vol. 5, No. 8, pp. 814--823, August 1994.
....two scheduling techniques, we introduce the terms M task and S task to denote a task that can run on multiple processors and a single processor, respectively. An M task can be either a purely data parallel task, or a mixed task data parallel routine. While pure data parallel scheduling techniques [1, 5, 14, 15, 16, 19, 29] could still be applied within data parallel M tasks, pure task scheduling techniques [13, 21, 22, 23, 24, 30, 31] are no longer applicable to schedule M tasks. As a result, new approaches have to be found that fully exploit the available parallelism. Scheduling is known to be NP complete even for ....
A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Transactions on Parallel and Distributed Systems, 5(8):814--822, 1994.
....(hyperplanes) into the parallel architecture should be applied. A systematic methodology for mapping into fixed size systolic arrays was presented in [8] Since the target architecture is synchronously operating, there is no need for communication effi1 For further studying on this method, see [2] and [13] cient mapping and the main criterion for optimality is now the total number of processors. Other methods dealt with the same problem of mapping, while reducing not only the size but also the resulting dimension of the systolic array (see [4, 6, 11] Researchers are trying to ....
A. Darte and Y. Robert, "Constructive Methods for Scheduling Uniform Loop Nests", IEEE Transactions on Parallel and Distributed Systems, vol. 5, pp. 814-822, August 1994.
....program transformation environment MmAlpha. A program like that of Prog.2 is rst uniformized [12, 16] to remove the data broadcasts and non local communications. It is then scheduled, i.e. each computation of the program is assigned an ane time function consistent with the data dependencies [3]. Finally an ane change of basis is performed on the index space of each variable, so that one of the indices represents the time at which this variable is computed, and the other indices specify the processor on which the computation is performed, in some processor array whose shape is given by ....
....product (we get the very classical systolic array depicted by Fig.3a) to synthesize bit serial operators as we did in the previous section (a bit serial adder consists of one full adder and two ip ops) then combine them to get a bit level circuit as shown by Fig.3b. M[1,1] M[1,2] M[1,3] M[2,1] M[2,2] M[2,3] M[3,1] M[3,2] M[3,3] V[1] V[2] V[3] R[1] R[2] R[3] p=1 p=N Multiplier a: The word level array b: Its bit serial cell Figure 3. Bit serial systolic array for the matrix vector product We won t elaborate on this approach: it needs ....
[Article contains additional citation context not shown here]
A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. Parallel Distributed Systems, 5:814-822, 1994.
....wavefronts as shown in Figure 8(b) to provide greater parallelism. This involves rotating wavefronts such that the number of independent tiles in the largest wavefront is exactly equal to the number of processors. This rotation corresponds to the selection of a different scheduling vector 1 [4]. The scheduling vector is (1; 1) for the original wavefronts in Figure 8(a) The scheduling vector for the modified wavefronts in Figure 8(b) is given by (b(N T ) B Delta P )c; 1) where N T is the number of iterations (with skewing) B is the tile size, and P is the number of processors. ....
A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. on Parallel and Distributed Systems, 5(8):814--822, October 1994.
....The basic model represents only perfectly nested loops. This severe restriction can be relaxed by applying the basic method to every single statement separately rather than to the body as a whole. Thus, every statement has its own index space, index vector, space time mapping and target space [5, 9, 15]. An operation in the program is identified by a statement together with its index vector. Of course, the feature of statementwise space time mapping complicates the generation of target code significantly (cf. Section 3.7) 2.3 Extension to loop nests containing while loops One of the main ....
A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. on Parallel and Distributed Systems, 5(8):814--822, August 1994.
....the types of schedule and processor allocation. For example, a system of affine recurrence equations, which is more general than UREs, can sometimes be converted into a system of quasi uniform recurrence equations [41] Some mapping techniques use quasi linear schedules [44] or affine schedules [7] [8] 10] 34] Similar scheduling techniques have been applied to the compilation of nested loops on programmable parallel machines. In addition, researchers have developed partitioning techniques, or more general, techniques of mapping a n dimensional DG to a k dimensional array so that k is ....
A. Darte and Y. Roberts, "Constructive Methods for Scheduling Uniform Loop Nests," IEEE transactions on Parallel and Distributed Systems, Vol. 5, pp. 814-822, August 1994.
....between two data access functions that address the same array and reciprocally, all the dioeerences between two data access functions that address the same array are dependence vectors. 2. 2 Scheduling Darte and Robert have presented techniques to compute schedules for a given uniform loop nest [2, 4]. These techniques are part of the theoretical basis of Bouclettes. Currently, the user has the choice between the linear schedule and the shifted linear schedule. A schedule is a function that associates to each computation point (each iteration of a statement) the time when it is computed in ....
Alain Darte and Yves Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. Parallel Distributed Systems, 5(8):814822, 1994.
....between two data acess functions that address the same array; and reciprocally, all the dioeerences between two data acess functions that address the same array are dependence vectors. 2. 2 Scheduling Darte and Robert have presented techniques to compute schedules for a given uniform loop nest [3, 5]. These techniques are part of the theoretical basis of Bouclettes 1 . Currently, the user has the choice between linear scheduling and shifted linear scheduling. the linear schedule is a linear function that associates a time t to an iteration point i ( i = i; j; k) if the loop nest ....
Alain Darte and Yves Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. Parallel Distributed Systems, 5(8):814822, 1994.
.... linear schedules for a single URE over a polyhedral domain was also close to being globally optimal [DKR91] Darte and Robert formulated linear programming problems to determine the optimal variable dependent one dimensional affine schedule for a SURE defined over parameterized polyhedral domains [DR94a], and also for SAREs. Feautrier [Fea92b] and Darte and Robert [DR95] extended these ideas to one dimensional affine schedules for a single ARE and variable dependent affine schedules for SAREs over parameterized families of domains (piecewise affine schedules for Feautrier) Both papers are ....
A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Transactions on Parallel and Distributed Systems, 5(8):814--822, Aug 1994.
....processing element is not sufficient to execute the operations assigned to them concurrently. As a result, a new research direction on extending array mapping methodologies can be recognized in the fusion of mapping techniques for regular processor arrays and high level synthesis tasks, see, e.g. [20, 4, 2, 18, 6, 5]. Here is a short overview of recent work in this direction: Except of special cases, resource constrained scheduling problems are known to be NPcomplete, see e.g. 3, 7] Hence, numerous heuristic methods have been proposed in order to find a compromise between the runtime of the scheduling ....
....2 AK = Gamma K y 2 0 y 3 A J = J y 3 0 y 4 A J = Gamma J y 4 0 (25) Clearly, the duality theorem holds for linear programs only. Consequently, the total evaluation time is approximated only. For bounding the error which is introduced in the case of additional integral constraints, see e.g. [4, 5] and the references herein. 3.2. Dependence constraints Let us suppose that an operation j directly depends on an operation i via the distance d ij . In this case there is an edge (v i ; v j ) 2 ED in the reduced dependence graph with distance vector d ij . The result of operation j is available ....
Alain Darte and Yves Robert. Constructive methods for scheduling uniform loop nests. IEEE Transactions on Parallel and Distributed Systems, Vol. 5, No. 5:814--822, August 1994.
....the branch decisions along the final schedule. Some of the previous research in software pipelining scientific applications was only applicable to problems which do not include conditional statements [8, 15, 18, 19, 20, 22, 24, 34] Also in the techniques such as the affine by statement technique [6] and the index shift method [17] resource constrained designs are not considered. In the direction of multi dimensional retiming, 19, 21] do not take into account resource constraints. Push up scheduling [22] presented an optimal resource N j M i (N,M) a) b) Figure 1: a) Floyd Steinberg ....
A. Darte and Y. Robert, "Constructive Methods for Scheduling Uniform Loop Nests," IEEE Transactions on Parallel and Distributed Systems, Vol. 5, no. 8, pp. 814-822, August, 1994.
....of the transformation may fuse loops in the presence of fusion preventing dependencies. However, when the number of peeled iterations exceeds the number of iterations per processor, this method is not efficient. Affine scheduling methods focus on improving parallelism in the entire program body [9, 10, 16, 29]. Loop fusion becomes a side effect of the application of the affine scheduling method to the code that presents consecutive loop segments under this consideration. Examples of such methods are the affine by statement and affine scheduling techniques described by Dart and Robert [9] and Feautrier ....
....[9, 10, 16, 29] Loop fusion becomes a side effect of the application of the affine scheduling method to the code that presents consecutive loop segments under this consideration. Examples of such methods are the affine by statement and affine scheduling techniques described by Dart and Robert [9] and Feautrier [10] which offer a general solution to the parallel optimization problem. These solutions usually depend on linear programming techniques to achieve the optimized results and are directly applicable to hardware design. The solution presented in this paper utilizes the ....
A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Transactions on Parallel and Distributed Systems, 5(8):814--822, August 1994.
....variable dependent) affine schedules for a sare. He further extended the method to multidimensional schedules [16] As for the optimality of schedules, basic results (for sures) were of course obtained by Karp et al. 3] and later by Rao, Shang Fortes, Darte et al. and DarteRobert [7, 17, 18, 19], among others. For sares, the problem was addressed by Feautrier, Darte Robert and Darte Vivien [15, 16, 20, 21] Finally, some theoretical results about the undecidability of scheduling are also available. Joinnault [22] showed that scheduling a sure whose variables are defined over arbitrary ....
A. Darte and Y. Robert, "Constructive methods for scheduling uniform loop nests," IEEE Transactions on Parallel and Distributed Systems, vol. 5, no. 8, pp. 814--822, Aug 1994.
....between two data access functions that address the same array; and reciprocally, all the differences between two data access functions that address the same array are dependence vectors. 2. 3 Scheduling Darte and Robert have presented techniques to compute schedules for a given uniform loop nest [4, 6]. These techniques are part of the theoretical basis of Bouclettes. Currently, user has the choice between linear scheduling and shifted linear scheduling. the linear schedule is a linear function that associates a time t to an iteration point i (i = i; j; k) if the loop nest is three ....
Alain Darte and Yves Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. Parallel Distributed Systems, 5(8):814-- 822, 1994.
....associated with a schedule vector, also called an ordering vector, that affects the order in which the iterations are performed. In the area of high level synthesis, researchers have focused on the optimization of multi dimensional problems through the selection of an appropriate schedule vector [16, 22, 24, 25]. The new schedule vector usually differs from the one used in the original design, introducing new memory requirements that may end up in complex storage control and substantial increment on the memory size. In a previous study, it has been shown that full parallelism can be obtained by the ....
.... the selection of a new schedule or ordering vector, as in the unimodular transformations [2, 30] loop skewing [29] and other traditional wavefront methods [1, 14] The last column, aff, presents results that could be obtained by modifying affine by statement methods developed for systolic arrays [16, 24, 25], and other methods focused on fine grain parallelism that also depend on the selection of a new schedule such as the schedule based multi dimensional retiming [21] Examining table 1, the chained MD retiming is the method that can get closer results to the Edge e d(e) de (e) dm (e) dr (e) orig ....
A. Darte and Y. Robert, " Constructive Methods for Scheduling Uniform Loop Nests," IEEE Transactions on Parallel and Distributed Systems, Vol. 5, no. 8, 1994, pp. 814-822.
....of mapping physically the optimized application of the target machine. Compared to our approach, there is no real time and architectural constraints (number of processors and memory resources) to take into account during the parallelization phase. Similar techniques are used in systolic arrays [16, 17, 14] and parallelization [23, 22, 26] communities to compute affine schedules. In the systolic community, these techniques are applied on a single loop nest with complex internal dependencies. The other approaches dealing with complete applications, do not have the same architectural and application ....
A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Transactions on Parallel and Distributed Systems, 5(8):814, August 1994.
....between neighboring processing elements. The next steps, named scheduling and allocation, constitute the parallelization stages. They govern intrinsically what kind of architectures we are finally going to deal with. Scheduling of uniform recurrence equations has been widely studied (see [4] for example) it gives an execution date for all the computations. An automatic schedule procedure is provided in mmalpha, a global clock is assumed and the execution date is given by a clock counter. The designer can specify which signal he wants to store (in register) in one iteration ....
A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Transactions on Parallel Distributed Systems, 5.8:814--822, 1994.
....communication by transforming the dependencies of the original graph. Much of previous results on retiming focus on one dimensional scheduling problems. Multi dimensional retiming research has been focused around the improvement of parallelism inherent in multi dimensional applications [3, 7, 8] not on the improvement of the resulting communication volume. Previous research on communication minimization, on the other hand, has concentrated on decreasing the communication by changing the partition, not by modifying the delays themselves [1, 5] 2. BASIC CONCEPT We will use Figure 1 to ....
A. Darte and Y. Robert, " Constructive Methods for Scheduling Uniform Loop Nests," IEEE Transactions on Parallel and Distributed Systems, 1994, Vol. 5, no. 8, pp. 814-822.
....between two data access functions that address the same array and reciprocally, all the differences between two data access functions that address the same array are dependence vectors. 2.2. Scheduling Darte and Robert have presented techniques to compute schedules for a given uniform loop nest [5, 7]. These techniques are part of the theoretical basis of Bouclettes. Currently, the user has the choice between the linear schedule and the shifted linear schedule. A schedule is a function that associates to each computation point (each iteration of a statement) the time when it is computed in ....
A. Darte andY. Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. Parallel Distributed Systems, 5(8):814--822, 1994.
....constrained by the number of delays (registers) in a cyclic data path. Recent studies have considered the optimization of nested loops, a software point of view of the MD problems [1, 8, 12, 13] In the area of high level synthesis, researchers also have focused on the optimization of MD problems [3, 6]. In general, these methods transform the loops in such a way to obtain a new sequence of execution characterized by a higher parallelism. This sequence of execution is commonly associated with a schedule vector. The new schedule vector usually differs from the one used in the original design, ....
....[11] wavefront shows the requirements imposed by methods based solely on the selection of a new schedule vector, as in the unimodular transformations [13] The row affine by st. presents results that could be obtained by modifying affine bystatement methods developed for systolic arrays [3, 6], and finally, methods focused on fine grain parallelism, such as the schedule based multi dimensional retiming [10] and the reindexing technique [12] are in row fine grain. We notice that when the fully parallel solution was achieved, the number of queues required by the final design and the ....
A. Darte and Y. Robert, " Constructive Methods for Scheduling Uniform Loop Nests," IEEE Trans. on Parallel and Distributed Systems, 1994, Vol. 5, no. 8, pp. 814-822.
....the total number of on chip memory misses. Therefore, our scheme can obtain least total number of on chip memory misses compared to other linear scheduling and partitioning schemes. 2 6 Experiments In this section we present the application of our method to a well known example taken from [6]. The loop body is: statement S1 a(i, j) b(i, j 6) d(i 1,j 3) statement S2 b(i 1,j 1) c(i 2, j 5) statement S3 c(i 3,j 1) a(i, j 2) statement S4 d(i, j 1) a(i, j 1) Our carrot hole scheduling method is applied to this example. In the example, the variable a(i; j) is produced by ....
....Figure 15: Performance of different methods. 0 500 1000 1500 2000 2500 3000 3500 4000 4500 0 5 10 15 20 25 30 On chip memory size Ratio of performance FIFO Carrot hole Figure 16: Ratio of on chip memory misses. Examples On chip memory Partition size LRU FIFO Carrot hole Nested loop 1 [6] 500 33 1053708 1055163 109535 Nested loop 1 [6] 1000 66 1002325 1004427 64760 Nested loop 2 [6] 500 71 1859952 3324436 52910 Nested loop 2 [6] 1000 142 1701443 1777341 31952 WDF [14] 500 250 864110 845355 6404 WDF [14] 1000 500 1470274 1433933 4002 IIR filter [14] 500 249 1983047 5900471 12020 ....
[Article contains additional citation context not shown here]
A. Darte and Y. Robert, "Constructive Methods for Scheduling Uniform Loop Nests," IEEE Transactions on Parallel and Distributed Systems, Vol. 5, No. 8, pp. 814--823, August 1994.
....transformation, which organizes indexed computations into well defined distinct groups called hyperplanes. Mathematical conditions which guarantee validity for a linear schedule were well defined and presented, and algorithms which find the optimal linear schedule were extensively elaborated [4], 10] 14] Linear schedules in loop parallelization were used in almost every attempt to map a nested loop into systolic architectures. Linear transformations affect all indexed computations uniformly, thus producing time schedules which fit into regularly interconnected parallel architectures. ....
....combinations of other simpler dependence vectors. 4.#THE CHAIN GROUPING METHOD Given a finite index space J n and the dependence set D, the first step is to find the hyperplane vector ## which determines the optimal execution time. Once ##is found by applying one of the methods presented in [4], or in [14] we should partition the index space into disjoint groups of related computations. An efficient grouping method should reduce the amount of intergroup communication while preserving the execution ordering imposed by the time scheduling step. As far as the grouping problem is ....
A. Darte and Y. Robert, "Constructive Methods for Scheduling Uniform Loop Nests," IEEE Trans. Parallel Distrib. Syst., vol. 5, no. 8, pp. 814-822, Aug. 1994.
....loop skewing [18] and loop quantization [1] These techniques do not change the structure of the iterations, and therefore may not achieve a fully parallel solution. More recent research has studied the scheduling of multi dimensional applications. For example, the affine by statement technique [4] and the index shift method [13] are able to achieve a fully parallel execution of multi dimensional tasks, utilizing algorithms based on linear programming techniques. However, these methods do not consider possible memory changes and consequently they may introduce new queues dependent on the ....
A. Darte and Y. Robert, " Constructive Methods for Scheduling Uniform Loop Nests," IEEE Transactions on Parallel and Distributed Systems, 1994, Vol. 5, no. 8, pp. 814-822.
....for each variable, and the system takes care of the rest. Our long term goal is to perform the analyses and choose the transformations automatically, using the now mature research on systolic synthesis and its generalization. Scheduling: There has been much research on the SARE scheduling problem [9, 17, 8, 18, 4], formulated as follows. For each variable V in a SARE, determine a function t V (z) that gives us the time instant, represented by a k dimensional time vector, at which V [z] can be computed. For example, t f (i; j) i j and t X (i) 2i are valid (one dimensional) affine schedules for the ....
A. Darte and Y. Robert. -- Constructive methods for scheduling uniform loop nests. -- IEEE Transactions on Parallel and Distributed Systems, 5(8):814--822, Aug 1994.
.... problems [1, 9, 14, 24] This study focuses on the parallelism inherent to multi dimensional applications, ignored by the one dimensional methods [12] Some recent research has been conducted in the scheduling of multi dimensional applications, such as the affine by statement technique [3] and the index shift method [17] However, these methods do not consider resource constrained designs. Other methods focus on multi processor scheduling and are not applicable to the target problem [10, 11, 16, 18, 19] In a previous study, we extended the concept of multi dimensional retiming, ....
....method [4] while OPT IMUS shows the data for the push up scheduling method, the row rotation shows the requirements imposed by the rotation scheduling method. The row affine by st. presents results that could be obtained by modifying affineby statement methods developed for systolic arrays [3, 17, 25], and finally, methods focused on fine grain parallelism that also depend on the selection of a new schedule, such as the reindexing technique [26] are presented in row fine grain. We notice that when the shortest schedule length was achieved, the number of additional queues and or inter iteration ....
A. Darte and Y. Robert, " Constructive Methods for Scheduling Uniform Loop Nests," IEEE Transactions on Parallel and Distributed Systems, Vol. 5, no. 8, pp. 814-822, August, 1994.
....The basic model represents only perfectly nested loops. This severe restriction can be relaxed by applying the basic method to every single statement separately rather than to the body as a whole. Thus, every statement has its own index space, index vector, space time mapping and target space [DR94, Fea92a, KPR94, Rao85]. An operation in the program is identified by a statement together with its index vector. Of course, the feature of statement wise space time mapping complicates the generation of target code significantly (cf. Section 3.8) 3 The Structure of LooPo Like most compilers, LooPo consists of a ....
Alain Darte and Yves Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. on Parallel and Distributed Systems, 5(8), August 1994.
....goal is to perform all the analyses and choose the transformations automatically. Much of the scheduling research has matured, and other problems are being addressed by a number of researchers. Scheduling: Over the past decade there has been considerable research on the SARE scheduling problem [17, 32, 30, 13, 14, 36, 9], formulated as follows. For each variable V in a SARE, determine a function t V (z) of the index point z, that gives us the time instant when V at z can be computed. In general, the the form of the schedule is a multidimensional affine function. For our example, it turns out that t f (i; j) i ....
A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Transactions on Parallel and Distributed Systems, 5(8):814--822, Aug 1994.
....and by the William D. Mensch, Jr. Fellowship. A B C D (1,0) 1,1) a) D A C B 1 2 1 (b) z 1 1 Figure 1: a) special case of MDFG where row wise computation is an optimization problem (b) circuit design, notice that z Gamma1 1 is equivalent to one register ming methods [5, 7]. However, the parallelization of operations that require data produced on the same iteration depends on the use of multiple processors. In this paper, we utilize the concept of MD retiming presented in [1, 6] to model the placement of registers along the circuit data paths, while considering the ....
A. Darte and Y. Robert, " Constructive Methods for Scheduling Uniform Loop Nests," IEEE Trans. on Parallel and Distributed Systems, 1994, Vol. 5, no. 8, pp. 814-822.
....case of direction vectors. Here, the respective dependence vectors are (0; 1) Gamma) and ( Gamma) In the second dimension, the 1 and the Gamma prevents to detect two levels of fully permutable loops. Therefore, the code remains unchanged. No parallelism is detected. Darte and Robert [DR94, DR95] Darte and Robert look for an affine schedule for each statement that satisfies all dependences. Exact dependence analysis is needed, and a quite large linear system (obtained by the duality theorem of linear programming) has to be solved. This technique leads to the valid schedule T (i; j) ....
Alain Darte and Yves Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. Parallel Distributed Systems, 5(8):814--822, 1994.
....to a set of perfectly nested loops whose dependences are all uniform (a perfect uniform loop nest) Such a loop nest with d loops always contains d Gamma 1 degrees of parallelism. The parallelized code contains one outer sequential loop and d Gamma 1 inner parallel loops. Darte and Robert [11, 12] Darte and Robert look for an affine schedule for each statement in the loop nest. All dependences need to be uniform, and a quite large linear system (obtained by the duality theorem of linear programming) has to be solved. The schedule is selected among all possible affine schedules. This ....
Alain Darte and Yves Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. Parallel Distributed Systems, 5(8):814--822, 1994.
....vector, based upon a partition of the solutions space into subcones on which they solve a linear fractional problem. Lisper [24] proposes an approach in which all couples of extremal points of the iteration space are enumerated before solving a linear program for each of them. Darte and Robert [8, 10, 5] propose a more efficient method which consists of solving only one single linear program: 8 : XD 1 X 1 A = X X 2 A = GammaX X 1 0 X 2 0 min (X 1 X 2 )b We refer the reader to [8, 10, 5] for a detailed proof that the solution of ( is indeed the optimal ....
....before solving a linear program for each of them. Darte and Robert [8, 10, 5] propose a more efficient method which consists of solving only one single linear program: 8 : XD 1 X 1 A = X X 2 A = GammaX X 1 0 X 2 0 min (X 1 X 2 )b We refer the reader to [8, 10, 5] for a detailed proof that the solution of ( is indeed the optimal scheduling vector. The first constraint in ( is Lamport s condition. The fact that the solution X of ( can be expressed both as X = X 1 A and X = GammaX 2 A with X 1 ; X 2 nonnegative is surprising at first but comes from ....
[Article contains additional citation context not shown here]
Alain Darte and Yves Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. Parallel Distributed Systems, 5(8):814--822, August 1994.
No context found.
A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Transactions on Parallel and Distributed Systems, 5(8):814--822, 1994.
No context found.
Alain Darte and Yves Robert. Constructive method for scheduling uniform loop nests. IEEE Trans. Parallel Distributed Syst., 5(8), Aug. 1994.
No context found.
A. Darte , Y. Robert, Constructive methods for scheduling uniform loop nests. IEEE Transactions on Parallel and Distributed Systems, Aug. 1994, vol.5, (no.8):814-22.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC