| A. Darte and Y. Robert, "Constructive Methods for Scheduling Uniform Loop Nests," IEEE Trans. Parallel Distrib. Syst., vol. 5, no. 8, pp. 814-822, Aug. 1994. |
....in Section 6, with some overall remarks on each method s efficiency. 2. Preliminary Concepts and Definitions 2.1. Model of the Algorithms We will use the computational uniform data dependence model of perfectly nested FOR loop algorithms, widely used in many similar papers (e.g. see [1] 2] [3], 6] 7] 8] So, our algorithms are of the form: FOR i 1 =l 1 TO u 1 DO . FOR i n =l n TO u n DO AS 1 (i) AS k (i) ENDFOR . Figure 1. The algorithm model. where l i and u are integer valued constants (boundary values of the i th inner loop) instance vector is denoted as i = ....
A. Darte and Y. Robert, "Constructive Methods for Scheduling Uniform Loop Nests," IEEE Trans. Parallel Distrib. Syst., vol. 5, no. 8, pp. 814-822, Aug. 1994.
.... in literature to find the optimal hyperplane; some of them are based on the solution of diophantine equations [8, 10] and others on the use of integer programming [13, 3] or even linear programming in subspaces [13] When index spaces with uniform de1 For further studying on this method, see [2] and [13] pendence vectors are concerned, a polynomial complexity scheduling algorithm is presented in [7] Once optimal parallel execution is found, an efficient method of mapping the concurrent groups of computations (hyperplanes) into the parallel architecture should be applied. A systematic ....
A. Darte and Y. Robert, "Constructive Methods for Scheduling Uniform Loop Nests", IEEE Transactions on Parallel and Distributed Systems, vol. 5, pp. 814-822, August 1994.
....two scheduling techniques, we introduce the terms M task and S task to denote a task that can run on multiple processors and a single processor, respectively. An M task can be either a purely data parallel task, or a mixed task data parallel routine. While pure data parallel scheduling techniques [3, 11, 12, 15, 24] could still be applied within data parallel M tasks, pure task scheduling techniques [17, 18, 19, 25, 26] are no longer applicable to schedule M tasks. As a result, new approaches have to be found that fully exploit the available parallelism. Scheduling is known to be NP complete even for the ....
A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. Parallel and Distributed Systems, 5(8):814--822, 1994.
....case of direction vectors. Here, the respective dependence vectors are (0; 1) Gamma) and ( Gamma) In the second dimension, the 1 and the Gamma prevents to detect two levels of fully permutable loops. Therefore, the code remains unchanged. No parallelism is detected. Darte and Robert [DR94, DR95] Darte and Robert look for an affine schedule for each statement that satisfies all dependences. Exact dependence analysis is needed, and a quite large linear system (obtained by the duality theorem of linear programming) has to be solved. This technique leads to the valid schedule T (i; j) ....
Alain Darte and Yves Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. Parallel Distributed Systems, 5(8):814--822, 1994.
....tranformation, which organizes indiced computations into well defined distinct groups, called hyperplanes. As the hyperplane method was proved to be nearly optimal from Darte [3] many researchers have focused on it. As a result, it was extended by several researchers such as Shang [12] and Darte [4], by allowing This work was partially supported by the General Secretariat of Research and Development. more computations to be executed in parallel. On the other hand, general scheduling approaches were never used in loop parallelization. General multiprocessor scheduling with precedence ....
A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Transactions on Parallel and Distributed Systems, 5(8):814--822, 1994.
....It is sucient to consider the vertices W i of the polytops I i since the linear functions i (i) have their extreme values only at the vertices of a polytop. The vertex method allows to consider di erent index spaces I i for the equations of the SURE in contrary to linear program proposed in [19, 2] which is based on the application of the duality theorem. The objective function is the product of the latency L = t max t min and the chip area of one processor C p = jMj P j=0 n j c j . This product is appropriate if the available chip area C c is signi cantly less than the necessary chip ....
A. Darte, Y. Robert: "Constructive Methods for Scheduling Uniform Loop Nests", IEEE Trans. on Parallel ans Distributed Systems, Vol. 5, No. 8, pp. 814-822, 1994
....in part by the NSF CAREER grant MIP 9501006, and by the William D. Mensch, Jr. Fellowship. have considered the optimization of nested loops, a software point of view of the MD systems [1, 7, 12] In the area of high level synthesis, researchers also have focused on the optimization of MD problems [2, 6]. In a previous study, it has been shown that full parallelism can be obtained by the application of MD retiming techniques [10] In general, these methods transform the loops in such a way to obtain a new sequence of execution characterized by a higher parallelism. This sequence of execution is ....
....constraint that can not be achieved by the straightforward implementation of the loop. In this case, optimization techniques are used to improve the parallelism among the operations in order to satisfy the time constraint. Some of the existing optimization methods point directly to MD problems [2, 6, 10, 12]. Most of these methods derive from the work by Lamport [5] and require a new scheduling direction when executing the loop iterations. Such a change implies in complex formulations of loop bounds and indices when writing the transformed, optimized, code. A possible method of obtaining ....
A. Darte and Y. Robert, " Constructive Methods for Scheduling Uniform Loop Nests," IEEE Trans. on Parallel and Distributed Systems, 1994, Vol. 5, no. 8, pp. 814-822.
....j Gamma 1) node D: z(i; j) t(i; j) w(i; j) node A: y(i; j) z(i; j) x(i; j) There exists a serial execution of two additions and one multiplication. Using optimization techniques, we can transform the loop such that all its operations can be executed simultaneously within one iteration [3, 7, 11, 15]. This transformation requires a change in the execution sequence, which is usually computed by the optimization technique. For example, the index shift method [7] and the chained multi dimensional retiming [11] would require a new execution direction (1; 1) Under this new schedule, the loop body ....
A. Darte and Y. Robert, "Constructive Methods for Scheduling Uniform Loop Nests," IEEE Trans. on Parallel and Distributed Systems, 1994, Vol. 5, no. 8, pp. 814-822.
....two scheduling techniques, we shall use the terms M task and S task to denote a task that can run on multiple processors and a single processor, respectively. An M task can be either a purely data parallel task, or a mixed task data parallel routine. While pure data parallel scheduling techniques [1, 5, 13, 14, 16, 19, 29] could still be applied within data parallel M tasks, pure task scheduling techniques [12, 15, 17, 22, 23, 24, 30, 31] are no longer applicable to schedule M tasks. As a result, new approaches have to be found 2 that fully exploit the available parallelism. Scheduling is known to be NP complete ....
A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Transactions on Parallel and Distributed Systems, 5(8):814--822, 1994.
.... of a linear program: maximize h T i h T j subject to Hi h 0 Hj h 0 Application of the duality theorem [10] leads to: minimize h T 0 (y 1 y 2 ) subject to y T 1 H = h y T 2 H = h (18) which is again an approximation since the duality theorem is valid only for linear programs [1]. 7 In the following we assume that both the number K of processors of the full size array forming one processor of the partitioned processor array, and the iteration interval are given. For techniques treating and K as variable in similar linear programs we refer to e.g. 4] Linearization ....
A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. on Parallel and Distributed Systems, 5(8):814--822, 1994.
....in Section 6, with some overall remarks on each method s efficiency. 2. Preliminary Concepts and Definitions 2.1. Model of the Algorithms We will use the computational uniform data dependence model of perfectly nested FOR loop algorithms, widely used in many similar papers (e.g. see [1] 2] [3], 6] 7] 8] So, our algorithms are of the form: FOR i 1 =l 1 TO u 1 DO . FOR i n =l n TO u n DO AS 1 (i) AS k (i) ENDFOR . ENDFOR Figure 1. The algorithm model. where l i and u are integer valued constants (boundary values of the i th inner loop) instance vector is ....
A. Darte and Y. Robert, "Constructive Methods for Scheduling Uniform Loop Nests," IEEE Trans. Parallel Distrib. Syst., vol. 5, no. 8, pp. 814-822, Aug. 1994.
....two scheduling techniques, we introduce the terms M task and S task to denote a task that can run on multiple processors and a single processor, respectively. An M task can be either a purely data parallel task, or a mixed task data parallel routine. While pure data parallel scheduling techniques [3, 11, 12, 15, 24] could still be applied within data parallel M tasks, pure task scheduling techniques [17, 18, 19, 25, 26] are no longer applicable to schedule M tasks. As a result, new approaches have to be found that fully exploit the available parallelism. Scheduling is known to be NP complete even for the ....
A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. Parallel and Distributed Systems, 5(8):814--822, 1994.
.... array by consideration of the processor functionality [3] An approach to compute a variety of linear allocation and scheduling functions is proposed in [11] Some notes to the determination of unconstrained minimal scheduling functions for algorithms with uniform dependencies can be found in [1]. Resource constraint scheduling for a given processor functionality is presented in [13] An approach to minimize the throughput by consideration of the chip area is proposed in [9] In [2] the approach [13] is extended to determine additionally the processor functionality in order to minimize a ....
A. Darte, Y. Robert: "Constructive Methods for Scheduling Uniform Loop Nests", IEEE Trans. on Parallel and Distributed Systems, Vol. 5, No. 8, pp. 814-822, 1994
....tools are required to exploit the degrees of freedom during the design process for deriving optimal parallel implementations of algorithms on recon gurable architectures. In this paper we present an approach to map regular algorithms onto FPGAs using methods of the design of processor arrays [1, 5, 6, 7, 10, 11]. The design is restricted by the limited number of Con gurable Logic Blocks (CLBs) available in the FPGA. As objective e mail: mmel,merker iee1.et.tu dresden.de The research was supported by the Deutsche Forschungsgemeinschaft , in the project A1 SFB358. we consider the minimal latency of ....
A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. on Parallel and Distributed Systems, 5(8):814-822, 1994.
....covers the design of a cost minimal interconnection network and the organization of the data transfers using this interconnections. A solution of the problem of organizing the data transfers for a given interconnection network is also presented. The design of processor arrays is well studied (e.g. [2,7,8,10,13]) and became more realistic by inclusion of resource constraints [3,5,12] But up to now, only some work has been done in the organization of data transfer. Fortes and Moldovan [6] as well as Lee and Kedem [9] discuss the need of a decomposition of global interconnections into a set of local ....
A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. on Parallel and Distributed Systems, 5(8):814-822, 1994.
....s4 d1 d2 d3 d4 d5 S1 to S4: d1= 0,1) S1 to S3: d2= 0,2) S2 to S1: d3= 1,5) S4 to S1: d4= 1, 4) S3 to S2: d5= 1, 6) a) b) c) Figure 4: Dataflow graph and dependence vectors. 6 Experiments In this section we present the application of our method to a well known example taken from [6]. The loop body is: statement S1 a(i, j) b(i, j 6) d(i 1,j 3) statement S2 b(i 1,j 1) c(i 2, j 5) statement S3 c(i 3,j 1) a(i, j 2) statement S4 d(i, j 1) a(i, j 1) In the example, the variable a(i; j) is produced by statement S 1 (i; j) and consumed by statement S 4 (i; j 1) ....
....10 20 30 40 50 60 70 80 90 100 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 10 Partition width LRU FIFO Carrot hole optimal point Figure 5: Performance of carrot hole data scheduling compared to FIFO and LRU. Examples On chip memory Partition size LRU FIFO Carrot hole Nested loop 1 [6] 500 33 1053708 1055163 109535 Nested loop 1 [6] 1000 66 1002325 1004427 64760 Nested loop 2 [6] 500 71 1859952 3324436 52910 Nested loop 2 [6] 1000 142 1701443 1777341 31952 WDF [13] 500 250 864110 845355 6404 WDF [13] 1000 500 1470274 1433933 4002 IIR filter [13] 500 249 1983047 5900471 ....
[Article contains additional citation context not shown here]
A. Darte and Y. Robert, "Constructive Methods for Scheduling Uniform Loop Nests," IEEE Transactions on Parallel and Distributed Systems, Vol. 5, No. 8, pp. 814--823, August 1994.
....two scheduling techniques, we introduce the terms M task and S task to denote a task that can run on multiple processors and a single processor, respectively. An M task can be either a purely data parallel task, or a mixed task data parallel routine. While pure data parallel scheduling techniques [1, 5, 14, 15, 16, 19, 29] could still be applied within data parallel M tasks, pure task scheduling techniques [13, 21, 22, 23, 24, 30, 31] are no longer applicable to schedule M tasks. As a result, new approaches have to be found that fully exploit the available parallelism. Scheduling is known to be NP complete even for ....
A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Transactions on Parallel and Distributed Systems, 5(8):814--822, 1994.
....(hyperplanes) into the parallel architecture should be applied. A systematic methodology for mapping into fixed size systolic arrays was presented in [8] Since the target architecture is synchronously operating, there is no need for communication effi1 For further studying on this method, see [2] and [13] cient mapping and the main criterion for optimality is now the total number of processors. Other methods dealt with the same problem of mapping, while reducing not only the size but also the resulting dimension of the systolic array (see [4, 6, 11] Researchers are trying to ....
A. Darte and Y. Robert, "Constructive Methods for Scheduling Uniform Loop Nests", IEEE Transactions on Parallel and Distributed Systems, vol. 5, pp. 814-822, August 1994.
....program transformation environment MmAlpha. A program like that of Prog.2 is rst uniformized [12, 16] to remove the data broadcasts and non local communications. It is then scheduled, i.e. each computation of the program is assigned an ane time function consistent with the data dependencies [3]. Finally an ane change of basis is performed on the index space of each variable, so that one of the indices represents the time at which this variable is computed, and the other indices specify the processor on which the computation is performed, in some processor array whose shape is given by ....
....product (we get the very classical systolic array depicted by Fig.3a) to synthesize bit serial operators as we did in the previous section (a bit serial adder consists of one full adder and two ip ops) then combine them to get a bit level circuit as shown by Fig.3b. M[1,1] M[1,2] M[1,3] M[2,1] M[2,2] M[2,3] M[3,1] M[3,2] M[3,3] V[1] V[2] V[3] R[1] R[2] R[3] p=1 p=N Multiplier a: The word level array b: Its bit serial cell Figure 3. Bit serial systolic array for the matrix vector product We won t elaborate on this approach: it needs ....
[Article contains additional citation context not shown here]
A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. Parallel Distributed Systems, 5:814-822, 1994.
....wavefronts as shown in Figure 8(b) to provide greater parallelism. This involves rotating wavefronts such that the number of independent tiles in the largest wavefront is exactly equal to the number of processors. This rotation corresponds to the selection of a different scheduling vector 1 [4]. The scheduling vector is (1; 1) for the original wavefronts in Figure 8(a) The scheduling vector for the modified wavefronts in Figure 8(b) is given by (b(N T ) B Delta P )c; 1) where N T is the number of iterations (with skewing) B is the tile size, and P is the number of processors. ....
A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. on Parallel and Distributed Systems, 5(8):814--822, October 1994.
....The basic model represents only perfectly nested loops. This severe restriction can be relaxed by applying the basic method to every single statement separately rather than to the body as a whole. Thus, every statement has its own index space, index vector, space time mapping and target space [5, 9, 15]. An operation in the program is identified by a statement together with its index vector. Of course, the feature of statementwise space time mapping complicates the generation of target code significantly (cf. Section 3.7) 2.3 Extension to loop nests containing while loops One of the main ....
A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. on Parallel and Distributed Systems, 5(8):814--822, August 1994.
....the types of schedule and processor allocation. For example, a system of affine recurrence equations, which is more general than UREs, can sometimes be converted into a system of quasi uniform recurrence equations [41] Some mapping techniques use quasi linear schedules [44] or affine schedules [7] [8] 10] 34] Similar scheduling techniques have been applied to the compilation of nested loops on programmable parallel machines. In addition, researchers have developed partitioning techniques, or more general, techniques of mapping a n dimensional DG to a k dimensional array so that k is ....
A. Darte and Y. Roberts, "Constructive Methods for Scheduling Uniform Loop Nests," IEEE transactions on Parallel and Distributed Systems, Vol. 5, pp. 814-822, August 1994.
....case of direction vectors. Here, the respective dependence vectors are (0; 1) Gamma) and ( Gamma) In the second dimension, the 1 and the Gamma prevents to detect two levels of fully permutable loops. Therefore, the code remains unchanged. No parallelism is detected. Darte and Robert [DR94, DR95] Darte and Robert look for an affine schedule for each statement that satisfies all dependences. Exact dependence analysis is needed, and a quite large linear system (obtained by the duality theorem of linear programming) has to be solved. This technique leads to the valid schedule T (i; j) ....
Alain Darte and Yves Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. Parallel Distributed Systems, 5(8):814--822, 1994.
No context found.
A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Transactions on Parallel and Distributed Systems, 5(8):814--822, 1994.
No context found.
Alain Darte and Yves Robert. Constructive method for scheduling uniform loop nests. IEEE Trans. Parallel Distributed Syst., 5(8), Aug. 1994.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC