55 citations found. Retrieving documents...
A. Darte and Y. Robert, "Constructive Methods for Scheduling Uniform Loop Nests," IEEE Trans. Parallel Distrib. Syst., vol. 5, no. 8, pp. 814-822, Aug. 1994.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Evaluation of Loop Grouping Methods Based on.. - Drositis, Goumas, ..   (Correct)

....in Section 6, with some overall remarks on each method s efficiency. 2. Preliminary Concepts and Definitions 2.1. Model of the Algorithms We will use the computational uniform data dependence model of perfectly nested FOR loop algorithms, widely used in many similar papers (e.g. see [1] 2] [3], 6] 7] 8] So, our algorithms are of the form: FOR i 1 =l 1 TO u 1 DO . FOR i n =l n TO u n DO AS 1 (i) AS k (i) ENDFOR . Figure 1. The algorithm model. where l i and u are integer valued constants (boundary values of the i th inner loop) instance vector is denoted as i = ....

A. Darte and Y. Robert, "Constructive Methods for Scheduling Uniform Loop Nests," IEEE Trans. Parallel Distrib. Syst., vol. 5, no. 8, pp. 814-822, Aug. 1994.


A Systolic Approach To Loop Partitioning And.. - Drositis.. (1999)   (Correct)

.... in literature to find the optimal hyperplane; some of them are based on the solution of diophantine equations [8, 10] and others on the use of integer programming [13, 3] or even linear programming in subspaces [13] When index spaces with uniform de1 For further studying on this method, see [2] and [13] pendence vectors are concerned, a polynomial complexity scheduling algorithm is presented in [7] Once optimal parallel execution is found, an efficient method of mapping the concurrent groups of computations (hyperplanes) into the parallel architecture should be applied. A systematic ....

A. Darte and Y. Robert, "Constructive Methods for Scheduling Uniform Loop Nests", IEEE Transactions on Parallel and Distributed Systems, vol. 5, pp. 814-822, August 1994.


CPR: Mixed Task and Data Parallel Scheduling for.. - Radulescu.. (2001)   (Correct)

....two scheduling techniques, we introduce the terms M task and S task to denote a task that can run on multiple processors and a single processor, respectively. An M task can be either a purely data parallel task, or a mixed task data parallel routine. While pure data parallel scheduling techniques [3, 11, 12, 15, 24] could still be applied within data parallel M tasks, pure task scheduling techniques [17, 18, 19, 25, 26] are no longer applicable to schedule M tasks. As a result, new approaches have to be found that fully exploit the available parallelism. Scheduling is known to be NP complete even for the ....

A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. Parallel and Distributed Systems, 5(8):814--822, 1994.


Optimal Fine and Medium Grain Parallelism Detection in.. - Darte, Vivien (1996)   (12 citations)  (Correct)

....case of direction vectors. Here, the respective dependence vectors are (0; 1) Gamma) and ( Gamma) In the second dimension, the 1 and the Gamma prevents to detect two levels of fully permutable loops. Therefore, the code remains unchanged. No parallelism is detected. Darte and Robert [DR94, DR95] Darte and Robert look for an affine schedule for each statement that satisfies all dependences. Exact dependence analysis is needed, and a quite large linear system (obtained by the duality theorem of linear programming) has to be solved. This technique leads to the valid schedule T (i; j) ....

Alain Darte and Yves Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. Parallel Distributed Systems, 5(8):814--822, 1994.


Geometric Pattern Prediction and Scheduling of.. - Drositis.. (2001)   (Correct)

....tranformation, which organizes indiced computations into well defined distinct groups, called hyperplanes. As the hyperplane method was proved to be nearly optimal from Darte [3] many researchers have focused on it. As a result, it was extended by several researchers such as Shang [12] and Darte [4], by allowing This work was partially supported by the General Secretariat of Research and Development. more computations to be executed in parallel. On the other hand, general scheduling approaches were never used in loop parallelization. General multiprocessor scheduling with precedence ....

A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Transactions on Parallel and Distributed Systems, 5(8):814--822, 1994.


Determination of the Processor Functionality in the Design of .. - Fimmel, Merker (1997)   (Correct)

....It is sucient to consider the vertices W i of the polytops I i since the linear functions i (i) have their extreme values only at the vertices of a polytop. The vertex method allows to consider di erent index spaces I i for the equations of the SURE in contrary to linear program proposed in [19, 2] which is based on the application of the duality theorem. The objective function is the product of the latency L = t max t min and the chip area of one processor C p = jMj P j=0 n j c j . This product is appropriate if the available chip area C c is signi cantly less than the necessary chip ....

A. Darte, Y. Robert: "Constructive Methods for Scheduling Uniform Loop Nests", IEEE Trans. on Parallel ans Distributed Systems, Vol. 5, No. 8, pp. 814-822, 1994


A Parameterized Index-Generator for the Multi-Dimensional.. - Passos, Sha   (Correct)

....in part by the NSF CAREER grant MIP 9501006, and by the William D. Mensch, Jr. Fellowship. have considered the optimization of nested loops, a software point of view of the MD systems [1, 7, 12] In the area of high level synthesis, researchers also have focused on the optimization of MD problems [2, 6]. In a previous study, it has been shown that full parallelism can be obtained by the application of MD retiming techniques [10] In general, these methods transform the loops in such a way to obtain a new sequence of execution characterized by a higher parallelism. This sequence of execution is ....

....constraint that can not be achieved by the straightforward implementation of the loop. In this case, optimization techniques are used to improve the parallelism among the operations in order to satisfy the time constraint. Some of the existing optimization methods point directly to MD problems [2, 6, 10, 12]. Most of these methods derive from the work by Lamport [5] and require a new scheduling direction when executing the loop iterations. Such a change implies in complex formulations of loop bounds and indices when writing the transformed, optimized, code. A possible method of obtaining ....

A. Darte and Y. Robert, " Constructive Methods for Scheduling Uniform Loop Nests," IEEE Trans. on Parallel and Distributed Systems, 1994, Vol. 5, no. 8, pp. 814-822.


Synthesis of Multi-Dimensional Applications in VHDL - Passos, Sha (1996)   (Correct)

....j Gamma 1) node D: z(i; j) t(i; j) w(i; j) node A: y(i; j) z(i; j) x(i; j) There exists a serial execution of two additions and one multiplication. Using optimization techniques, we can transform the loop such that all its operations can be executed simultaneously within one iteration [3, 7, 11, 15]. This transformation requires a change in the execution sequence, which is usually computed by the optimization technique. For example, the index shift method [7] and the chained multi dimensional retiming [11] would require a new execution direction (1; 1) Under this new schedule, the loop body ....

A. Darte and Y. Robert, "Constructive Methods for Scheduling Uniform Loop Nests," IEEE Trans. on Parallel and Distributed Systems, 1994, Vol. 5, no. 8, pp. 814-822.


A Low-Cost Approach towards Mixed Task and Data Parallel.. - Radulescu, van Gemund (2001)   (2 citations)  (Correct)

....two scheduling techniques, we shall use the terms M task and S task to denote a task that can run on multiple processors and a single processor, respectively. An M task can be either a purely data parallel task, or a mixed task data parallel routine. While pure data parallel scheduling techniques [1, 5, 13, 14, 16, 19, 29] could still be applied within data parallel M tasks, pure task scheduling techniques [12, 15, 17, 22, 23, 24, 30, 31] are no longer applicable to schedule M tasks. As a result, new approaches have to be found 2 that fully exploit the available parallelism. Scheduling is known to be NP complete ....

A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Transactions on Parallel and Distributed Systems, 5(8):814--822, 1994.


Generation of scheduling functions supporting LSGP-partitioning - Fimmel   (Correct)

.... of a linear program: maximize h T i h T j subject to Hi h 0 Hj h 0 Application of the duality theorem [10] leads to: minimize h T 0 (y 1 y 2 ) subject to y T 1 H = h y T 2 H = h (18) which is again an approximation since the duality theorem is valid only for linear programs [1]. 7 In the following we assume that both the number K of processors of the full size array forming one processor of the partitioned processor array, and the iteration interval are given. For techniques treating and K as variable in similar linear programs we refer to e.g. 4] Linearization ....

A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. on Parallel and Distributed Systems, 5(8):814--822, 1994.


Evaluation of Loop Grouping Methods Based on.. - Drositis, Goumas, .. (2000)   (Correct)

....in Section 6, with some overall remarks on each method s efficiency. 2. Preliminary Concepts and Definitions 2.1. Model of the Algorithms We will use the computational uniform data dependence model of perfectly nested FOR loop algorithms, widely used in many similar papers (e.g. see [1] 2] [3], 6] 7] 8] So, our algorithms are of the form: FOR i 1 =l 1 TO u 1 DO . FOR i n =l n TO u n DO AS 1 (i) AS k (i) ENDFOR . ENDFOR Figure 1. The algorithm model. where l i and u are integer valued constants (boundary values of the i th inner loop) instance vector is ....

A. Darte and Y. Robert, "Constructive Methods for Scheduling Uniform Loop Nests," IEEE Trans. Parallel Distrib. Syst., vol. 5, no. 8, pp. 814-822, Aug. 1994.


CPR: Mixed Task and Data Parallel Scheduling for.. - Radulescu.. (2001)   (Correct)

....two scheduling techniques, we introduce the terms M task and S task to denote a task that can run on multiple processors and a single processor, respectively. An M task can be either a purely data parallel task, or a mixed task data parallel routine. While pure data parallel scheduling techniques [3, 11, 12, 15, 24] could still be applied within data parallel M tasks, pure task scheduling techniques [17, 18, 19, 25, 26] are no longer applicable to schedule M tasks. As a result, new approaches have to be found that fully exploit the available parallelism. Scheduling is known to be NP complete even for the ....

A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. Parallel and Distributed Systems, 5(8):814--822, 1994.


Design of Processor Arrays for Real-time Applications - Fimmel, Merker (1998)   (2 citations)  (Correct)

.... array by consideration of the processor functionality [3] An approach to compute a variety of linear allocation and scheduling functions is proposed in [11] Some notes to the determination of unconstrained minimal scheduling functions for algorithms with uniform dependencies can be found in [1]. Resource constraint scheduling for a given processor functionality is presented in [13] An approach to minimize the throughput by consideration of the chip area is proposed in [9] In [2] the approach [13] is extended to determine additionally the processor functionality in order to minimize a ....

A. Darte, Y. Robert: "Constructive Methods for Scheduling Uniform Loop Nests", IEEE Trans. on Parallel and Distributed Systems, Vol. 5, No. 8, pp. 814-822, 1994


Implementation of Regular Algorithms on Field Programmable.. - Fimmel, Merker   (Correct)

....tools are required to exploit the degrees of freedom during the design process for deriving optimal parallel implementations of algorithms on recon gurable architectures. In this paper we present an approach to map regular algorithms onto FPGAs using methods of the design of processor arrays [1, 5, 6, 7, 10, 11]. The design is restricted by the limited number of Con gurable Logic Blocks (CLBs) available in the FPGA. As objective e mail: mmel,merker iee1.et.tu dresden.de The research was supported by the Deutsche Forschungsgemeinschaft , in the project A1 SFB358. we consider the minimal latency of ....

A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. on Parallel and Distributed Systems, 5(8):814-822, 1994.


Localization of Data Transfer in Processor Arrays - Fimmel, Merker   (Correct)

....covers the design of a cost minimal interconnection network and the organization of the data transfers using this interconnections. A solution of the problem of organizing the data transfers for a given interconnection network is also presented. The design of processor arrays is well studied (e.g. [2,7,8,10,13]) and became more realistic by inclusion of resource constraints [3,5,12] But up to now, only some work has been done in the organization of data transfer. Fortes and Moldovan [6] as well as Lee and Kedem [9] discuss the need of a decomposition of global interconnections into a set of local ....

A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. on Parallel and Distributed Systems, 5(8):814-822, 1994.


Optimal Data Scheduling for Uniform Multi-Dimensional.. - Wang, Sha, Passos   (Correct)

....s4 d1 d2 d3 d4 d5 S1 to S4: d1= 0,1) S1 to S3: d2= 0,2) S2 to S1: d3= 1,5) S4 to S1: d4= 1, 4) S3 to S2: d5= 1, 6) a) b) c) Figure 4: Dataflow graph and dependence vectors. 6 Experiments In this section we present the application of our method to a well known example taken from [6]. The loop body is: statement S1 a(i, j) b(i, j 6) d(i 1,j 3) statement S2 b(i 1,j 1) c(i 2, j 5) statement S3 c(i 3,j 1) a(i, j 2) statement S4 d(i, j 1) a(i, j 1) In the example, the variable a(i; j) is produced by statement S 1 (i; j) and consumed by statement S 4 (i; j 1) ....

....10 20 30 40 50 60 70 80 90 100 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 10 Partition width LRU FIFO Carrot hole optimal point Figure 5: Performance of carrot hole data scheduling compared to FIFO and LRU. Examples On chip memory Partition size LRU FIFO Carrot hole Nested loop 1 [6] 500 33 1053708 1055163 109535 Nested loop 1 [6] 1000 66 1002325 1004427 64760 Nested loop 2 [6] 500 71 1859952 3324436 52910 Nested loop 2 [6] 1000 142 1701443 1777341 31952 WDF [13] 500 250 864110 845355 6404 WDF [13] 1000 500 1470274 1433933 4002 IIR filter [13] 500 249 1983047 5900471 ....

[Article contains additional citation context not shown here]

A. Darte and Y. Robert, "Constructive Methods for Scheduling Uniform Loop Nests," IEEE Transactions on Parallel and Distributed Systems, Vol. 5, No. 8, pp. 814--823, August 1994.


CPR: Mixed Task and Data Parallel Scheduling for.. - Radulescu.. (2001)   (Correct)

....two scheduling techniques, we introduce the terms M task and S task to denote a task that can run on multiple processors and a single processor, respectively. An M task can be either a purely data parallel task, or a mixed task data parallel routine. While pure data parallel scheduling techniques [1, 5, 14, 15, 16, 19, 29] could still be applied within data parallel M tasks, pure task scheduling techniques [13, 21, 22, 23, 24, 30, 31] are no longer applicable to schedule M tasks. As a result, new approaches have to be found that fully exploit the available parallelism. Scheduling is known to be NP complete even for ....

A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Transactions on Parallel and Distributed Systems, 5(8):814--822, 1994.


A Systolic Approach To Loop Partitioning And.. - Drositis.. (1999)   (Correct)

....(hyperplanes) into the parallel architecture should be applied. A systematic methodology for mapping into fixed size systolic arrays was presented in [8] Since the target architecture is synchronously operating, there is no need for communication effi1 For further studying on this method, see [2] and [13] cient mapping and the main criterion for optimality is now the total number of processors. Other methods dealt with the same problem of mapping, while reducing not only the size but also the resulting dimension of the systolic array (see [4, 6, 11] Researchers are trying to ....

A. Darte and Y. Robert, "Constructive Methods for Scheduling Uniform Loop Nests", IEEE Transactions on Parallel and Distributed Systems, vol. 5, pp. 814-822, August 1994.


Libraries of Schedule-Free Operators in Alpha - de Dinechin (1997)   (Correct)

....program transformation environment MmAlpha. A program like that of Prog.2 is rst uniformized [12, 16] to remove the data broadcasts and non local communications. It is then scheduled, i.e. each computation of the program is assigned an ane time function consistent with the data dependencies [3]. Finally an ane change of basis is performed on the index space of each variable, so that one of the indices represents the time at which this variable is computed, and the other indices specify the processor on which the computation is performed, in some processor array whose shape is given by ....

....product (we get the very classical systolic array depicted by Fig.3a) to synthesize bit serial operators as we did in the previous section (a bit serial adder consists of one full adder and two ip ops) then combine them to get a bit level circuit as shown by Fig.3b. M[1,1] M[1,2] M[1,3] M[2,1] M[2,2] M[2,3] M[3,1] M[3,2] M[3,3] V[1] V[2] V[3] R[1] R[2] R[3] p=1 p=N Multiplier a: The word level array b: Its bit serial cell Figure 3. Bit serial systolic array for the matrix vector product We won t elaborate on this approach: it needs ....

[Article contains additional citation context not shown here]

A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. Parallel Distributed Systems, 5:814-822, 1994.


Scheduling of Wavefront Parallelism on Scalable.. - Manjikian, Abdelrahman (1996)   (14 citations)  (Correct)

....wavefronts as shown in Figure 8(b) to provide greater parallelism. This involves rotating wavefronts such that the number of independent tiles in the largest wavefront is exactly equal to the number of processors. This rotation corresponds to the selection of a different scheduling vector 1 [4]. The scheduling vector is (1; 1) for the original wavefronts in Figure 8(a) The scheduling vector for the modified wavefronts in Figure 8(b) is given by (b(N T ) B Delta P )c; 1) where N T is the number of iterations (with skewing) B is the tile size, and P is the number of processors. ....

A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. on Parallel and Distributed Systems, 5(8):814--822, October 1994.


The Loop Parallelizer LooPo - Griebl, Lengauer (1996)   (4 citations)  (Correct)

....The basic model represents only perfectly nested loops. This severe restriction can be relaxed by applying the basic method to every single statement separately rather than to the body as a whole. Thus, every statement has its own index space, index vector, space time mapping and target space [5, 9, 15]. An operation in the program is identified by a statement together with its index vector. Of course, the feature of statementwise space time mapping complicates the generation of target code significantly (cf. Section 3.7) 2.3 Extension to loop nests containing while loops One of the main ....

A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. on Parallel and Distributed Systems, 5(8):814--822, August 1994.


Processor Array Design with FPGA Area Constraint - Fernando, Jean   (Correct)

....the types of schedule and processor allocation. For example, a system of affine recurrence equations, which is more general than UREs, can sometimes be converted into a system of quasi uniform recurrence equations [41] Some mapping techniques use quasi linear schedules [44] or affine schedules [7] [8] 10] 34] Similar scheduling techniques have been applied to the compilation of nested loops on programmable parallel machines. In addition, researchers have developed partitioning techniques, or more general, techniques of mapping a n dimensional DG to a k dimensional array so that k is ....

A. Darte and Y. Roberts, "Constructive Methods for Scheduling Uniform Loop Nests," IEEE transactions on Parallel and Distributed Systems, Vol. 5, pp. 814-822, August 1994.


Optimal Fine and Medium Grain Parallelism Detection in.. - Darte, Vivien (1996)   (12 citations)  Self-citation (Darte)   (Correct)

....case of direction vectors. Here, the respective dependence vectors are (0; 1) Gamma) and ( Gamma) In the second dimension, the 1 and the Gamma prevents to detect two levels of fully permutable loops. Therefore, the code remains unchanged. No parallelism is detected. Darte and Robert [DR94, DR95] Darte and Robert look for an affine schedule for each statement that satisfies all dependences. Exact dependence analysis is needed, and a quite large linear system (obtained by the duality theorem of linear programming) has to be solved. This technique leads to the valid schedule T (i; j) ....

Alain Darte and Yves Robert. Constructive methods for scheduling uniform loop nests. IEEE Trans. Parallel Distributed Systems, 5(8):814--822, 1994.


Optimal Loop Parallelization in n-Dimensional Index.. - Drositis, Andronikos.. (2002)   (Correct)

No context found.

A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Transactions on Parallel and Distributed Systems, 5(8):814--822, 1994.


Automatic Generation of Modular Time-space Mappings and Data.. - Lee, Fortes   (Correct)

No context found.

Alain Darte and Yves Robert. Constructive method for scheduling uniform loop nests. IEEE Trans. Parallel Distributed Syst., 5(8), Aug. 1994.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC