12 citations found. Retrieving documents...
Zbigniew Chamski. Environnement logiciel de programmation d'un acc#l#rateur de calcul parall#le. PhD thesis, Universit# de Rennes, Rennes, France, 1993.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Loop Parallelization Algorithms: From Parallelism.. - Boulet, Darte, Silber (1997)   (4 citations)  (Correct)

....one. On the example of Figure 6, the result is the same. This algorithm produces no empty iterations but may introduce AEoor and ceiling operations. The complexity of the simplex algorithm is exponential in the worst case but polynomial on the average and so also works well in practice. Chamski [9] addresses the problem of control overhead by replacing extrema operations by conditionals at the expense of code duplication. 11 4.2 Non unimodular linear transformations When dealing with non unimodular transformations, the classical approach [29] is to decompose the transformation matrix ....

Zbigniew Chamski. Environnement logiciel de programmation d'un acc#l#rateur de calcul parall#le. PhD thesis, Universit# de Rennes, Rennes, France, 1993.


Automatic Parallelization of Higher-Order Languages in the.. - Collard (1997)   (Correct)

....time given by the schedule: when the last use of a datum occurs after some constant delay d, then the corresponding storage can be freed d time steps after the datum s definition. When the data structure is an array, rows may then be reused in a cyclic way, i.e. the array can be folded modulo d [14, 15]. Consider again function g: a valid schedule is (i; j) i. From the initial time step t 0 = 0 until the beginning of computations in R g at t mid = P , entire fronts of values of g may be deallocated at the end of the following time step (we say that the period is equal to 1) Notice that, ....

Z. Chamski. Environnement logiciel de programmation d'un acc'el'erateur de calcul parall`ele. PhD thesis, Univ. Rennes I, Rennes, February 1993.


Loop Shifting for Loop Parallelization - Darte, Huard (2000)   (Correct)

....for example, can cause several problems when generating the new code. One problem is that the conversion between new and old loop indices can produce extra costly 1 operations such as modulos, multiplications, minima and maxima, and this complexity is not due to the code generation algorithms [16, 4, 6] but to the transformation itself: the simplest to avoid it is sometimes just to consider another transformation Another problem (especially in a distributed memory environment) is that the data that could be properly aligned for the initial execution order can now be badly located and can ....

Zbigniew Chamski. Environnement logiciel de programmation d'un acclrateur de calcul parall le. PhD thesis, Universit de Rennes, Rennes, France, 1993. numro 957.


Construction of DO Loops from Systems of Affine Constraints - Collard, Feautrier, Risset (1993)   (20 citations)  (Correct)

....They thus can be used as context for the computation of the next inner loop. This is optional, since PIP will find equivalent results whenever the parameters belong to the context. Choice between the two methods must be based on considerations of compile time performance. Results of Chamski in [Cha93] tend to show that omitting the context gives a faster algorithm, but this has to be confirmed by more extensive experiments. Our aim now is to use this method in the back end of the PAF parallelizer (Parall eliseur Automatique pour Fortran) Fea91, RWF90, RWF91] In PAF, the hidden parallelism of ....

Z. Chamski. Environnement logiciel de programmation d'un acc 'el'erateur de calcul parall`ele. PhD thesis, Univ. Rennes I, Rennes, February 1993.


Toward Automatic Partitioning of Arrays on Distributed Memory.. - Feautrier (1993)   (10 citations)  (Correct)

....counterparts. In the Cholesky case, memory size which is O(n 2 ) in the original version, expands to O(n 3 ) in the single assignment version. In my mind, single assignment programs are not meant to be executed as written. When distribution is done, one must optimize memory usage; see [RWF91, Cha93] for a description of the technique. The best way of exhibiting the parallelism in a static control program is to construct a schedule from its Edge Source Destination Dimension Predicate 101 h2; i; k Gamma 1i h2; i; ki 2 k Gamma 2 0 102 h1; ii h2; i; ki 1 1 Gamma k 0 103 h2; i; i Gamma ....

Zbigniew Chamski. Environnement logiciel de programmation d'un acc'el'erateur de calcul parall`ele. PhD thesis, IFSIC, Rennes I, February 1993.


Loop Parallelization Algorithms: From Parallelism.. - Boulet, Darte.. (1997)   (4 citations)  (Correct)

....one. On the example of Figure 6, the result is the same. This algorithm produces no empty iterations but may introduce AEoor and ceiling operations. The complexity of the simplex algorithm is exponential in the worst case but polynomial on the average and so also works well in practice. Chamski [9] addresses the problem of control overhead by replacing extrema operations by conditionals at the expense of code duplication. 11 4.2 Non unimodular linear transformations When dealing with non unimodular transformations, the classical approach [29] is to decompose the transformation matrix ....

Zbigniew Chamski. Environnement logiciel de programmation d'un acc#l#rateur de calcul parall#le. PhD thesis, Universit# de Rennes, Rennes, France, 1993.


Automatic Parallelization of Higher-Order Languages in the.. - Collard (1997)   (Correct)

....time given by the schedule: when the last use of a datum occurs after some constant delay d, then the corresponding storage can be freed d time steps after the datum s definition. When the data structure is an array, rows may then be reused in a cyclic way, i.e. the array can be folded modulo d [14, 15]. Consider again function g: a valid schedule is (i; j) i. From the initial time step t 0 = 0 until the beginning of computations in R g at t mid = P , entire fronts of values of g may be deallocated at the end of the following time step (we say that the period is equal to 1) Notice that, ....

Z. Chamski. Environnement logiciel de programmation d'un acc'el'erateur de calcul parall`ele. PhD thesis, Univ. Rennes I, Rennes, February 1993.


Loop Parallelization Algorithms: From Parallelism.. - Boulet, Darte.. (1997)   (4 citations)  (Correct)

....one. On the example of Figure 5, the result is the same. This algorithm produces no empty iterations but may introduce AEoor and ceiling operations. The complexity of the simplex algorithm is exponential in the worst case but polynomial on the average and so also works well in practice. Chamski [10] addresses the problem of control overhead by replacing extrema operations by conditionals at the expense of code duplication. 3.2 Non unimodular linear transformations When dealing with non unimodular transformations, the classical approach [30] is to decompose the transformation matrix into its ....

Zbigniew Chamski. Environnement logiciel de programmation d'un acc#l#rateur de calcul parall#le. PhD thesis, Universit# de Rennes, Rennes, France, 1993.


Plugging Anti and Output Dependence Removal.. - Calland, Darte.. (1997)   (2 citations)  (Correct)

.... have been made to remedy these two problems: more sophisticated rewriting techniques have been proposed to move if tests into the outermost loops if possible and to minimize the memory usage (through memory folding, i.e. memory reuse) once parallelism has been detected (see the work of Chamski [7]) In other words, Chamski s technique consists in three main steps: 1. transform the code into SAF, through full memory expansion, 2. parallelize the code, 3. reduce memory size by analyzing the life duration of each cell in the parallelized code. In this paper, we explore an opposite approach: ....

Zbigniew Chamski. Environnement logiciel de programmation d'un acc l rateur de calcul parall le. PhD thesis, Universit de Rennes, Rennes, France, 1993. num ro 957.


Compiling For Massively Parallel Architectures: A Perspective - Feautrier (1994)   (11 citations)  (Correct)

....to obtain a correct program. Single assignment conversion is usually too 1 A vector is primitive iff its coordinates are mutually prime integers. much expansion. The problem of finding the minimum expansion which still gives a correct parallel program is a very important one, see [MAL93, Cha93] A partial solution may be obtained by reindexing. If the first row of T is given by a schedule, the shape of (23) is: N [t; f( N [t Gamma d; where t is logical time and d is a positive delay by (13) If the delays for the various statements have an upper bound ....

Zbigniew Chamski. Environnement logiciel de programmation d'un acc'el'erateur de calcul parall`ele. PhD thesis, IFSIC, Rennes I, February 1993.


Storage Management in Parallel Programs - Lefebvre, Feautrier (1996)   (6 citations)  (Correct)

....prove that the problem of determining a minimal process of renaming is NP complete. Values Lifetime Analysis is a technique which comes from the systolic community. It takes into account single assignment form programs and try to generate output and antidependences without changing the dataflow([2], 10] 4 Minimal Memory Expansion With Respect to a Schedule Our method tries to maintain as many false dependences as possible from the original program to the parallel one. One takes into account the original data structures, the results given by data dependences and data flow analysis, the ....

Zbigniew Chamski. Environnement logiciel de programmation d'un accelerateur de calcul parallele. These de l'universite de Rennes I - chapitre IV - 1993, numero d'ordre 957.


Automatic Generation of Data Parallel Code - Collard, Feautrier (1994)   (3 citations)  (Correct)

....If all references to b are of the above form, and if D is the maximum of all d s, then only D 1 rows of b need be allocated along the temporal dimension, provided the references are wrapped modulo D 1. In some cases, a more elaborate analysis may reduce the factor of D 1 to D, see [Cha93] for details. In this way, the main drawback of single assignment form, memory expansion, is greatly reduced. Similarly, if the new access vector is of the form (t 0 ; p a x) where x is a constant vector, then communication between S a s and S b s processors is regular. Communication ....

Z. Chamski. Environnement logiciel de programmation d'un acc'el'erateur de calcul parall`ele. PhD thesis, Univ. Rennes I, Rennes, February 1993.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC