| FEAUTRIER, P. 1995. Compiling for massively parallel architectures: A perspective. Microprocess. Microprogram. 41, 5-6 (Oct.), 425--439. |
....given of how to write an NLP for MatParser. Finally, the options MatParser supports are described. 2 Array Dataflow analysis The MatParser tool belongs to the class of Array Dataflow analysis tools. An Array Dataflow analysis is the effort to find the set of all flow dependencies in a program [7]. It finds if two variables are depending on each other, but moreover, at which iteration. In order to find these dependencies, MatParser uses linear algebra techniques. This, however, immediately limits the kind of programs that can be analyzed to the class of affine Nested Loop Programs. ....
P. Feautrier. Compiling for massively parallel architectures: A perspective. In Algorithms and Parallel VLSI Architectures III, pages 259--270. Kluwer Academic Publishers, 1995.
....to recognize that parallel tasks access disjoint regions of the same data structure. Researchers have developed many sophisticated techniques for extracting or verifying this kind of information. There are two broad categories: analyses that characterize the accessed regions of dense matrices [8, 53, 7, 50, 77, 9, 5, 38, 47, 84], and analyses that extract or verify reachability properties of linked data structures [60, 19, 51, 43, 85] Although many of these analyses were originally developed for the automatic parallelization of sequential programs, the basic approaches should generalize to handle the appropriate kinds ....
....An additional complication is the fact that the parallel tasks often access disjoint parts of the same data structure. Over the years researchers have developed many sophisticated techniques for extracting or verifying this kind of information, both for programs that access dense matrices [8, 53, 7, 50, 77, 9, 5, 38, 47, 84] and for programs that manipulate linked data structures [60, 19, 51, 43, 85] Parallel computing programs may also use reductions and commuting operations, in which case it may be important to generalize algorithms from the field of automatic parallelization to verify that the program executes ....
P. Feautrier. Compiling for massively parallel architectures: A perspective. Microprogramming and Microprocessors, 1995.
....focus on allocation and or alignment without paying attention to the schedule determination [5, 7, 26, 6] we deal with both questions simultaneously. Hence the allocation and the schedule functions are strongly related for the global space time transformation to be valid as stated by Feautrier [11]. The communication optimization technique developed in this paper relies on the dependence information given by parameterized utilization sets and utilization vectors [22, 20] This dependence modeling permits to classify the dependences according to the potential number of distant communications ....
P. Feautrier. Compiling for massively parallel architectures : a perspective. Microprogramming and Microprocessors, 41:425--439, 1995.
....The earliest successful attempts to design algorithms based on ane mappings are due to Feautrier [23, 24] However, despite introducing the idea of multidimensional schedules, his work focussed on utilising the idea to exploit parallelism. He considers automated solutions for data locality [27, 29, 28], parallel code generation [26] and even in his introduction of schedules, algorithms to generate schedules already aim at exploiting parallelism, and do not stay general enough to easily introduce other heuristics. The latest work is due to Lim and Lam [52, 53, 54] also in collaboration with ....
Paul Feautrier. Compiling for massively parallel architectures: a perspective. Microprogramming and Microprocessors, 41:425-439, 1995.
....parallel code is obtained from the original SARE by applying to the domain and the dependences the transformation matrices TX = i X oe X j . Such a transformation must be a full row rank n Theta n matrix. In the case of a multi dimensional schedule it must also be n Theta n full rank (cf. [8]) The parallelization process requires the determination of both the schedule and the allocation. The order of this determination depends on the main objective. If it is for example to find the fastest parallel solution, then we must determine first the schedule and then the allocation. If on the ....
....and the approach presented in this paper it that in these approaches the authors deal exclusively with the allocation. The approaches and heuristics proposed to solve the placement problem do not integrate the notion of schedule and allocation uncomputability. However as P. Feautrier states it in [8] there are two reasons not to compute a placement or allocation function without any reference to a schedule . The first one is the relationship between the dimensionality of the schedule and the one of the allocation : if the domain has dimension n and if s is the dimension of the schedule then ....
P. Feautrier. Compiling for massively parallel architectures : a perspective. Microprogramming and Microprocessors, 41:425--439, 1995.
.... nests with a certain regularity (e.g. having polytope iteration space, and uniform, linear, or affine dependencies between the operations) These results have also been used for parallelizing compilation of regular loop nests, generating code for fine grained parallel computers (see e.g. Fea96] Fea95] Len93] or Sect. 3 for further references) The integer programming techniques used to find optimal systolic parallelizations have been extended to deal with program parameters describing problem sizes. Some extensions allow the parallelization of statements belonging to loops at different ....
....area of hardware synthesis inspired its use for automatic parallelization of regular loop nests. The main characteristic of this direction is the search for optimal parallel solutions for the transformation of loop nests under some additional restrictions. The recent publications of Feautrier ( Fea95] Fea96] and Lengauer ( Len93] give an overview and introduction to this work. In [Kin95] systolic loop transformation is presented in a functional framework as useful basic transformation for the development of massively parallel programs with the help of program skeletons. There are only ....
Paul Feautrier. Compiling for massively parallel architectures: a perspective. Microprogramming and Microprocessors, 41:425--439, 1995.
.... dependence difference cyclic cyclic(2) cyclic cyclic(2) KIJ [k; 1; i; 0] k; 1; i; 1; j] 0,0,0,1] 0,0,0,1] 0,0,0,1] KIJd [k; 1; i; 0] k; 2; i; 1; j] 0,1] 0,1] 0,1] good good KJId [k; 1; i; 0] k; 2; j; 1; i] 0,1] 0,1] 0,1] good good IKJ [i; 0; k; 0] i; 0; k; 1; j] 0,0,0,1] [ 3,0,0,1] [ 3,0,0,1] bad bad IJK [i; 0; k; 1] i; 0; j; 0; k] 0,0, 1] 3,0,0 , 1] 3,0,0 , 1, 1] bad bad JKI [k; 2; i] j; 0; k; i] 2] 0 , 2] 0 , 1, 2] bad bad JIK [k; 2; i] j; 0; i; k] 2] 0 , 2] 0 , 1,2] bad bad Table 5: Predictions made by our algorithm for Cholesky decomposition ....
.... difference cyclic cyclic(2) cyclic cyclic(2) KIJ [k; 1; i; 0] k; 1; i; 1; j] 0,0,0,1] 0,0,0,1] 0,0,0,1] KIJd [k; 1; i; 0] k; 2; i; 1; j] 0,1] 0,1] 0,1] good good KJId [k; 1; i; 0] k; 2; j; 1; i] 0,1] 0,1] 0,1] good good IKJ [i; 0; k; 0] i; 0; k; 1; j] 0,0,0,1] 3,0,0,1] [ 3,0,0,1] bad bad IJK [i; 0; k; 1] i; 0; j; 0; k] 0,0, 1] 3,0,0 , 1] 3,0,0 , 1, 1] bad bad JKI [k; 2; i] j; 0; k; i] 2] 0 , 2] 0 , 1, 2] bad bad JIK [k; 2; i] j; 0; i; k] 2] 0 , 2] 0 , 1,2] bad bad Table 5: Predictions made by our algorithm for Cholesky decomposition checks are ....
[Article contains additional citation context not shown here]
Paul Feautrier. Compiling for massively parallel architectures: A perspective. In 7th Workshop on Algorithms and Parallel VLSI Architectures, Leuven, August 1994. Elsevier. to appear.
....programs are good, giving in some cases better results than the T3D library. There is obviously a lot of work to do for transforming these pilot implementations into useful compilers. The above technique can be explained in the context of recent research on automatic parallelization, Fea92c, Fea95] in which a parallel program is represented as a partial order on its operations. Scheduling techniques [Fea92a, Fea92b] look for sets of unordered operations (anti chains of the parallel order) and are well adapted to synchronous architectures. Placement, on the contrary, looks for chains ....
....edges, others become PAR of SEQ by suppressing very few edges. On the other hand, each parallel architecture has a prefered form, and in some cases SIMD machines and systolic arrays one needs both a schedule and a placement for the generation of the parallel program. The reader is referred to [Fea95] for a discussion of these cases. In the rare cases where the original program is already in the PAR of SEQ form, the algorithm should nd directly all independent chains. As example D shows, this is not really the case. A straightforward analysis gives a piecewise bidimensional placement ....
Paul Feautrier. Compiling for massively parallel architectures: a perspective. Microprogramming and microprocessing, 1995. a para^tre.
No context found.
FEAUTRIER, P. 1995. Compiling for massively parallel architectures: A perspective. Microprocess. Microprogram. 41, 5-6 (Oct.), 425--439.
No context found.
P. Feautrier. Compiling for massively parallel architectures: A perspective. Microprogramming and Microprocessors, 1995.
No context found.
P.Feautrier, "Compiling for massively parallel architectures: a perspective ", in "Algorithms and Parallel VLSI Architectures III", (eds. M.Moonen, F.Catthoor), Elsevier, pp.259-270, 1995.
No context found.
P. Feautrier. Compiling for massively parallel architectures: a perspective. Microprogramming and Microprocessors, 41:425--439, 1995.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC