| S. Gorlatch. Systematic extraction and implementation of divide-and-conquer parallelism. In H. Kuchen and S.D. Swierstra, editors, Eighth International Symposium on Programming Languages, Implementations, Logics, and Programs, PLILP'96, volume 1140 of LNCS, pages 274--288, September 1996. |
....we are lacking of powerful parallelization theorem and laws for calculating ecient parallel programs, which more or less prevents it from being widely used. To remedy this situation, quite a lot of recent studies have been devoted to the development of powerful parallelization methods with BMF [Ski93a, Col95, Gor96b, Gor96a, GDH96, HIT97, HTC98]. As explained in Section 3, the main idea is based on derivation of list homomorphism from a naive speci cation. This is based on the fact that a list homomorphism can be eciently implemented by a composition of two parallel primitives, namely reduce and map. Our newly introduced skeleton can ....
S. Gorlatch. Systematic extraction and implementation of divide-and-conquer parallelism. In Proc. ConferenceonProgramming Languages: Implementation, Logics and Programs, LNCS 1140, pages 274-288. Springer-Verlag, 1996.
....we are lacking of powerful parallelization theorem and laws for calculating ecient parallel programs, which more or less prevents it from being widely used. To remedy this situation, quite a lot of recent studies have been devoted to the development of powerful parallelization methods with BMF [Ski93a, Col95, Gor96b, Gor96a, GDH96, HIT97, HTC98]. As explained in Section 3, the main idea is based on derivation of list homomorphism from a naive speci cation. This is based on the fact that a list homomorphism can be eciently implemented by a composition of two parallel primitives, namely reduce and map. Our newly introduced skeleton ....
S. Gorlatch. Systematic extraction and implementation of divide-and-conquer parallelism. In Proc. Conference on Programming Languages: Implementation, Logics and Programs, LNCS 1140, pages 274-288. Springer-Verlag, 1996.
....with distributed I O data, aside from ours. There has been related work in our own group. First, there is work on the parallelization of the homomorphism [4] a basic DC skeleton somewhat more restrictive than ours. There exists a theory for the transformational parallelization of homomorphisms [24, 10]. The class of distributable homomorphisms (DH ) 9] corresponds to the combine phase of our skeleton dc4 with a binary divide function (this class is called C algorithms in [11] For all functions of the DH class, a common hypercube implementation can be derived by transformation in the ....
S. Gorlatch. Systematic extraction and implementation of divide-and-conquer parallelism. In H. Kuchen and D. Swierstra, editors, Programming Languages: Implementation, Logics and Programs, Lecture Notes in Computer Science 1140, pages 274--288. Springer-Verlag, 1996.
....in a uniform recursion, and to bridge the gap between natural definitions using recursions and definitions using parallel primitives. It includes as its special case the well known homomorphism lemma [Bir87] which has served as the basis for deriving parallel programs on lists [Col95, GDH96, Gor96b, HIT97, HTC98] The key idea to establish our theorem is an essential use of scans to memoize intermediate results in parallel computation. ffl Our polytypic framework can provide both explicit and implicit way to describe parallelism, supporting both mechanical implementation and flexible ....
....catamorphisms. 2. 2 Parallel Programming in BMF Besides the work [Ski90, Ski94] on looking for architecture independent parallel implementation of some specific catamorphisms, studies on parallel programming in BMF are actually quite recently, focusing mainly on list functions as in [Col95, GDH96, Gor96b, HIT97, HTC98] The main idea is to derive the so called List homomorphisms [Bir87] which are nothing more than catamorphisms on join lists as defined above. The relevance of homomorphisms to parallel programming is basically from the homomorphism lemma [Bir87] cata JList 8 k (8) 8= ffi ....
[Article contains additional citation context not shown here]
S. Gorlatch. Systematic extraction and implementation of divide-and-conquer parallelism. In Proc. ConferenceonProgramming Languages: Implementation, Logics and Programs, LNCS 1140, pages 274--288. Springer-Verlag, 1996.
....requires heuristics and human insights in the derivation process, which seems a bit difficult to be made automatic. The other approach, which is very popular, is to use Bird Meertens formalism [Bir87, MFP91, Fok92] to synthesize parallel functional programs by program calculation 1 , e.g. [Ski90, GDH94, Gor96a, Gor96b]. Different from the first approach whose emphasis is on the derivation process, its emphasis is on the restriction of sequential programs being described in some specific recursive forms (like left reductions or right reductions) Imposing restrictions on the forms of the sequential programs ....
....main contributions are as follows. ffl We develop several elementary but general parallelization laws (Section 4) By elementary,we mean that they contribute to the core transformations in our parallelization algorithm; and by general, we mean that they are more powerful than the previous ones [Ski92, Gor96a, Gor96b] and can be applied to synthesize several interesting parallel programs (as demonstrated in Section 4) Moreover, these laws can be directly implemented in a way of simple symbolic manipulation. ffl We propose a systematic and constructive parallelization algorithm (Section 5) for derivation of ....
[Article contains additional citation context not shown here]
S. Gorlatch. Systematic extraction and implementation of divide-and-conquer parallelism. Microprocessing and Microprogramming, 41:571--578, 1996. (Also appears in PLILP'96).
....and their parallel implementations Often, one obtains a skeleton by a generalization. e.g. reduction and scan with an associative operator are included in the class of homomorphic functions which match the divide and conquer paradigm and, therefore, have a natural parallel implementation [Col95, Gor96c, Ski94] 3. Can we optimize combinations, esp. functional composition, of skeleton instances In other words, is the cost of the parallel implementation of the functional composition of some skeleton instances simply the addition of their individual costs [SC95] or are there patterns of ....
S. Gorlatch. Systematic extraction and implementation of divide-and-conquer parallelism. In H. Kuchen and S. Doaitse Swierstra, editors, 8th Int. Symposium on Programming Languages: Implementations, Logics, and Programs (PLILP'96), volume 1140 of Lecture Notes in Computer Science, pages 274288. Springer-Verlag, 1996.
....but it usually requires heuristics and human insights in the derivation process, which seems a bit difficult to be made automatic. The other approach, which is very popular, is to use Bird Meertens formalism [Bir87, MFP91] to synthesize parallel functional programs by program calculation, e.g. [GDH94, Gor96a, Gor96b]. Different from the first approach whose emphasis is on the derivation process, its emphasis is on skeletons of sequential programs whose parallel versions can be obtained by a simple application of calculational rules (laws) This calculational approach has the advantage of simple derivation ....
....main contributions are as follows. ffl We develop several elementary but general parallelization laws (Section 3) By elementary, we mean that they contribute to the core transformations in our parallelization algorithm; and by general, we mean that they are more powerful than the previous ones [Ski92, GDH94, Gor96a, Gor96b] and can be applied to synthesize several interesting parallel programs (as demonstrated in Section 3) Moreover, these laws can be directly implemented in a way of simple symbolic manipulation. ffl We propose a systematic and constructive parallelization algorithm (Section 4) for derivation ....
[Article contains additional citation context not shown here]
S. Gorlatch. Systematic extraction and implementation of divide-and-conquer parallelism. Microprocessing and Microprogramming, 41:571--578, 1996. (Also appears in PLILP'96).
....circuit design [JS90] and optimization of functional programs [TM95, HIT96, HITT97] In each specialization, new laws and theorems need to be developed in order to handle specific problems. However, in the field of parallelization (i.e. development of efficient parallel program) GDH96, Gor96a, Gor96b] there is a lack of powerful parallelization laws and theorems, which greatly limits its scope. Our calculational framework should remedy this situation. In this paper, we shall report our first attempt to construct a calculational framework specifically for parallelization. Our main ....
.... an elementary, but general calculational theorem for parallelization (Section 4) By elementary, we mean that it contributes to the core transformations in our parallelization algorithm; and by general, we mean that it is more powerful than all the previous laws and theorems [Ski92, GDH96, Gor96a, Gor96b] and thus can be applied to synthesize many interesting parallel programs (as demonstrated in Section 4) Moreover, this theorem can be directly implemented by way of simple symbolic manipulation. ffl We propose a systematic and constructive parallelization algorithm (Section 5) for the ....
[Article contains additional citation context not shown here]
S. Gorlatch. Systematic extraction and implementation of divide-and-conquer parallelism. In Proc. Conference on Programming Languages: Implementation, Logics and Programs, LNCS 1140, pages 274--288. Springer-Verlag, 1996.
....input and output data, aside from ours. There has also been related work in our own group. First, there is work on the parallelization of the homomorphism [3] a basic DC skeleton somewhat more restrictive than ours. There exists a theory for the transformational parallelization of homomorphisms [17,7]. The class of distributable homomorphisms (DH ) 6] corresponds to the combine phase of our skeleton dc4 with a binary divide function (this class is called C algorithms in [8] For all functions of the DH class, a common hypercube implementation can be derived by transformation in the ....
S. Gorlatch. Systematic extraction and implementation of divide-and-conquer parallelism. In H. Kuchen and D. Swierstra, editors, Programming Languages: Implementation, Logics and Programs, Lecture Notes in Computer Science 1140, pages 274--288. Springer-Verlag, 1996.
....if we can derive list homomorphisms, then we can get corresponding parallel programs. Though being so simple, the homomorphism lemma plays an important role to bridge the gap between programs in recursive form and programs in compositional form, and it has led to surprisingly many good results [Gor96a, Gor96b, HIT97, HTC98]. The major reason is that list homomorphisms provide us a new interface to develop parallel programs. The importance of using a recursion instead of map and reduce in parallel programming has greatly motivated us to study this simple diffusion in a more general and practical manner. 2.3 ....
....we are lacking of powerful parallelization theorem and laws for calculating efficient parallel programs, which more or less prevents it from being widely used. To remedy this situation, Quite a lot of recent studies have been devoted to the development of powerful parallelization methods with BMF [Ski93a, Col95, Gor96b, Gor96a, GDH96, HIT97, HTC98]. As explained in Section 2, the main idea is based on derivation of list homomorphism from a naive specification. This is based on the fact that a list homomorphism can be efficiently implemented by a composition of two parallel primitives, namely reduce and map. Our uniform recursions for ....
S. Gorlatch. Systematic extraction and implementation of divide-and-conquer parallelism. In Proc. Conference on Programming Languages: Implementation, Logics and Programs, LNCS 1140, pages 274-- 288. Springer-Verlag, 1996.
....can be expressed directly as homomorphisms. Much larger is the class of almosthomomorphisms, which can be turned into homomorphisms when tupled with auxiliary functions. Basically, this increses parallelism at the price of extra computations; a method for finding auxiliary functions is proposed in [6, 8]. More complex problems may require a composition or nest of several (almost )homomorphisms. ffl Implementation: How can the homomorphism skeleton be implemented efficiently on parallel computers For illustration, we use the architectural skeleton swap which is, on the one hand, formally ....
S. Gorlatch. Systematic extraction and implementation of divide-and-conquer parallelism. In H. Kuchen and D. Swierstra, editors, Programming Languages: Implementation, Logics and Programs (PLILP'96), Lecture Notes in Computer Science 1140, pages 274-- 288. Springer-Verlag, 1996.
....) x ) Omega ) pref ( Omega ) y) j 3 The obtained intermediate expression does not fit format (4) because functions different from prefred are applied to x and y . This indicates that prefred is not directly a homomorphism. To cure that, we massage prefred into an almost homomorphism [8], by tupling it with red as an auxiliary function: prefred 0 ( Phi; Omega ) def = prefred ( Phi; Omega ) red ( Omega ) 14) Note that prefred 0 yields a pair, where prefred is the first component: prefred ( Phi; Omega ) 1 ffi prefred 0 ( Phi; Omega ) 15) We can proceed with the ....
.... 1 s 2 ) s 1 (u 1 s 2 ) j ; i t 2 (t 1 u 2 ) u 1 u 2 j j Interestingly enough, we have arrived exactly at the quadruple solution with parallel time complexity O (log n) which was proposed in [6] and later obtained in a systematic way by generalizing two sequential programs in [8]. The components of a quadruple have a problem specific meaning maximum initial segment sum, total sum, etc. but this meaning has not been called for in our derivation. 7 6 Discussion Our optimization rules and derivations demonstrate the elegance and power of the calculational BMF ....
S. Gorlatch. Systematic extraction and implementation of divide-and-conquer parallelism. In H. Kuchen and D. Swierstra, editors, Programming Languages: Implementation, Logics and Programs. PLILP'96, Lecture Notes in Computer Science 1140, pages 274--288. Springer-Verlag, 1996.
....or adjust customize it to a homomorphic form. 2) Implementation: for different classes of homomorphisms, find an efficient way of implementing them on parallel machines. For both tasks, we aim at a systematic approach which would lead to a practically relevant parallel programming methodology [11, 12, 13]. A systematic extraction method, proposed in [12] proceeds by generalizing two sequential representations of the function: on the cons and snoc lists. Partially supported by grant Ku 996 3 1 of the DFG within the Schwerpunkt Deduktion at the University of Tubingen. Partially supported ....
....for different classes of homomorphisms, find an efficient way of implementing them on parallel machines. For both tasks, we aim at a systematic approach which would lead to a practically relevant parallel programming methodology [11, 12, 13] A systematic extraction method, proposed in [12], proceeds by generalizing two sequential representations of the function: on the cons and snoc lists. Partially supported by grant Ku 996 3 1 of the DFG within the Schwerpunkt Deduktion at the University of Tubingen. Partially supported by the DFG project RecuR2 and by the DAAD exchange ....
[Article contains additional citation context not shown here]
S. Gorlatch. Systematic extraction and implementation of divide-and-conquer parallelism. In H. Kuchen and D. Swierstra, editors, Programming languages: Implementation, Logics and Programs, Lecture Notes in Computer Science 1140, pages 274--288. Springer-Verlag, 1996.
....form. ffl Implementation: for different classes of homomorphisms, find an efficient way of implementing them on parallel architectures. For both tasks, we aim at a systematic approach which would lead to a practically relevant parallel programming methodology. Recent results are presented in [3, 6, 7]. This paper makes a further step in solving the extraction problem. A systematic extraction method, proposed in [7] proceeds by generalizing two sequential representations of the function: on the cons and snoc lists. This so called CS method (CS for Cons and Snoc ) has been proved to be ....
....architectures. For both tasks, we aim at a systematic approach which would lead to a practically relevant parallel programming methodology. Recent results are presented in [3, 6, 7] This paper makes a further step in solving the extraction problem. A systematic extraction method, proposed in [7], proceeds by generalizing two sequential representations of the function: on the cons and snoc lists. This so called CS method (CS for Cons and Snoc ) has been proved to be powerful enough for a broad class of almost homomorphisms which include the classical problems like maximum segment sum and ....
S. Gorlatch. Systematic extraction and implementation of divide-and-conquer parallelism. In H. Kuchen and D. Swierstra, editors, Programming languages: Implementation, Logics and Programs, Lecture Notes in Computer Science 1140, pages 274--288. Springer-Verlag, 1996.
....space time mapping method based on the method for nested loops [25] The target is again a parallel loop nest, which can also be represented as an SPMD program. Subsect. 4.2. The second skeleton is a bit less general. It is parallelized based on the algebraic properties of its constituents [20]. It is used to generate coarser grained parallelism in the form of an SPMD program. In this paper, we are mainly comparing and evaluating. The references cited in the individual sections contain the full details of the respective technical development. Our comparison is concerned with the models ....
....is replaced by a set of dependences which do [29] In the previous subsection, we format the input and the output of polynomial product with adaptation functions to make it match with our Haskell skeleton. The homomorphic form of a problem may exist but be not immediately clear. An example is scan [20]. Other algorithms can be turned into a D C and, further, into a homomorphic form with the aid of auxiliary functions [10, 38] 4. The application of the promotion property gives us a parametrized granularity of parallelism which is controlled by the size of the chunks in which distribution ....
S. Gorlatch. Systematic extraction and implementation of divide-and-conquer parallelism. In H. Kuchen and D. Swierstra, editors, Programming Languages: Implementation, Logics and Programs, Lecture Notes in Computer Science 1140, pages 274--288. Springer-Verlag, 1996.
....divide phase can be separated and represented by function in that transforms an input lists into a list of lists. The divide and conquer speci cations may also have a nested control structure. For example, the problem of parsing many bracket languages is a so called nested almost homomorphism [8], i.e. a C algorithm, whose con is again a C function; we denote this class by C (C ) Another well known example is bitonic sort, which belongs to the class C (D) see [1, 12] for details. 4. Case Study: Revisited Example 3 (FFT as divide and conquer) The mathematical speci cation (1) of FFT can ....
S. Gorlatch. Systematic extraction and implementation of divide-and-conquer parallelism. In H. Kuchen and D. Swierstra, editors, Programming languages: Implementation, Logics and Programs. PLILP'96, Lecture Notes in Computer Science 1140, pages 274 288. Springer-Verlag, 1996.
No context found.
S. Gorlatch. Systematic extraction and implementation of divide-and-conquer parallelism. In H. Kuchen and S.D. Swierstra, editors, Eighth International Symposium on Programming Languages, Implementations, Logics, and Programs, PLILP'96, volume 1140 of LNCS, pages 274--288, September 1996.
No context found.
S. Gorlatch. Systematic extraction and implementation of divide-and-conquer parallelism. In Proceedings of Eighth International Symposium on Programming Languages, Implementations, Logics and Programs, pages 274--288, 1996.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC