| A.L. Fischer and A.M. Ghuloum. Parallelizing complex scans and reductions. In ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 135--136, Orlando, Florida, ACM Press, 1994. |
....of exclusive methods. The amount for each of the benchmarks above is 0.25, 17, 1035, and 137 microseconds, respectively. 8 Related Work Parallel Execution of Associative Operations. There is an extensive literature on the techniques that extract parallelism among associative exclusive operations [3, 5, 9, 12, 18]. In systems using the techniques, each thread executes associative exclusive operations in parallel and accumulates the contributions of # S#### ##### ##### o oo#### # S#### ##### ##### o oo#### # # S### #### #### o oo### # o oo### #### #### Fig. 4. Number of ....
....hand, combines multiple method invocations, currently making programmers responsible for keeping the semantics of their program. Static Fusion of Multiple Operations through Program Transformations. There is much literature on program transformations that fuse multiple operations statically [1, 2, 5, 10, 11, 21]. The techniques described in the literature detect static occurrences of the operations that are invoked consecutively, and transform them into a cheaper operation statically. Our technique, on the other hand, detects dynamic occurrences of the operations that can be executed consecutively, and ....
Allan L. Fisher and Anwar M. Ghuloum. Parallelizing Complex Scans and Reductions. In Proceedings of the ACM SIGPLAN '94 Conference on Programming Language Design and Implementation (PLDI '94), pages 135--146, 1994.
....of exclusive methods. The amount for each of the benchmarks above is 0.25, 17, 1035, and 137 microseconds, respectively. 8 Related Work Parallel Execution of Associative Operations. There is an extensive literature on the techniques that extract parallelism among associative exclusive operations [3, 5, 9, 12, 18]. In systems using the techniques, each thread executes associative exclusive operations in parallel and accumulates the contributions of # S#### ##### ##### o oo#### ##o oo#S# #OE## # S#### ##### ##### o oo#### ##o oo#S# #OE## # # S### #### #### o oo### ##o oo#S# #OE## # ....
....hand, combines multiple method invocations, currently making programmers responsible for keeping the semantics of their program. Static Fusion of Multiple Operations through Program Transformations. There is much literature on program transformations that fuse multiple operations statically [1, 2, 5, 10, 11, 21]. The techniques described in the literature detect static occurrences of the operations that are invoked consecutively, and transform them into a cheaper operation statically. Our technique, on the other hand, detects dynamic occurrences of the operations that can be executed consecutively, and ....
Allan L. Fisher and Anwar M. Ghuloum. Parallelizing Complex Scans and Reductions. In Proceedings of the ACM SIGPLAN '94 Conference on Programming Language Design and Implementation (PLDI '94), pages 135--146, 1994.
....can in theory obtain the following parallel code: f(xl xr) #y.g(y) f xr) 1) g [a] # y . g(y) g (xl xr) g(xl) g(xr) While this solution is valid, the problem is that parallel code obtained, g , may not be e#cient. This is due to the fact that higer order closures are being built [7]. On the other hand, if g is a non recursive function, we can choose to unfold the call of g . If the unfolded expression satisfies context preservation, we can find a parallel version of f which is more e#cient than (1) For example, when we have definition of g as g u = 1 u, we can treat the ....
....di#cult. Another approach is to provide more specialized schemes, either statically [13] or via a procedure [11] that can be directly matched to sequential specification. On the imperative language (e.g. Fortran) front, there have been interests in parallelization of reduction style loops [7, 8]. By modelling loops via functions, they noted that functiontype values could be reduced (in parallel) via associative function composition. These propagated function type values could only be e#ciently combined if they have a template closed under composition. This requirement is similar to the ....
A.L. Fischer and A.M. Ghuloum. Parallelizing complex scans and reductions. In ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 135--136, Orlando, Florida, ACM Press, 1994.
....is just what the programmer does. The compiler is solely responsible for exploiting parallelism. Traditionally pattern matching and idiom recognition have been used to parallelize reductions [4, 14] Sophisticated techniques for recognizing broader classes of reductions have also been examined [8, 19]. Commutativity analysis [15] promises to be yet another effective technique. However, it is an undecidable problem to determine whether a function is associative [10] Moreover, even if a function is not technically associative, the salient part of the calculation might be. To exploit the ....
A. L. Fisher and A. M. Ghuloum. Parallelizing complex scans and reductions. In Proceedings of the ACM Conference on Programming Language Design and Implementation, 1994.
....5 Related Work In this section we discuss related work in the area of reduction analysis and replication for concurrent read access in shared memory systems. 5. 1 Reduction Analysis Several existing compilers can recognize when a loop performs a reduction of many values into a single value [8, 7, 24, 3]. These compilers recognize when the reduction primitive (typically addition) is associative. They then exploit this algebraic property to eliminate the data dependence associated with the serial accumulation of values into the result. The generated program computes the reduction in parallel. Each ....
A. Fisher and A. Ghuloum. Parallelizing complex scans and reductions. In Proceedings of the SIGPLAN '94 Conference on Program Language Design and Implementation, pages 135--144, Orlando, FL, June 1994. ACM, New York.
....compilers will incorporate both commutativity analysis and data dependence analysis for pointer based data structures, using each when it is appropriate. 8. 3 Reductions Several existing compilers can recognize when a loop performs a reduction of many values into a single value [Callahan 1991; Fisher and Ghuloum 1994; Ghuloum and Fisher 1995; Pinter and Pinter 1991] These compilers recognize when the reduction primitive (typically addition) is associative. They then exploit this algebraic property to eliminate the data dependence associated with the serial accumulation of values into the result. The ....
FISHER, A. AND GHULOUM, A. 1994. Parallelizing complex scans and reductions. In Proceedings of the SIGPLAN '94 Conference on Program Language Design and Implementation. ACM Press, Orlando, FL, 135--144.
.... that manipulate linked data structures [60, 19, 51, 43, 85] Parallel computing programs may also use reductions and commuting operations, in which case it may be important to generalize algorithms from the field of automatic parallelization to verify that the program executes deterministically [39, 44, 48, 80]. In general, the programmer can reasonably develop programs with quite sophisticated access patterns and data structures, with the data race freedom of the program depending on the detailed properties of the data structures and the algorithms that manipulate them. It therefore seems unlikely that ....
A. Fisher and A. Ghuloum. Parallelizing complex scans and reductions. In Proceedings of the SIGPLAN '94 Conference on Program Language Design and Implementation, pages 135--144, Orlando, FL, June 1994. ACM, New York.
....of the class Counter, shown in Section 4.2. c inc(3) c inc(5) It is also not our concern here to apply fusion rules to program fragments like the code below. This will be an interesting future work since it is quite similar to the traditional program transformation reduction transformation [FG94, HAM 95, Ope98] for (i = 0; i n; i ) f c inc(a[i] g 4.4 Sample Programs This section shows several sample programs written in Amdahl. GUI event handling. The following program fuses multiple invocations of the method repaint to one invocation of the method. class Window f . ....
....OE## o oo # # # o oo ww w w w (b) Amdahl (mutex) # OE## ### OE## o oo # # # o oo ww w w w (c) Amdahl (detach) # OE## ### OE## o oo # # # o oo ww w w w (d) Amdahl (detach fusion) Figure 4.13: Breakdowns of execution times of ImageViewer. 105 parallelism [FG94, HAM 95, Ope98] In addition, Hu et al. HTC98] proposed a formal and general technique for program transformations that make iterations expressed as recursions be executed efficiently in a manner similar to the execution of reductions. All these techniques can only be applied to the regular ....
[Article contains additional citation context not shown here]
Allan L. Fisher and Anwar M. Ghuloum. Parallelizing Complex Scans and Reductions. In Proceedings of the ACM SIGPLAN '94 Conference on Programming Language Design and Implementation (PLDI '94), pages 135--146, Orlando, June 1994.
....for parallel algorithm synthesis. In traditional imperative languages (e.g. Fortran) there are also many on going efforts at developing sophisticated techniques for parallelizing iterative loops. Lately, we became aware of a more systematic method for parallelizing complex scans and reductions [FG94, GF95] This method is based on a parallel reduction of function composition where function type values are propagated relying on such composition being associative. However, the complexity of the functions propagated could get progressively worse unless they match a certain template form. The ....
A.L. Fischer and A.M. Ghuloum. Parallelizing complex scans and reductions. In ACM PLDI, pages 135--136, Orlando, Florida, ACM Press, 1994.
....functional programming should have no difficulty in understanding BMF programs. It is worth noting that although BMF is functional, this does not limit our proposing approach to the functional world. Rather one can use functional description to capture control structures in imperative programs [FG94] In BMF, Function application is denoted by a space and the argument which may be written without brackets. Thus fameans f (a) Functions are curried, and application associates to the left. Thus fabmeans (fa) b. Function application binds stronger than any other operator, so fa8b means (fa)8b, ....
....domain of function h, while Omega is an associative operator over the resulting domain of the accumulating parameter c. There are several ways to identify these associative operators in our programs: limiting application scope by requiring all associative operators to be made explicit, e.g. in [FG94] or adopting AI techniques likeanti unification [Hei94] to synthesize them, or more interestingly, deriving them from the resulting domain types. For the last, it is known [SF93] that every linear type R that has a zero constructor CZ (a constructor with only only a don t care value like[ for ....
A. Fischer and A. Ghuloum. Parallelizing complex scans and reductions. In ACM PLDI, pages 135--146, Orlando, Florida, 1994. ACM Press.
....Though cheap to operate, the generality of this approach is often called into question. On the imperative language (e.g. Fortran) front, there have been significant interests in the parallelization of reduction style loops. A work similar to ours was independently conceived by Fischer Ghoulum [FG94, GF95] By modelling loops via functions, they noted that function type values could be reduced (in parallel) via associative function composition. However, the propagated function type values could only be efficiently combined (reduced) if they have a template closed under composition. This ....
A.L. Fischer and A.M. Ghuloum. Parallelizing complex scans and reductions. In ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 135--136, Orlando, Florida, ACM Press, 1994.
....example is taken from [Col95] where only an informal and intuitive derivation was given. Although our derived program is a bit different, it is as efficientas that in [Col95] 4.3 Conditional Structure Conditional structure is important in a recursive definition. Related work can be found in [FG94, CDG96], where transformation on conditional expressions is proposed. Take a look at the following sequential program solving the least sorted prefix (lsp for short) problem [Gib96] lsp [x] x] lsp (x : xs) if x hdxsthen [x] lsp xs else [x] Our parallelization law with regard to the conditional ....
....is the use of associativityofa binary operator 8 as well as distributivity of Omega . As the first step, we must be able to recognize them in a program. There are several ways. We may restrict our application scope so that all associative and distributive operators can be made explicit, e.g. in [FG94, CTT97]. Or, wemay adopt some artificial methods likeanti unification [Hei94] to synthesize them. However, these approaches are not so satisfactory to be used practically in a parallelization system. In this paper, rather than recognizing all associative and distributive operators, we are interested in ....
[Article contains additional citation context not shown here]
A. Fischer and A. Ghuloum. Parallelizing complex scans and reductions. In ACM PLDI, pages 135--146, Orlando, Florida, 1994. ACM Press.
.... a discussion of the Scan primitive and a demonstration of its usefulness in multi processors in [Bl90] ii) the multiprefix as an operator and instruction in [Ke96] in the context of the SB PRAM multiprocessor, and (iii) an example for extracting PS from sequential parallel loops by compiler in [FG94]. In other words, the usefulness of multi prefixes in compact and elegant parallel programming is well established and the current paper relies on that. In contrast to these references, however, the current paper considers the value of (multiple) prefix sum to uniprocessors. 2. It is envisioned ....
A.L. Fisher and A.M. Ghuloum. Parallelizing complex scans and reductions. In Proc. ACM-SIGPLAN PLDI, 135--146, 1994.
....histogram reductions, which accumulate their results into an array. Both SUIF and Polaris generate code for parallel reductions on shared memory systems, and don t address the issues of data locality and communication optimization which are key for distributed memory systems. Fisher and Ghuloum [7, 8] describe a technique for recognizing reductions and scan operations that is much more powerful than those supported by other parallelizing compilers. However, their recognizer implicitly assumes that loops with recurrences are not intermixed with other code. They generate code for iWarp, a ....
....improvement over code generated without them in the cases where they apply. Finally, our optimizations to minimize communication can also be used to improve the code generated for distributed memory machines using other recurrence parallelization techniques such as those of Fisher and Ghuloum [8, 7]. In addition, although we focus on generating code for a message passing communication model in our exposition here, we are also using the same approach to optimize communication when generating reduction code for distributed shared memory systems as well. 6. Acknowledgments The experiments ....
A. Fisher and A. Ghuloum. Parallelizing complex scans and reductions. In Proceedings of the SIGPLAN '94 Conference on Programming LanguageDesign and Implementation,Or- lando, FL, June 1994.
....specification. Though cheap to operate, the generality of this approach is often called into question. On the imperative language (e.g. Fortran) front, there have been interests in parallelization of reduction style loops. A work similar to ours was independently conceived by Fischer Ghoulum [FG94,GF95] By modelling loops via functions, they noted that function type values could be reduced (in parallel) via associative function composition. However, the propagated function type values could only be efficiently combined if they have a template closed under composition. This requirement is ....
A.L. Fischer and A.M. Ghuloum. Parallelizing complex scans and reductions. In ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 135--136, Orlando, Florida, ACM Press, 1994.
....As shown here, it is possible to parallelize systematically using only a leftward sequential program. There is also tremendous interests in the parallelization of loops for imperative languages (e.g. Fortran) A work which is similar to our method was independently conceived by Fischer Ghoulum [13, 14]. By modelling loops via functions, they noted that these function type values could be reduced (in parallel) via associative function composition. However, the propagated function type values can only be efficiently propagated (by parallel reduction) if they have a template form closed under ....
A.L. Fischer and A.M. Ghuloum. Parallelizing complex scans and reductions. In ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 135--136, Orlando, Florida, ACM Press, 1994.
....histogram reductions, which accumulate their results into an array. Both SUIF and Polaris generate code for parallel reductions on shared memory systems, and don t address the issues of data locality and communication optimization which are key for distributedmemory systems. Fisher and Ghuloum [7, 8] describe a technique for recognizing reductions and scan operations that is much more powerful than those supported by other parallelizing compilers. However, their recognizer implicitly assumes that loops with recurrences are not intermixed with other code. They generate code for iWarp, a ....
....improvement over code generated without them in the cases where they apply. Finally, our optimizations to minimize communication can also be used to improve the code generated for distributed memory machines using other recurrence parallelization techniques such as those of Fisher and Ghuloum [8, 7]. In addition, although we focus on generating code for a message passing communication model in our exposition here, we are also using the same approach to optimize communication when generating reduction code for distributed shared memory systems as well. 6. Acknowledgments The experiments ....
A. Fisher and A. Ghuloum. Parallelizing complex scans and reductions. In Proceedings of the SIGPLAN '94 Conference on ProgrammingLanguageDesign and Implementation, Orlando, FL, June 1994.
....traversing the outer conditional expressions to identify the minimal conjunctions of basic terms that select each maximal conditional free subexpression as the value of the original expression. It is possible to further simplify the table using logic minimization techniques as proposed in [7]. To compare two expressions for equality the algorithm performs a simple recursive isomorphism test. The algorithm checks that the condition tables have the same indexes and that corresponding subexpressions in the table are isomorphic. 6.5 Execution Time and Limitations In the worst case the ....
....mechanism, typically the programmer, to specify when the operations actually commute. The techniques presented in this paper, on the other hand, automatically detect commuting operations. Several existing compilers can recognize when a loop performs a reduction of many values into a single value[7, 15, 5]. These compilers recognize when the reduction primitive (typically addition) is associative. They then exploit this algebraic property to eliminate the data dependence associated with the serial accumulation of values into the result. The generated program computes the reduction in parallel. ....
A. Fisher and A. Ghuloum. Parallelizing complex scans and reductions. In Proceedings of the SIGPLAN '94 Conference on Program Language Design and Implementation, Orlando, FL, June 1994.
No context found.
A.L. Fischer and A.M. Ghuloum. Parallelizing complex scans and reductions. In ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 135--136, Orlando, Florida, ACM Press, 1994.
No context found.
A. Fisher and A. Ghuloum. Parallelizing complex scans and reductions. In Proceedings of the SIGPLAN '94 Conference on Program Language Design and Implementation, pages 135--144, Orlando, FL, June 1994. ACM, New York.
No context found.
A. Fisher and A. Ghuloum. Parallelizing Complex Scans and Reductions. Proceedings of the SIGPLAN'94 ConferenceonProgramming Language Design and Implementation, June 1994.
No context found.
A. Fisher and A. Ghuloum. Parallelizing Complex Scans and Reductions. Proceedings of the SIGPLAN'94 ConferenceonProgramming Language Design and Implementation, June 1994.
No context found.
A. Fisher and A. Ghuloum. Parallelizing complex scans and reductions. In Proceedings of the SIGPLAN '94 Conference on Program Language Design and Implementation, Orlando, FL, June 1994.
No context found.
A. Fisher and A. Ghuloum. Parallelizing complex scans and reductions. In Proceedings of the SIGPLAN '94 ConferenceonProgram Language Design and Implementation, Orlando, FL, June 1994.
No context found.
A. Fisher and A. Ghuloum. Parallelizing Complex Scans and Reductions. Proceedings of the SIGPLAN'94 Conference on Programming Language Design and Implementation, June 1994.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC