| M. J. Quinn, P. J. Hatcher, and B. K. Seevers. Implementing a data parallel language on a tightly coupled multiprocessor. In Proceedings of 3rd Workshop on Programming Languages and Compilers for Parallel Computing, 1990. |
....illustrates the complexity of implementation tradeoffs. Hatcher and Quinn developed two compilers that translate a program written in a language called Dataparallel C into code executable on a parallel machine [Hatcher Quinn 91] One compiler generates code for 15 a shared memory machine [Quinn et al. 90] and one for a message passing machine [Hatcher et al. 91] The programmer s interface, Dataparallel C, is identical on both platforms. Dataparallel C provides the programmer with the abstraction of a single shared address space for variables. Thus, the programmer gets the sharing model that he ....
M. J. Quinn, P. J. Hatcher, and B. K. Seevers. Implementing a data parallel language on a tightly coupled multiprocessor. In Proceedings of 3rd Workshop on Programming Languages and Compilers for Parallel Computing, 1990.
....the core of this dissertation should also apply to HPF, and likely to other data parallel languages as well. Chapter 4 THE C COMPILER In this chapter, we outline our compilation strategy for C . Our C compiler is based on a recent version of the compiler by Hatcher and Quinn [Quinn et al. 88, Hatcher et al. 91, Hatcher Quinn 91] that we have modified for our purposes. The overall approach is to translate C into mostly machine independent C code that makes calls to run time libraries for all communication and synchronization operations. The resulting code is then linked with architecture specific ....
....compiler handles C communication operations, since C get and send operations translate directly into load and store instructions. However, the compiler must insert explicit synchronization operations in the generated code to prevent race conditions. Hatcher and Quinn describe this approach in [Hatcher et al. 91] Consider the C code in Figure 4.2a, where each VP sends a copy of its y variable to its left neighbor, which stores the data in its x variable. To simplify the following discussion, 27 we ignore effects at the ends of the VP array, and assume one virtual processor per physical processor, ....
[Article contains additional citation context not shown here]
P. J. Hatcher, M. J. Quinn, and B. K. Seevers. Implementing a dataparallel language on a tightly coupled multiprocessor. In Proc. 3rd Workshop Programming Languages Compilers Parallel Computers, 1991.
.... there is only constant dependence distances and no conditional branches, the problem of finding a minimum set of dependence has be shown to be NP Hard[22] Program transformations to eliminate or decrease the cost of synchronizations inside parallel and doacross loops have also been proposed [11, 2, 40, 34, 21, 23]. The program transformations used are loop distribution, loop fusion and loop alignment. Loop distribution is used to move a synchronization in a parallel loop between the parallel loop resulting from the distribution. Loop fusion is used to increase granularity and to reduce parallel loop ....
....into a loop independent dependences. To extend the loop alignment applicability it can also be coupled INRIA Synchronization Minimization in a SPMD Execution Model 9 with statement replications; the problem is then NP Hard when minimizing the amount of replication code[2] Quinn et al. [34] have proposed a greedy algorithm to minimize the number of synchronization in a basic block. The algorithm is based on finding, for a set of intervals defined by the sink and source of the dependencies, the minimum number of points such that every intervals spans at least one point. This problem ....
[Article contains additional citation context not shown here]
Quinn M. Hatcher P and Seevers B, Implementing a Data Parallel Language on a Tightly Coupled Multiprocessors, in Advances in Languages and Compilers for Parallel Processing, pp 385-401, Pitman.
....to satisfy the dependences present in the graph in column 3. 2. 2 Related Work There has been much previous work on compiler based approaches to synchronisation reduction, but most do not accurately determine whether a dependence requires synchronisation and generally assume a fork and join model [1, 6, 17]. Program transformations, such as loop distribution, loop fusion and loop alignment, have also been proposed to eliminate or decrease the cost of synchronisations [3, 4, 18, 2] More recent work has moved away from the fork and join model and addressed the problem of synchronisation insertion for ....
....no interval in common and we can not kill these two arcs with one barrier (see figure 5) Since we have essential arcs we will need at least barriers to kill these arcs. Therefore we have found the minimum number of barriers in n steps. This algorithm has similarities to the one described in [17], but has a smaller time and space complexity and is thus faster. The algorithm of [17] needs Theta(n e) steps to find the minimum number of synchronisation points, where n is the number of statements and e is the number of dependences. Our algorithm needs only n steps to find the minimum number ....
[Article contains additional citation context not shown here]
M. Quinn, P. Hatcher and B. Seevers, Implementing a data parallel language on a tightly coupled multiprocessors. In Advances in Languages and Compilers for Parallel Processing, pp.385-401. Pitman, London, 1991
....dependences at each stage chronisation points based on kill cover analysis is presented. It was shown that the problem of finding the minimal number of synchronisation points is NP complete [4] The problem of eliminating redundant or covered dependences was also considered in [3] Quinn et al. [11] have proposed an algorithm similar in spirit to that presented in this paper. Apart from a small technical error in their algorithm, they minimise the number of synchronisations in a basic block and attempt to extend the technique to loop nests. Crucially, however, there is no consideration of ....
M. Quinn, P. Hatcher and B. Seevers, Implementing a data parallel language on a tightly coupled multiprocessors. In Advances in Languages and Compilers for Parallel Processing, pp.385-401. Pitman, London, 1991.
.... there is only constant dependence distances and no conditional branches, the problem of finding a minimum set of dependence has be shown to be NP Hard[22] Program transformations to eliminate or decrease the cost of synchronizations inside parallel and doacross loops have also been proposed [11, 2, 40, 34, 21, 23]. The program transformations used are loop distribution, loop fusion and loop alignment. Loop distribution is used to move a synchronization in a parallel loop between the parallel loop resulting from the distribution. Loop fusion is used to increase granularity and to reduce parallel loop ....
....creation overhead. Loop alignment can convert loop carried dependences into a loop independent dependences. To extend the loop alignment applicability it can also be coupled with statement replications; the problem is then NP Hard when minimizing the amount of replication code[2] Quinn et al. [34] have proposed a greedy algorithm to minimize the number of synchronization in a basic block. The algorithm is based on finding, for a set of intervals defined by the sink and source of the dependencies, the minimum number of points such that every intervals spans at least one point. This problem ....
[Article contains additional citation context not shown here]
Quinn M. Hatcher P and Seevers B, Implementing a Data Parallel Language on a Tightly Coupled Multiprocessors, in Advances in Languages and Compilers for Parallel Processing, pp 385-401, Pitman.
....execution available on the machine originally targeted by the language. If a direct translation of a SIMD construct to code for a MIMD machine is performed, performance may be degraded significantly. However, some traditional synchronous languages have be successfully implemented on MIMD machines [QHS91] Sab92] This is made possible by two things. The first is compiler technology that can remove unnecessary synchronization. The second is use of hardware to implement a barrier synchronization across processors in a MIMD machine, rather than using a large number of direct synchronizations ....
M. Quinn, P. Hatcher, and B. Seevers. Implementing a Data Parallel Language on a Tightly Coupled Multiprocessor. In A. Nicolau and D. Padua, editors, Advances in languages and compilers for parallel processing, pages 385--401. The MIT Press, Cambridge, Massachusetts, 1991.
....nodes holding a copy. For both versions, we assume that data is always fetched in cache line units. 2. 2 Compilation of C Programs Our benchmarks are written in the C data parallel language [TMC 90] The C compiler, based on a newer version of the work by Hatcher and Quinn [Quinn et al. 88, Hatcher et al. 91, Hatcher Quinn 91] generates mostly machine independent C code that is designed to be run, SPMD style, on each node. Communication and synchronization operations are implemented in machine specific runtime libraries and the compiler inserts appropriate calls in the generated code. The ....
P. J. Hatcher, M. J. Quinn, and B. K. Seevers. Implementing a data-parallel language on a tightly coupled multiprocessor. In Proc. 3rd Workshop Programming Languages Compilers Parallel Computers, 1991.
.... is only constant dependence distances and no conditional branches, the problem of finding a minimum set of data dependences has been shown to be NP Hard[29] Program transformations to eliminate or decrease the cost of synchronizations inside parallel and doacross loops have also been proposed [14, 2, 52, 44, 28, 30]. The program transformations used are loop distribution, loop fusion, and loop alignment. Loop distribution is used to transform a synchronization within one loop nest to a synchronisation between two loop nests. Loop fusion is used to increase granularity and to reduce parallel loop creation ....
....(cf. section 4.1) Loop alignment can convert loop carried dependences into a loop independent dependence. To extend the loop alignment applicability it can also be coupled with statement replications; the problem is then NP Hard when minimizing the amount of replication code [2] Quinn et al. [44] have proposed a greedy algorithm to minimize the number of synchronization in a basic block. For a set of intervals defined by the sink and source of the data dependences, the algorithm is based on finding the minimum number of points such that every interval spans at least one point. This ....
[Article contains additional citation context not shown here]
Quinn M. Hatcher P and Seevers B, Implementing a Data Parallel Language on a Tightly Coupled Multiprocessors, in Advances in Languages and Compilers for Parallel Processing, pp. 385-401, Pitman, August 1993.
....given a set of intervals spanned by flow, anti, and output dependencies, the minimal number of barrier sychronizations must be determined such that each interval spans at least one barrier. This problem can be solved in polynomial time[Gav72] a simple greedy algorithm devised by Quinn and Hatcher[QHS91] extracts a minimal set of synchronizations from a set of dependencies arcs represented as a sychronization graph. At each step the algorithm selects, of all arcs, the dependency arc whose head points to the earliest point in the program. A barrier is inserted before this point and the arc is ....
....bar O bar ) By analyzing the code fragment for dependency relationships, it is possible to reduce the barrier synchronizations to only those points where they are essential. Figure 10 shows the synchronization graph for the diffusion code fragment. By analysis similar to the one presented in [QHS91] the s1 s2 s4 s5 s6 barrier synch 1 barrier synch 2 barrier synch 3 Figure 10: Dependence graph for the diffusion program fragment number of barrier synchronizations may be reduced as shown in figure 10, where dashed lines indicate barriers that are necessary. The following is the transformed ....
[Article contains additional citation context not shown here]
M. Quinn, P. Hatcher, and B. Seevers. Implementing a Data Parallel Language on a Tightly Coupled Multiprocessor. In A. Nicolau, D. Gelernter, Gross T., and Padua D., editors, Advances in Languages and Compilers for Parallel Processing, chapter 20, pages 385--401. MIT Press, Cambridge, Massachusetts, 1991.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC