| S. Midkiff and D. Padua, "Compiler Algorithm for Synchronization ", IEEE Transactions on Computer, Vol. C-36, No. 12, 1987, pp.1485-1495. |
....by their compiler. In this thesis we are not considering the problem of extracting parallelism at run time, relying instead on compiler directives when static analysis cannot identify parallel loops. Interesting work in this area, mainly for shared address space multiprocessors, is described in [46, 74, 16, 52]. Recently, Rauchwerger [59, 58] proposed a very promising scheme to detect fully parallel loops. This scheme is being implemented in the context of the Polaris compiler project [12] at the University of Illinois. RUN TIME SUPPORT FOR IRREGULAR COMPUTATIONS In order to parallelize applications ....
S. Midkiff and D. Padua. Compiler algorithms for synchronization. IEEE Trans. Comput., C-36(12):1485--1495, 1987.
....has two limitations: first, inspector and executor are tightly coupled and, thus, the inspector is not reused across invocations of the same loop, even if the dependences do not change. Second, the execution of consecutive reads to the same array entry is serialized (RAR) Midkiff and Padua [15] improve this strategy by allowing concurrent reads of the same entry. Saltz et al. 18] propose an alternative solution but restricted to the particular case of loops with no output dependences. In this strategy, inspector and executor are uncoupled. Leung and Zahorjan [12] extend the previous ....
S. P. Midkiff and D. A. Padua. Compiler Algorithms for Synchronization. IEEE Transactions on Computers, 36(12):1485--1495, 1987.
....forwarding path using intra epoch control speculation and data dependence speculation in Section 4.2. 1. 3 Related Work Parallelization of a loop where the compiler synchronizes a loopcarried data dependence is known as a DOACROSS [8, 26] parallelization and has been exploited in previous work [6, 22, 38]. All schemes for TLS support include some form of DOACROSS synchronization, although few use the compiler to optimize this aspect of speculative execution. The most relevant related work is the Wisconsin Multiscalar [11, 27, 35] compiler, which performs synchronization and scheduling for ....
MIDKIFF, S. P., AND PADUA, D. A. Compiler algorithms for synchronization. IEEE Transactions on Computers C-36, 12 (1987), 1485--1495.
....operations, then releases the locks. 9. 2 Automatically Parallelized Scientific Computations Previous parallelizing compiler research in the area of synchronization optimization has focused almost exclusively on synchronization optimizations for parallel loops in scientific computations [14]. The natural implementation of a parallel loop requires two synchronization constructs: an initiation construct to start all processors executing loop iterations, and a barrier construct at the end of the loop. The majority of synchronization optimization research has concentrated on removing ....
S. Midkiff and D. Padua. Compiler algorithms for synchronization. IEEE Transactions on Computers, 36(12):1485--1495, December 1987.
.... between data and locks, then check that the program holds the lock whenever it accesses the corresponding data [94, 28] Other analyses trace the control transfers associated with the use of synchronization constructs such as the post and wait constructs used in parallel dialects of Fortran [71, 18, 36, 17], the Ada rendezvous constructs [95, 99, 33, 70, 35] or the Java wait and notify constructs [73, 74] The goal is to determine that the synchronization actions temporally separate conflicting accesses to shared data. In some cases it may be important to recognize that parallel tasks access ....
S. Midkiff and D. Padua. Compiler algorithms for synchronization. IEEE Transactions on Computers, 36(12):1485--1495, Dec. 1987.
....[7, 6] using dominant locks. A lock specifies a dependence relation, and a lock dominates another lock if enforcing the first lock ensures that the second lock is preserved. Three conditions for identifying dominant locks are provided, but they cover only very limited cases. Midkiff and Padua [10, 9] described schemes to generate synchronization instructions in a compiler using test and testset, which are very similar to await and advance instructions used in the Alliant minisupercomputers [1] They also introduced the Control Path Graph (CPG) to show the ordering imposed by the ....
S. Midkiff and D. Padua. Compiler algorithms for synchronization. IEEE Trans. on Computers, C-36(12), December 1987. 18
.... javac bytecode parallelization jasmin javab Figure 1: Automatic Bytecode Parallelization by javab To keep compile time limited, the prototype relies on relatively simple but generally also less expensive analysis, rather than doing, for instance, advanced data dependence analysis (see e.g. [3, 19, 22, 23, 24, 25]) Although currently an off line bytecode to bytecode transformation is applied, keeping compile time limited may become more important if the techniques are used for some form of JIT parallelization, i.e. bytecode parallelization directly prior to execution. 2 javab Command The bytecode tool ....
Samuel P. Midkiff and David A. Padua. Compiler algorithms for synchronization. IEEE Transactions on Computers, C-36:1485--1495, 1987.
....of the outermost loop in a loop nest among the processors, with synchronization instructions being inserted to take care of dependences carried by this loop. To reduce synchronization, transformations like loop interchange are carried out to move parallel loops outermost wherever possible [3, 5, 25, 37]. This approach does not perform any data management, so it is not suitable for generating good code on NUMA architectures. An alternative approach, implemented by the FORTRAND system [12] is to give the programmer control over how data structures are distributed across the processors. The ....
....min(ub, p 1) S) a) wrapped distribution (b) blocked distribution Given this assignment of iterations to processors, we must generate synchronization instructions to take care of dependences carried by the outermost loop, and insert block transfers wherever possible. These steps are routine [11, 25, 30], and are omitted from this paper. 8 Empirical Results and Performance Analysis In this section, we report the performance of our techniques on routines from the BLAS (Basic Linear Algebra Subprograms) library. The target machine is a BBN Butterfly GP1000. On this machine, a processor can access ....
S. P. Midkiff and D. A. Padua. Compiler algorithms for synchronization. IEEE Transactions on computers, C-36:1485--1495, December 1987.
....such programs, it does not address the trade off between lock overhead and waiting overhead. The goal is simply to minimize the lock overhead. 7. 1 Parallel Loop Optimizations Other synchronization optimization research has focused almost exclusively on parallel loops in scientific computations [Midkiff and Padua 1987]. The natural implementation of a parallel loop requires two synchronization constructs: an initiation construct to start all processors executing loop iterations, and a barrier construct at the end of the loop. The majority of synchronization optimization research has concentrated on removing ....
MIDKIFF, S. AND PADUA, D. 1987. Compiler algorithms for synchronization. IEEE Transactions on Computers 36, 12 (Dec.), 1485--1495.
....where compiler analysis is insufficient, software pipelining is not possible, because the safe initiation interval unknown and often variable. Statically analyzable doacross loops have been exploited in multiprocessors through the use of hardware or software synchronization primitives [31, 49]. The compiler can place post and await instructions around the accesses that cause flow dependences. The speedup obtained with this technique is upper bounded by the size of the loop iterations that may be overlapped and the overhead of synchronizations. If the data access pattern of the loop is ....
....constructing execution schedules for partially parallel loops, i.e. loops whose 7 parallelization requires synchronization to ensure that the iterations are executed in the correct order. Briefly, run time methods for parallelizing loops rely heavily on global synchronizations (communication) [13, 21, 26, 31, 35, 41, 43, 49], are applicable only to restricted types of loops [26, 41, 43] have significant sequential components [35, 41, 43] and or do not extract the maximum available parallelism (they make conservative assumptions) 13, 26, 35, 41, 43, 49] The only method that manages to combine the most advantageous ....
[Article contains additional citation context not shown here]
S. Midkiff and D. Padua. Compiler algorithms for synchronization. IEEE Trans. Comput., C-36(12):1485--1495, 1987.
....cases where synchronization inserted to satisfy a dependence satisfies another as well, and implement their technique as a pass in the Parafrase source to source restructurer. Midkiff and Padua extend this to eliminate redundant synchronization operations from loops with arbitrary control flow [MP87] Both algorithms can be applied to reduce the synchronization added when parallelizing loops. However, we have shown that the synchronization added by these algorithms is not optimal [RC97a] Quinn, Hatcher and Seevers describe an algorithm to insert the minimal number of barriers when ....
S.P. Midkiff and D.A. Padua. Compiler algorithms for synchronization. IEEE Transactions on Computers, C-36(12):1485--1495, December 1987.
No context found.
# S. Midkiff and D. Padua, "Compiler Algorithms for Synchronization, " IEEE Trans. Computers, vol. 36, no. 12, pp. 1,485--1,495, Dec. 1987.
....parallel loops, i.e. loops whose parallelization requires synchronization to ensure that the iterations are executed in the correct order. These methods are centered around the extraction of an inspector loop that analyzes the data access pattern off line, i.e. without side effects [8, 19, 22, 26, 27, 28, 29, 35, 36]. The inspection phase of these schemes usually yields a partitioning of the set of iterations into subsets that can be executed in parallel. These subsets, sometimes called wavefronts, are scheduled sequentially by placing synchronization barriers between them. Unfortunately the distribution of ....
S. Midkiff and D. Padua. Compiler algorithms for synchronization. IEEE Trans. Comput., C-36(12):1485--1495, 1987.
....pattern is input data dependent. For example, most dependence analysis algorithms conservatively assume dependences when presented with non linear or subscripted subscript expressions. During the past few years, techniques have been developed for the run time analysis and scheduling of loops [5, 9, 13, 17, 20, 23, 25, 26, 27, 28, 29, 30, 33, 34]. The majority of this workhas concentrated on developing run time methods for constructing execution schedules for partially parallel loops, i.e. loops whose parallelization requires synchronization to ensure that the iterations are executed in the correct order. Given the original, or source ....
....sequential code. Since compile time data dependence analysis techniques cannot be used on such programs, methods of performing the analysis at run time are required. Several techniques have been developed for the run time analysis and scheduling of loops with cross iteration dependences [5, 9, 13, 17, 20, 23, 28, 29, 30, 33, 34]. However, for various reasons, such techniques have not achieved wide spread use in current parallelizing compilers. In the following we describe a new run time scheme for constructing a parallel execution schedule for the iterations of a loop. The general structure of our method is similar to ....
[Article contains additional citation context not shown here]
S. Midkiff and D. Padua. Compiler algorithms for synchronization. IEEE Trans. Comput., C-36(12):1485--1495, 1987.
....conditions in parallel programs (see, e.g. 13, 24, 31] However, these methods are generally not appropriate for run time loop parallelization since they are optimized for other purposes, e.g. for them minimizing memory requirements is more important than speed. i.e. without side effects [8, 20, 23, 26, 28, 29, 30, 37, 38, 12]. The inspection phaseof these schemesusually yields a partitioning of the set of iterations into subsets that can be executed in parallel. These subsets, sometimes called wavefronts, are scheduled sequentially by placing synchronization barriers between them. Unfortunately the distribution of ....
S. Midkiff and D. Padua. Compiler algorithms for synchronization. IEEE Trans. Comput., C-36(12):1485--1495, 1987.
....iteration to access any variable (e.g. array element) is found using atomic compare and swap synchronization primitives to record the minimum such iteration in a shadow version of that variable. By using separate shadow variables to process the read and write operations, Midkiff and Padua [27] improved this basic method so that concurrent reads from a memory location are allowed in multiple iterations. Recently, Chen, Yew and Torrellas [13] proposed another variant of the Zhu and Yew method which improves performance in the presence of hot spots (i.e. many accesses to the same memory ....
.... a suboptimal schedule since a new synchronization 19 obtains contains requires restricts privatizes optimal sequential global type of or finds Method schedule portions synchron loop reductions Rauchwerger Amato Padua [31] Yes No No No P,R Zhu Yew [49] No 1 No Yes 2 No No Midkiff Padua [27] Yes No Yes 2 No No Krothapalli Sadayappan [18] No 3 No Yes 2 No P Chen Yew Torrellas [13] No 1;3 No Yes No No Xu Chaudary [46] 45] Yes No Yes No No Saltz Mirchandaney [35] No 3 No Yes Yes 5 No Saltz et al. 37] Yes Yes 4 Yes Yes 5 No Leung Zahorjan [22] Yes No Yes Yes 5 ....
[Article contains additional citation context not shown here]
S. Midkiff and D. Padua. Compiler algorithms for synchronization. IEEE Trans. Comput., C-36(12):1485--1495, 1987.
No context found.
S. Midkiff and D. Padua, Compiler algorithms for synchronization, IEEE Trans. Comput., 36(12):1485--1495, 1987.
....type system to translate a strongly typed functional program to an annotated functional program, where the annotation is used for for stack allocation rather than for region allocation. Prior work on synchronization optimization has addressed the problem of reducing the amount of synchronization [13, 20, 21]. These approaches assume that the mutual exclusion 15 0 20 40 60 80 trans jgl jacorb jolt jobe javacup hashjava toba wingdis pbob FS BFS FI BFI Figure 8: Percentage of user local objects allocated on the stack. 0 20 40 60 80 trans jgl jacorb jolt jobe javacup ....
S.P. Midkiff and D. Padua. Compiler algorithms for synchronization. IEEE Transactions on Computers, C36 (12):1485--1495, December 1987.
No context found.
S. Midkiff and D. Padua, "Compiler Algorithm for Synchronization ", IEEE Transactions on Computer, Vol. C-36, No. 12, 1987, pp.1485-1495.
No context found.
S. P. Midkiff and D. A. Padua. Compiler algorithms for synchronization. IEEE Transactions on computers, C-36:1485--1495, December 1987.
No context found.
S. P. Midkiff and D. A. Padua. Compiler algorithms for synchronization. IEEE Transactions on computers, C-36:1485--1495, December 1987.
No context found.
S.P. Midki# and D.A. Padua, "Compiler Algorithms for Synchronization", IEEE Trans. Comput., C-36(12), pp. 1485-1495, Dec. 87.
No context found.
S. Midki# and D. Padua. Compiler algorithms for synchronization. IEEE Transactions on Computers, 36#12#:1485#1495, December 1987.
No context found.
S. Midki# and D. Padua. Compiler algorithms for synchronization. IEEE Transactions on Computers, 36(12):1485--1495, Dec. 1987.
No context found.
S P Midki and D A Padua. Compiler Algorithms for Synchronization. IEEE Transactions on Computers, 36(12):1485-1495, (1987).
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC