| M. J. Wolfe, "Optimizing compilers for supercomputers," Ph.D. dissertation, Department of Computer Science, University of Illinois, Urbana, IL, CSRD Report 329, October 1982. |
.... this problem by detecting and exploiting parallelism in sequential programs written in conventional languages as well as parallel languages (e.g. HPF) Although compiler techniques for the automatic detection of parallelism have been studied extensively over the last two decades (see, e.g. [11, 18]) current parallelizing compilers cannot extract a significant fraction of the available parallelism in a loop if it has a complex and or statically insufficiently defined access pattern. Typical examples are complex simulations such as SPICE [10] DYNA 3D [17] GAUSSIAN [8] and CHARMM [1] ....
M. Wolfe. Optimizing Compilers for Supercomputers. The MIT Press, Boston, MA, 1989.
.... address this need by detecting and exploiting parallelism in sequential programs written in conventional languages as well as parallel languages (e.g. HPF) Although compiler techniques for the automatic detection of parallelism have been studied extensively over the last two decades (see, e.g. [9, 22]) current parallelizing compilers cannot extract a significant fraction of the available parallelism in a loop if it has a complex and or statically insufficiently defined access pattern. Typical examples are complex simulations such as SPICE [8] DYNA 3D [21] GAUSSIAN [6] CHARMM [1] In ....
M. Wolfe. Optimizing Compilers for Supercomputers. The MIT Press, Boston, MA, 1989.
....presented by this case is the fact that the exact form of the RHS is not known statically. What is known, however, is the set of all possible RHS forms, which can be computed by following all potential paths in the control flow graph. A direct approach uses a gated static single assignment (GSSA) [5, 33] representation of the program. In such a representation, scalar variables are assigned only once. At the points of confluence of conditional branches a function of the form g ;1 ; is used (in the GSSA representation) to select one of the two possible definitions of a variable ( ....
M. Wolfe. Optimizing Compilers for Supercomputers. The MIT Press, Boston, MA, 1989.
....Restructuring, or parallelizing, compilers address these problems by detecting and exploiting parallelism in sequential programs written in conventional languages. Although compiler techniques for the automatic detection of parallelism have been studied extensively over the last two decades [22, 32], current parallelizing compilers cannot extract a significant fraction of the available parallelism in a loop if it has a complex and or statically insufficiently defined access pattern. This is an extremely important issue because a large class of complex simulations used in industry today have ....
.... none of them has all of these properties (a comparison to previous work is contained in Section 4) 2 Preliminaries In order to guarantee the semantics of a loop, the parallel execution schedule for its iterations must respect the data dependence relations between the statements in the loop body [22, 15, 3, 32, 35]. There are three possible types of dependences between two statements that access the same memory location: flow (read after write) anti (write after read) and output (write after write) Flow dependences express a fundamental relationship about the data flow in the program. Anti and output ....
M. Wolfe. Optimizing Compilers for Supercomputers. The MIT Press, Boston, MA, 1989.
....provide accuracy. The first complication to dependence analysis is disambiguating array references, particularly in the context of loops. To test for dependence between array references, compilers have traditionally relied on several well known algorithms based on a set of Diophantine equations [45], 46] More recently, techniques have been developed which are able to handle multidimensional arrays and more complex array subscripts [47] 48] 49] 50] Although array dependence analysis has reached a fair level of maturity, current techniques may achieve inex35 act results due to ....
M. J. Wolfe, "Optimizing compilers for supercomputers," Ph.D. dissertation, Department of Computer Science, University of Illinois, Urbana, IL, 1982.
....Fellowships, and Army contract #DABT63 92C 0033. This work is not necessarily representative of the positions or policies of the Army or the Government. Although compiler techniques for the automatic detection of parallelism have been studied extensively over the last two decades (see, e.g. [25, 36]) current parallelizing compilers cannot extract a significant fraction of the available parallelism in a loop if it has a complex and or statically insufficiently defined access pattern. One major reason for this inability to statically parallelize some programs is that the most effective ....
....not depend in any way upon the execution ordering of the data accesses from different iterations. In order to determine whether or not the execution order of the data accesses affects the semantics of the loop, the data dependence relations between the statements in the loop body must be analyzed [6, 18, 25, 36, 39]. There are three possible types of dependences between two statements that access the same memory location: flow (read after write) anti (write after read) and output (write after write) Flow dependencesexpress a fundamental relationship about the data flow in the program. Anti and output ....
M. Wolfe. Optimizing Compilers for Supercomputers. The MIT Press, Boston, MA, 1989.
....in a while loop (assuming no crossiteration dependences) is the amount of parallelism available in its dispatching recurrence. To aid our analysis of the dispatching recurrence, it is convenient to extract, at least conceptually, this recurrence from the original while loop by distributing [20] the original loop into two do loops with conditional exits: 1. A loop that evaluates the terms of the dispatcher (recurrence) and any termination condition that is strongly connected to the dispatcher. 2. A loop consisting of the remainder loop which uses the values of the recurrence (computed ....
....not depend in any way upon the execution ordering of the data accesses from different iterations. In order to determine whether or not the execution order of the data accessesaffects the semantics of the loop, the data dependence relations between the statements in the loop body must be analyzed [15, 12, 2, 20, 23]. In related work, we have proposed run time techniques, called the Privatizing Doall (PD) test [16] and the more powerful LRPD test [17] for detecting the presenceof cross iteration dependences in a loop. These techniques were developed to test at run time whether a do loop could be executed as ....
[Article contains additional citation context not shown here]
M. Wolfe. Optimizing Compilers for Supercomputers. The MIT Press, Boston, MA, 1989.
....or parallelizing, compilers address these problems by detecting and exploiting parallelism in sequential programs written in conventional languages. Although compiler techniques for the automatic detection of parallelism have been studied extensively over the last two decades (see, e.g. 29] [43]) current parallelizing compilers cannot extract a significant fraction of the available parallelism in a loop if it has a complex and or statically unknown access pattern. Typical examples of applications containing such loops are complex simulations such as SPICE for circuit simulation, DYNA ....
....way upon the execution ordering of the data accesses from different iterations. In order to determine whether or not the execution order of the data accesses affects the semantics of the loop, the data dependence relations between the statements in the loop body must be analyzed [7] 20] 29] [43], 50] There are three possible types of dependences between two statements that access the same memory location: flow (read after write) anti (write after read) and output (write after write) Flow dependences express a fundamental relationship about the data flow in the program. Anti and ....
M. Wolfe. Optimizing Compilers for Supercomputers. The MIT Press, Boston, MA, 1989.
....are computation dependent and are modified from one execution phase to another, e.g. because of the changing interactions of the underlying physical phenomena they are simulating. Techniques addressing the issue of data dependence analysis have been studied extensively over the last two decades [34, 47] but parallelizing compilers cannot perform a meaningful data dependence analysis and extract a significant fraction of the available parallelism in a loop if it has a complex and or statically insufficiently defined access pattern. Unfortunately irregular programs, as previously defined, ....
....analysis fails, i.e. for irregular, dynamic applications. 2. 1 Fully Parallel (Doall) Loops In order to determine whether or not the execution order of the data accesses affects the semantics of the loop, the data dependence relations between the statements in the loop body must be analyzed [5, 24, 34, 47, 50]. There are three possible types of dependences between two statements that access the same memory location: flow (read after write) anti (write after read) and output (write after write) Flow dependences are data producer and consumer dependences, i.e. they express a fundamental relationship ....
M. Wolfe. Optimizing Compilers for Supercomputers. The MIT Press, Boston, MA, 1989.
.... address this need by detecting and exploiting parallelism in sequential programs written in conventional languages as well as parallel languages (e.g. HPF) Although compiler techniques for the automatic detection of parallelism have been studied extensively over the last two decades (see, e.g. [10, 19]) current parallelizing compilers cannot extract a significant fraction of the available parallelism in a loop if it has a complex and or statically insufficiently defined access pattern. Typical examples are complex simulations such as SPICE [9] DYNA 3D [18] GAUSSIAN [7] CHARMM [1] In ....
M. Wolfe. Optimizing Compilers for Supercomputers. The MIT Press, Boston, MA, 1989.
....must make careful judgement on whether a memory dependence can be removed. Such a decision is made by the memory dependence analyzer. A memory dependence analyzer determines the relation between memory references. For array references, many algorithms exist to perform data dependence analysis [6], 7] 8] Three possible conclusions can be reached regarding the relation between a pair of memory references: 1) they always access the same location; 2) they never access the same location; or 3) they may access the same location. In the first case, in which the two references alway access ....
M. J. Wolfe, "Optimizing compilers for supercomputers," Ph.D. dissertation, Department of Computer Science, University of Illinois, Urbana, IL, 1982.
....scheduling and the load balancing issues, but also the synchronization issues, which are usually much more difficult. Loop carried dependences can be categorized as lexically forward and lexically backward. Vector and SIMD machines can handle DOACROSS loops with only lexically forward dependences [1, 15, 22, 14]. One advantage of the MIMD (multiple instruction multiple data) machines is it allows loops with backward loop carried dependences to be handled by DOACROSS execution (i.e. by delaying consecutive iterations to satisfy backward dependences) The speedup obtained in this way can be large and ....
M. J. Wolfe. Optimizing Compilers for Supercomputers. PhD thesis, University of Illinois at Urbana-Champaign, 1982.
....Australian National University, Australia. z Part of the work was done when this author was a visiting fellow at the Australian National University. He is currently visiting the University of Southern Queensland, Australia. 0 tect parallelism and convert sequential programs into parallel forms [1,2]. Two statement instances 1 are said to be data dependent if they access the same data element and at least one of the accesses is a write. The direction of data dependence is determined by the sequential execution order of the statement instances of the program. The basic strategy of all ....
M. Wolfe, Optimizing Compilers for Supercomputers. Cambridge, MA, MIT Press, 1989.
....of the index variable of the enclosing loops. Although the loop bounds of such form are rare in real programs, they are common after the loops are transformed by unimodular transformation [10,11] which incorporates wide range of loop transformations such as loop interchange, loop skewing [12] and loop permutation [13] Therefore, a statement enclosed in n loops with loop index variables i 1 ; i n has as many instances as the integer grids in the n dimensional convex set defined by the lower and upper bounds of the enclosed loops. Each grid in the convex set represents a loop ....
M. Wolfe, Optimizing Compilers for Supercomputers. Cambridge, MA, MIT Press, 1989.
....or parallelizing, compilers address these problems by detecting and exploiting parallelism in sequential programs written in conventional languages. Although compiler techniques for the automatic detection of parallelism have been studied extensively over the last two decades (see, e.g. [19, 24]) current parallelizing compilers cannot extract a significant fraction of the available parallelism in a loop if it has a complex and or statically unknown access pattern. Programs exhibiting this kind of behavior account for more than 50 of all Fortran applications [12] and encompass most C ....
M. Wolfe. Optimizing Compilers for Supercomputers. The MIT Press, Boston, MA, 1989.
....We present some experimental results on loops from the PERFECT Benchmarks which confirm our conclusion that this test can lead to significant speedups. 1 Introduction During the last two decades, compiler techniques for the automatic detection of parallelism have been studied extensively [17, 27]. From this work it has become clear that, for a class of programs, compile time analysis has to be complemented with run time techniques if a significant fraction of the implicit parallelism is to be detected [6, 8] The main reason for this is that the access pattern of some programs cannot be ....
....not depend in any way upon the execution ordering of the data accesses from different iterations. In order to determine whether or not the execution order of the data accesses affects the semantics of the loop, the data dependence relations between the statements in the loop body must be analyzed [3, 11, 17, 27, 30]. There are three possible types of dependences between two statements that access the same memory location: flow (read after write) anti (write after read) and output (write after write) Flow dependences are data producer and consumer dependences, i.e. they express a fundamental relationship ....
M. Wolfe. Optimizing Compilers for Supercomputers. The MIT Press, Boston, MA, 1989.
....supported in part by Army contract #DABT63 92 C 0033. This work is not necessarily representative of the positions or policies of the Army or the Government. 1 Introduction During the last two decades, compiler techniques for the automatic detection of parallelism have been studied extensively [29, 18, 7]. From this work it has become clear that, for a class of programs, compile time analysis must be complemented with run time techniques if a significant fraction of the implicit parallelism is to be detected. The main reason for this is that the access pattern of some programs cannot be determined ....
....not depend in any way upon the execution ordering of the data accesses from different iterations. In order to determine whether or not the execution order of the data accesses affects the semantics of the loop, the data dependence relations between the statements in the loop body must be analyzed [18, 11, 3, 29, 32]. There are three possible types of dependences between two statements that access the same memory location: flow (read after write) anti (write after read) and output (write after write) Flow dependences express a fundamental relationship about the data flow in the program. Anti and output ....
M. Wolfe. Optimizing Compilers for Supercomputers. The MIT Press, Boston, MA, 1989.
....presented by this case is the fact that the exact form of the RHS is not known statically. What is known, however, is the set of all possible RHS forms, which can be computed by following all potential paths in the control flow graph. A direct approach uses a gated static single assignment (GSSA) [5, 33] representation of the program. In such a representation, scalar variables are assigned only once. At the points of confluence of conditional branches a OE function of the form OE(B; X 1 ; X 2 ) is used (in the GSSA representation) to select one of the two possible definitions of a variable (X 1 ....
M. Wolfe. Optimizing Compilers for Supercomputers. The MIT Press, Boston, MA, 1989.
....This work was supported in part by the Australian Research Council under Grant No. A49232251 ii 1 Introduction Data dependences between statements have long been used by vectorizing and parallelizing compilers to detect parallelism and convert sequential programs into parallel forms [1, 2]. Two statement instances 1 are said to be data dependent if they access the same data element and at least one of the accesses is a write. The direction of the data dependence is determined by the order of sequential execution of the program. Enforcing all the data dependences between dependent ....
M. Wolfe, Optimizing Compilers for Supercomputers. Cambridge, MA, MIT Press, 1989.
....ffi 2 is caused by the dependence between B(i) in S 2 ) and B(i Gamma 1) in S 1 ) The process of deciding whether two given statements in a program are data dependent is called the data dependence analysis and has been a subject of much research since the mid seventies. The studies reported in [1, 6, 10] represent only a subset of such investigations. 2.2. Parallel programs. In order to exploit the parallelism available in a shared memory multiprocessor, programs may be written with explicit parallel constructs or conventional sequential programs may be transformed into equivalent parallel ones ....
M. Wolfe, Optimizing compilers for supercomputers, Tech. Rep. UIUCDCS-R82 -1105, Department of Computer Science, University of Illinois at UrbanaChampaign, October, 1982.
....goal. It is called the Dependence Distance Vector Optimization and deserves further study. If some of the entries in the dependence distance vectors Equation (6. 7) are zeros, they correspond to DOALL loops and can be interchanged freely with other loop nests without violating dependence relations [Wol82] In parallelization, it is desirable to have DOALL loops at the outer levels so that the granularity of the tasks can be increased to amortize the overhead. Therefore, we should try to move all DOALL loops to the outermost levels. If these DOALL loops provide large enough parallelism, we need ....
....7.2 Background on Run Time Parallelization Issues In this section, we first present the basic concepts and then the previous work of runtime parallelization. 7.2.1 Loop Model and Basic Concept Consider a loop as shown in Figure 7. 1(a) In order to determine if there is any loop carried dependence [Wol82] between statements s p and s q across iterations and to compute its dependence distance, we have to solve the following integer equation f(i) g(i 0 ) DO i=1,N . s p : A(f(i) s q : A(g(i) ENDDO (a) DO i=1,N . s p : A(I1(i) s ....
M. J. Wolfe. Optimizing Compilers for Supercomputers. PhD thesis, University of Illinois at Urbana-Champaign, 1982.
No context found.
M. J. Wolfe, "Optimizing compilers for supercomputers," Ph.D. dissertation, Department of Computer Science, University of Illinois, Urbana, IL, CSRD Report 329, October 1982.
No context found.
# M. Wolfe, Optimizing Compilers for Supercomputers. Boston, Mass.: The MIT Press, 1989.
No context found.
M. Wolfe. Optimizing Compilers for Supercomputers. The MIT Press, Boston, MA, 1989.
No context found.
M. Wolfe, Optimizing Compilers for Supercomputers, The MIT Press, Boston, MA, 1989.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC