44 citations found. Retrieving documents...
M. J. Wolfe, "Optimizing compilers for supercomputers," Ph.D. dissertation, Department of Computer Science, University of Illinois, Urbana, IL, CSRD Report 329, October 1982.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

The R-LRPD Test: Speculative Parallelization of Partially.. - Dang, Yu, Rauchwerger   (Correct)

.... this problem by detecting and exploiting parallelism in sequential programs written in conventional languages as well as parallel languages (e.g. HPF) Although compiler techniques for the automatic detection of parallelism have been studied extensively over the last two decades (see, e.g. [11, 18]) current parallelizing compilers cannot extract a significant fraction of the available parallelism in a loop if it has a complex and or statically insufficiently defined access pattern. Typical examples are complex simulations such as SPICE [10] DYNA 3D [17] GAUSSIAN [8] and CHARMM [1] ....

M. Wolfe. Optimizing Compilers for Supercomputers. The MIT Press, Boston, MA, 1989.


Run-time Parallelization Techniques for Sparse Applications - Lawrence   (Correct)

.... address this need by detecting and exploiting parallelism in sequential programs written in conventional languages as well as parallel languages (e.g. HPF) Although compiler techniques for the automatic detection of parallelism have been studied extensively over the last two decades (see, e.g. [9, 22]) current parallelizing compilers cannot extract a significant fraction of the available parallelism in a loop if it has a complex and or statically insufficiently defined access pattern. Typical examples are complex simulations such as SPICE [8] DYNA 3D [21] GAUSSIAN [6] CHARMM [1] In ....

M. Wolfe. Optimizing Compilers for Supercomputers. The MIT Press, Boston, MA, 1989.


The LRPD Test: Speculative Run-Time Parallelization of.. - Rauchwerger, Padua (1995)   (57 citations)  (Correct)

....presented by this case is the fact that the exact form of the RHS is not known statically. What is known, however, is the set of all possible RHS forms, which can be computed by following all potential paths in the control flow graph. A direct approach uses a gated static single assignment (GSSA) [5, 33] representation of the program. In such a representation, scalar variables are assigned only once. At the points of confluence of conditional branches a function of the form g ;1 ; is used (in the GSSA representation) to select one of the two possible definitions of a variable ( ....

M. Wolfe. Optimizing Compilers for Supercomputers. The MIT Press, Boston, MA, 1989.


Run-Time Methods for Parallelizing Partially Parallel Loops - Rauchwerger, Amato, Padua (1995)   (18 citations)  (Correct)

....Restructuring, or parallelizing, compilers address these problems by detecting and exploiting parallelism in sequential programs written in conventional languages. Although compiler techniques for the automatic detection of parallelism have been studied extensively over the last two decades [22, 32], current parallelizing compilers cannot extract a significant fraction of the available parallelism in a loop if it has a complex and or statically insufficiently defined access pattern. This is an extremely important issue because a large class of complex simulations used in industry today have ....

.... none of them has all of these properties (a comparison to previous work is contained in Section 4) 2 Preliminaries In order to guarantee the semantics of a loop, the parallel execution schedule for its iterations must respect the data dependence relations between the statements in the loop body [22, 15, 3, 32, 35]. There are three possible types of dependences between two statements that access the same memory location: flow (read after write) anti (write after read) and output (write after write) Flow dependences express a fundamental relationship about the data flow in the program. Anti and output ....

M. Wolfe. Optimizing Compilers for Supercomputers. The MIT Press, Boston, MA, 1989.


Memory Disambiguation To Facilitate Instruction-Level.. - Gallagher (1995)   (17 citations)  (Correct)

....provide accuracy. The first complication to dependence analysis is disambiguating array references, particularly in the context of loops. To test for dependence between array references, compilers have traditionally relied on several well known algorithms based on a set of Diophantine equations [45], 46] More recently, techniques have been developed which are able to handle multidimensional arrays and more complex array subscripts [47] 48] 49] 50] Although array dependence analysis has reached a fair level of maturity, current techniques may achieve inex35 act results due to ....

M. J. Wolfe, "Optimizing compilers for supercomputers," Ph.D. dissertation, Department of Computer Science, University of Illinois, Urbana, IL, 1982.


The LRPD Test: Speculative Run-Time Parallelization of.. - Rauchwerger, Padua (1995)   (57 citations)  (Correct)

....Fellowships, and Army contract #DABT63 92C 0033. This work is not necessarily representative of the positions or policies of the Army or the Government. Although compiler techniques for the automatic detection of parallelism have been studied extensively over the last two decades (see, e.g. [25, 36]) current parallelizing compilers cannot extract a significant fraction of the available parallelism in a loop if it has a complex and or statically insufficiently defined access pattern. One major reason for this inability to statically parallelize some programs is that the most effective ....

....not depend in any way upon the execution ordering of the data accesses from different iterations. In order to determine whether or not the execution order of the data accesses affects the semantics of the loop, the data dependence relations between the statements in the loop body must be analyzed [6, 18, 25, 36, 39]. There are three possible types of dependences between two statements that access the same memory location: flow (read after write) anti (write after read) and output (write after write) Flow dependencesexpress a fundamental relationship about the data flow in the program. Anti and output ....

M. Wolfe. Optimizing Compilers for Supercomputers. The MIT Press, Boston, MA, 1989.


Parallelizing WHILE Loops for Multiprocessor Systems - Rauchwerger, Padua (1995)   (8 citations)  (Correct)

....in a while loop (assuming no crossiteration dependences) is the amount of parallelism available in its dispatching recurrence. To aid our analysis of the dispatching recurrence, it is convenient to extract, at least conceptually, this recurrence from the original while loop by distributing [20] the original loop into two do loops with conditional exits: 1. A loop that evaluates the terms of the dispatcher (recurrence) and any termination condition that is strongly connected to the dispatcher. 2. A loop consisting of the remainder loop which uses the values of the recurrence (computed ....

....not depend in any way upon the execution ordering of the data accesses from different iterations. In order to determine whether or not the execution order of the data accessesaffects the semantics of the loop, the data dependence relations between the statements in the loop body must be analyzed [15, 12, 2, 20, 23]. In related work, we have proposed run time techniques, called the Privatizing Doall (PD) test [16] and the more powerful LRPD test [17] for detecting the presenceof cross iteration dependences in a loop. These techniques were developed to test at run time whether a do loop could be executed as ....

[Article contains additional citation context not shown here]

M. Wolfe. Optimizing Compilers for Supercomputers. The MIT Press, Boston, MA, 1989.


The LRPD Test: Speculative Run-Time Parallelization of.. - Rauchwerger, Padua (1995)   (57 citations)  (Correct)

....or parallelizing, compilers address these problems by detecting and exploiting parallelism in sequential programs written in conventional languages. Although compiler techniques for the automatic detection of parallelism have been studied extensively over the last two decades (see, e.g. 29] [43]) current parallelizing compilers cannot extract a significant fraction of the available parallelism in a loop if it has a complex and or statically unknown access pattern. Typical examples of applications containing such loops are complex simulations such as SPICE for circuit simulation, DYNA ....

....way upon the execution ordering of the data accesses from different iterations. In order to determine whether or not the execution order of the data accesses affects the semantics of the loop, the data dependence relations between the statements in the loop body must be analyzed [7] 20] 29] [43], 50] There are three possible types of dependences between two statements that access the same memory location: flow (read after write) anti (write after read) and output (write after write) Flow dependences express a fundamental relationship about the data flow in the program. Anti and ....

M. Wolfe. Optimizing Compilers for Supercomputers. The MIT Press, Boston, MA, 1989.


Run-Time Parallelization: It's Time Has Come - Rauchwerger (1998)   (3 citations)  (Correct)

....are computation dependent and are modified from one execution phase to another, e.g. because of the changing interactions of the underlying physical phenomena they are simulating. Techniques addressing the issue of data dependence analysis have been studied extensively over the last two decades [34, 47] but parallelizing compilers cannot perform a meaningful data dependence analysis and extract a significant fraction of the available parallelism in a loop if it has a complex and or statically insufficiently defined access pattern. Unfortunately irregular programs, as previously defined, ....

....analysis fails, i.e. for irregular, dynamic applications. 2. 1 Fully Parallel (Doall) Loops In order to determine whether or not the execution order of the data accesses affects the semantics of the loop, the data dependence relations between the statements in the loop body must be analyzed [5, 24, 34, 47, 50]. There are three possible types of dependences between two statements that access the same memory location: flow (read after write) anti (write after read) and output (write after write) Flow dependences are data producer and consumer dependences, i.e. they express a fundamental relationship ....

M. Wolfe. Optimizing Compilers for Supercomputers. The MIT Press, Boston, MA, 1989.


Techniques for Reducing the Overhead of Run-time Parallelization - Yu, Rauchwerger   (Correct)

.... address this need by detecting and exploiting parallelism in sequential programs written in conventional languages as well as parallel languages (e.g. HPF) Although compiler techniques for the automatic detection of parallelism have been studied extensively over the last two decades (see, e.g. [10, 19]) current parallelizing compilers cannot extract a significant fraction of the available parallelism in a loop if it has a complex and or statically insufficiently defined access pattern. Typical examples are complex simulations such as SPICE [9] DYNA 3D [18] GAUSSIAN [7] CHARMM [1] In ....

M. Wolfe. Optimizing Compilers for Supercomputers. The MIT Press, Boston, MA, 1989.


Data Preload For Superscalar And VLIW Processors - Chen, Jr. (1993)   (16 citations)  (Correct)

....must make careful judgement on whether a memory dependence can be removed. Such a decision is made by the memory dependence analyzer. A memory dependence analyzer determines the relation between memory references. For array references, many algorithms exist to perform data dependence analysis [6], 7] 8] Three possible conclusions can be reached regarding the relation between a pair of memory references: 1) they always access the same location; 2) they never access the same location; or 3) they may access the same location. In the first case, in which the two references alway access ....

M. J. Wolfe, "Optimizing compilers for supercomputers," Ph.D. dissertation, Department of Computer Science, University of Illinois, Urbana, IL, 1982.


Statement Re-Ordering for DOACROSS Loops - Chen, Yew (1994)   (2 citations)  (Correct)

....scheduling and the load balancing issues, but also the synchronization issues, which are usually much more difficult. Loop carried dependences can be categorized as lexically forward and lexically backward. Vector and SIMD machines can handle DOACROSS loops with only lexically forward dependences [1, 15, 22, 14]. One advantage of the MIMD (multiple instruction multiple data) machines is it allows loops with backward loop carried dependences to be handled by DOACROSS execution (i.e. by delaying consecutive iterations to satisfy backward dependences) The speedup obtained in this way can be large and ....

M. J. Wolfe. Optimizing Compilers for Supercomputers. PhD thesis, University of Illinois at Urbana-Champaign, 1982.


Vectorization beyond Data Dependences - Peiyi Tang   (Correct)

....Australian National University, Australia. z Part of the work was done when this author was a visiting fellow at the Australian National University. He is currently visiting the University of Southern Queensland, Australia. 0 tect parallelism and convert sequential programs into parallel forms [1,2]. Two statement instances 1 are said to be data dependent if they access the same data element and at least one of the accesses is a write. The direction of data dependence is determined by the sequential execution order of the statement instances of the program. The basic strategy of all ....

M. Wolfe, Optimizing Compilers for Supercomputers. Cambridge, MA, MIT Press, 1989.


Exact Side Effects for Interprocedural Dependence Analysis - Tang (1992)   (12 citations)  (Correct)

....of the index variable of the enclosing loops. Although the loop bounds of such form are rare in real programs, they are common after the loops are transformed by unimodular transformation [10,11] which incorporates wide range of loop transformations such as loop interchange, loop skewing [12] and loop permutation [13] Therefore, a statement enclosed in n loops with loop index variables i 1 ; i n has as many instances as the integer grids in the n dimensional convex set defined by the lower and upper bounds of the enclosed loops. Each grid in the convex set represents a loop ....

M. Wolfe, Optimizing Compilers for Supercomputers. Cambridge, MA, MIT Press, 1989.


A Unified Approach to Speculative Parallelization of.. - Zhang, Rauchwerger.. (1998)   (Correct)

....or parallelizing, compilers address these problems by detecting and exploiting parallelism in sequential programs written in conventional languages. Although compiler techniques for the automatic detection of parallelism have been studied extensively over the last two decades (see, e.g. [19, 24]) current parallelizing compilers cannot extract a significant fraction of the available parallelism in a loop if it has a complex and or statically unknown access pattern. Programs exhibiting this kind of behavior account for more than 50 of all Fortran applications [12] and encompass most C ....

M. Wolfe. Optimizing Compilers for Supercomputers. The MIT Press, Boston, MA, 1989.


The Privatizing DOALL Test: A Run-Time Technique for DOALL.. - Rauchwerger, Padua (1994)   (23 citations)  (Correct)

....We present some experimental results on loops from the PERFECT Benchmarks which confirm our conclusion that this test can lead to significant speedups. 1 Introduction During the last two decades, compiler techniques for the automatic detection of parallelism have been studied extensively [17, 27]. From this work it has become clear that, for a class of programs, compile time analysis has to be complemented with run time techniques if a significant fraction of the implicit parallelism is to be detected [6, 8] The main reason for this is that the access pattern of some programs cannot be ....

....not depend in any way upon the execution ordering of the data accesses from different iterations. In order to determine whether or not the execution order of the data accesses affects the semantics of the loop, the data dependence relations between the statements in the loop body must be analyzed [3, 11, 17, 27, 30]. There are three possible types of dependences between two statements that access the same memory location: flow (read after write) anti (write after read) and output (write after write) Flow dependences are data producer and consumer dependences, i.e. they express a fundamental relationship ....

M. Wolfe. Optimizing Compilers for Supercomputers. The MIT Press, Boston, MA, 1989.


Speculative Run-Time Parallelization of Loops - Rauchwerger, Padua (1994)   (1 citation)  (Correct)

....supported in part by Army contract #DABT63 92 C 0033. This work is not necessarily representative of the positions or policies of the Army or the Government. 1 Introduction During the last two decades, compiler techniques for the automatic detection of parallelism have been studied extensively [29, 18, 7]. From this work it has become clear that, for a class of programs, compile time analysis must be complemented with run time techniques if a significant fraction of the implicit parallelism is to be detected. The main reason for this is that the access pattern of some programs cannot be determined ....

....not depend in any way upon the execution ordering of the data accesses from different iterations. In order to determine whether or not the execution order of the data accesses affects the semantics of the loop, the data dependence relations between the statements in the loop body must be analyzed [18, 11, 3, 29, 32]. There are three possible types of dependences between two statements that access the same memory location: flow (read after write) anti (write after read) and output (write after write) Flow dependences express a fundamental relationship about the data flow in the program. Anti and output ....

M. Wolfe. Optimizing Compilers for Supercomputers. The MIT Press, Boston, MA, 1989.


The LRPD Test: Speculative Run-Time Parallelization of.. - Rauchwerger, Padua (1995)   (57 citations)  (Correct)

....presented by this case is the fact that the exact form of the RHS is not known statically. What is known, however, is the set of all possible RHS forms, which can be computed by following all potential paths in the control flow graph. A direct approach uses a gated static single assignment (GSSA) [5, 33] representation of the program. In such a representation, scalar variables are assigned only once. At the points of confluence of conditional branches a OE function of the form OE(B; X 1 ; X 2 ) is used (in the GSSA representation) to select one of the two possible definitions of a variable (X 1 ....

M. Wolfe. Optimizing Compilers for Supercomputers. The MIT Press, Boston, MA, 1989.


Vectorization Using Reversible Data Dependences - Tang, Gao   (Correct)

....This work was supported in part by the Australian Research Council under Grant No. A49232251 ii 1 Introduction Data dependences between statements have long been used by vectorizing and parallelizing compilers to detect parallelism and convert sequential programs into parallel forms [1, 2]. Two statement instances 1 are said to be data dependent if they access the same data element and at least one of the accesses is a write. The direction of the data dependence is determined by the order of sequential execution of the program. Enforcing all the data dependences between dependent ....

M. Wolfe, Optimizing Compilers for Supercomputers. Cambridge, MA, MIT Press, 1989.


Cache-Based Data Distribution Constrained Scheduling - Min, Nam, Park (1994)   (Correct)

....ffi 2 is caused by the dependence between B(i) in S 2 ) and B(i Gamma 1) in S 1 ) The process of deciding whether two given statements in a program are data dependent is called the data dependence analysis and has been a subject of much research since the mid seventies. The studies reported in [1, 6, 10] represent only a subset of such investigations. 2.2. Parallel programs. In order to exploit the parallelism available in a shared memory multiprocessor, programs may be written with explicit parallel constructs or conventional sequential programs may be transformed into equivalent parallel ones ....

M. Wolfe, Optimizing compilers for supercomputers, Tech. Rep. UIUCDCS-R82 -1105, Department of Computer Science, University of Illinois at UrbanaChampaign, October, 1982.


Compiler Optimizations For Parallel Loops With Fine-Grained.. - Chen (1994)   (5 citations)  (Correct)

....goal. It is called the Dependence Distance Vector Optimization and deserves further study. If some of the entries in the dependence distance vectors Equation (6. 7) are zeros, they correspond to DOALL loops and can be interchanged freely with other loop nests without violating dependence relations [Wol82] In parallelization, it is desirable to have DOALL loops at the outer levels so that the granularity of the tasks can be increased to amortize the overhead. Therefore, we should try to move all DOALL loops to the outermost levels. If these DOALL loops provide large enough parallelism, we need ....

....7.2 Background on Run Time Parallelization Issues In this section, we first present the basic concepts and then the previous work of runtime parallelization. 7.2.1 Loop Model and Basic Concept Consider a loop as shown in Figure 7. 1(a) In order to determine if there is any loop carried dependence [Wol82] between statements s p and s q across iterations and to compute its dependence distance, we have to solve the following integer equation f(i) g(i 0 ) DO i=1,N . s p : A(f(i) s q : A(g(i) ENDDO (a) DO i=1,N . s p : A(I1(i) s ....

M. J. Wolfe. Optimizing Compilers for Supercomputers. PhD thesis, University of Illinois at Urbana-Champaign, 1982.


Memory Latency Rediction via Data Prefetching and Data Forwarding .. - Poulsen (1994)   (Correct)

No context found.

M. J. Wolfe, "Optimizing compilers for supercomputers," Ph.D. dissertation, Department of Computer Science, University of Illinois, Urbana, IL, CSRD Report 329, October 1982.


The LRPD Test: Speculative Run-Time Parallelization of.. - Rauchwerger, Padua (1999)   (57 citations)  (Correct)

No context found.

# M. Wolfe, Optimizing Compilers for Supercomputers. Boston, Mass.: The MIT Press, 1989.


Speculative Parallelization of Partially Parallel Loops - Francis Dang Lawrence (2000)   (Correct)

No context found.

M. Wolfe. Optimizing Compilers for Supercomputers. The MIT Press, Boston, MA, 1989.


Run-Time Methods For Parallelizing Do Loops - Rauchwerger, Padua   (Correct)

No context found.

M. Wolfe, Optimizing Compilers for Supercomputers, The MIT Press, Boston, MA, 1989.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC