25 citations found. Retrieving documents...
Pohua P. Chang, Daniel M. Lavery, Scott A. Mahlke, William Y. Chen and Wenmei W. Hwu, \The Importance of Prepass Code Scheduling for Superscalar and Superpipelined Processors", IEEE Transactions on Computers, Vol. 44, No. 3, pp. 353-370, Mar. 1995

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Genetic Programming Applied to Compiler Heuristic.. - Stephenson, O'Reilly, .. (2003)   (3 citations)  (Correct)

....IMPACT, performs code profiling. Table 3 details the specific architecture over which we evolved. This model is similar to Intel s Itanium architecture. We enabled the fol lowing Trimaran compiler optimizations: function inlining, loop unrolling, backedge coalescing, acyclic global scheduling [5], modulo scheduling [20] hyperblock forma tion, register allocation, machine specific peephole optimization, and several other classic optimizations. We built a GP loop around Trimaran and internally modified IMPACT by replacing its predication priority function (Equation 1) with our GP ....

P. Chang, D. Lavery, S. Mahlke, W. Chen, and W. Hwu. The Importance of Prepass Code Scheduling for Superscalar and Superpipelined processors. In IEEE Transactions on Computers, volume 44, pages 353-370, March 1995.


Genetic Programming Applied to Compiler Heuristic.. - Stephenson, O'Reilly, .. (2003)   (3 citations)  (Correct)

....IMPACT, performs code pro ling. Table 3 details the speci c architecture over which we evolved. This model is similar to Intel s Itanium architecture. We enabled the following Trimaran compiler optimizations: function inlining, loop unrolling, backedge coalescing, acyclic global scheduling [5], modulo scheduling [20] hyperblock formation, register allocation, machine speci c peephole optimization, and several other classic optimizations. We built a GP loop around Trimaran and internally modi ed IMPACT by replacing its predication priority function (Equation 1) with our GP ....

P. Chang, D. Lavery, S. Mahlke, W. Chen, and W. Hwu. The Importance of Prepass Code Scheduling for Superscalar and Superpipelined processors. In IEEE Transactions on Computers, volume 44, pages 353-370, March 1995.


Issues in Instruction Scheduling - Schielke (1998)   (Correct)

....we would be happy with the code in Figure 13. However if that code causes too much spill code to be generated, the code in Figure 14 may be preferable. There are several different approaches for looking at this problem in the literature. Several authors have looked at the phase ordering problem [3, 6, 11, 22]. There have been several methods developed for integrating the register allocation and instruction scheduling phases. The amount of integration varies from making one phase sensitive to the needs of the other to completely combining the two phases. We will not try to categorize the level of ....

Pohua P. Chang, Daniel M. Lavery, Scott A. Mahlke, William Y. Chen, and Wen mei W. Hwu. The importance of prepass code scheduling for superscalar and superpipelined processors. IEEE Transactions on Computers, 44(3):353--370, March 1995.


General-Purpose Architecture Instruction Scheduling Techniques - De Sutter (1998)   (Correct)

....could conclude that a pre pass code scheduler that limits the register allocator preceding the actual scheduler is a good thing. While this is true for numeric applications, pre pass code scheduling has almost no advantage for control intensive applications. Chang et al. have demonstrated this in [29] and indicate at the same time that some extensions to modern architectures can increase possibilities for uncovering ILP and make a pre pass code scheduler useful and indeed important. This is further discussed in section 4.6. 4.1.3 Combining Register Allocation and Instruction Scheduling ....

Chang, P., Lavery, D., Mahlke, S., Chen, W., and Hwu, W.-M. The importance of prepass code scheduling for superscalar and superpipelined processors. IEEE Transactions on Computers 44, 3 (March 1995), 353-370. 36


The Predictability of Libraries - Calder, Grunwald, Srivastava (1995)   (Correct)

....prediction studies. i 1 Introduction Profile guided code optimizations have been shown to be effective by several researchers. Among these optimizations are basic block and procedure layout optimizations to improve cache and branch behavior [3, 10, 12] register allocation, and trace scheduling [5, 6, 8, 11]. The technique that all these optimizations have in common is that they use profiles from a previous run of a given program to predict the behavior of a future run of the same program. However, many researchers believe that collecting profile information is too costly or time consuming, and that ....

Pohua P. Chang, Daniel M. Lavery, Scott A. Mahlke, William Y. Chen, and Wen mei W. Hwu. The importance of prepass code scheduling for superscalar and superpipelined processors. IEEE Transactions on Computers, 44(3):353--370, 1995.


Split Point Selection and Recovery for Value Speculation.. - Fu, Knies, Conte   (Correct)

....to increase the effectiveness of value speculation. There are several major results presented in this paper. First, we describe a new ISA that is suitable for EPIC instruction sets and implement it in IA 64. Second, we provide a new recovery code generation scheme using tail duplication [10] [11], 12] to enable multiple predictions on dependence chains while keeping critical path length down. Third, we analyze our technique where profiling is used to identify good candidates for which the compiler inserts explicit predict and update instructions that use the best predictor for each ....

.... Algorithm The compiler used for these experiments is based on an experimental IA 64 compiler, with support added for the value prediction ISA and value speculation scheduling (VSS) In this study, value speculation scheduling is performed after pre pass scheduling and before post pass scheduling [11]. During pre pass scheduling the compiler performs global code motion to exploit ILP via control speculation and data speculation. VSS uses the pre pass schedule to find better split points to break flow dependencies. The value speculation scheduling algorithm utilizes the new PREDICT and UPDATE ....

[Article contains additional citation context not shown here]

P. P. Chang, D. M. Lavery, S. A. Mahlke, W. Y. Chen, and W. W. Hwu, "The Importance of Prepass Code Scheduling for Superscalar and Superpipelined Proessors," IEEE Transactions on Computers, vol. 44, No. 3, pp. 353-370, March 1995.


Systematic Compilation For Predicated Execution - August (2000)   (Correct)

....ow analysis is performed in a conservative manner. All code generation in the IMPACT compiler is performed at the Mcode level. The two largest components of code generation are the instruction scheduler and register allocator. Scheduling is performed via either acyclic global scheduling [57] [67] or software pipelining using modulo scheduling [12] For acyclic global scheduling, code scheduling is applied both before register allocation (prepass scheduling) and after register allocation 167 (postpass scheduling) to generate an ecient schedule. For software pipelining, loops targeted for ....

P. P. Chang, D. M. Lavery, S. A. Mahlke, W. Y. Chen, and W. W. Hwu, \The importance of prepass code scheduling for superscalar and superpipelined processors," IEEE Transactions on Computers, vol. 44, pp. 353-370, March 1995.


Compiler Support For Sparc Architecture Processors - Roland Ouellette Massachusetts (1994)   (6 citations)  (Correct)

....[5] shows how use profile information may be used to make traditional code optimizations more effective. Reference [6] is a technical report containing a more thorough treatment of material of [5] Reference [7] describes control flow optimizations which the IMPACT compiler also used. Reference [8] is a technical report showing the advantages of scheduling code prior to register allocation. Reference [9] shows the advantages of scheduling superblocks especially on superpipelined superscalar processors. Reference [10] shows the importance of function inlining in compiling C programs. ....

P. P. Chang, D. M. Lavery, and W. W. Hwu, "The importance of prepass code scheduling for superscalar and superpipelined processors," Tech. Rep. CRHC-91-18, Center for Reliable and High-Performance Computing, University of Illinois, Urbana, IL, May 1991.


Enhancing Instruction Level Parallelism Through.. - Bringmann (1995)   (5 citations)  (Correct)

....further exploit predicated execution support are available. All code generation in the IMPACT compiler is performed at the Lcode level. The two largest components of code generation are the instruction scheduler and register allocator. Scheduling is performed via either acyclic global scheduling [16, 17] or software pipelining using modulo scheduling [18] For the acyclic global scheduling, code scheduling is applied both 8 before register allocation (prepass scheduling) and after register allocation (postpass scheduling) to generate an efficient schedule. For software pipelining, loops targeted ....

P. P. Chang, D. M. Lavery, and W. W. Hwu, "The importance of prepass code scheduling for superscalar and superpipelined processors," Tech. Rep. CRHC-91-18, Center for Reliable and High-Performance Computing, University of Illinois, Urbana, IL, May 1991.


The Predictability of Branches in Libraries - Calder, Grunwald, Srivastava (1995)   (14 citations)  (Correct)

....of the same material. 1 Introduction Profile guided code optimizations have been shown to be effective by several researchers. Among these optimizations are basic block and procedure layout optimizations to improve cache and branch behavior [3, 10, 12] register allocation, and trace scheduling [5, 6, 11, 7]. The technique that all these optimizations have in common is that they use profiles from a previous run of a given program to predict the behavior of a future run of the same program. However, many researchers believe that collecting profile information is too costly or time consuming, and that ....

Pohua P. Chang, Daniel M. Lavery, Scott A. Mahlke, William Y. Chen, and Wen mei W. Hwu. The importance of prepass code scheduling for superscalar and superpipelined processors. IEEE Transactions on Computers, 44(3):353--370, 1995.


Resource Assignment in a Compiler for Transport Triggered .. - Hoogerbrugge, Corporaal (1996)   (Correct)

....in more detail. 3.1 Register and Register File Assignment Register and RF assignments to pseudo registers are made before scheduling. Making them during scheduling as proposed in [9] increases the engineering complexity significantly. Performing these assignments after scheduling as proposed in [3] (1) increases the register requirement, 2) is difficult in combination with predicated execution since predicated execution complicates live variable analysis required for register allocation [15] and (3) insertion of spill code into parallel (i.e. scheduled) code requires a postpass ....

CHANG, P. P., LAVERY, D. M., MAHLKE, S. A., CHEN, W. Y., AND HWU, W. W. The Importance of Prepass Code Scheduling for Superscalar and Superpiplined Processors. IEEE Transactions on Computers 44, 3 (March 1995), 353--370.


Global Instruction Scheduling In Machine SUIF - Gang Chen (1997)   (2 citations)  (Correct)

....those in the backend, to make it very easy to reorder and repeat passes. We currently run instruction scheduling after register allocation, however, as Figure 1 shows, we could run instruction scheduling both as a pass before and as a pass after register allocation, as suggested by Hwu et al. [Chan95]. Structurally, the only difference between the pre and post scheduling passes is the heuristics used to drive the selection of instructions (e.g. pre pass schedulers typically try to perform code motions without increasing register pressure) Global instruction scheduling consists of several ....

P. Chang, et al. "The Importance of Prepass Code Scheduling for Superscalar and Superpipelined Processor," IEEE Trans. on Computers, 44(3):353--370, Mar. 1995.


Dynamic Control Of Compile Time Using Vertical Region-Based.. - Braun   Self-citation (Hwu)   (Correct)

....the code generators are the register allocator and the instruction scheduler. These modules are common to all of the code generators in IMPACT. Register allocation is performed using graph coloring [15] 2] Several different code scheduling models exist, including acyclic global scheduling [16] [17], software pipelining using modulo scheduling [18] 19] and sentinel scheduling [20] A detailed machine description database, Mdes, is referenced throughout the compilation process by various IMPACT modules [21] This database contains information such as the number and type of functional ....

P. P. Chang, D. M. Lavery, S. A. Mahlke, W. Y. Chen, and W. W. Hwu, "The importance of prepass code scheduling for superscalar and superpipelined processors," IEEE Transactions on Computers, vol. 44, pp. 353--370, March 1995.


Emulation Of The Intermediate Representation In The Impact Compiler - Olaniran   Self-citation (Hwu)   (Correct)

....architectures [7] The most actively supported architectures are the Sun SPARC, the HP PA RISC, and the Intel X86. The two main components of code generation are the instruction scheduler and the register allocator [15] Several scheduling models exist, including acyclic global scheduling [16] [17], sentinel scheduling [18] and software pipelining using modulo scheduling [19] The IMPACT and HPL Playdoh [20] architectures, two experimental instruction level parallelism (ILP) architectures, are also supported. These experimental architectures provide the necessary framework for advanced ....

P. P. Chang, D. M. Lavery, S. A. Mahlke, W. Y. Chen, and W. W. Hwu, "The importance of prepass code scheduling for superscalar and superpipelined processors," IEEE Transactions on Computers, vol. 44, pp. 353--370, March 1995.


Structural And Static Analysis Techniques For Enhancing Compiler.. - Crozier (1999)   Self-citation (Hwu)   (Correct)

....predicated execution. IMPACT performs code generation at the Lcode level after block formation and optimization is complete. The two main components of code generation are the instruction scheduler and the register allocator. IMPACT can schedule code using either acyclic global scheduling [41] [42] or software pipelining using modulo scheduling [43] Acyclic global scheduling involves two passes of the scheduler. Prepass scheduling is performed before register allocation, and postpass 18 scheduling is performed after register allocation to generate the most efficient schedule. For modulo ....

P. P. Chang, D. M. Lavery, S. A. Mahlke, W. Y. Chen, and W. W. Hwu, "The importance of prepass code scheduling for superscalar and superpipelined processors," IEEE Transactions on Computers, vol. 44, no. 3, pp. 353--370, March 1995.


Memory Disambiguation To Facilitate Instruction-Level.. - Gallagher (1995)   (17 citations)  Self-citation (Hwu)   (Correct)

....into the target architecture s assembly language. Two of the most significant components of code generation are the instruction scheduler and register allocator, both of which are common modules shared by all code generators. Scheduling is performed via either global acyclic scheduling [22] [24] or software pipelining [17] 18] Global acyclic scheduling is applied both before register allocation (prepass scheduling) and after register allocation (postpass scheduling) to generate an efficient schedule. Loops targeted for software pipelining are identified and marked at the Pcode level. ....

P. P. Chang, D. M. Lavery, and W. W. Hwu, "The importance of prepass code scheduling for superscalar and superpipelined processors," Tech. Rep. CRHC-91-18, Center for Reliable and High-Performance Computing, University of Illinois, Urbana, IL, May 1991.


Data Dependence Analysis For Fortran Programs In The Impact Compiler - Haab (1995)   (7 citations)  Self-citation (Chang Lavery Hwu)   (Correct)

....classic optimizations are applied [26] Superblock [27] and hyperblock [28] compilation techniques are also performed using the Lcode IR. 27 All code generation in the IMPACT compiler is also performed using the Lcode module. Scheduling is performed via either acyclic global scheduling [29], 30] or software pipelining using modulo scheduling [31] Graph coloring based register allocation is utilized for all target architectures [32] In addition, for each target architecture, a set of specially tailored peephole optimizations are performed. A detailed machine description database, ....

P. P. Chang, D. M. Lavery, and W. W. Hwu, "The importance of prepass code scheduling for superscalar and superpipelined processors," Tech. Rep. CRHC-91-18, Center for Reliable and High-Performance Computing, University of Illinois, Urbana, IL, May 1991.


Memory Profiling For Directing Data Speculative Optimizations And .. - Connors (1997)   (1 citation)  Self-citation (Lavery Mahlke Hwu)   (Correct)

....architectures. The most actively supported architectures are the Sun SPARC, the HP PA RISC, and the Intel X86. The two main components of code generation are the instruction scheduler and the register allocator [10] Several scheduling models exist, including acyclic global scheduling [11] [12], sentinel scheduling [13] and software pipelining using modulo scheduling [14] In addition, a scheduling technique capable of exploiting architectural support for MCB data speculation exists [2] 3] 15] The focus of this thesis is to obtain memory profile information for developing a more ....

P. P. Chang, D. M. Lavery, S. A. Mahlke, W. Y. Chen, and W. W. Hwu, "The importance of prepass code scheduling for superscalar and superpipelined processors," IEEE Transactions on Computers, vol. 44, March 1995, pp. 353--370.


Three Superblock Scheduling Models for Superscalar and.. - Pohua Chang Nancy (1991)   (6 citations)  Self-citation (Chang Hwu)   (Correct)

....3. dependence graph generation, and 4. list scheduling. Steps 3 and 4 are used for both prepass and postpass code scheduling. Prepass code scheduling is performed prior to register allocation to reduce the effect of artificial data dependencies that are introduced by register assignment [10] [6]. Postpass code scheduling is performed after register Technical Report CRHC 91 29, University of Illinois 4 avg = 0; weight = 0; count = 0; while(ptr = NIL) count = count 1; if(count = 0) avg = weight count; if(ptr wt 0) weight = weight ptr wt; else weight = weight ....

....together. The general idea of the list scheduling algorithm is to pick, from a set of nodes (instructions) that are ready to be scheduled, the best combination of nodes to issue in a cycle. The best combination of nodes is determined by using heuristics which assign priorities to the ready nodes[6]. A node is Technical Report CRHC 91 29, University of Illinois 10 ready if all of its parents in the dependence graph have been scheduled and the result produced by each parent is available. If the number of dependencies are reduced, a more efficient code schedule can be found. Of the data ....

P. P. Chang, D. M. Lavery, and W. W. Hwu, "The Importance of Prepass Code Scheduling for Superscalar and Superpipelined Processors," Center for Reliable and High-Performance Computing Technical Report, University of Illinois at UrbanaChampaign, May, 1991.


Hyperblock Performance Optimizations For ILP Processors - August (1996)   (1 citation)  Self-citation (Mahlke Hwu)   (Correct)

....enhancing this set of hyperblockspecific optimizations. Code generation in the IMPACT compiler is performed at the Lcode level. The two largest components of code generation are the instruction scheduler and register allocator. Scheduling is performed via either acyclic global scheduling [24] [25] or software pipelining using modulo scheduling [26] For the acyclic global scheduling, code scheduling is applied both before register allocation (prepass scheduling) and after register allocation (postpass scheduling) to generate an efficient schedule. For software pipelining, loops targeted ....

P. P. Chang, D. M. Lavery, S. A. Mahlke, W. Y. Chen, and W. W. Hwu, "The importance of prepass code scheduling for superscalar and superpipelined processors," IEEE Transactions on Computers, vol. 44, no. 3, pp. 353--370, March 1995.


Condition Awareness Support For Predicate Analysis And Optimization - Sias (1999)   (1 citation)  Self-citation (Hwu)   (Correct)

....regarding the applicability of transformations. As compilation advances, particularly in the scheduler and register allocator, the compiler relies more heavily on the Mdes to generate code appropriate for the target. Code is scheduled using either an acyclic global scheduling technique [21] [22] or modulo scheduling [23] Both models support control and data speculation for aggressive enrichment of ILP [10] 24] 25] Register allocation, sandwiched between a prepass and a postpass schedule in the acyclic model, is performed using a graph coloring approach [26] Since predication is ....

P. P. Chang, D. M. Lavery, S. A. Mahlke, W. Y. Chen, and W. W. Hwu, "The importance of prepass code scheduling for superscalar and superpipelined processors," IEEE Transactions on Computers, vol. 44, no. 3, pp. 353--370, March 1995.


Three Architectural Models for Compiler-Controlled.. - Chang, Warter.. (1995)   (11 citations)  Self-citation (Chang Hwu)   (Correct)

....data dependences that are avg = 0; weight = 0; count = 0; while(ptr = NIL) count = count 1; if(count = 0) avg = weight count; if(ptr wt 0) weight = weight ptr wt; else weight = weight ptr wt; ptr = ptr next; Figure 1: C code segment. introduced by register assignment [19][20]. Postpass code scheduling is performed after register allocation. The C code segment in Figure 1 will be used in this paper to illustrate the superblock scheduling algorithm. Compiling the C code segment for a load store architecture produces the assembly language shown in Figure 2. The assembly ....

....together. The general idea of the list scheduling algorithm is to pick, from a set of nodes (instructions) that are ready to be scheduled, the best combination of nodes to issue in a cycle. The best combination of nodes is determined by using heuristics which assign priorities to the ready nodes [20]. A node is ready if all of its parents in the dependence graph have been scheduled and the result produced by each parent is available. If the number of dependences are reduced, a more efficient code schedule can be found. Of the data dependences, only the flow dependences are true dependences. ....

P. P. Chang, D. M. Lavery, and W. W. Hwu, "The importance of prepass code scheduling for superscalar and superpipelined processors," Tech. Rep. CRHC-91-18, Center for Reliable and High-Performance Computing, University of Illinois, Urbana, IL, May 1991.


Unrolling-Based Optimizations for Modulo Scheduling - Lavery, Hwu (1995)   (17 citations)  Self-citation (Lavery Hwu)   (Correct)

....RecMII if it is already less than the ResMII and vice versa. As the ResMII and the RecMII are reduced, the number of physical processor registers can impose a third constraint on the MII. As more parallelism is exploited, more simultaneously live values are generated, requiring more registers [24, 25]. If more simultaneously live values exist than physical registers, spill code must be added and can significantly increase the achieved II of the loop. In this case, it may be possible to achieve a better final II by increasing the candidate II and attempting to schedule the original loop body ....

P. P. Chang, D. M. Lavery, S. A. Mahlke, W. Y. Chen, and W. W. Hwu, "The importance of prepass code scheduling for superscalar and superpipelined processors, " IEEE Transactions on Computers, vol. 44, pp. 353--370, March 1995.


Scalar Program Performance on Multiple-Instruction-Issue .. - Mahlke, Chen, Chang, Hwu (1992)   (10 citations)  Self-citation (Chang Hwu)   (Correct)

....than simple postpass scheduling, and that with the integrated strategy little performance gains result from register files larger than 32 registers. Chang, Lavery, and Hwu show the effectiveness of prepass code scheduling for integer scalar programs on multiple instruction issue processors [4]. There has also been significant research evaluating other architectural features for multiple instruction issue processors. Butler et al. investigated the exploitable parallelism under varying hardware constraints including the dynamic window size, the number of functional units, and the branch ....

....are introduced when registers are recycled. For smaller register files, recycling occurs frequently, thereby significantly increasing the number of data dependencies. The number of additional data dependencies introduced can be reduced if code scheduling is performed before register allocation [2] [4] [3] However, one should expect the performance of higher issue rate processors to suffer more performance loss when there are a large number of data dependencies. 1 For many of these processors, register allocators can often utilize parameter registers between function calls and several of the ....

[Article contains additional citation context not shown here]

P. P. Chang, D. M. Lavery, and W. W. Hwu, "The importance of prepass code scheduling for superscalar and superpipelined processors," Tech. Rep. CRHC-91-18, Center for Reliable and HighPerformance Computing, University of Illinois, Urbana, IL, May 1991.


Analysis of Profiling Information for Cache Sensitive Scheduling - Lindenmaier (1999)   (Correct)

No context found.

Pohua P. Chang, Daniel M. Lavery, Scott A. Mahlke, William Y. Chen and Wenmei W. Hwu, \The Importance of Prepass Code Scheduling for Superscalar and Superpipelined Processors", IEEE Transactions on Computers, Vol. 44, No. 3, pp. 353-370, Mar. 1995

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC