| S. Moon and K. Ebcioglu. Parallelizing nonnumerical code with selective scheduling and software pipelining. ACM Transactions on Programming Languages and Systems, 19(6):853--898, November 1997. |
....page zero as readable and is provided as an user option. Another example of directed speculation is to move a load (e.g. lw r x ; k1(r z ) from a source to the destination block containing lw r y ; k2(r z ) if it can be detected that register r z is not modified in between. Circular scheduling [5, 10], is a loop scheduling scheme by which instructions from the top portion of the loop body are moved to the bottom portion of the loop body. This can improve performance if, 1) the instructions in the top portion form a critical set of instructions and by moving them can remove unnecessary stalls, ....
S. Moon and K. Ebcioglu. Parallelizing Non-Numerical Code with Selective Scheduling and Software Pipelining. ACM Transactions on Programming Languages and Systems, 19(6):853--898, 1997.
....techniques from moving instructions to unused instruction slots. Whether a conditional branch is taken or not cannot be decided during compile time. Global scheduling techniques as for instance trace Scheduling [6] PDG Scheduling [1] Dominator path Scheduling [17] or Selective Scheduling [10] use various approaches to overcome this problem. However, investigations have shown that the conditions that must be fulfilled to safely move an instruction across the basic block boundaries are very restrictive. Therefore the scheduling algorithms fail to gain enough ILP for a good utilisation ....
S. M. Moon and K. Ebcioglu. Parallelizing nonnumerical code with selective scheduling and software pipelining. ACM Transactions on Programming Languages and Systems, 19(6):853--898, 1997.
.... information to accomplish PSSA (a predicate sensitive form of SSA [13, 12] which enables Predicated Speculation and Control Height Reduction for hyperblocks that have previously been examined only in the presence of the single path of control found in superblocks [26, 27, 28] Moon and Ebcioglu [23] have implemented selective scheduling algorithms, which can schedule operations at their earliest possible cycle for non predicated code. Our work extends theirs for predicated code, by allowing earliest possible cycle scheduling using predicated renaming with full path predicates. 8 ....
S. Moon and K. Ebcioglu. Parallelizing nonnumerical code with selective scheduling and software pipelining. ACM Transactions on Programming Languages and Systems, 19(6):853--898, November 1997.
....unit) Conditional branches seriously prevent the scheduling techniques from moving instructions to unused instruction slots. Whether a conditional branch is taken or not cannot be decided during compile time. Global scheduling techniques as for instance PDG Scheduling [1] or Selective Scheduling [13] use various approaches to move instructions across branches to execute these instructions speculatively. The increase of performance gained by these speculative extensions is limited either by the rollback overhead of misprediction or by restrictions to speculative code motion. S 3 can be ....
S. M. Moon and K. Ebcioglu. Parallelizing nonnumerical code with selective scheduling and software pipelining. ACM Transactions on Programming Languages and Systems, 19(6):853--898, 1997.
.... information to accomplish PSSA (a predicate sensitive form of SSA [12, 11] which enables Predicated Speculation and Control Height Reduction for hyperblocks that have previously been examined only in the presence of the single path of control found in superblocks [23, 24, 25] Moon and Ebcioglu [20] have implemented selective scheduling algorithms, which can schedule operations at their earliest possible cycle for non predicated code. Our work extends theirs for predicated code, by allowing earliest possible cycle scheduling using predicated renaming with full path predicates. 7 Implementing ....
S. Moon and K. Ebcioglu. Parallelizing nonnumerical code with selective scheduling and software pipelining. ACM Transactions on Programming Languages and Systems, 19(6):853--898, November 1997.
....scheduling techniques from moving instructions to unused instructions slots. Whether a conditional branch is taken or not cannot be decided during compile time. Scheduling techniques as for instance Trace Scheduling [6] PDG Scheduling [1] Dominator path Scheduling [19] or Selective Scheduling [13] use various approaches to overcome this problem. However, investigations have shown that the conditions that must be fulfilled to safely move an instruction across the basic block boundaries are very restrictive. Therefore the scheduling algorithms fail to gain enough ILP for a good utilization ....
S. M. Moon and K. Ebcioglu. Parallelizing nonnumerical code with selective scheduling and software pipelining. ACM Transactions on Programming Languages and Systems, 19(6):853--898, 1997.
No context found.
S.-M. Moon and K. Ebcio glu. Parallelizing Non-numerical Code with Selective Scheduling and Software Pipelining. ACM Transactions on Programming Languages and Systems, 19(6):853--898, 1997.
.... one is also indispensable for time optimal execution, since it enables to avoid output dependence of store operations which belong to different execution paths of a parallel instruction as pointed out by Aiken et al. 14] As a specific example architecture, we use the tree VLIW architecture model [3, 15], which satisfies the architectural requirements described above. In this architecture, a parallel VLIW instruction, called a tree instruction, is represented by a binary decision tree as shown in Fig. 1. A tree instruction can execute simultaneously ALU and memory operations as well as branch ....
....regions. The operations belonging to the same group (i.e. the same shaded region) are executed in parallel. A parallel tree VLIW program can be easily converted into the parallel program in the extended sequential representation with some local transformation on copy operations, and vice versa [15]. 3.3 Basic Terminology A program 1 is represented as a triple h G = N#E) # O #ffii. This representation is due to Aiken et al. 14] The body of the program is a CFG G which consists of a set of nodes N and a set of directed edges E. Nodes in N are categorized into assignment nodes that read ....
[Article contains additional citation context not shown here]
S.-M. Moon and K. Ebcioglu. Parallelizing Non-numerical Code with Selective Scheduling and Software Pipelining. ACM Transactions on Programming Languages and Systems, pages 853--898, 1997.
....code (without register windows) by the gcc compiler. The SPARC based assembly code is scheduled into highperformance VLIW code targeting a tree VLIW architecture [11] by an aggressive scheduling compiler based on software pipelining, all path code motion (without branch probability) and renaming [12]. The final VLIW code is simulated, producing results and statistics. Our benchmarks are composed of eight non trivial integer programs listed in Table 1. We assumed a load latency of two and unit latencies for others. We also assumed perfect instruction and data cache. Benchmarks Lines ....
S.-M. Moon and K. Ebcioglu. Parallelizing Nonnumerical Code with Selective Scheduling and Software Pipelining. ACM Transactions on Programming Languages and Systems, 19(6):853--898, Dec, 1997. 10
....copy related nodes) with a single register if they do not interfere. On the interference graph, this is achieved by coalescing the two nodes into a single node, with their interference edges being unioned. Since many optimization phases before register allocation, including instruction scheduling [4], store tocopy promotion [5] and static single assignment (SSA) translation [6] leave behind many copies that would slow down program execution, it is essential to minimize these copies. Coalescing, on the other hand, may affect the colorability of the interference graph. Since a coalesced node ....
....on a SPARC based VLIW testbed [16] The input C code is compiled to optimized SPARC assembly code (without register windows) by the gcc compiler. The SPARC based assembly code is scheduled to high performance VLIW code by a software pipelining technique called enhanced pipeline scheduling (EPS) [4]. The final VLIW code is simulated, producing execution results. Our benchmarks are composed of seven non trivial integer programs listed in Table 1. The resource constraint of the VLIW machine is 16 ALUs and 8 way branching. The machine is assumed to have 32 general purpose registers, 16 ....
S.-M. Moon and K. Ebcioglu. Parallelizing nonnumerical code with selective scheduling and software pipelining. ACM TOPLAS, Vol 19, No. 6, pages 853--898, Nov. 1997.
....nodes) with a single register if they do not interfere. On the interference graph, this is implemented by coalescing the two nodes into a single node, with their interference edges being unioned. Since many optimization phases before the register allocation including instruction scheduling [4], store to copy promotion [5] and static single assignment (SSA) translation [6] leave behind many copies that slow down program execution, it is essential to minimize those copies. Coalescing may affect the colorability of the interference graph. Since a coalesced node will have the union of ....
....[13] The input C code is compiled into optimized SPARC assembly code (without register windows) by the gcc compiler. The SPARC based assembly code is scheduled into highperformance VLIW code by aggressive scheduling techniques such as software pipelining, all path code motion, and renaming [4]. The final VLIW code is simulated, producing execution results. Our benchmarks are composed of seven non trivial integer programs such as eqntott, espresso, li, compress, yacc, sed, and gzip. The resource constraint of the VLIW machine is 16 ALUs and 8 way branching. The machine has 32 ....
S.-M. Moon and K. Ebcioglu. Parallelizing nonnumerical code with selective scheduling and software pipelining. ACM TOPLAS, Vol 19, No. 6, pages 853--898, Nov. 1997.
No context found.
S. Moon and K. Ebcioglu. Parallelizing nonnumerical code with selective scheduling and software pipelining. ACM Transactions on Programming Languages and Systems, 19(6):853--898, November 1997.
No context found.
S. Moon and K. Ebcioglu. Parallelizing nonnumerical code with selective scheduling and software pipelining. ACM Transactions on Programming Languages and Systems, 19(6):853--898, November 1997.
No context found.
S.-M. Moon and K. Ebcioglu. Parallelizing Non-numerical Code with Selective Scheduling and Software Pipelining. ACM Transactions on Programming Languages and Systems, pages 853--898, 1997.
No context found.
S.-M. Moon and K. Ebcio glu. Parallelizing Non-numerical Code with Selective Scheduling and Software Pipelining. ACM Transactions on Programming Languages and Systems, 19(6):853--898, 1997.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC