| D. Pnevmatikatos and G. Sohi. Guarded execution and branch prediction in dynamic ilp processors. In Proceedings of the 21st International Symposium on Computer Architecture, pages 120--129, 1994. |
....Both these factors severely limit the the instruction level parallelism that can be exploited by superscalar processors. It is clear that to overcome the above bottleneck, either (i) the num ber of branches encountered by the hardware has to be reduced (through techniques such as guarding [6] [11]) or (ii) the target information has to be associated with higher level blocks (as opposed to associating them with the branches) and the outcomes of multiple branches need to be predicted in a cycle. 143 This paper focuses on the second approach. The approach considered in [10] called control ....
D. N. Pnevmatikatos and G. S. Sohi, "Guarded Execution and Branch Prediction in Dynamic ILP Processors," Proceedings of the 21st International Symposium on Computer Architecture, 1994.
....one more complication that should be resolved for successfully filling an instruction of wd for the delay slot of the indirect jump. The RTLs shown in 52 Figure 6.11(a) depict the restructured code after branch coalescing transformation occurs for the example C code. Note that hardware register r[24] contains the temporary value of wd. It appears that r[24] r[24] 1 cannot be filled for the delay slot, since r[24] is both set and referenced among targets of the indirect jump. However, r[24] r[24] 1 can be filled for the following reasons: ffl r[24] r[24] 1 has no dependency with other ....
....filling an instruction of wd for the delay slot of the indirect jump. The RTLs shown in 52 Figure 6.11(a) depict the restructured code after branch coalescing transformation occurs for the example C code. Note that hardware register r[24] contains the temporary value of wd. It appears that r[24]=r[24] 1 cannot be filled for the delay slot, since r[24] is both set and referenced among targets of the indirect jump. However, r[24] r[24] 1 can be filled for the following reasons: ffl r[24] r[24] 1 has no dependency with other instructions within the indirect jump target block containing the ....
[Article contains additional citation context not shown here]
D. N. Pnevmatikatos and G. S. Sohi. Guarded execution and branch prediction in dynamic ILP processors. In Proceedings of the 21th International Symposium on Computuer Architecture, pages 120--129, April 1994.
.... be combined with other approaches to speculative execution like [15] and[12] and also integrates nicely with architectural support for conditional (guarded, predicated) execution of operations [10] which both of the above VLIW DSPs provide and which can be efficiently supported by compilers [11, 4]. We will first explain how MPP allows to minimize the average loop execution time, then we will show how to apply the new speculation strategy to the VLIW DSPs as a source level transformation strategy. The results show that a significant reduction in memory size and or a decrease in execution ....
D. N. Pnevmatikatos and G. S. Sohi. Guarded execution and branch prediction in dynamic ILP processors. In ISCA, pages 120--129, 1994.
....from modifying the processor state. 7 The removal of branches yields performance bene ts in the execution of the ILP code, the most notable of which is the elimination of branch misprediction penalties. In particular, the removal of frequently mispredicted branches yields large performance gains [13], 14] 15] Predicated execution also provides an ecient mechanism by which a compiler can explicitly present the overlapping execution of multiple control paths to the hardware. In this manner, processor performance is increased by the compiler s ability to nd ILP across distant multiple ....
D. N. Pnevmatikatos and G. S. Sohi, \Guarded execution and branch prediction in dynamic ILP processors," in Proceedings of the 21st International Symposium on Computer Architecture, April 1994, pp. 120-129.
....values are referenced, losing some or all of the benefits of control independence [103,113] This is only true for designs without selective reissuing capability, e.g. large threads may preclude being selective. In a sense, this approach to control independence more closely resembles guarding [3,76], which converts control dependences into data dependences. This is not always the case: Vijaykumar, Breach, and Sohi [113] study several register forwarding strategies, including speculative register forwarding guided by (compiler) profiling to reduce the frequency of task squashes. The more ....
....managed in a fifo queue so CGCI is not explicitly exploited. A more sophisticated treatment of trace processor control independence is presented in [86] and is the direct basis for the control independence concepts presented in this thesis. 2.4. 5 Predication and multi path execution Predication [3,76,53,4] and selective multi path execution [111,34,110,44,116,1] attempt to identify hard to predict branches, either through profiling or branch confidence estimators (respectively) and fetch both paths of these branches. In the case of multi path execution, both paths are fully renamed and executed as ....
D. Pnevmatikatos and G. Sohi. Guarded Execution and Branch Prediction in Dynamic ILP Processors. 21st International Symposium on Computer Architecture, April 1994.
....when incorrect values are referenced, losing some or all of the benefits of control independence. This is only true for designs without selective reissuing capability, e.g. large threads may preclude being selective. In a sense, this approach to control independence more closely resembles guarding [36,37,8,9], which shifts the problem of control flow to data flow. But clearly these are not fundamental restrictions [38] conservatism reflects a simpler and perhaps more practical design. B.3 Other misprediction tolerant solutions B.3.1 Instruction reuse Instruction reuse [18] is a mechanism that ....
....repair models than those proposed in this report are possible. However, re fetching may be necessary for maintaining high prediction accuracy this was discussed in Appendix A.3.2 in terms of the need for re predict sequences. B.3. 2 Predication and selective multi path execution Predication [36,37,8,9] and selective multi path execution [2,3,4,5,6,7] attempt to identify hard to predict branches, either through profiling or branch confidence estimators (respectively) and fetch both paths of these branches. In the case of multi path execution, both paths are fully renamed and executed as ....
D. Pnevmatikatos and G. Sohi. Guarded execution and branch prediction in dynamic ilp processors. 21st Intl. Symp. on Computer Architecture, April 1994.
....set of instructions. The most notable modification of predication to the instruction set encoding format is the addition of the predicate operand source for every instruction. The predicate operand increases the instruction size and has significant effects on overall program code size. One model [10] proposes a new set of predicate guarding instructions that would reduce the drawback of existing methods of specifying predicated execution through the use of predicate mask setting instructions. Although the mechanism is useful in reducing the predicate operand overhead, the general mechanism ....
D. N. Pnevmatikatos and G. S. Sohi. Guarded execution and branch prediction in dynamic ILP processors. In Proceedings of the 21st International Symposium on Computer Architecture, pages 120--129, April 1994.
....predicates with the predicated instructions. One approach is to make the predicate an operand on the predicated instruction [11] Another approach is to have some kind of mask register which holds the predicates and associates the predicates with the instructions based on instruction arrangement [12]. The first approach will be referred to as operand specified predicates (OSP) and the second approach will be referred to as mask specified predicates (MSP) OSP is the more flexible of the two approaches. This approach requires a register to hold the predicate and a source operand on the ....
D. N. Pnevmatikatos and G. S. Sohi, "Guarded execution and branch prediction in dynamic ILP processors," in Proceedings of the 21st International Symposium on Computer Architecture, pp. 120--129, April 1994.
....may result from branches is becoming increasingly critical as microprocessor designs implement greater degrees of instruction level parallelism. There are several techniques for reducing branch penalties including guarded execution, basic block enlargement, and static and dynamic branch prediction [12, 7, 14, 4, 19, 11]. Among these, dynamic branch prediction is perhaps the most popular, because it yields good results and can be implemented without changes to the instruction set architecture or pre existing binaries. The strength of dynamic branch prediction is that it can track branch behavior closely at ....
Pnevmatikatos, D.N. and Sohi, G.S. Guarded Execution and Branch Prediction in Dynamic ILP Processors. Proc. of the 21st Ann. Int. Symp. on Computer Architecture, Apr. 1994, pp. 120-129.
....skip instructions improve application performance from 1 to 4 . This assumes that their compiler can make optimal use of these instructions, which it cannot do at this point. Currently, conditional skips can only be found in specialized, hand coded assembly routines. Pnevmatikatos and Sohi [6] present a method for the full support of predicated execution by only making a small extension to an existing instruction set architecture. They suggest using a GUARD instruction whose operands are a predicate and a predicate mask specifying which of the following instructions are guarded by that ....
....instruction set, 8 GUARD TRUE and GUARD FALSE instructions will be able to predicate about 20 instructions while the GUARD BOTH instruction will only be able to guard 12. But, they show that the GUARD BOTH introduces less overhead than the separate form of the instruction. A few studies [5][6][7] on the effects of predicated execution on branch prediction have been performed. Mahlke et al. [5] examine the effects of hyperblock techniques on the number of branches and mispredictions on three branch prediction schemes: a 1K entry branch target buffer (BTB) with 2 bit saturating counters, ....
[Article contains additional citation context not shown here]
D. N. Pnevmatikatos and G. S. Sohi. "Guarded Execution and Branch Prediction in Dynamic ILP Processors." In Proceedings of the 21st Annual Symposium on Computer Architecture. pp. 120-129. April 1994.
....field for each instruction, as was done in the Cydra 5. Another way is to introduce a special instruction which controls the conditional execution of following (non predicated) instructions. An example of this approach is seen in the guarded execution model proposed by Pnevmatikatos and Sohi [PnSo94], which includes special instructions whose execution specify whether following instructions should or should not be executed. This section will present 4 different existing predication models, those used in the Alpha processor, the HP RISC, guarded execution model and the Cydra 5. The Alpha ....
D. N. Pnevmatikatos and G. S. Sohi, "Guarded Execution and Branch Prediction in Dynamic ILP Processors", Proceedings of the 21th Annual Symposium on Computer Architecture, Chicago, Illinois (April 18-21, 1994), pp. 120-129.
....With specialized BTB support for indirect jumps [Chang et al. 1997] even better results should be obtained. Some machines provide other special architectural support for speculative execution of instructions dependent on branches, such as boosting [Smith et al. 1990] and predicted execution [Pnevmatikatos and Sohi 1994; Mahlke et al. 1994] The relative cost of an indirect jump versus the set of branches it replaces will be affected by such support. The compiler writer must use appropriate cost estimates based on the architectural support available for branches and indirect jumps on the target machine. An ....
Pnevmatikatos, D. N. and Sohi, G. S. 1994. Guarded execution and branch prediction in dynamic ILP processors. In Proceedings of the 21th International Symposium on Computuer Architecture, pp. 120--129.
....allows the fetch engine to fetch straight line code. Recent work has characterized how extensively predication should be used [60] explored techniques for better analysis of predication opportunities [40] and explored the relationship between predication and branch prediction e#cacy [59, 72, 101]. Compiler analysis also permits static branch prediction. When combined with profiling and or branch history information like the history that dynamic two level CHAPTER 1. INTRODUCTION 15 predictors use, the compiler can often do an e#ective job of statically predicting branch outcomes, as ....
D. N. Pnevmatikatos and G. S. Sohi. Guarded execution and branch prediction in dynamic ILP processors. In Proceedings of the 21st Annual International Symposium on Computer Architecture, pages 120--29, Apr. 1994.
.... predicate defining instructions and predicating the original instructions appropriately [12] 13] 7] By eliminating branches, if conversion may lead to a substantial reduction in branch prediction misses and a reduced need to handle multiple branches per cycle for wide issue processors [14][15][16] Even though predicated execution has been shown to greatly improve the branch characteristics of programs, many situations still exist in which branches remain problematic. In fact, the removal of some branches by ifconversion may adversely affect the predictability of other remaining ....
D. N. Pnevmatikatos and G. S. Sohi, "Guarded execution and branch prediction in dynamic ILP processors, " in Proceedings of the 21st International Symposium on Computer Architecture, pp. 120--129, April 1994.
....gets moved. Static speculation has been applied in scheduling for superscalar, VLIW and superpipelined machines. Techniques like global scheduling [15] percolation scheduling [76] trace scheduling [34] boosting [15] 92] hyperblock scheduling [16] superblock scheduling [17] guarded execution [82], sentinel scheduling [66] and modulo scheduling [83] employ varying degrees of static speculation. Utilizing speculative execution in instruction scheduling constitutes two issues: i) decision of when to perform computation under speculation, and (ii) recovery from incorrect speculation. The ....
D. Pnevmatikatos and G. Sohi. Guarded execution and branch prediction in dynamic ilp processors. In Conference Proceedings of the 21st Annual International Symposium on Computer Architecture, pages 120--129. Association for Computing Machinery, Apr. 1994.
.... Predication allows the compiler to overlap the execution of independent control constructs without code explosion [12] It also enables the compiler to reduce the frequency of branch instructions, to reduce branch mispredictions, and to perform sophisticated control flow optimizations [16][19][23] Predication does this at the cost of increased fetch utilization. Control speculation allows the compiler to judiciously eliminate control dependences at the cost of increased register consumption and instruction overhead [14] 21] Data dependence speculation enables the compiler to overcome ....
D. N. Pnevmatikatos and G. S. Sohi. Guarded execution and branch prediction in dynamic ILP processors. In Proceedings of the 21st International Symposium on Computer Architecture, pages 120--129, April 1994.
....using the predicated representation. In addition, the removal of branches yields performance benefits in the executed code, the most notable of which is the removal of branch misprediction penalties. In particular, the removal of frequently mispredicted branches yields large performance gains [9][10] 11] Predicated execution also provides an efficient mechanism for a compiler to overlap the execution of multiple control paths on the hardware. In this manner, processor performance may be increased by exploiting ILP across multiple program paths. Another, more subtle, benefit of ....
D. N. Pnevmatikatos and G. S. Sohi, "Guarded execution and branch prediction in dynamic ILP processors," in Proceedings of the 21st International Symposium on Computer Architecture, pp. 120--129, April 1994.
....supporting full or partial predicate support is clearly a choice that is available. Varying levels of partial predicate support provide options for extending an existing ISA. For example, introducing guard instructions which hold the predicate specifiers of subsequent instructions may be utilized [14]. 2 ISA Extensions In this section, a set of extensions to the instruction set architecture for both full and partial predicate support are presented. The baseline architecture assumed is generic ILP processor (either VLIW or superscalar) with in order issue and register interlocking. A generic ....
D. N. Pnevmatikatos and G. S. Sohi, "Guarded execution and branch prediction in dynamic ILP processors," in Proceedings of the 21st International Symposium on Computer Architecture, pp. 120--129, April 1994.
....field for each instruction, as was done in the Cydra 5. Another way is to introduce a special instruction # # which controls the conditional execution of following (non predicated) instructions. An example of this approach is seen in the guarded execution model proposed by Pnevmatikatos and Sohi [PnSo94], which includes special instructions whose execution specify whether following instructions should or should not be executed. This section will present 4 different existing predication models, those used in the Alpha processor, the HP RISC, guarded execution model and the Cydra 5. The Alpha ....
D. N. Pnevmatikatos and G. S. Sohi, "Guarded Execution and Branch Prediction in Dynamic ILP Processors", Proceedings of the 21th Annual Symposium on Computer Architecture, Chicago, Illinois (April 18-21, 1994), pp. 120-129.
....execution paths are being combined. Also, the elimination of difficult to predict branches improves the performance of the branch prediction mechanism used by the processor. A detailed discussion of these effects is beyond the scope of this paper; the interested reader is referred to [34][35][36] for more details. 4 Case Study II: Memory Dependence Information for ILP Compilation Various techniques have been proposed to provide an accurate analysis of memory references. Data dependence analysis attempts to determine the nature of the dependence relationship between pairs of memory ....
D. N. Pnevmatikatos and G. S. Sohi, "Guarded execution and branch prediction in dynamic ILP processors," in Proceedings of the 21st International Symposium on Computer Architecture, pp. 120--129, April 1994.
....added more correlation. In a sense then, correlation is good in terms of our metric it tends to concentrate the misses at a small subset of the branch sites. 5 Related Work There have been many studies of branch prediction behavior [Smi81] YP92] many as part of papers describing new algorithms [PS93]. Our contribution in this regard is to verify the results for correlating branch predictors [PKR92] and for available misprediction windows (e.g. for trace scheduling) PFS93] We have also further characterized miss sites as individual locations in the source programs, as opposed to the aggregate ....
D.N. Pnevmatikatos and G.S. Sohi, "Guarded Execution and Branch Prediction in Dynamic ILP Processors." Univ. of Wisconsin-Madison Technical Report #1193. November, 1993.
....which are actually taken at run time. To improve performance for processors with a high issue rate on programs with a low prediction accuracy, one may consider using multiple path speculative execution to exploit ILP from all control paths. Predicated execution [11, 12, 15] or guarded execution [6, 13] has been used as a way to eliminate branches from instruction streams. In predicated execution, a conditional branch is converted into a predicate defining instruction. Instructions which are control dependent on the conditional branch are then guarded by the predicate. By eliminating branch ....
D. N. Pnevmatikatos and G. S. Sohi. Guarded Execution and Branch Prediction in Dynamic ILP Processors. In Proceedings of the 21th International Symposium on Computer Architecture, pages 120--129, April 1994.
....was used in the Cydra 5 [18] architecture. Mahlke et al. 15] proposed hyperblocks, or superblock scheduling extended to support predicated architectures. Tyson [23] and Mahlke et al. [13] studied the potential benefits of predication on branch prediction accuracy. Pnevmatikatos and Sohi [16] proposed guarded execution, where a single instruction specified the predication information for subsequent instructions using a bit mask. Tyson studied both partial predication, where an architecture supports only a few, limited predicated instructions such as conditional moves, and full ....
D. N. Pnevmatikatos and G. S. Sohi. Guarded Execution and Branch Prediction in Dynamic ILP Processors. In 21st Intl. Symp. on Computer Architecture, pages 120--129, June 1994.
....instruction with a forward bit can be accomplished in a variety of ways. The two ways that appear most promising are (i) using special opcodes for such instructions (which we call op and send instructions) and (ii) using a specially inserted bit mask, similar to the GUARD instructions proposed in [6]. If the path of control in a task is complex, the situation becomes more involved. The instructions may be tagged with forward bits as before, but the determination of the create mask may be somewhat problematic. As the dynamic path through the task is unknown, the create mask must allow for all ....
D.N. Pnevmatikatos and G.S. Sohi. Guarded execution and branch prediction in dynamic ilp processors. In Proc. of ISCA 21, pages 120--129, April 1994.
No context found.
D. Pnevmatikatos and G. Sohi. Guarded execution and branch prediction in dynamic ilp processors. In Proceedings of the 21st International Symposium on Computer Architecture, pages 120--129, 1994.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC