| A. M. Holler. Optimization for a superscalar out-of-order machine. In Proceedings of the 29th International Symposium on Microarchitecture, pages 336--348, December 1996. |
.... poorly for indirect jumps, since the target of an indirect jump can change with every dynamic instance of that branch [9, 30] In fact, some compilers provide techniques that insert extra conditional branches that check for likely targets to avoid the execution of indirect jumps from a table [17] or indirect calls [7] Most modern architectures seldom support indirect jumps in BTB due to such poor misprediction ratios for indirect jumps. However, consider the results shown in Figure 7.1. An UltraSPARC 1 could execute about eight pairs of compare and branch instructions in the time ....
A. M. Holler. Optimization for a superscalar out-of-order machine. In Proceedings of the 29th International Symposium on Microarchitecture, pages 336--348, December 1996.
....optimizations have evolved to incorporate profile information in the optimization analysis. For example, profiles are used to choose between several optimization plans based on the expected run time benefit [1, 2, 13, 23] or to specify the optimization scope by directing procedure inlining [14] or scheduling [15, 20] While code optimizations use profiles to complement static program analysis, most memory system optimizations that attempt to improve a program s data cache performance by changing the data layout, rely primarily on profiles [4, 6, 7, 9, 17, 26] The success of many ....
A. M. Holler "Optimization for a superscalar out-of-order machine." In Proceedings of the 29th Annual ACM/IEEE International Symposium on Microarchitecture, 1996.
.... Example Interval a Interval b Infimum inf(c) Supremum sup(c) Addition any any inf(a) inf(b) sup(a) sup(b) 2,3] 4,5] 6,8] Subtraction any any inf(a) sup(b) sup(a) inf(b) 2,3] 4,5] 3, 1] Multiplication positive positive inf(a) inf(b) sup(a) sup(b) 2,3] 3,4] [6,12] positive negative sup(a) inf(b) inf(a) sup(b) 2,3] 2, 1] 6, 4] positive crosses zero sup(a) inf(b) sup(a) sup(b) 2,3] 2,1] 6,3] negative positive inf(a) sup(b) sup(a) inf(b) 3, 2] 3,4] 12, 6] negative negative sup(a) sup(b) inf(a) inf(b) 3, 2] ....
.... positive positive inf(a) inf(b) sup(a) sup(b) 2,3] 3,4] 6,12] positive negative sup(a) inf(b) inf(a) sup(b) 2,3] 2, 1] 6, 4] positive crosses zero sup(a) inf(b) sup(a) sup(b) 2,3] 2,1] 6,3] negative positive inf(a) sup(b) sup(a) inf(b) 3, 2] 3,4] [ 12, 6] negative negative sup(a) sup(b) inf(a) inf(b) 3, 2] 2, 1] 2,6] negative crosses zero inf(a) sup(b) inf(a) inf(b) 3, 2] 2,2] 6,6] crosses zero positive inf(a) sup(b) sup(a) sup(b) 1,2] 3,4] 4,8] crosses zero negative sup(a) inf(b) inf(a) inf(b) ....
[Article contains additional citation context not shown here]
A. M. Holler, "Optimization for a Superscalar Out-of-order Machine," Proceedings of the 29 th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 336-348, 1996.
....provided that such instructions are committed to memory in order, and runtime dependences are detected before instructions commit. If a runtime dependence violation is detected, then instructions are re executed from the point where the dependence violation occured as with mispredicted branches[10]. This work supported in part by NSF grant ASC 9624987. 1 These advances in hardware have the goal of increasing the mean number of Instructions executed Per Cycle (IPC) The effect of some of these mechanisms is fairly obvious. For example run time register renaming eliminates the need for a ....
Anne Holler. Optimization for a superscalar out-of-order machine. In Micro-29, pages 336--348, 1996.
....more redundant computations can be removed than when we rely on a lexical analysis. By using names created through back substitution across multiple loop iterations, VNGPRE subsumes predictive commoning , an optimization aimed at removing common subexpressions recurring across loop iterations [Hol96, OHM 91] VNG is also being used to carry out PRE of load store operations; it is worth noting that the popular (cf. Hol96] algorithm in [CK94] computes 11 data flow problems, while the VNG is able to subsume this optimization by solving only three problems. Besides supporting powerful ....
.... across multiple loop iterations, VNGPRE subsumes predictive commoning , an optimization aimed at removing common subexpressions recurring across loop iterations [Hol96, OHM 91] VNG is also being used to carry out PRE of load store operations; it is worth noting that the popular (cf. Hol96] algorithm in [CK94] computes 11 data flow problems, while the VNG is able to subsume this optimization by solving only three problems. Besides supporting powerful value flow analysis, the approach of defining the name space with the analysis problem in mind allows distributive formulation of ....
Anne M. Holler. Optimization for a superscalar out-of-order machine. In Proceedings of the 29th Annual International Symposium on Microarchitecture, pages 336--348, December 1996.
.... 9ld. 02ld n , after.tvsec, after.tvusec 10000) Fig. 23. Code to Measuring the Execution Time for Loop Overhead and Loop Body In fact, some compilers provide techniques that insert extra conditional branches that check for likely targets to avoid the execution of indirect jumps from a table [Holler 1996] or indirect calls [Calder and Grunwald 1994] Most modern architectures seldom support indirect jumps with a BTB due to poor misprediction ratios for indirect jumps. However, consider the results shown in Table IV. An UltraSPARC 1 could execute about eight pairs of compare and branch instructions ....
Holler, A. M. 1996. Optimization for a superscalar out-of-order machine. In Proceedings of the 29th International Symposium on Microarchitecture, pp. 336--348.
.... a version of PRE that is more powerful than the best existing algorithms [30, 40] By using names created through back substitution across multiple loop iterations, VNGPRE subsumes predictive commoning, an optimization aimed at removing common subexpressions recurring across loop iterations [26, 34]. VNG is also being used to carry out PRE of load store operations; it is worth noting that the popular (cf. 26] algorithm in [11] computes 11 data flow problems, while the VNG is able to subsume this optimization by solving only three problems. We are currently considering the use of VNG for ....
.... back substitution across multiple loop iterations, VNGPRE subsumes predictive commoning, an optimization aimed at removing common subexpressions recurring across loop iterations [26, 34] VNG is also being used to carry out PRE of load store operations; it is worth noting that the popular (cf. [26]) algorithm in [11] computes 11 data flow problems, while the VNG is able to subsume this optimization by solving only three problems. We are currently considering the use of VNG for array bound check optimizations and for synthesizing run time memory disambiguation. Distributive formulations of ....
Anne M. Holler. Optimization for a superscalar out-of-order machine. In Proceedings of the 29th Annual International Symposium on Microarchitecture, pages 336--348, December 1996.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC