| Frank Mueller and David B. Whalley. Avoiding unconditional jumps by code replication. In Proceedings of the ACM SIGPLAN 1992. |
....several possible reasons for this: 1. GCC may be able to take advantage of the express lane transformation to perform its own optimizations (e.g. code layout [7] 2. Reduction of the hot path graphs may result in poorer code layout that requires more unconditional jumps along critical paths [12]. 3. The more aggressive reduction strategies seek only to preserve decided branches, and may destroy data flow facts that show an expression to have a constant value. 4. The code layout for the reduced graph may interact poorly with the I cache. The results shown in Tables 4 and 5 are often ....
F. Mueller and D. B. Whalley. Avoiding unconditional jumps by code replication. In PLDI92.
....redundancy that is apparent by examining multiple basic blocks along a path, as opposed to a redundancy due to a single basic block detected in their analysis. In addition, in ICBE, the analysis cost and the code growth incurred due to program restructuring can be controlled. Mueller and Whalley [MW92b] also investigated avoiding unconditional jumps by code replication. Krall [Kra94] developed code replication techniques to improve the accuracy of semi static branch prediction to the accuracy of dynamic prediction. 7.6.2 Other benefits of entry exit splitting The primary benefit of ICBE is ....
Frank Mueller and David B. Whalley. Avoiding unconditional jumps by code replication. SIGPLAN Notices, 27(7):322--330, July 1992. Proceedings of the ACM SIGPLAN '92 Conference on Programming Language Design and Implementation.
....needs to be separated. A simple form of restructuring is tail duplication [HMC # 92] which separates frequently executed paths to improve scheduling by separating control flow merge points. Restructuring is also necessary when redundant operations are unhoistable, such as unconditional branches [MW92a] or conditional branches [MW95a,BGS97a] Gupta et al. apply control speculation, which is a transformation that inserts computations onto paths that did not compute these computations in the unoptimized program. As a result, some paths are optimized and some are impaired. To control the ....
Frank Mueller and David B. Whalley. Avoiding unconditional jumps by code replication. In Programming Language Design and Implemenation Conference, pages 322--330. ACM SIGPLAN, ACM Press, June 1992.
....to an optimized code layout for our database workload (PostgreSQL 6.3 running the TPC D benchmark) Using code reordering techniques we can obtain streams of average length 14, more than enough to feed even the more aggressive superscalar processor. Plus, the use of code replicating techniques [4, 9] can lead to much longer streams, and completely solve the problem of stream length. 1.2 Instruction cache misses Code reordering techniques also reduce the number of instruction cache misses [1, 2, 5, 7] by almost an order of magnitude. The same code reordering we need to enlarge our streams ....
Frank Mueller and David B. Whalley. Avoiding unconditional jumps by code replication. Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation, pages 322-330, 1992.
....always predict the same outcome for a given branch. This prediction was obtained either using very simple heuristics [23] static analysis [1] or profile information [5, 4] The accuracy of static branch predictors can be increased using code transformations, which usually imply code replication [14, 27, 9, 13, 16], and branch alignment [3] This branch alignment is nothing but a code reordering optimization which targets an increase in the static branch prediction accuracy: knowing the branch outcome, it is aligned to follow the heuristic implemented by the static predictor. As the transistor budget in ....
....branch is usually not taken [23] In this work we examine how code layout optimizations targeting the fetch engine affect both static and dynamic branch prediction. There have been other code transformations proposed to improve static branch prediction accuracy, usually implying code replication [14, 27, 9, 13, 16]. These code transformations are beyond the scope of this work. Basic branch prediction techniques can also be broadly classified in three groups: static, semi static, and dynamic predictors. Static prediction techniques are based solely on static analysis and simple prediction strategies, and ....
F. Mueller and D. B. Whalley. Avoiding unconditional jumps by code replication. Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation, pages 322--330, 1992.
....spirit to Baker s problem in that she only structures the parts of the program that correspond naturally to structured control constructs. Mueller et al. present a compiler back end optimization method that attempts to eliminate unconditional branches, whether they originate from gotos or not [MW92] The method eliminates almost all the unconditional branches by performing code duplication. It replaces each unconditional jump with the shortest possible sequence of instructions, to minimize growth in code size. It is implemented on a RTL intermediate representation in the early stages of the ....
F. Mueller and D. B. Whalley. Avoiding unconditional jumps by code replication. In Proceedings of the SIGPLAN '92 Conference on Programming Language Design and Implementation, pages 322--330, San Francisco, California, June 17--19, 1992. ACM SIGPLAN. SIGPLAN Notices, 27(7), July 1992.
....can be applied. A simple form of restructuring is tail duplication [HMC 92] which separates frequently executed paths to improve scheduling by separating control flow merge points. Restructuring is also necessary when redundant operations are unhoistable, such as unconditional branches [MW92] or conditional branches [MW95, BGS97a] Gupta et al. apply control speculation, which is a form of code motion that may impair certain paths. They use program profile to control the impairment [GBF98] A more complete PRE by means of control speculation was also explored in [HH97, CKL 98] ....
Frank Mueller and David B. Whalley. Avoiding unconditional jumps by code replication. In Programming Language Design and Implemenation Conference, pages 322--330. ACM SIGPLAN, ACM Press, June 1992.
....replicates a piece of code, so that the branches in the replicated code pieces are more predictable than in the original code. This idea was inspired by the work of Pettis and Hanson [PH90] who use profiling for code positioning to improve cache behaviour, and by the work of Mueller and Whalley [MW92], who use code replication to avoid jumps. Chapter 2 presents existing branch prediction methods. Chapter 3 contains a description of our profiling tool and presents the results of profiling our benchmark suite. Chapter 4 describes the techniques for compacting the collected history information ....
....from correlated branches to the branch to be predicted. The correlated branch state machine is the set of those paths which give the lowest misprediction rate. One state covers the case where the control flow matches none of the paths. The code replication for correlated branches is similar to [MW92]. The difference is that our aim was to save information about the branch direction, whereas the aim of Mueller and Whalley was to avoid unconditional jumps. Table 4 gives the misprediction rate of correlated branches. We used a maximum path length of n Gamma 1 for an n state machine to keep the ....
Frank Mueller and David B. Whalley. Avoiding unconditional jumps by code replication. In 1992 SIGPLAN Conference on Programming Language Design and Implementation. ACM, June 1992. 8
....rearrange nonoptimal code sequences and identify potential lookup tables in order to generate efficient code. In addition to incorporating many standard optimizations found in traditional compilers, the BPF optimizer introduces a novel application of redundant predicate elimination [17, 22]. This optimization is rarely found in compilers for traditional languages like C or Java because redundant predicates do not occur very often and the optimization would not be very profitable. However, in the domain of packet filter compilation, BPF s naive code generator produces decision trees ....
Frank Mueller and David B. Whalley. Avoiding unconditional jumps by code replication. In ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 322--330, June 1992.
....the improver that inserts these instructions can often be made machine independent. For example, most machines support register to register copies, register indirect addressing mode, and register to register ALU operations. A third advantage is that it is very simple to add new code improvements [MUEL92]. This advantage should not be taken Line Line 1. proc Improve is 2. BuildControlFlowGraph( 3. ControlFlowTransformations( 4. SetLocalLinks( 5. InstructionSelection( 6. EvaluationOrderDetermination( 7. BuildDominatorTree( 8. FindDominanceFrontiers 9. LiveVariableAnalysis( 10. ....
Mueller, F. and Whalley, D. B., `Avoiding Unconditional Jumps by Code Replication,' Proceedings of the SIGPLAN '92 Conference on Programming Language Design and Implementation, San Francisco, CA, June 1992, pp. 322---330.
.... [LFK 93] The height and number of executed branches has also been reduced for while loops [SK95] Optimizations for very specific code patterns have been used to eliminate conditional branches [GK92] The number of executed conditional branches can also be reduced using code duplication [MW92] MW95] BGS95] Finally, the number of executed branches can be reduced by reordering multi way switch statements so commonly executed cases appear first [YUW98] Scalar control CPR provides the potential for a compiler to more systematically treat a broader variety of scalar program branches ....
F. Mueller and D Whalley. Avoiding unconditional jumps by code replication. In Proceedings of PLDI92, pp. 322--330, June 1992.
....[21] Instead of performing splitting in the front end, we could obtain similarly good code with back end optimization. Heintz [64] proposes such an optimization (called rerouting predecessors ) for a Smalltalk compiler, and similar back end optimizations for conventional languages are well known [3, 10]. We have opted for splitting because it can achieve good code without extensive analysis and thus helps in building a fast compiler. 6.1.4 Uncommon traps In SELF, many operations have many different possible outcomes, and the compiler creates specialized code for the common cases in order to ....
Frank Mueller and David B. Whalley. Avoiding Unconditional Jumps by Code Replication. In Proceedings of the SIGPLAN `92 Conference on Programming Language Design and Implementation, p. 322-330. Published as SIGPLAN Notices 27(7), July 1992.
....inter procedural optimizations, has been described in various papers [9, 15, 22] Loop unrolling [3] and software pipelining [17] are well known techniques for reducing loop overhead, improving register and data cache locality, and increasing instruction level parallelism. Mueller and Whalley [19] describe the elimination of unconditional jumps by code replication. They later extend these ideas to the elimination of conditional branches [20] Chambers [6] describes the use of loop splitting to propagate operand types and hence aid optimization in the context of a dynamically typed ....
F. Mueller and D. B. Whalley. Avoiding unconditional jumps by code replication. In Proceedings of the ACM SIGPLAN '92 Conference on Programming Language Design and Implementation, pages 322--330, San Francisco, CA, June 1992.
....folding, and code motion permitted by path splitting. Note that the C continue statement is generally implemented by a jump to the bottom of the loop, where an exit condition is evaluated and a conditional branch taken to the top of the loop if the exit condition is false. Exploiting the ideas of [34], we save the additional jump to the loop test by duplicating the loop test at all points where a loop now ends. If the test evaluates to false, a jump is taken to the end of the entire loop tree so that execution can continue in the right place. There are other small advantages to be had from ....
....for potentially reducing loop overhead, improving register and data cache locality, and increasing instruction level parallelism. Code replication is fundamental in producing the long instruction sequences useful for compiling efficiently to VLIW architectures [2, 14, 16, 32] Mueller and Whalley [34] describe the elimination of unconditional jumps by code replication. They present an algorithm for replacing unconditional jumps uniformly by replicating a sequence of instructions from the jump destination. This sequence is selected by following the shortest path from the destination to a ....
F. Mueller and D. B. Whalley. Avoiding unconditional jumps by code replication. In Proceedings of the ACM SIGPLAN '92 Conference on Programming Language Design and Implementation, pages 322--330, San Francisco, CA, June 1992.
....that is apparent by examining multiple basic blocks along a path, as opposed to a redundancy due to a single basic block detected in their analysis. In addition, in our technique, the analysis cost and the code growth incurred due to program restructuring can be controlled. Mueller and Whalley [18] also investigated avoiding unconditional jumps by code replication. Krall [16] developed code replication techniques to improve the accuracy of semi static branch prediction to the accuracy of dynamic prediction. This paper is organized as follows. The next section presents an example to motivate ....
Frank Mueller and David B. Whalley. Avoiding unconditional jumps by code replication. SIGPLAN Notices, 27(7):322--330, July 1992. Proceedings of the ACM SIGPLAN '92 Conference on Programming Language Design and Implementation.
....about the structure of a program, which contemporary compilers generally do not support for irreducible regions of code. In addition, aggressive global instruction scheduling, enhanced modulo scheduling [13] trace scheduling and pro le guided code positioning combined with code replication [8, 9] or applied during binary translation may result in branch reordering and code replication, which itself may introduce irreducible regions. This paper brie y discusses traditional loop splitting, contributes a new approach of optimized node splitting and reports on a performance study of these ....
....also measured the instruction cache performance for a 4kB and 512B direct mapped cache using VPO and EASE. The hit ratio did not change signi cantly (less than 1 ) for the tested programs, regardless of the cache size. For changing code sizes, the cache work is often a more appropriate measurement [8], where a miss accounts for 10 cycles delay (for going to the next memory level) and a hit for one cycle. The methods of handling irreducible loops all resulted in reduced cache work for most cases, varying between a reduction of 6 and 28 . This reduction seems to indicate that execution in ....
F. Mueller and D. B. Whalley. Avoiding unconditional jumps by code replication. In ACM SIGPLAN Conf. on Programming Language Design and Impl., pages 322{ 330, June 1992.
....instruction scheduling may result in branch reordering and code replication, which itself may introduce irreducible regions. Other code optimization techniques, such as trace scheduling and pro le guided code positioning for optimizing compilers [7, 9, 20] may be combined with code replication [18, 19] or applied during binary translation [3] which can result in irreducible loops. Common algorithms to detect loops only cover natural loops [1] As a result, irreducible loops are generally missed and will be ignored during loop optimizations, which results in performance penalties for each loop ....
F. Mueller and D. B. Whalley. Avoiding unconditional jumps by code replication. In ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 322-330, June 1992.
....also measured the instruction cache performance for a 4kB and 512B direct mapped cache using VPO and EASE. The hit ratio did not change signi cantly (less than 1 ) for the tested programs, regardless of the cache size. For changing code sizes, the cache work is often a more appropriate measurement [17], where a miss accounts for 10 cycles delay (for going to the next memory level [12] and a hit for one cycle. The methods of handling irreducible loops all resulted in reduced cache work for most cases, varying between a reduction of 6 and 28 . This reduction seems to indicate that execution in ....
F. Mueller. Avoiding unconditional jumps by code replication. Master's thesis, Dept. of CS, Florida State University, April 1991.
No context found.
Frank Mueller and David B. Whalley. Avoiding unconditional jumps by code replication. In Proceedings of the ACM SIGPLAN 1992.
No context found.
Frank Mueller and David B. Whalley, \Avoiding unconditional jumps by code replication," Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation, pp. 322{ 330, 1992.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC