| Pohua Chang, et al., "Using Profile Information to Assist Classic Code Optimizations," Software-Practice and Experience, vol. 21, no. 12 (1991): 1301-1321. |
....of the application. Several systems [26, 35, 34] have shown that performance can be improved substantially by exploiting invariant runtime values; however, these systems were not fully automatic and relied on programmer directives to identify regions of code to be optimized. Many researchers (ex. [37, 40, 31, 18]) have focused on performing feedback directed optimizations in which the application behavior is captured from a prior execution of the program because these profiling executions typically incur significant overhead. Although the resulting performance improvements are often promising, this ....
....frequencies of basic blocks can be easily derived from edge frequencies. Such profile information is useful for a variety of optimizations. It has been used o#line in previous work for optimizations such as code reordering [40] instruction scheduling [31] and other classic code optimizations [18]. There are two subproblems involved in collecting edge profiles for the purpose of optimization: collecting the profiles, and making the profiles available to the client optimizations that use them. These two topics are discussed in the next two subsections, respectively. Collecting Edge ....
[Article contains additional citation context not shown here]
Pohua P. Chang, Scott A. Mahlke, and Wen-Mei W. Hwu. Using profile information to assist classic code optimizations. Software---Practice and Experience, 21(12):1301--1321, December 1991.
....procedures, which can lead to more optimization opportunities [5] 2. We perform code transformation before performing data flow analysis. This allows us to use classic data flow analyses. 3. We guide path duplication using interprocedural path profiles. This point may sound redundant, but [7], for example, uses edge profiles to duplicate intraprocedural paths. The advantage of using interprocedural path profiles is that we get more accuracy in terms of which paths are important. 4. We perform interprocedural range analysis on the transformed graph. 5. We attempt to eliminate ....
....no reduction strategy is used to limit code growth. In fact, aggressive reduction strategies can destroy the performance gains. There are several possible reasons for this: 1. GCC may be able to take advantage of the express lane transformation to perform its own optimizations (e.g. code layout [7]) 2. Reduction of the hot path graphs may result in poorer code layout that requires more unconditional jumps along critical paths [12] 3. The more aggressive reduction strategies seek only to preserve decided branches, and may destroy data flow facts that show an expression to have a constant ....
[Article contains additional citation context not shown here]
P.P. Chang, S.A. Mahlke, and W.W. Hwu. Using profile information to assist classic code optimizations. Software practice and experience, 1(12), Dec. 1991.
....were not fully automatic and relied on programmer directives to identify regions of code to be optimized. There exists a large body of work on collecting profiling information by performing instrumentation [25, 44, 4, 18, 17] as well as fully automatic optimizations based instrumented profiles [34, 27, 46, 30, 31, 10, 62, 65, 53]. However this work assumes the execution model where a profiles can be collected o#ine, using a separate training run. Although the resulting speedups are often promising, this approach fails in scenarios where 1) it is impractical to collect a profile prior to execution, or 2) the application ....
....feedback directed optimizations (either o#ine or online) strives to improve performance by using profiling information to identify common execution patterns that can be exploited when performing optimizations. Many systems have demonstrated success using o#ine profiling data to perform FDO [26, 34, 27, 46, 30, 31, 10, 62, 65, 53]. However, online feedback directed optimization is more di#cult than o#ine feedbackdirected optimization for several reasons. When using o#ine profile information, the profile is usually assumed to be free, accurate, and available prior to execution. With online profiling, none of these are ....
[Article contains additional citation context not shown here]
Pohua P. Chang, Scott A. Mahlke, and Wen-Mei W. Hwu. Using profile information to assist classic code optimizations. Software---Practice and Experience, 21(12):1301-- 1321, December 1991.
....4 followed by a description of how the proposed technique is applied in Section 5. Finally, Section 6 presents the experiments. 2. Related Work 2. 1 Traditional Profiling Traditional profiling executes instrumented code and collects control flow characterization to assist in code optimization [2][16] Profiling introduces significant performance overhead as evidenced by proposals for hardware assisted instrumentation or sampling based profiling schemes [9] Such techniques rely upon execution of code on the actual system. Traditional profile data can be collected by executing a program, ....
P. Chang, S. Mahlke, W. Hwu, Using profile information to assist classic code optimization, Software Practice and Experience, V 21 n 12, Dec 1991
....this speedup. This means that we go beyond the best speedup observed with the naive approach while (at the same time) reducing by 75.4 the size of the compiled bytecode. 5 Related work Profiling information has been used for a long time to guide optimizers in static compilers. One trend of work [3, 4] uses profiling information to detect frequently executed scenarios and transform the program so as to be able to optimize the frequently executed code. More recently, profiling information has been used in dynamic optimization systems. Some work only uses it to detect so called hot spots [6, 10] ....
Pohua P. Chang, Scott A. Mahlke, and Wen-mei W. Hwu. Using profile information to assist classic code optimizations. Software Practice and Experience, 1991.
....defined by a set of states and a set of transitions. The basic blocks in a procedure represent the states of the Markov chain, and transitions are defined by the probabilities of branch outcomes in the control flow graph. These probabilities can be readily gleaned from the traditional edge profile [2] that measures the probability that one block flows to another assuming independent branch outcomes. For inter procedural control flow we model the effect of procedure calls with a restriction on transitions into and out of procedures such that a callee must return to its caller. In particular, ....
P.Chang,S.Mahlke,W.Hwu,UsingProfileInformationto Assist Classic Code Optimizations, Software Practice and Experience, vol. 21, no.12, pp. 1301-1321, 1991.
....A Markov chain is defined by a set of states and transitions. The basic blocks in a procedure represent the states of the Markov chain, and transitions are defined by the probabilities of branch outcomes in the control flow graph. These probabilities can be gleaned from traditional edge profiling [2], which measures the frequency that one block flows to another. For inter procedural control flow, we summarize the e#ect of procedure calls when necessary by computing summaries for subroutines, or sets of mutually recursive functions. This is equivalent to restricting transitions into and out of ....
P. P. Chang, S. A. Mahlke, and W. Hwu. Using Profile Information to Assist Classic Code Optimizations. Software -- Practice and Experience, 21(12):1301--1321, 1991.
.... significant of the previous work in this area concerns the use of profile information to determine which function calls can be profitably inlined [14] The closest previously published work to our own is an examination by Chang et al. of how profiles can be used in classical code optimizations [1]. Indeed, Chang et al. gave one example where a partial redundancy elimination was performed. However, they did not formalize the technique or give a systematic algorithm for finding the opportunities for code motion. We believe that our approach is the first formalization of code motion ....
P.P. Chang, S.A. Mahlke, W.Y. Chen, and W.W. Hwu. "Using Profile Information to Assist Classic Code Optimizations ", Software -- Practice & Experience 21, 12, Chichester, Dec. 1991, pp. 1301-1321.
....implementation. Dynamic Profile Guided Optimizations The StarJIT compiler supports dynamic profile guided optimization (DPGO) as part of its dynamic compilation framework. Modern static compilers have used profileguided optimization (PGO) to achieve significant performance improvement [5] [7] The performance benefit from PGO on the Itanium Processor Family architecture is even more profound, with a speedup of approximately 20 observed on certain integer benchmarks. Traditional static PGO requires an initial compilation and execution run to collect an execution profile for use in ....
P.P. Chang, S.A. Mahlke and W.W. Hwu, "Using Profile Information to Assist Classic Code Optimizations," Software-Practice and Experience, vol. 21(12), Dec. 1991, pp.1301-1321.
....above are the logic blocks and design of the interconnection network. This is designed specific to an application family. The methodology flow for this is shown in Figure 2. The input to the design is C source code for an application or a set of applications from a domain. First we use the IMPACT [10,7] compiler , which is part of the MESCAL compiler infrastructure, as the front end to do some pre processing, including performance profiling and loop detection. The preprocessing is done on scheduled and register allocated intermediate code (IMPACT lcode) The performance profiling information ....
P. Chang, S. Mahlke, W. W. Hwu "Using Profile Information to Assist Classic Compiler Code Optimizations "Software Practice and Experience, Dec. 1991, Vol. 21, No. 12, pp. 1301-1321.
....optimization Profile driven optimization is a relatively new field. Some static compilers utilize profile information from prior test runs to perform better optimizations, for example, trace scheduling [79] improving cache locality [87, 117, 48, 128, 127] or traditional optimizations [35, 34, 36, 23]. Profile driven optimizations have shown up in recent commercial products [89, 88, 131] There has been work on using profile information in dynamic compilers for Scheme [18, 17, 19] Self [61] Cecil [26] and ML [96, 101] More recently, the upcoming HotSpot dynamic compiler for Java claims to ....
Pohua P. Chang, Scott A. Mahlke, and Wen mei W. Hwu. Using profile information to assist classic code optimizations. Software Practice and Experience, 1991.
....shown in Figure 2 illustrate how the RPA can specify different profiling applications. Example a) Edge Profiling Edge profiles, counts of how often conditional branches are taken and not taken, are one of the most useful types of profiles, enabling or improving several important optimizations [2, 4, 26]. These profiles are also relatively easy to collect. Figure 2a shows the RPA assembly for the query. The query header indicates the PC and branch outcome (taken not taken) result as well as a branch misprediction flag (misc) for branch instructions should be collected. Random sampling is used to ....
P. P. Chang, S. A. Mahlke, W. W. Hwu, "Using Profile Information to Assist Classic Code Optimizations," Software-Practice and Experience, vol. 21, pp. 1301-1321, Dec. 1991.
....of a sparse matrix on a parallel machine at run, and automatically run a graph partitioner like Parmetis [36] to redistribute it to accelerate subsequent matrix vector multiplications. Feedback Directed Compilation This involves running the program, collecting profile and other information [29, 19, 7, 3] and recompiling with this information. We will make use of this mode through the explicit incorporation of a database of performance history information. 5.3 Applying SANS to Sparse Linear Algebra In dense linear algebra, efficient use of the memory hierarchy is probably the single most ....
P. P. Chang, S. A. Mahlke, and W. W. Hwu. Using profile information to assist classic code optimizations. Software -- Practice & Experience, 21(12):1301--1321, December 1991.
....at which optimizations produce the greatest benefits. For example, branch probabilities can guide instruction generation and scheduling to reduce stalls on pipelined processors [Fisher 81, Hank 93] Block and function execution frequencies can identify program bottlenecks during optimization [Chang 91] or assist in performance analysis [Sarkar 89] Branch and function call frequencies help order code for instruction scheduling [Hwu 89, McFarling 89] or enhance memory reference locality [Pettis 90, Wu 92] Most previous work on profiling studied techniques for decreasing the cost of profiling ....
Chang, P.P., S.A. Mahlhe, and W. W. Hwu,, "Using profile information to assist classic code optimizations." Software Practice and Experience, 21, 12 (Dec., 1991) pp. 1301-1321.
....has been shaped by two trends. First, profiles have become indispensable in a spectrum of advanced optimizations that include trace scheduling [20] and extend well beyond it: basic block and path profiles [6,43] identify hot spots in the program; call graph profiles [1] guide procedure inlining [3, 9, 10]; dynamic type profiling removes indirect calls in object oriented languages [25,26] value invariance profiles lead to program specialization [8,11,32,36] and memory conflict profiles To appear at the 28th International Symposium on Computer Architecture (ISCA 2001) Goteborg, Sweden between ....
Pohua P. Chang, Scott A. Mahlke, and Wen-Mei W. Hwu. Using profile information to assist classic code optimizations. Software: Practice and Experience, 21(12):1301--1321, December 1991.
....the infrequently executed paths moved to the outer loop contain procedure calls or stores, there are more opportunities for finding loop invariant expressions, holding values in registers in the inner loop, etc. This optimization is a generalization of the superblock loop optimizations from Chang [12]. After superblock formation the most common path through the loop is a superblock loop, which is a superblock with an edge from the bottom to the top. Chang describes specialized forms of loop invariant code removal, global variable migration, and loop induction variable elimination that apply ....
....frequently executed. Switch statements in C are implemented with a combination of tests and conditional branches and computed branches through a jump table. With profile information, the compiler inserts a test for the most frequently occurring case before any other tests or jump table lookups [12]. When evaluating ## and expressions in C, the compiler can swap the operands if a different evaluation order is likely to skip some operations. For example, if the profile indicates that the second operand of a is usually evaluated, then it is known that the first operand is usually true. ....
P. Chang, S. Mahlke, and W. Hwu, "Using Profile Information to Assist Classic Code Optimizations," 10 Software-Practice and Experience, vol. 21, no. 12 (1991): 1301-1321.
....software is already executing in the foreground, so that the speed of re compilation is not critical. This means that far more aggressive optimization strategies can be employed than would be possible in an interactive context. Run time profiling data can also be exploited during re compilation [Ing71, Han74, CMH91] so that successive iterations yield better and better code. Hence, iterative dynamic re compilation can provide the run time efficiency of a globally optimized monolithic application in the context of mobile objects, or even surpass it due to the fine tuning that profiling makes possible. It ....
P. P. Chang, S. A. Mahlke, and W. W. Hwu; Using Profile Information to Assist Classic Code Optimizations; SoftwarePractice and Experience, 21:12, 1301-1321; 1991.
....takes. This means that far more aggressive optimization strategies can be employed than would be possible in an interactive context. Further, because re optimization occurs at run time, live execution profile data can be taken into account for certain optimizations [Ingalls 1971, Hansen 1974, Chang et al. 1991] This is why our model is continuous: although re optimization would strictly be necessary only whenever new components are added to the system, usage patterns among the existing components still shift over time. Re optimization at regular intervals makes it possible to take these shifts into ....
P. P. Chang, S. A. Mahlke, and W. W. Hwu (1991); "Using Profile Information to Assist Classic Code Optimizations"; Software--Practice and Experience, 21:12, 1301-1321.
....coalescing,looppeeling,andloopunrolling,softwarepipelining, speculativeexecution,codemotion,redundantcodeelimination, andinterproceduraloptimizationssuchasinlining,cloning,etc. Theideaofusingprogramexecutionprofilestoguideperformanceoptimizingtransformationshasbeenproposedin [3]and[4] In mostoftheseapproaches,schedulingsucceedstransformationapplication. Wedemonstratethat,inhigh levelsynthesis,scheduling informationcanguidetheapplicationoftransformationsandenhancetheireffectiveness. Inaddition,wesupportpower optimizing ....
P. P. Chang, S. A. Mahlke,and W. W. Hwu, "Using profile information to assist classic code optimizations," Software Practice & Experience, vol. 21, pp. 1301--1321,Dec. 1991.
....to remove many of the conditional branches in the protocol code. This optimization is similar in approach to what Bershad et al. 9] did by hand for remote procedure calls. To further optimize the code on the frequent I O path, we can apply the trace based ideas of global instruction scheduling [15][26] 31] 66] 101] To apply this previous work to I O paths, we are extending these scheduling algorithms with more inter procedural analysis [91] We are also investigating the applicability of loop fusion [110] and redundant load removal to the protocol stack code. In this way, the compiler ....
Pohua P. Chang, Scott A. Mahlke, and Wen mei W. Hwu. Using profile information to assist classic code optimizations. Center for Reliable and HighPerformance Computing Report CRHC-91-12, University of Illinois at UrbanaChampaign, Urbana, IL 61801, April 1991. 38
....important for high performance systems. 1 Introduction Advanced compilers perform optimizations across block boundaries to increase instruction level parallelism, enhance resource usage and improve cache performance. Many of these methods, such as trace scheduling [1] and superblock scheduling [2], either rely on or can benefit from information about dynamic program behavior. For example, traditional 0 optimizations enhance performance by an additional 15 when combined with profile driven superblock formation [2] Other examples include data preloading [3] improved function in lining ....
....of these methods, such as trace scheduling [1] and superblock scheduling [2] either rely on or can benefit from information about dynamic program behavior. For example, traditional 0 optimizations enhance performance by an additional 15 when combined with profile driven superblock formation [2]. Other examples include data preloading [3] improved function in lining [4] and improved instruction cache performance [5] There are several drawbacks to profile driven optimizations. Many of the techniques can result in code size explosion if they are performed too aggressively. Dynamic ....
[Article contains additional citation context not shown here]
P. P. Chang, S. A. Mahlke, and W. W. Hwu, "Using profile information to assist classic code optimizations," Software--Practice and Experience, vol. 21, pp. 1301--1321, Dec. 1991.
....University as technical report TR 23 95. 2 affected and hence that component of the performance equation is unchanged. The true performance benefit of SCBP is dependent upon the effectiveness of compiletime optimizations, such as global instruction scheduling [15, 17] and code optimization [4], since these optimizations attempt to reduce the total number of cycles required to execute a program. Since the design and evaluation of sophisticated compile time optimizations is an endless task, we do not attempt to give a definitive answer to the question of how much SCBP can ultimately ....
P. Chang, S. Mahlke, and W. Hwu, "Using Profile Information to Assist Classic Compiler Code Optimizations," Software Practice and Experience, Vol. 21, No. 12, Dec. 1991.
....and executable code size that is possible with the interprocedural techniques, several methods have been devised. Each has the goal to avoid performing, or re performing, the work of analysis and optimization when possible. 2.4. 1 Profile driven Analysis Profile driven analysis and optimization [3, 7, 8, 10, 17, 34, 35, 42, 62, 69] makes use of execution frequency information derived at run time to focus the attention of the compiler. An instrumented version of the program is created and run to gather a run time profile of the program s performance. This profile information can then be used to improve a subsequent compiler ....
....a subsequent compiler optimization of the program by, for example, placing constraints on compile time and space consumption, by optimizing the most frequently executed portions of the code first, and halting the optimization phase when the time or space constraints are exceeded. Path profiling [3, 7, 8, 17, 34, 35, 62] has been used successfully in compiler optimization. Other basic types of control flow profiling are edge profiling, which measures the execution frequency of each individual flow graph edge, and basic block profiling, which measures how many times each basic block is executed. Edge profiles are ....
[Article contains additional citation context not shown here]
P. P. Chang, S. A. Mahlke, and W. W. Hwu. Using profile information to assist classic code optimizations. Software Practice and Experience, 21:1301--1321, Dec. 1991.
....the compiler to optimize code before and after the loop with the loop body. If a loop executes more than a few iterations, unrolling is better than peeling. The tracer uses flow edge counts to select a trace, which is a frequently executed path through the flow graph [11] Trace selection [11] [12] starts with a seed flow graph edge. The trace is then grown forwards and backwards using the mutual most likely heuristic. The heuristic requires that for block A to be followed by block B in the trace, A must be B s most likely predecessor and B must be A s most likely successor. Design and ....
....the infrequently executed paths moved to the outer loop contain procedure calls or stores, there are more opportunities for finding loop invariant expressions, holding values in registers in the inner loop, etc. This optimization is a generalization of the superblock loop optimizations from Chang [12]. After superblock formation, the most common path through the loop is a superblock loop, which is a superblock with an edge from the bottom to the top. Chang describes specialized forms of loop invariant code removal, global variable migration, and loop induction variable elimination that apply ....
[Article contains additional citation context not shown here]
P. Chang, S. Mahlke, and W. Hwu, "Using Profile Information to Assist Classic Code Optimizations," Software-Practice and Experience, vol. 21, no. 12 (1991): 1301-1321.
....prediction studies. i 1 Introduction Profile guided code optimizations have been shown to be effective by several researchers. Among these optimizations are basic block and procedure layout optimizations to improve cache and branch behavior [3, 10, 12] register allocation, and trace scheduling [5, 6, 8, 11]. The technique that all these optimizations have in common is that they use profiles from a previous run of a given program to predict the behavior of a future run of the same program. However, many researchers believe that collecting profile information is too costly or time consuming, and that ....
P. P. Chang, S. A. Mahlke, and W. W. Hwu. Using profile information to assist classic compiler code optimizations. Software Practice and Experience, 21(12):1301--1321, 1991.
....the citations given below help to place the work done in this thesis into its proper context. Reference [4] is a detailed description of the IMPACT compiler framework. Reference [5] shows how use profile information may be used to make traditional code optimizations more effective. Reference [6] is a technical report containing a more thorough treatment of material of [5] Reference [7] describes control flow optimizations which the IMPACT compiler also used. Reference [8] is a technical report showing the advantages of scheduling code prior to register allocation. Reference [9] shows ....
P. P. Chang, S. A. Mahlke, and W. W. Hwu, "Using profile information to assist classic code optimizations," Tech. Rep. CRHC-91-12, Center for Reliable and High-Performance Computing, University of Illinois, Urbana, IL, April 1991.
....IMPACT compiler has been used by our group many times in published research. As a component in a large group project, the citations given below help to place the work done in this thesis into its proper context. Reference [4] is a detailed description of the IMPACT compiler framework. Reference [5] shows how use profile information may be used to make traditional code optimizations more effective. Reference [6] is a technical report containing a more thorough treatment of material of [5] Reference [7] describes control flow optimizations which the IMPACT compiler also used. Reference [8] ....
....proper context. Reference [4] is a detailed description of the IMPACT compiler framework. Reference [5] shows how use profile information may be used to make traditional code optimizations more effective. Reference [6] is a technical report containing a more thorough treatment of material of [5]. Reference [7] describes control flow optimizations which the IMPACT compiler also used. Reference [8] is a technical report showing the advantages of scheduling code prior to register allocation. Reference [9] shows the advantages of scheduling superblocks especially on superpipelined ....
P. P. Chang, S. A. Mahlke, and W. W. Hwu, "Using profile information to assist classic code optimizations," Software Practice and Experience, vol. 21, pp. 1301--1321, December 1991.
....use to the software business community. 1 Introduction Advanced compilers perform optimizations across block boundaries to increase instruction level parallelism, enhance resource usage and improve cache performance. Many of these methods, such as trace scheduling [1] 2] superblock scheduling [3] and software pipelining [4] either rely on or can benefit from information about dynamic program behavior. For example, traditional optimizations enhance performance by an additional 15 when combined with profile driven superblock formation [3] Other examples include data preloading [5] ....
....as trace scheduling [1] 2] superblock scheduling [3] and software pipelining [4] either rely on or can benefit from information about dynamic program behavior. For example, traditional optimizations enhance performance by an additional 15 when combined with profile driven superblock formation [3]. Other examples include data preloading [5] improved function in lining [6] and improved instruction cache performance [8] There are several drawbacks to profile driven optimizations. Many of the techniques can result in code size explosion if they are performed too aggressively. Dynamic ....
[Article contains additional citation context not shown here]
P. P. Chang, S. A. Mahlke, and W. W. Hwu, "Using profile information to assist classic code optimizations," Software--Practice and Experience, vol. 21, pp. 1301--1321, Dec. 1991.
....in Section 4. In the Deco project at Harvard, we are interested in developing a feedback directed optimization approach that is able to obtain and exploit timely feedback information from any point in an application s execution Existing Approach Example Systems Profile driven compilation IMPACT [4] Off line optimization with continuous profiling FX 32 [14] Morph [5] Run time code generation (RTG) C [16] DyC [12] Tempo [6] Run time optimization in software DAISY [9] Dynamo [3] Tinker s DR [7] Run time optimization in hardware out of order machines, trace processors [18] Table 1: ....
P. Chang, S. Mahlke, and W. Hwu. "Using Profile Information to Assist Classic Compiler Code Optimizations," Software Practice and Experience, 21(12):1301--1321, Dec. 1991.
....a sequence of basic blocks. Superblocks are formed in two steps. Traces within a program (sets of basic blocks which tend to execute in sequence [8] are first identified using execution profile information [24] Tail duplication is then performed to eliminate any side entrances to the trace [25]. The basic blocks in a superblock need not be consecutive in the code. However, our implementation restructures the code so that all blocks in a superblock appear in consecutive order to the optimizer and scheduler. Formation of superblocks is best illustrated with an example. Figure 2.2(a) ....
....at every cycle in a superblock when there are more instructions to choose from. An important feature of superblock enlarging optimizations is that only the most frequently executed parts of a program are enlarged. This selective enlarging strategy keeps the overall code expansion under control [25]. Three superblock enlarging optimizations are described as follows. Branch Target Expansion. Branch target expansion expands the target superblock of a likely taken control transfer which ends a superblock. The target superblock is copied and appended to the end of the original superblock. 12 ....
[Article contains additional citation context not shown here]
P. P. Chang, S. A. Mahlke, and W. W. Hwu, "Using profile information to assist classic code optimizations," Software Practice and Experience, vol. 21, pp. 1301--1321, December 1991. 157
....paths that would be impaired by the code motion. The restructuring is based on code duplication, as shown in Figure 1(d) where all optimization potential is exploited and no path is impaired. In prior research, profiling based optimization has been largely restricted to small program regions [CMmWH91] Restructuring has been used without guarantees that it will enable optimization [HMC 92] This research will integrate code motion, speculation, and restructuring to achieve a nearcomplete redundancy removal that will be practical by applying the costly transformations only when ....
Pohua P. Chang, Scott A. Mahlke, and Wen mei W. Hwu. Using profile information to assist classic code optimizations. Software Practice and Experience, 1991. 47
....branch prediction exploits information that is obtained by static analysis of a program. The ability to predict the correct direction of the control flow stream at compilation time allows the compiler to perform various 2 optimizations such as trace scheduling [8, 9] code reordering [10, 11, 12], inter procedural register allocation [13] optimization and scheduling of superblocks [14] and hyperblocks [15] and improved branching [16] Many static branch prediction techniques have been proposed [17] Since the predicted branch outcome is determined prior to run time, static branch ....
P. Chang, S. Mahlke, and W. Hwu. Using profile information to assist classic code optimizations. Technical Report CRHC-91-13, Center for Reliable and High-Performance Computing, Univ. of Illinois UrbanaCompaig, Urbana,IL, April 1991.
....a specialized suite of transformations is applied to it to take advantage of these new opportunities. In the context of superblock formation, the process of tail duplication has been employed to eliminate side entrances from a superblock so that the optimizer need not take them into consideration [27]. This same technique can also be applied to eliminate side entrances from arbitrary regions. Tail duplication duplicates all blocks within a region that are reachable from a side entrance, thus eliminating the entrance. There are seven levels of optimization available to the compile time ....
P. P. Chang, S. A. Mahlke, and W. W. Hwu, "Using profile information to assist classic code optimizations," Software Practice and Experience, vol. 21, pp. 1301-- 1321, December 1991.
....emulating intermediate representation in C include flexible profiling capabilities, cross platform independence, and enhanced debugging capabilities. The most important of these benefits for the IMPACT compiler is profiling. Profiling has been shown to be successful in guiding code optimizations [1], 2] such as branch prediction strategies [3] 4] 5] and instruction level parallelism enhancing optimizations. In order to get profiling information, probing instructions are inserted into the program code. Generally, an in depth knowledge of the development architecture is required to ....
.... Local variable declarations unsigned tempvar; int P15; int IP0; int IR[10] outgoing parameter space declaration int OP[4] int previousOPptr; saving global OP pointer, and init globalOP previousOPptr = globalOPptr; globalOPptr = int ) unsigned) OP 12) CB1: IR[1] = P0; IR[2] P1; IR[3] P2; IR[4] P3; CB2: tempvar = P0; IR[5] char ) tempvar) if (IR[5] 0) goto CB9; CB3: tempvar = IR[1] IR[7] char ) tempvar) IR[1] IR[1] 1; if (IR[7] 108) goto CB4; CB12: if (IR[7] 119) goto CB8; CB13: if (IR[7] 99) goto ....
[Article contains additional citation context not shown here]
P. P. Chang, S. A. Mahlke, and W. W. Hwu, "Using profile information to assist classic code optimizations," Software Practice and Experience, vol. 21, pp. 1301-- 1321, December 1991.
....optimizations require information about the direction of conditional branches and relative execution frequencies of basic blocks. Examples of these optimizations include superblock formation, superblock scheduling [20] hyperblock formation [21] register allocation [22] and function inlining [23]. Execution frequencies may be obtained from dynamic profile information or may be estimated using static techniques. Static estimates are derived by examining the statements and structure of the program. Examining the loop edges, conditional branch expressions, and the static call graph, gives ....
P. P. Chang, S. A. Mahlke, and W. W. Hwu, "Using profile information to assist classic code optimizations," Software Practice and Experience, vol. 21, no. 12, pp. 1301--1321, December 1991.
....the top, but may leave at one or more exit points. It is formed by identifying sets of basic blocks which tend to execute in sequence (called a trace) 30] These blocks are coalesced to form the superblock. Tail duplication is then performed to eliminate any side entrances into the superblock [31]. The formation of superblocks is illustrated in Figure 2.2, taken from [23] Figure 2.2(a) shows a weighted flow graph which represents a loop code segment. The nodes in the graph correspond to basic blocks and the arcs represent the possible control transfers. The number in each node represents ....
P. P. Chang, S. A. Mahlke, and W. W. Hwu, "Using profile information to assist classic code optimizations," Software Practice and Experience, vol. 21, pp. 1301--1321, December 1991.
....scheduling schemes such as superblocks [4] and hyperblocks [1] would definitely benefit from productive data speculation. Profiling is one method for estimating optimization profitability. The use of profile information in relation to control flow based optimization has been studied previously [5]. These results illustrate the benefit of using profile information within compiler optimizations. Similarly, memory profiling can provide a method of triggering data speculation in both the compiler s optimization and scheduling domains. This thesis will illustrate how memory reference ....
P. P. Chang, S. A. Mahlke, and W. W. Hwu, "Using profile information to assist classic code optimizations," Center for Reliable and High-Performance Computing, University of Illinois, Urbana, IL, Tech. Rep. CRHC-91-12, April 1991.
No context found.
T. Kistler and M. Franz Chang, P. P., Mahlke, S. A., and Hwu, W.-M. W. 1991. Using Profile Information to Assist Classic Code Optimizations. Software---Practice and Experience 21, 12 (Dec.), 1301--1321.
....hardware necessary to support data relocation and prefetching would be much easier to add to an existing processor than the hardware to support vectorization. 9 CHAPTER 2 HIGH LEVEL PROFILING AND SIMULATION In general, profiling information is very useful for code transformation and optimization [12][13] This chapter describes a high level profiling tool for loop iteration analysis and array reference analysis. The statistics of the profiled results for some benchmarks are also described. Finally, high level cache simulation is described with some results to motivate data relocation and ....
P. P. Chang, S. A. Mahlke, and W. W. Hwu, "Using profile information to assist classic code optimizations," Software Practice & Experience, Dec. 1991.
.... code, profiling provides accurate branch prediction [13] Once the direction of the branch is determined, blocks which tend to execute together can be grouped to form a trace[9] 3] To reduce some of the bookkeeping complexity, the side entrances to the trace can be removed to form a superblock [5]. In dynamically and statically scheduled processors in which the scheduling scope is enlarged by predicting the branch direction, there are possible hazards to moving instructions across branches. An instruction that is moved above a conditional branch should not cause an exception which ....
....The second step of the superblock scheduling algorithm is to form superblocks. Superblocks avoid the complex repairs associated with moving code across side entrances by removing all side entrances from a trace. Side entrances to a trace can be removed using a technique called tail duplication [5]. A copy of the tail portion of the trace from the side entrance to the end of the trace is appended to the end of the function. All side entrances into the trace are then moved to the corresponding duplicate basic blocks. The remaining trace with only a single entrance is a Technical Report ....
[Article contains additional citation context not shown here]
P. P. Chang, S. A. Mahlke, and W. W. Hwu, "Using Profile Information to Assist Classic Code Optimizations," Center for Reliable and High-Performance Computing Technical Report, University of Illinois at Urbana-Champaign, April, 1991.
No context found.
Pohua Chang, et al., "Using Profile Information to Assist Classic Code Optimizations," Software-Practice and Experience, vol. 21, no. 12 (1991): 1301-1321.
No context found.
P. P. Chang, S. A. Mahlke, and W. mei W. Hwu. Using profile information to assist classic code optimizations. Software - Practice and Experience, 21(12):1301--1321, 1991.
No context found.
Pohua P. Chang, Scott A. Mahlke, and Wen mei W. Hwu. Using profile information to assist classic code optimizations. Software - Practice and Experience, 21(12):1301-- 1321, 1991.
No context found.
P. P. Chang, S. A. Mahlke, and W.-M. W. Hwu. Using Profile Information to Assist Classic Code Optimizations. Software: Practice and Experience, 21(12):1301--1321, Dec. 1991.
No context found.
P. P. Chang, S. A. Mahlke, and W.-m. W. Hwu. Using profile information to assist classic code optimizations. Software Practice and Experience, 1991.
No context found.
Pohua P. Chang, Scott A. Mahlke, and Wen mei W. Hwu. Using profile information to assist classic code optimizations. Software Practice and Experience, 21(12):1301--1321, December 1991.
No context found.
P. P. Chang, S. A. Mahlke, and W. W. Hwu. Using profile information to assist classic code optimizations. Software -- Practice & Experience, 21(12):1301--1321, December 1991.
No context found.
P. P. Chang, S. A. Mahlke, and W.-M. W. Hwu, "Using profile information to assist classic code optimizations," Software ---Practice and Experience, vol. 21, no. 12, pp. 1301--1321, Dec. 1991.
No context found.
P. P. Chang, S. A. Mahlke, and W. W. Hwu. Using profile information to assist classic code optimizations. Software - Practice and Experience, 21(12):1301-- 1321, 1991.
No context found.
P. P. Chang, S.A. Mahlke and W. W. Hwu, `Using profile information to assist classic code optimization', Software---Practice and Experience, 21, (12), 1301--1321 (1991).
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC