| K. Cooper, M. W. Hall, and L. Torczon. An experiment with inline substitution. Software--Practice and Experience, 21(6):581-601, June 1991. |
....limited expansion of code space. The gain in execution speed may be due to direct effects, such as the reduction in the number of call and return instructions executed or due to indirect effects, such as cache and virtual memory behaviour and in context optimizations permitted on the inlined code [DH92, CHT91, McF91]. For efficiency and ease of implementation, many compilers restrict the conditions under which a procedure may be inlined. In some compilers (e.g. CMCH92] the calls that may be inlined depend on the order in which the procedures are declared. Several diverse heuristics are used to restrict ....
Keith D. Cooper, Mary W. Hall, and Linda Torczon. An experiment with inline substitution. Software--- Practice and Experience, 21:581--601, 1991.
.... the global method reachability DFA is updated incrementally when an edge is removed (the reachability vector is recomputed for the caller node and propagated to its predecessors until a fixed point is re established) The inlining strategy is to make a single post order traversal (as suggested by [6] and others) over the methods in the call graph and visit each of their sites in textual order, applying the inlining policy to make an inlining decision at each site encountered. In the absence of mutually recursive cycles, the post order traversal constitutes a topological sorting of the call ....
....overhead reduces the proportional impact of the inlining operations and makes the size estimate appear inflated. Finally, the relation between source text size growth and object code size growth is usually non linear due to increased optimization opportunities created by inlining Cooper et al. [6] confirmed this observation over a number of compilers in their source to source inliner. 8.3 Execution Time Speedup Figure 8.3.1 reports the effects of the two inlining heuristics on the execution time performance of the benchmarks. Figure 8.3.1 Total Execution Time Speedup 2 1 0 1 2 ....
[Article contains additional citation context not shown here]
Cooper, Keith D., Hall, Mary W. and Torczon, Linda. "An Experiment with Inline Substitution", Software -- Practice and Experience, (June 1991).
....requires knowledge of the entire program, thus precluding run time extensibility. Studies of inlining for more conventional languages like C or Fortran have found that it often does not increase execution speed but tends to increase code size significantly (e.g. Davidson and Holler 1988] [Cooper et al. 1991], Chang et al. 1992] Hall 1991] In contrast, inlining in SELF results in both significant speedups and only moderate code growth. The main reason for this striking difference is that SELF methods are much smaller on average than C or Fortran procedures, so that inlining can actually reduce ....
COOPER, K., HALL, M., AND TORCZON, L., 1991. An experiment with inline substitution. Software---Practice and Experience 21 (6), 581-601.
....parts in the unrolled code. Finally, for all our benchmarks, the combined effects of conditional fusion and recursion re rolling always improves the running times as compared with the programs with recursion unrolling alone. 6 Related Work Procedure inlining is a classical compiler optimization [3, 2, 4, 6, 12, 7, 11]. The usual goal is to eliminate procedure call and return overhead and to enable further optimizations by exposing the combined code of the caller and callee to the intraprocedural optimizer. Some researchers have reported a variety of performance improvements from procedure inlining; others have ....
K. Cooper, M. W. Hall, and L. Torczon. An experiment with inline substitution. Software---Practice and Experience, 21(6):581--601, June 1991.
....from 508 to 186. The code optimization that is the most usually used in programming language such as C is the inlining technique. Inlining involves a space speed tradeooe, therefore it should be applied selectively. The eoeect of inlining on program performance has been extensively studied [CHT91, DH92, Hos95] The issue of how function inlining can be automated has been addressed several times in previous research [DH92, Hos95] Current low level compilers feature inlining optimizations. Usually those compilers only inline simple and small functions. The heuristics used can not perform ....
....it involves a code size code speed tradeooe. Inlining the wrong functions might have a bad impact on the i cache performance and slow down program execution speed. As we cited before, the issue of how function inlining can be automated has been addressed several times in previous research [Hos95, CHT91, DH92] If code size is not a direct constraint, inlining is benecial in any of the following cases: INRIA HIPPCO: A High Performance Protocol Code Optimizer 33 ffl the function is only called once, ffl the size of the function is smaller or equal to the number of instructions required to call ....
K. D. Cooper, M. W. Hall, and L. Torczon. An experiment with inline substitution. In Software-Practice and Experience, June 1991. INRIA HIPPCO: A High Performance Protocol Code Optimizer 75
....subroutine Foo. In this example, A and B map to X and Y respectively at the first call statement and to C and D respectively at the second call statement. Hence, the memory location represented by a formal parameter may vary from one invocation of the subroutine to another. Subroutine in lining [CHT91, CHT92, Hol91] can eliminate all formal parameters, and subroutine cloning [CHK92] can result in a unique virtual address for every formal parameter. However, in lining and cloning may make the resulting object code overly large and, thus, are performed to limited extent in practice. When we ....
K. Cooper, M. Hall, and L. Torczon. An experiment with inline substitution. Software-- Practice and Experience, 21(6):581--601, June 1991.
....call disappears (IC decreases) Second, by removing the function boundary, more eOEcient optimizations can be performed by the low level compiler. The penalty for inlining is a code size increase which may adversely eoeect the instruction cache hit rate and slow down program execution speed [13, 15, 26, 20]. If code size is not a direct constraint, inlining a function call is benecial if the function is only called once, the size of the function is smaller or equal to the number of instructions required to call this function, and or this function call is executed very often. Traditional compilers ....
K. D. Cooper, M. W. Hall, and L. Torczon. An experiment with inline substitution. In SoftwarePractice and Experience, June 1991.
....of many research systems (e.g. CM 92] Hl94] G 95] as well as production compilers. Studies of inlining for procedural languages like C or Fortran have found that it often does not significantly increase execution speed but tends to significantly increase code size (e.g. DH88] HwC89] [CHT91], CM 92] Hall91] Our results indicate that these previous results do not apply to C programs. In implementations of dynamic or object oriented languages, profiling information has often been used to identify (and optimize for) common cases. For example, Lisp systems usually inline the ....
K. D. Cooper, M. W. Hall, and L. Torczon. An experiment with inline substitution. Software---Practice and Experience 21 (6): 581-601, June 1991.
....to adjust the amount of inlining dynamically for development (recompiling) or shrink wrapping mode. But the automatic cross module inlining schemes used to date have not treated free variables, nested scopes, higherorder functions, or link time side e#ects from module level initializers [DH88, CHT91, CMCH92] They cannot move a function body from module A to module B if the function has a free variable that is not exported from A and cannot be copied into B. This limits the generality of existing approaches, especially when applied to higher order functional languages. One might think of ....
....basis only to the compiler of client code; abstraction and modularity are never compromised at the source level. Furthermore, by tuning a few compile time parameters one can adjust the aggressiveness of cross module inlining or turn it o# completely. In contrast to previous experiments [Sch77, CHT91] that did not explain how to preserve e#cient separate compilation while inlining, our technique is fully integrated with SML NJ s separate compilation system: it cleanly exports inlinable portions of one compilation unit through the binary object file into the importing module. Our approach ....
Keith D. Cooper, Mary W. Hall, and Linda Torczon. An experiment with inline substitution. Software--- Practice and Experience, 21(6):581--601, June 1991.
....from 508 to 186. The code optimization that is the most usually used in programming language such as C is the inlining technique. Inlining involves a space speed tradeooe, therefore it should be applied selectively. The eoeect of inlining on program performance has been extensively studied [CHT91, DH92, Hos95] The issue of how function inlining can be automated has been addressed several times in previous research [DH92, Hos95] Current low level compilers feature inlining optimizations. Usually those compilers only inline simple and small functions. The heuristics used can not perform ....
K. D. Cooper, M. W. Hall, and L. Torczon. An experiment with inline substitution. In Software-Practice and Experience, June 1991.
....fonction est #conomis# (IC diminue) Ensuite, l #limination de l indirection permet l application de plus d optimisations par le compilateur de bas niveau. La contre partie est une augmentation de la taille du code, qui pourrait ralentir l ex#cution du programme par suite de son eoeet sur le cache [6, 7, 16, 12]. L insertion d une fonction est donc 11 b#n#que si cette fonction est appel#e une seule fois, ou tr#s fr#quemment ou bien si la taille du corps de la fonction est inf#rieur ou #gal au nombre d instructions n#cessaires pour appeler cette fonction. Les compilateurs traditionnels ins#rent en ....
K. D. Cooper, M. W. Hall, and L. Torczon. An experiment with inline substitution. Software-Practice and Experience, pages 581601, June 1991. 22
....call disappears (IC decreases) Second, by removing the function boundary, more eOEcient optimizations can be performed by the low level compiler. The penalty for inlining is a code size increase which may adversely eoeect the instruction cache hit rate and slow down program execution speed [14, 16, 27, 21]. If code size is not a direct constraint, inlining a function call is benecial if the function is only called once, the size of the function is smaller or equal to the number of instructions required to call this function, and or this function call is executed very often. Traditional compilers ....
K. D. Cooper, M. W. Hall, and L. Torczon. An experiment with inline substitution. Software-Practice and Experience, pages 581601, June 1991.
....to make good inlining decisions. The compiler uses a database to record the results of inlining experiments conducted in the past. The potential bene t of an inline is estimated by consulting the database. More research into inlining and related issues can be found in the work by Cooper et al. [18, 19, 17], Richardson et al. 34] Holler [28] and Allen et al. 10] 1.4 Thesis Contributions We have focused on reducing the overhead associated with virtual method calls in Java bytecode in this thesis. We have adopted two distinct approaches to address 17 this problem. First, we have tried to ....
....improvement in performance as a result of performing our optimization on a set of benchmarks in Chapter 4. This optimization has been implemented on the Jimple intermediate representation and the Soot framework is used to produce optimized class les. 44 3. 1 Method Inlining Method inlining [14, 11, 31, 16, 20, 21, 18, 19, 17, 34, 28, 10] is an optimization technique that has been used by optimizing compilers traditionally for both procedural and object oriented languages. The basic idea in method inlining is to statically replace a method invocation instruction by the code representing the body of the method that is the target of ....
Keith D. Cooper, Mary W. Hall, and Linda Torczon. An experiment with inline substitution. Software Practice and Experience, 21(6):581-601, June 1991.
....we were surprised by the rather small performance increases as the threshold is increased beyond 2, and plan to investigate this further. 8 Related work There is a modest literature on inlining applied to imperative programming languages, such as C and FORTRAN some recent examples are [DH92, CMCH92, CHT91, CHT92]. In these works the focus is exclusively on procedures defined at the top level. The benefits are found to be fairly modest (in the 10 20 range) but the cost in terms of code bloat is also very modest. Considerable attention is paid to the effect on register allocation of larger basic blocks, ....
KD Cooper, MW Hall, and L Torczon. An experiment with inline substitution. Software Practice and Experience, 21:581--601, June 1991.
.... this reason, most research on code transformation for UMA machines has focused on developing compiler techniques that are primarily for the frontend modules of parallelizing compilers, such as: effective dependence detection techniques [14, 30, 60, 61, 73] dependence elimination techniques [21, 59, 69], the optimization techniques for parallel job and loop scheduling [46, 57] and the techniques for optimizing data locality and reuse in caches [38, 40, 72] The work on parallelizing compilers for UMA machines also has given us the foundations necessary to tackle the study of compiler ....
K. Cooper, M. Hall, and L. Torczon. An experiment with inline substitution. Technical Report COMP TR-90-128, Department of Computer Science, Rice University, August 1990.
....complex than intraprocedural analysis. One approach is to make conservative assumptions about subroutine calls, letting may ref = may def = L and must def = OE. This is the simplest approach which always yields a safe, albeit non optimal result. More complex approachs include subroutine inlining [CHT91] and construction of a interprocedural summary graph [Cal88] For the purposes of the analyses in this paper all subroutine calls have been inlined. ffl Substantial research has been devoted to improving the efficiency of solving data flow equations [CCF91, JP93] These approaches are complex to ....
Keith D. Cooper, Mary W. Hall, and Linda Torczon. An experiment with inline substitution. Software -- Practice & Experience, 21(6):581--601, June 1991.
....dependence analyzers. However, even if the whole program becomes no larger, the loop nest which contained the call may grow dramatically, causing explosive growth in resource requirements due to the non linearity of array dependence analysis and other single procedure compilation algorithms [CHT91] To gain some of the benefits of inline expansion without its drawbacks, we must find another representation for the effects of the called procedure. For dependence analysis, we are interested in the memory locations modified or used by a procedure. Given a call to procedure p at statement S 1 ....
.... performance using an adapted version of pfc [Por89] Goff, Kennedy and Tseng studied the performance of dependence tests on Riceps and other benchmarks [GKT91] Some Riceps and 23 Riceps candidate codes have also been examined in a study on the utility of inline expansion of procedure calls [CHT91] The six programs studied here are two Riceps codes linpackd and track) and four codes from the inlining study. 3 2.5.2 Precision The precision of regular sections, or their correspondence to the true access sets, is largely a function of the programming style being analyzed. Linpack is ....
[Article contains additional citation context not shown here]
K. Cooper, M. W. Hall, and L. Torczon. An experiment with inline substitution. Software---Practice and Experience, 21(6):581--601, June 1991.
....and Kingsley 9 Submitted to FTCS One approach is to make conservative assumptions about subroutine calls, letting may ref = may def = L and must def = OE. This is the simplest approach which always yields a safe, albeit non optimal result. More complex approachs include subroutine inlining [CHT91] and construction of a interprocedural summary graph [Cal88] For the purposes of the analyses in this paper all subroutine calls have been inlined. ffl Substantial research has been devoted to improving the efficiency of solving data flow equations [CCF91, JP93] These approaches are complex to ....
Keith D. Cooper, Mary W. Hall, and Linda Torczon. An experiment with inline substitution. Software -- Practice & Experience, 21(6):581--601, June 1991.
....to adjust the amount of inlining dynamically for development (recompiling) or shrink wrapping mode. But the automatic cross module inlining schemes used to date have not treated free variables, nested scopes, higher order functions, or link time side effects from module level initializers [DH88, CHT91, CMCH92] They cannot move a function body from module A to module B if the function has a free variable that is not exported from A and cannot be copied into B. This limits the generality of existing approaches, especially when applied to higher order functional languages. Our new technique, ....
....basis only to the compiler of client code; abstraction and modularity are never compromised at the source level. Furthermore, by tuning a few compile time parameters one can adjust the aggressiveness of crossmodule inlining or turn it off completely. In contrast to previous experiments [Sch77, CHT91, CMCH92] that did not explain how to preserve efficient separate compilation while inlining, our technique is fully integrated with SML NJ s separate compilation system: it cleanly exports inlinable portions of one compilation unit through the binary object file into the importing module. 2 ....
Keith D. Cooper, Mary W. Hall, and Linda Torczon. An experiment with inline substitution. Software---Practice and Experience, 21(6):581--601, June 1991.
....not allocated their own clones. The increased object code size due to cloning could conceivably have a negative effect on caching and virtual memory. Inlining results in a similar but greater increase in object code size, but inlining apparently has little effect on caching and virtual memory. [CHT91] found no obvious evidence of either thrashing or instruction cache overflow due to inlining, and cited previous reports of similar results. While these studies involved inlining, they suggest that increased object code size due to cloning would likewise be free of significant performance ....
....observed any degradation attributable to combining filters into a single function in experiments integrating up to 15 (simple) layers. Combining filters in a function is similar to inlining, and in general, inlining seems to have little effect on caching and virtual memory. Experiments reported in [CHT91] showed no obvious evidence of either instruction cache overflow or thrashing, and the previous reports they cited showed similar results. 4.4 Reconciling Different Views of Data The preceding section addressed the problem of integrating arbitrary data manipulations in isolation, outside the ....
Keith D. Cooper, Mary W. Hall, and Linda Torczon. An experiment with inline substitution. Software---Practice and Experience, 21(6):581--601, June 91.
....a procedure call with a copy of the invoked procedure. In Figure 2(b) by inlining the calls to I) each version of I) may be fully optimized in the context of its caller. However, inlining can sometimes lead to unmanageable code explosion, resulting in prohibitive increases in compile time [47, 18]. Moreover, execution time performance may degrade when optimizers fail to exploit the additional context inlining exposes or when the code size exceeds register, cache or paging system limits [18, 34, 42, 35, 46] In light of the limitations of both inline substitution and interprocedural ....
.... sometimes lead to unmanageable code explosion, resulting in prohibitive increases in compile time [47, 18] Moreover, execution time performance may degrade when optimizers fail to exploit the additional context inlining exposes or when the code size exceeds register, cache or paging system limits [18, 34, 42, 35, 46]. In light of the limitations of both inline substitution and interprocedural data flow analysis, our research has explored additional techniques that provide some of the power of inlining but with less associated costs. a) Interprocedural Analysis (b) Inline Substitution A (c) Procedure ....
K. Cooper, M. W. Hall, and L. Torczon. An experiment with inline substitution. Software--Practice and Experience, 21(6):581-601, June 1991.
....123 132 7.32 96 putb 43 44 2. 33 38 getb 26 22 15.38 20 rffti1 24 19 20.83 20 slv2xy 11 9 18.18 11 pdiag 6 0 100.00 6 Our initial interest in this problem arose from several studies in which we examined code that resulted from automatic application of aggressive program transformations [10, 6, 14]. As these techniques become more widely applied, compilers will need to deal with their consequences. For this study, we focused on routines from the program wave5 in the SPEC95 benchmark suite. These routines had been transformed by the insertion of advisory prefetch instructions intended to ....
Keith D. Cooper, Mary W. Hall, and Linda Torczon. An experiment with inline substitution. Software -- Practice and Experience, 21(6):581--601, June 1991.
....At the merge points, the analyzer can only assume the set of facts that occur along all entering paths. This set is often weaker than the individual sets that enter the merge. We recently completed a study of the effectiveness of inline substitution in commercial FORTRAN optimizing compilers [9]. During the course of the study, we came across an example that demonstrates the kind of problems that can arise in the use of interprocedural transformations like inlining. Similar problems will arise in a compiler that bases optimization on the results of interprocedural data flow analysis. Our ....
....of its size. The inlined code had 2.37 times as many source code statements, but the executable produced by the MIPS compiler was nine percent smaller than the original executable. Both our study and Holler s study suggest that inlining rarely leads to thrashing or instruction cache overflow [9, 13]. 3 subroutine daxpy(n,da,dx,incx,dy,incy) c double precision dx(1) dy(1) da integer i,incx,incy,ix,iy,m,mp1,n : do 30 i = 1,n dy(i) dy(i) da dx(i) 30 continue return end Figure 3 abstracted code for daxpy other floating point interlocks occurred during execution of the ....
K. D. Cooper, M. W. Hall, and L. Torczon. An experiment with inline substitution. Software -- Practice and Experience, 21(6):581--601, June 1991.
....We have been exploring the problem of deciding how to use inlining and cloning to create opportunities for other transformations. Our study of inline substitution (using commercial FORTRAN compilers) showed that secondary effects in the compilers often overshadowed any benefit from inlining [11]. Thus, our strategy is to use interprocedural transformations in a goal directed way that is, we identify a high payoff transformation that can be helped by some combination of procedure cloning and inlining and use them to enable the highpayoff transformation [4] We have used this ....
....results. Special case code generation. A natural extension of our work with procedure cloning is to be more aggressive about generating conditional code. In our inlining study, both the vectorizing compilers had cases where they incorrectly assumed that parallel execution of a loop was profitable [11]. The compilers should have inserted a run time test on the number of iterations and generated both a sequential and a parallel version of the loop. This would have led to better behavior on small data sets while keeping the parallelism for the larger cases where the granularity actually covered ....
K. D. Cooper, M. W. Hall, and L. Torczon. An experiment with inline substitution. Software -- Practice and Experience, June 1991.
No context found.
K. D. Cooper, M. W. Hall, and L. Torczon. An experiment with inline substitution. Software---Practice and Experience 21 (6): 581-601, June 1991.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC