| Brad Calder and Dirk Grunwald. Reducing indirect function call overhead in C++ programs. In ACM SIGPLAN'94 Symposium on Principles of Programming Languages, pages 397-408, 1994. 26 |
....0.0; a : DenseProperty storage DenseFormat numRows( numColumns( numRows( numColumns( for (j=1; j =numColumns; j ) end for end for element(i,j) element(i,j) for (i=1; i =numRows; i ) Thread norm1 Figure 11: Sequence diagram for jjAjj 1 implemented at MA level with A dp df. Method inlining [CG94, GDGC95, DGC95, Fer95, AH96, BS96, DA99, IKY 00, SHR 00] can be applied to eliminate the invocations by inserting the code of the invoked methods into the invoking methods. After applying method inlining, 9 the statements inside the nested loop appear as shown in Figure 12. An analysis ....
Brad Calder and Dirk Grunwald. Reducing indirect function call overhead in C++ programs. In ACM SIGPLAN'94 Symposium on Principles of Programming Languages, pages 397-408, 1994. 26
....by design patterns Objects interact using method invocations, which can be implemented either as direct calls or virtual calls. Virtual calls defeat branch prediction (and thereby instruction pipelining) and inhibit inlining, blocking subsequent traditional intra procedural compiler optimizations [6, 15]. Thus, many compilers go to great lengths to replace virtual calls by direct calls [1, 13, 14, 26] some even using constrained specialization techniques [11, 20] such as customization [7] Even so, virtual calls can only be completely eliminated when static analysis can safely determine that ....
B. Calder and D. Grunwald. Reducing indirect function call overhead in C++ programs. In Conference Record of 9 the Twentieth Annual ACM SIGPLAN-SIGACT Symposium on Principles Of Programming Languages, pages 397--408. ACM Press, 1994.
.... jumps, since the target of an indirect jump can change with every dynamic instance of that branch [9, 30] In fact, some compilers provide techniques that insert extra conditional branches that check for likely targets to avoid the execution of indirect jumps from a table [17] or indirect calls [7]. Most modern architectures seldom support indirect jumps in BTB due to such poor misprediction ratios for indirect jumps. However, consider the results shown in Figure 7.1. An UltraSPARC 1 could execute about eight pairs of compare and branch instructions in the time required to perform an ....
B. Calder and D. Grunwald. Reducing indirect function call overhead in C++ programs. In Proceedings of the ACM Symposium on Principles of Programming Languages, pages 397--408, December 1994.
....to the language. However, this could signi cantly reduce the potential for static optimizations. Although the optimizations we described are all static ones, they do not prevent the integration of various dynamic optimizations in SmallEi el. For example, the addition of pro le guided analysis [HCU91, CG94, AH96] would provide more information on the most commonly used types at execution time, and thus allow further optimization of dynamic dispatch sites. Acknowledgments We thank the anonymous reviewers for their helpful comments and suggestions. We are also especially grateful to Jean Michel Drouet who ....
Brad Calder and Dirk Grunwald. Reducing Indirect Function Call Overhead In C++ Programs. In 21st Annual ACM Symposium on the Principles of Programming Languages, pages 397-408, January 1994.
....numRows( numColumns( numRows( numColumns( for (j=1; j =numColumns; j ) end for end for element(i,j) element(i,j) for (i=1; i =numRows; i ) Thread norm1 Fig. 11. Sequence diagram for jjAjj1 implemented at MA level with A dense and stored in dense format. 14 Method inlining [13, 21, 15, 18, 2, 6, 16, 25, 37] can be applied to eliminate the invocations by inserting the code of the invoked methods into the invoking methods. After applying method inlining, 6 , the statements inside the nested loop appear as shown in Figure 12. if (a instanceOf DenseProperty) first guard double aux; try if ....
Brad Calder and Dirk Grunwald. Reducing indirect function call overhead in C++ programs. In ACM SIGPLAN'94 Symposium on Principles of Programming Languages, pages 397-408, 1994.
....original running time Base Inlining Minv Minv RLE Minv RLE Inlining Fig. 26. Cumulative impact of optimizations. Figure 25 illustrates that most run time polymorphic method invocations arise because more than one type of object is stored in a heap slot. Two techniques, explicit type test [Calder and Grunwald 1994; Holzle and Ungar 1994] and cloning or splitting combined with aggressive alias analysis, may be able to resolve these method invocations. Merges in control are another important cause of the run time polymorphism, especially for trestle, and can be resolved by code splitting and cloning ....
....executed as their performance metric and found that the more powerful alias analysis did not significantly improve performance. We observe more performance improvement due to RLE, which may be because we measure object oriented programs as opposed to the C programs used by Cooper and Lu. Calder et al. 1994] show that C programs typically execute a smaller percentage of loads and stores than C programs. Debray et al. 1998] describe an alias analysis for executable code. They evaluate their algorithm by measuring the percentage of loads eliminated using loop invariant code motion and PRE of loads. ....
[Article contains additional citation context not shown here]
CALDER, B. AND GRUNWALD, D. 1994. Reducing indirect function call overhead in C++ programs. In 21st Symposium on Principles of Programming Languages. ACM, Portland, Oregon, 397--408.
....in the last years. Some papers ( 8] 11] just report dynamic program characteristics of C programs and show that virtual calls are very common in C programming style. These studies are not proposing solutions, but provide the basic motivation for subsequent research works. Others, like [7], propose techniques for optimizing virtual function calls, e.g. by transforming them to conditional expressions, and analyze the effects of such optimizations, but do not provide concrete implementations. An interesting comparison of optimization techniques in the context of SELF can be found in ....
....Name The analysis phase for this optimization consists of finding VFC sites for which only one destination exists. For each UN VFC candidate, a unique class name is identified. There are many methods suggested in the literature to find a UN candidate set, varying from simple algorithms ( 3] [7]) up to approximations using class hierarchy pruning ( 4] and complex data flow analysis ( 12] We implement a simple algorithm that explores information on VFC sites and knowledge on the complete class hierarchy. For each VFC, we find all of the possible methods that can be called at that ....
B. Calder and D. Grunwald. Reducing Indirect Function Call Overhead in C++ Programs. In Conference Record of the Twenty-First ACM symposiumon Principles of Programming Languages (POPL), pages 397--408, Portland, Oregon, January1994. ACM Press, New York, NewYork.
.... an increasing focus on interprocedural optimizations in compilers for imperative and object oriented languages and with an increasing use of results of interprocedural program analysis in software development environments, the problem of call graph construction has received a significant attention [3, 5, 6, 10, 11, 13, 14, 18, 23, 22, 24, 25, 27, 29, 28, 30, 31, 33]. A call graph is a static representation of dynamic invocation relationships between procedures (or functions or methods) in a program. A node in this directed graph represents a procedure and an edge (p q) exists if the procedure p can invoke the procedure q [2] In objectoriented languages, a ....
....is crucial for the results of interprocedural analysis. Therefore, call graph construction or dynamic call site This research was supported by NSF CAREER award ACI 9733520 and NSF grant CCR 9808522. 1 resolution has been a focus of attention lately in the object oriented compilation community [3, 6, 10, 11, 13, 22, 24, 30, 31]. A common characteristic, however, of all the work in this area has been that they perform exhaustive analysis, i.e. given a code, they compute the set of procedures that can be called at each of the call sites in the entire code. We believe that with an increasing popularity of just in time or ....
[Article contains additional citation context not shown here]
Brad Calder and Dirk Grunwald. Reducing indirect function call overhead in C++ programs. In Conference Record of POPL '94: 21st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 397--408, Portland, Oregon, January 1994.
....analysis, we would have preferred to start with a conservative solution and then add further elements. However, this is not feasibly while performing demand analysis, because of the way the initial conservative call graph is constructed. The previous work in the area of call graph construction [2, 3, 4, 7, 8, 10, 11, 12, 15, 19, 18, 20, 21, 23, 25, 24, 26] has only focussed on exhaustive analysis and has not considered demand driven analysis. Similarly, the previous work in the area of demanddriven data flow analysis [9, 14] has assumed that a complete call graph has already been constructed before initiating the demand driven analysis. 6 ....
Brad Calder and Dirk Grunwald. Reducing indirect function call overhead in C++ programs. In Conference Record of POPL '94: 21st ACM SIGPLANSIGACT Symposium on Principles of Programming Languages, pages 397--408, Portland, Oregon, January 1994.
....analysis as discussed in Sec 4.3 to choose those candidates that may yield useful optimizations. Our system focuses on code specialization, since that optimization is implemented in alto. If we change the optimizations used in alto, for instance to reduce indirect function call overhead [18, 19], we need not change the profiler. Since the optimizer shares the cost benefit decisions with the profiler, the profiler would simply select different program points, based on the expected benefit. 4.1 A Cost Model for Value Profile Based Code Specialization Our approach uses value profiles ....
B. Calder and D. Grunwald, "Reducing Indirect Function Call Overhead in C++ Programs", Proc. 21st ACM Symposium on Principles of Programming Languages, Jan. 1994, pp. 397-- 408.
....( SPARC92] only the total number of cycles can be an approximate measure, due to latencies involved in memory accesses and especially in branch instructions. Refilling the processors pipeline with code from an unpredictable branch target address involves a multi cycle execution delay. e.g. CaGr94] reported that the elimination of the branch instruction in the VTBL method could lead to a performance improvement of up to 66 for C programs. Recently, DHV95] concluded a detailed comparison of different dispatch methods. They showed that methods that contain no unpredictable branches and ....
Brad Calder, Dirk Grunwald: "Reducing indirect function call overhead in C++ Programs". In Conference Rocord of POPL '94: 21st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, ACM Press, pp. 397-408.
....for space constrained systems such as embedded and pervasive computing systems. Most previous work on efficient C implementation has focused either on implementing just one language feature, such as multiple inheritance [24, 22, 9, 10] or on reducing time overhead, such as for virtual dispatch [4, 11, 3, 6, 21, 8, 2]. In this paper, we quantify and evaluate the space overhead for three language features: virtual dispatch, virtual inheritance, and dynamic typing. We also study the space overhead due to the interaction between these features. Computing the space overhead for a memory layout requires computing ....
Brad Calder and Dirk Grunwald. Reducing indirect function call overhead in C++ programs. In 21st Annual ACM SIGACT-SIGPLAN Symposium on the Principles of Programming Languages, pages 397--408, Portland, OR, Jan. 1994.
....affect some of their parameters, could be treated as a special case as well. 2 [1] procedure DetectUnusedDataMembers(Program P ) 2] begin [3] mark all data members in P initially as dead ; 4] mark all classes in P initially as not visited ; 5] construct the call graph G of program P ; [6] for each statement s in each function f in call graph G do [7] call ProcessStatement(s) 8] end for [9] for each union construct U in P do [10] if (at least one of the members of U is marked live ) then [11] call MarkAllContainedMembers(U) 12] end if [13] end for [14] end; 15] ....
....classes for which a constructor call occurs in the application) and the number of data members that occur in used classes. Several of these benchmarks have been studied previously in the literature for other purposes (e.g. experimentation with virtual function call elimination algorithms) [5, 9, 8, 6, 12, 3]. The programs of Table 1 range from 606 to 58,296 lines of code, and contain between 10 to 268 classes, 5 and between 22 and 1052 data members. Some benchmarks (e.g. taldict, simulate, and hotwire) use class libraries that have been developed independently from the application. Several other ....
Calder, B., and Grunwald, D. Reducing indirect function call overhead in C++ programs. In Conference Record of the Twenty-First ACM Symposium on Principles of Programming Languages (POPL'94) (Portland, Oregon, Jan. 1994), pp. 397--408.
....reduce object size. ffl Creation and destruction of objects requires less time, due to reduced object size. Time requirements may also be reduced through caching paging effects. ffl Specialization may create new opportunities for existing optimizations such as virtual function call resolution [4, 9, 3, 8, 5]. To appear in the proceedings of the 12th Annual ACM Conference on Object Oriented Programming, Systems, Languages, and Applications (OOPSLA 97) October 5 8 1997, Atlanta, GA. ffl Specialization may be of use in program understanding and debugging tools. For example, specialization can be ....
CALDER, B., AND GRUNWALD, D. Reducing indirect function call overhead in C++ programs. Conference Record of the Twenty-First ACM Symposium on Principles of Programming Languages (January 1994), 397--408.
....the issue of accurate control flow prediction assumes significant importance. Using type determination to replace virtual invocations with function calls, when the target function is known at compile time, would yield benefits comparable to those obtained by profile based prediction for C [CG94]. A uniquely resolved call site would eliminate pipeline stalls, as the target of the call is unambiguously known. A related empirical study of behavioral differences between C and C programs reveals that C functions are usually small (in terms of lines of code) and C programs ....
.... resolution of virtual functions seems to contradict the encouraged usage of polymorphism in object oriented languages, run time analyses have shown that polymorphism is used selectively, leaving much room for unique resolution of non polymorphic calls as well as optimization of polymorphic calls [CG94]. Outline Section 2 presents key concepts used in our analysis and defines the ICFG, our internal graph representation of the C program. We also state our theoretical complexity theorem and outline its proof. Section 3 introduces our approximation algorithm and compares it to the conditional ....
[Article contains additional citation context not shown here]
B. Calder and D. Grunwald. Reducing indirect function call overhead in C ++ programs. In Conference Record of the Twenty-first Annual ACM Symposium on Principles of Programming Languages, pages 397--408, January 1994.
....can be used to: 1) compact applications by removing methods that are never called, and (2) improve the eciency and accuracy of subsequent interprocedural analyses. Of course, virtual method resolution is not a new problem. It has been widely studied for a variety of object oriented languages[7, 8, 9, 10, 12, 14, 18, 16, 21, 24, 28, 29, 30]. The focus of this paper is the development and evaluation of a new simple and inexpensive technique for resolving virtual method calls in Java. A main design objective was to develop a technique that would produce a solution in one iteration and thus scales linearly in the size of the program. ....
B. Calder and D. Grunwald. Reducing indirect function call overhead in C++ programs. In 21st Symposium on Principles of Programming Languages, pages 397-408, Jan. 1994.
....cross input stability of receiver class profiled in C and Cecil and found it good enough to be used for optimization. Until now, few profile based techniques have been applied to hybrid, statically typed languages like Modula 3 or C . Based on measurements of C programs, Calder and Grunwald [CG94] argued that type feedback would be beneficial for C and proposed (but did not implement) a weaker form of class hierarchy analysis to improve efficiency. Their estimate of the performance benefits of this optimization (2 24 improvements, excluding benefits from inlining) exceeds the ....
Brad Calder and Dirk Grunwald. Reducing Indirect Function Call Overhead in C++ Programs. In 21st Annual ACM Symposium on Principles of Programming Languages, p. 397-408, January 1994.
....shows that there is only one function implementation to be called, we can eliminate the indirect branch. Even if there exist multiple function implementations, we can replace the dispatch code with a series of run time class tests, each branching to a direct procedure call implementing that case [44, 120]. 95 Profile data can be used to issue tests for the most frequently visited targets of the virtual function call [19, 110] In general though, tracking the run time pointer type requires some kind of type or alias analysis, which is known to be NP complete [39] The version of the Compaq C ....
....to only one target. A BTB is ineffective for predicting indirect branches which can jump to multiple targets. When the BTB mispredicts, the existing target address is replaced. A 2 bit counter can be used to limit the update of the target address to only after two consecutive mispredictions occur [120]. We will refer to this configuration as BTB2b. The BTB2b can produce a better branch prediction ratio for C applications by taking advantage of the locality exhibited by the targets of virtual function calls. In [35] Kaeli and Emma describe a mechanism which accurately predicts the targets of ....
B. Calder and D. Grunwald. Reducing Indirect Function Call Overhead in C++ Programs. In Proceedings of the 21st Annual Symposium on Principles of Programming Languages, pages 397--408, January 1994. 188
....benefit analysis is of critical importance in reducing the amount of time and space used for profiling. Our system focuses on code specialization, since that optimization is implemented in alto. If we change the optimizations used in alto, for instance to reduce indirect function call overhead [25, 26], we need not change the profiler significantly. Since the profiler shares the cost benefit decisions with the profiler, the profiler would simply select different program points, based on the expected benefit. 4.1 A Cost Model for Value Profile Based Code Specialization Our approach uses value ....
B. Calder and D. Grunwald, "Reducing Indirect Function Call Overhead in C++ Programs", Proc. 21st ACM Symposium on Principles of Programming Languages, Jan. 1994, pp. 397-- 408.
.... very small cache size) miss LC 250 a LC miss cost (find method in class dictionaries) conservative estimate based on data in [127] h IC 95 inline caching hit ratio; from [127] and [68] miss IC 80 a L LC IC miss cost; from [68] m 66 fraction of calls from monomorphic call sites (PIC) 68][14] k 3.54 dynamic number of type tests per PIC stub (from SELF [71] p 10 average branch misprediction rate (estimate from [65] Table 8. Additional parameters influencing performance 51 The performance of some dispatch techniques depends on additional parameters (listed in Table 8) In order ....
....programs will see different dispatch costs if their polymorphism characteristics (and thus their inline cache hit ratios) vary. The data presented so far assume a hit ratio of 95 which is typical for Smalltalk programs [127] but may not represent other systems. For example, Calder et al. [14] report 1 Even though the number of cycles per dispatch increases, dispatch time will decrease since P97 will operate at a higher clock frequency. Thus, while the dispatch cost rises relative to the cost of other operations, its absolute performance still increases. P92 P95 P97 static dyn. ....
[Article contains additional citation context not shown here]
Brad Calder and Dirk Grunwald. Reducing Indirect Function Call Overhead in C++ Programs. In 21st Annual ACM Symposium on Principles of Programming Languages, p. 397-408, January 1994.
....ffl Instruction scheduling ffl Register allocation ffl Code hoisting ffl Common subexpression elimination ffl Partial redundancy elimination ffl Resolving method invocations ffl Delta Delta Delta 1. 2 Type Analysis Many algorithms for type analysis have been proposed in literature [1, 13, 7, 15, 19, 32, 47, 82, 85, 86, 84, 83] (see Chapter 9 for a review) and they range from fast algorithms that only look at the types to exponential time algorithms incorporating context sensitive flow analysis. However, prior work has two major shortcomings. First, it ignores the situation when the full program is unavailable for ....
....between method invocations Figure 2.2 Instruction distribution between method invocations on the SPARC for M2toM3 There are two ways to improve the control flow information in the compiler. First, program analysis may reduce the set of possible procedures called at each method invocation [1, 13, 7, 15, 19, 32, 39, 47, 82, 85, 86, 84, 83] . This analysis is effective only on monomorphic method invocations. Second, a program may be transformed so that the performance critical method invocations can be converted to direct calls. An example is splitting, which duplicates code to reduce the number of types possible at each method ....
[Article contains additional citation context not shown here]
Calder, B. and Grunwald, D. Reducing indirect function call overhead in C++ programs. In 21st Symposium on Principles of Programming Languages, pages 397--408, Portland, Oregon, Jan. 1994. ACM.
....for hybrid languages: Calder Grunwald, Aigner Hlzle, Bacon Sweeney, and Pande Ryder have looked at optimizing C , and Fernandez and Diwan, Moss, McKinley have looked at optimizing Modula 3. Calder and Grunwald consider several ways of optimizing dynamically dispatched calls in C [Calder Grunwald 94] They examined some characteristics of the class distributions of several C programs and found that although the potential polymorphism was high, the distributions seen at individual call sites were strongly peaked, suggesting that profile guided receiver class prediction would pay off. ....
Brad Calder and Dirk Grunwald. Reducing Indirect Function Call Overhead in C++ Programs. In Conference Record of POPL '94: 21st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 397--408, Portland, Oregon, January 1994.
.... be important in statically typed hybrid languages like C [Stroustrup 91] Calder and Grunwald argue how C programs can be sped up with this optimization, particularly on modern processors where branch prediction hardware supports conditional branches much better than indirect procedure calls [Calder Grunwald 94] To perform receiver class prediction at a particular call site, some sort of expected class frequency distribution is needed for that call site. The Deutsch Schiffman Smalltalk 80 system [Deutsch Schiffman 84] and the Self 89 [Chambers Ungar 89] and Self 91 [Chambers 92] systems in effect ....
Brad Calder and Dirk Grunwald. Reducing Indirect Function Call Overhead in C++ Programs. In Conference Record of POPL '94: 21st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 397--408, Portland, Oregon, January 1994.
....is a last value predictor; it uses the most recently seen target to predict the next target for a branch. The indexing function is usually formed by taking the lower order bits of the branch address. A n bit wrap around counter normally controls replacement in a direct mapped BTB. Experiments in [14] showed that a 2 bit counter has been able to overall reduce the number of mispredictions compared to a configuration where a target is replaced on a single misprediction. The 2 bit counter provides hysteresis allowing a target address to be replaced only after two consecutive mispredictions (we ....
B. Calder and D. Grunwald. Reducing Indirect Function Call Overhead in C++ Programs. In Proceedings of the 21st Annual Symposium on Principles of Programming Languages, pages 397--408, January 1994.
....the Java interpreter [74] showing that they can compile and produce fairly efficient native code from Java bytecodes. We are investigating more advanced optimizations such as code layout, data layout, and method optimizations, which we have researched in the past in other contexts [23] 25] 69] [24]. G.2.3.2 Operating System Support An operating system controls local resources, and allocates them to processes. When an agent arrives at an Active Web resource server, it will generally expect a certain level of performance, especially if it is willing to pay. For example, an agent may need to ....
Brad Calder and Dirk Grunwald, "Reducing indirect function call overhead in C++ programs, " In Proceedings of the 21st Annual ACM Symposium on Principles of Programming Languages (POPL `94), pp. 397-408, Jan. 1994. 62
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC