| S. McFarling, "Program Optimization for Instruction Caches, " Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 183--191, April 1989. |
....with the goal of more efficiently using the instruction cache. Several compile time code placement techniques have been proposed that use heuristics and profile information to reduce the number of conflict misses in the primary (firstlevel or L1) instruction cache by reordering the program code [3, 18, 26, 27, 36]. Most of this work uses cache parameters such as cache size and line size as well as procedure sizes to accurately model the cache mapping of the code. The code placement algorithms typically use some kind of profile information to find a cache mapping that reduces cache conflict misses. These ....
....use a WCG, which contains only information about pairs of procedures connected by a direct call and no information about the temporal ordering of procedure calls. As an example, Figure 2 shows two programs that result in the same WCG but have substantially different temporal behavior. McFarling [26] uses profile data that incorporates loop counts and probabilities for conditionals, but still retains the limitations mentioned above. Basic block transitions, used by Torrellas et al. 36] share these limitations. Our technique is based on a profiling scheme that captures important information ....
[Article contains additional citation context not shown here]
S. McFarling, "Program optimization for instruction caches." Proc. ASPLOS-III: 3rd Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, p.183, April 1989.
....the working set size and cache con icts. There has been prior work in changing code layout at the function level at compile time as well as dynamically. There have also been e orts to exploit program locality dynamically at other levels of granularity. Pettis and Hanson [20] Scott McFarling [18], Hat eld and Gerald [11] Gloy and Smith [15] have presented 3 methods to rearrange the procedures, which comprise the executable, based on pro le data to improve memory locality. Most of these use pro le data in the form of a weighted call graph (WCG) In a WCG, there is a node for each ....
Scott McFarling. Program optimization for instruction caches. In Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, pages 183{ 191. ACM, 1989.
....the working set size and cache con icts. There has been prior work in changing code layout at the function level at compile time as well as dynamically. There have also been e orts to exploit program locality dynamically at other levels of granularity. Pettis and Hanson [21] Scott McFarling [19], Hat eld and Gerald [13] Gloy and Smith [15] have presented methods to rearrange the procedures, which comprise the executable, based on pro le data to improve memory locality. Most of these use pro le data in the form of a weighted call graph (WCG) In a WCG, there is a node for each ....
Scott McFarling. Program optimization for instruction caches. In Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, pages 183-191. ACM, 1989.
....in which blocks b and d are packed into the same cache line. The number of cache misses is greatly reduced in this case. Here we have illustrated the application of code layout optimization at the basic block level. Techniques for layout optimization at procedural level have also been developed [29]. 5. CONCLUDING REMARKS In this chapter we have identi ed optimization opportunities that may exist during program execution but cannot be exploited without the availability of pro le data. Di erent types of pro le data that are useful for code optimization were identi ed. The use of this pro le ....
S. McFarling, \Program Optimization for Instruction Caches," The 3th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 183-191, April 1989.
....a varied set of methods for automatic cache control instruction insertion. 2.2 Memory Exploration in Embedded Systems Cache memory issues have been studied in the context of embedded systems. McFarling presents techniques of code placement in main memory to maximize instruction cache hit ratio [10, 14]. A model for partitioning an instruction cache among multiple processes has been presented [7] Panda, Dutt and Nicolau present techniques for partitioning on chip memory into scratchpad memory and cache [12] The presented algorithm assumes a fixed amount of scratchpad memory and a fixed size ....
S. McFarling. Program Optimization for Instruction Caches. In Proceedings of the 3 rd Int'l Conference on Architectural Support for Programming Languages and Operating Systems, pages 183--191, April 1989.
.... OF THE CACHE AREA AND LATENCY MODEL: A AREA, L LATENCY [13] conflict misses in large direct mapped instruction caches has been proposed [3] Static code repositioning by using cache line coloring at the procedure or basic block level has been an alternative approach proposed and evaluated in [12], 13] and [27] Similar technique for profile driven data repositioning has been proposed in [26] III. PERFORMANCE MODELING In this section, we describe the hardware performance models for caches and processor cores. Three factors combine to influence system performance: cache miss rates, ....
S. McFarling, "Program optimization for instruction caches," in Proc. Int. Conf. Architectural Support for Programming Languages and Operating Systems, 1989, pp. 183--191.
....this work is to extend their work to static profiling. The advantages of the availability of static profiling are quite obvious. Many parts of optimizing compilers rely on profile data to perform good optimizations, for example trace scheduling [Fis81] register allocation [Wal86] and code motion [McF89]. In general these optimizations take advantage of locating those 10 of the program code, in which 90 of the run time is spent. Up to now it is common practice to get profile information by running a program and measuring the interesting data (e.g. block counts) by an appropriate tool like prof, ....
Scott McFarling. Program optimization for instruction caches. In Third International Symposium on Architectural Support for Programming Languages and Operating Systems, April 1989. Published as Computer Architecture News 17(2).
....TPC C benchmarks on AlphaServers. 6 Discussion and Related Work Code layout optimizations were originally proposed to reduce the working set size of applications for virtual memory [8, 11, 10] More recent work has focused on the reduction of branch mispredicts and cache misses. McFarling [18] describes an algorithm that uses the loop and call structure of a program to determine which parts of the program should overlay each other in the cache and which parts should be assigned to non conflicting addresses. Hwu and Chang [13] describe a profile based algorithm which uses function ....
S. McFarling. Program optimization for instruction caches. Proceedings of the 3rd Intl. Conferenceon Architectural Support for Programming Languages and Operating Systems, pages 183--191, Apr. 1989.
....FSM predictors are used in a few areas of computer architecture, and summarize initial results for using automated FSM predictors to guide confidence estimation used to guide value prediction. 6. 1 Cache Management Cache management schemes have been proposed that perform intelligent replacement [16], cache exclusion [29] and they use a small FSM counter to determine when the optimization should be applied. In addition, prefetching architectures have used FSM predictors to determine when to initiate prefetching for a load and to guide stream buffer allocation [25] 6.2 Power Control Manne ....
S. McFarling. Program optimization for instruction caches. In Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS III), pages 183--191, April 1989.
....of the mispredict recovery time. These two factors have long been identified as fetch performance bottlenecks and are relatively well researched topics. The design of instruction caches has been studied in great detail in order to lessen the impact of instruction cache misses on fetch bandwidth [31] [36] Likewise, there have been many studies done to improve branch prediction accuracy [16] 25] To date, the techniques developed to reduce instruction cache misses and increase branch prediction accuracy have been very successful in improving fetch bandwidth. However, as the issue rates for ....
S. McFarling, "Program Optimization for Instruction Caches," Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, April 1989
....space to a small region in the cache (useful for memory mapped devices) 4.2 Memory Exploration in Embedded Systems Cache memory issues have been studied in the context of embedded systems. McFarling presents techniques of code placement in main memory to maximize instruction cache hit ratio [8, 16]. A model for partitioning an instruction cache among multiple processes has been presented [6] Panda, Dutt and Nicolau present techniques for partitioning on chip memory into scratchpad memory and cache [11] The presented algorithm assumes a fixed amount of scratchpad memory and a fixed size ....
S. McFarling. Program Optimization for Instruction Caches. In Proceedings of the 3 rd Int'l Conference on Architectural Support for Programming Languages and Operating Systems, pages 183--191, April 1989.
....cache is explored. 2. If an outer loop does not fit in the cache there is no reuse from one iteration of this loop to the 2 next. In this case, innermost loops still may exhibit reuses. 3. There is no instruction cache interference. This problem can be addressed independently with methods [13, 12, 25] to lay out the code such as no or few interferences occur. We only consider the first level instruction cache. 4. Data cache misses are assumed invariant by loop unrolling. We could verify this assumption in our experiments 1 . 5. It is assumed that the compiler generates the code for ....
S.McFarling. Program optimization for instruction caches. In Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS III), pages 183--191, April 1989.
....counts, is the most important, especially for integer programs. The second use, delaying register reuse, is more important for floating point programs where scheduling for long operation latency is important. 3. 7 Code layout Our code layout algorithm is essentially the same as Pettis and Hansen [4,5,6,9]. Its goal is to reduce instruction cache misses and improve instruction fetch by using profile information to guide the layout of code in memory. We found that the algo # # Figure 4: Function from xlisp where rarely called is useful 6 rithm worked well, except in its handling of branches for ....
S. McFarling, "Program Optimization for Instruction Caches," ASPLOS III Proceedings, Boston, Mass. (April 1989): 183-193.
....cloned program, something that currently consumes an inordinate amount of compile time. When we apply the profiling based code placement to larger programs, we will no doubt see its effectiveness diminished, but fortunately, several more sophisticated algorithms can be found in the literature [8, 7, 6]. We would also like to find a better heuristic method, since profile guided methods are less convenient to use. Also, we have not yet looked at aligning procedures at cache line boundaries which might further decrease the miss rate. ....
Scott McFarling. Program optimization for instruction caches. In Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, pages 183--191, 1989.
No context found.
S. McFarling, "Program Optimization for Instruction Caches, " Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 183--191, April 1989.
No context found.
MCFARLING, S. 1989. Program optimization for instruction caches. In Proceedings of the 3rd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-III, Boston, MA, Apr. 3-- 6), J. Emer, Chair. ACM Press, New York, NY, 183--191.
No context found.
S. McFarling. Program optimization for instruction caches. In 3rd International Conference on Architectural Support for Programming Languages and Operating Systems, volume 24, pp. 183--191, Boston, MA, April 1989.
No context found.
S. McFarling. Program optimization for instruction caches. In Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS III), pages 183--191, April 1989. 29
No context found.
S. McFarling. Program optimization for instruction caches. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'89), pages 183--191, Apr. 1989.
No context found.
pp. McFarling, \Program Optimization for instruction caches", Proceedings of ASPLOS III, 1989
No context found.
S. McFarling. "Program optimization for instruction caches," Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, April 1989.
No context found.
S. McFarling. Program optimization for instruction caches. In Third International Conference on Architectural Support for Programming Languages and Operating Systems, pages 183--191, Apr. 1989.
No context found.
McFarling, S., "Program Optimization for Instruction Caches," Third International Conference on Architectural Support for Programming Language and Operating Systems (April 1989) pp. 183- 191.
No context found.
McFarling, S. Program optimization for instruction caches. In Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, Boston, MA, ACM, 183-191, 1989.
No context found.
S. McFarling, "Program Optimization for Instruction Caches," ASPLOS III Proceedings, Boston, Mass. (April 1989): 183--193.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC