12 citations found. Retrieving documents...
John Kalamatianos and David R. Kaeli, \Temporal-based procedure reordering for improved instruction cache performance," Proceedings of the 4th Intl. Conference on High Performance Computer Architecture, Feb. 1998.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Trends in High-Performance, Low-Power Cache Memory.. - Inoue, Moshnyaga, Murakami (2001)   (Correct)

....3.2.5 Optimizing Data Placement Conflict misses take place when two data compete for a cache location. If we can re allocate one of the competing data address, the conflict miss can be avoided. Data placement optimization is a static approach to reducing the conflict misses [58] 44] 17] [29]. 3.3 Making Cache Miss Penalty Smaller As explained in Section 2, there are at least three approaches to minimizing the miss penalty: 1) improving the DRAM access time, 2) reducing the amount of data to be replaced, and 3) increasing the memory bandwidth. The DRAM access time can be improved by ....

J. Kalamatianos and D. R. Kaeli,"Temporal-based Procedure Reordering for Improved Instruction Cache Performance, " Proc. of the 4th International Symposium on High-Performance Computer Architecture, pp. 244--253, Jan./Feb. 1998.


Efficient Dynamic Procedure Placement - Scales (1998)   (3 citations)  (Correct)

....RT balls4 67.18s 65.13s m88ksim 171.29s 170.31s perl 69.93s 68.45s gcc 10.40s 10.86s go 150.48s 151.63s Oracle DSS 1 8.77s 8.24s Oracle DSS 2 4.91s 4.44s Oracle DSS 3 3.28s 2.99s Oracle DSS 4 2.48s 2. 30s Table 3: Application Run Times 5 Related Work There has been a variety of work [2, 3, 4, 5, 6, 7, 8, 10, 12] in improving the instruction cache behavior of user applications, with most of it focusing on statically laying out the instructions of an application using profile information from a previous run. Samples and Hilfinger [10] did early work in using edge frequencies between basic blocks to lay out ....

....between basic blocks to lay out basic blocks, while McFarling [7] used basic block frequency counts. Pettis and Hansen [8] developed a simple algorithm for placing procedures based on procedure frequency counts that is frequently used. Several more complex but potentially more effective algorithms [4, 5, 6] have been developed recently, including some that make use of a full trace of procedure calls. Most of these systems have been run only on user level simulators, and the authors report the changes in instruction miss rates, but not changes in execution time. Several systems [3, 12] have recently ....

J. Kalamatianos and D. Kaeli. Temporal-Based Procedure Reordering for Improved Instruction Cache Performance. In Proceedings of the Fourth International Symposium on HighPerformance Computer Architecture, pages 244--253, Feb. 1998.


Software Trace Cache for Commercial Applications - Ramirez, Larriba-Pey..   (Correct)

....et al. method can be found in [19] Gloy et al. 7] extend the Pettis Hansen placement algorithm at the procedure level to consider the temporal relationship between procedures in addition to the target cache information and the size of each procedure. Hashemi et al. [8] and Kalamaitianos et al. [13] use a cache line coloring algorithm inspired in the register coloring technique to map procedures so that the resulting number of con icts is minimized. Techniques developed for VLIW processors, like Trace Scheduling [5] also identify the most frequent execution paths in a program. But these ....

John Kalamaitianos and David R. Kaeli. Temporal-based procedure reordering for improved instruction cache performance. Proceedings of the 4th Intl. Conference on High Performance Computer Architecture, February 1998.


Code Reordering of Decision Support Systems for.. - Ramírez.. (1998)   (Correct)

....frequently referenced basic blocks. Gloy et al. 7] extend the Pettis Hansen placement algorithm at the procedure level to consider the temporal relationship between procedures in addition to the target cache information and the size of each procedure. Hashemi et al. [8] and Kalamaitianos et al. [13] use a cache line coloring algorithm inspired in the register coloring technique to map procedures so that the resulting number of conflicts is minimized. Their algorithm is based on either a dynamic profile of the code or on static estimations based on heuristics. For more aggressive ....

John Kalamaitianos and David R. Kaeli, Temporal-based Procedure Reordering for Improved Instruction Cache Performance, Proceedings of the 4th Intl. Conference on High Performance Computer Architecture, Februray 1998.


Software Trace Cache - Ramírez, Larriba-Pey.. (1999)   (3 citations)  (Correct)

....University of Illinois at Urbana Champaign, USA. blocks in consecutive memory positions and, therefore, increase the number of useful instructions fetched per access. Unfortunately, past work on code reordering techniques has largely focused on simply reducing the instruction cache miss rate [9, 11, 14, 6, 5, 8]. This approach made sense in the context of the simple, less aggressive processors for which past work was done. However, in the modern, wide issue superscalars, ensuring that sequentially executed instructions are mapped in consecutive memory positions can be more crucial than keeping the number ....

....results across all tested setups. Gloy et al. 5] extend the Pettis Hansen placement algorithm at the procedure level to consider the temporal relationship between procedures in addition to the target cache information and the size of each procedure. Hashemi et al. [6] and Kalamaitianos et al. [8] use a cache line coloring algorithm inspired in the register coloring technique to map procedures so that the resulting number of conflicts is minimized. Their algorithm is based on either a dynamic profile of the code or on static estimations based on heuristics. To further increase the number ....

John Kalamaitianos and David R. Kaeli. Temporal-based procedure reordering for improved instruction cache performance. Proceedings of the 4th Intl. Conference on High Performance Computer Architecture, February 1998.


A Comparison of Software Code Reordering and Victim Buffers - Bahar, Calder, Grunwald (1999)   (3 citations)  (Correct)

.... Recent work on procedure placement to improve instruction cache performance shows that further improvements in performance are achieved by keeping track of where cache lines procedures are placed to eliminate conflict misses, and by using temporal information to guide the placement algorithm [8, 7, 11]. This research showed that the cache miss rate iL1 buffer data access instruction access from Processor L1 Inst Cache (iL1) L1 Data Cache (dL1) Unified L2 cache (UL2) swap Figure 1: Configuration of the memory hierarchy in the base simulator is significantly reduced by taking the cache ....

John Kalamatianos and David Kaeli. Temporal-based procedure reordering for improved instruction cache performance. In 4thIntl. Symp. on High Performance Computer Architecture, February 1998.


Microarchitectural and Compile-Time Optimizations for.. - Kalamatianos (2000)   (1 citation)  Self-citation (Kalamatianos)   (Correct)

....page, procedure and basic block. Traditionally page repositioning algorithms have targeted the improvement of the average memory access time [62, 63, 64, 65] Some of them require some form of operating system support. Procedure reordering also focuses on improving the memory access time [52, 54, 55, 66, 67, 68]. Basic block techniques can be roughly characterized as intra or interprocedural. Intraprocedural rearrange blocks strictly within the procedure boundaries while interprocedural move block globally. Branch alignment is a form of basic block positioning technique that attempts to minimize the ....

....guided by the control flow of the program (e.g. loops, procedure calls, etc. It uses the liveness of a procedure s basic blocks to guide the conflict miss estimation algorithm. This worst case behavior model weights edges in a procedure graph. We call this graph a Conflict Miss Graph (CMG) [68]. We use a CMG to place procedures in the cache address space so that conflict misses between critically interacting procedure pairs is minimized. Cache allocation is performed by a cache line coloring algorithm, similar to the algorithm introduced in [52] The CMG edge weights determine the ....

[Article contains additional citation context not shown here]

J. Kalamatianos and D.R. Kaeli. Temporal-Based Procedure Reordering for Improved Instruction Cache Performance. In Proceedings of the International Conference on High Performance Computer Architecture, pages 244--253, February 1998.


Accurate Simulation and Evaluation of Code Reordering - Kalamatianos, Kaeli (2000)   Self-citation (Kalamatianos Kaeli)   (Correct)

....use either control flow analysis [12, 13] and or profile data [18, 15, 7] Recently, several systems have been proposed that attempt to reorder code at run time [16, 3] Besides the issue of when to reorder, there remains the issue of what granule size to use when reordering. Procedure reordering [7, 6, 11], interprocedural basic block reordering [18, 9] and combined basic block and procedure reordering [15, 10, 4] are the most popular approaches. In this paper we present an approach that provides a single pass simulation of multiple code reordering algorithms and their accurate cycle based ....

....our framework is capable of weighting edges according to three different graph models: a) a Call Graph (CG) b) a Temporal Relationship Graph (TRG) and (c) a Conflict Miss Graph. All of these models have been successfully applied in procedure placement: CG in [15, 7] TRG in [6, 1] and CMG in [11]. Although the CG exploits interaction between procedures that directly call each other, the TRG and the CMG exploit temporal interaction between procedures that lie even further apart in the call chain, or even between procedures that lie in different call chains [5] To capture this interaction, ....

J. Kalamatianos and D. Kaeli. Temporal-Based Procedure Reordering for Improved Instruction Cache Performance. In Proceedings of the International Conference on High Performance Computer Architecture, pages 244--253, February 1998.


Analysis of Temporal-Based Program Behavior for.. - Kalamatianos.. (1999)   (1 citation)  Self-citation (Kalamatianos Kaeli)   (Correct)

....Once coloring has been performed, each pruned node must be mapped. The nodes are laid out in the opposite order of their deletion. III. Conflict Miss Graphs Next we consider cache misses which can occur between procedures many procedures away in the call graph, as well as on different call chains [14]. We capture temporal information by weighting the edges of a procedure graph with an estimation of the worst case number of conflict misses that can occur between any two procedures. We then use the graph to apply cache line coloring to place procedures in the cache address space. We call this ....

....graph with an estimation of the worst case number of conflict misses that can occur between any two procedures. We then use the graph to apply cache line coloring to place procedures in the cache address space. We call this graph a Conflict Miss Graph (CMG) The complete algorithm is described in [14]. We summarize it here and will contrast it with the CGO in Section V using Inter Reference Gap analysis. A. Conflict Miss Graph Construction The CMG is built using profile data. We assume a worstcase scenario where procedures completely overlap in the cache address space every time they ....

[Article contains additional citation context not shown here]

J. Kalamatianos and D.R. Kaeli, "Temporal-based Procedure Reordering for improved Instruction Cache Performance," in Proceedings of the International Conference on High Performance Computer Architecture, February 1998.


Cache Line Coloring Using Real and Estimated Profiles - Hashemi, Kalamatianos..   Self-citation (Kalamatianos Kaeli)   (Correct)

....achieved using call edge profiles to guide the optimizations in order to eliminate first generation cache conflicts. We are currently investigating how to apply our algorithm to use full path profiling and other collection techniques in order to collect improved temporal locality information [17]. ....

J. Kalamatianos and D.R. Kaeli. Temporal-based procedure reordering for improved instruction cache performance. In Proceedings of the 4th International Conference on High Performance Computer Architecture, pages 244--253, Las Vegas, NV, February 1998.


Instruction Fetch Architectures and Code Layout.. - Ramirez, Larriba-Pey..   (Correct)

No context found.

John Kalamatianos and David R. Kaeli, \Temporal-based procedure reordering for improved instruction cache performance," Proceedings of the 4th Intl. Conference on High Performance Computer Architecture, Feb. 1998.


Analyzing the Working Set Characteristics of Branch Execution - Kim, Tyson (1998)   (7 citations)  (Correct)

No context found.

J. Kalamatianos and D. Kaeli, "Temporal-based Procedure Reordering for Improved Instruction Cache Performance ", in Proceedings of the 4th International Symposium on High Performance Computer Architecture, 1998.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC