| Koopman, Philip J., Lee, Peter, and Siewiorek, Daniel. Cache behavior of combinator graph reduction. ACM Transactions on Programming Languages and Systems, 14, 2 (April 1992) 265--297. |
....collection. Wilson, Lam and Moher measured a bytecode Scheme implementation [19] that used a copying generational garbage collector, and simulated multiple cache con gurations. They concluded that tting the allocation area in cache would help locality greatly. Koopman, Lee and Siewiorek [12] evaluated various cache con gurations for an SK combinator graph reduction language. Examining very small programs, they found that write allocate caches gave much better performance than write no allocate caches, because most data is referenced almost immediately after allocation. Diwan, ....
P. J. Koopman, Jr., P. Lee, and D. P. Siewiorek. Cache behavior of combinator graph reduction. ACM Transactions on Programming Languages and Systems, 14(2):265-297, Apr. 1992.
....prior work in understanding the cache behavior of programs, we are not aware of any study that correlates cache behavior to high level properties such as types. Some prior work tries to understand and improve the cache behavior of heap loads by measuring the cache impact of garbage collection [12, 18, 23, 30, 32, 33]. Mowry and Luk [22] also attempt to improve the e#ectiveness of latency tolerance techniques by applying them only to cache misses. They identify instructions that are likely to miss in the cache using correlation profiling, which, for instance, predicts whether a load will hit or miss in the ....
P. J. Koopman, Jr., P. Lee, and D. P. Siewiorek. Cache behavior of combinator graph reduction. Transactions on Programming Languages and Systems, 14(2):265--277, Apr. 1992.
....placement caches allocate a cache line when a store instruction references a location not currently residing in the cache. This organization is used in current workstations (e.g. the DECstation 5000 TM series) and has been shown to be effective for programs with intensive heap allocation [Koopman et al. 1992], Reinhold 1993] Diwan et al. 1995] We do not use the original SPARCstation 2 cache configuration because it suffers from large variations in cache miss ratios caused by small differences in code and data positioning (we have observed variations of up to 15 of total execution time) With the ....
KOOPMAN, P., LEE, P., AND SIEWIOREK, D. 1992. Cache behavior of combinator graph reduction. ACM Transactions on Programming Languages and Systems 14 (2), 265-297,.
.... and found very low data cache overheads for the same cache organization (write allocate, subblock placement) Similar results have also been reported by Reinhold for Scheme programs [Rei93] by Jouppi for the SPEC benchmark suite [Jou93] and by Koopman et al. for combinator graph reduction [KLS92]. Such low data cache overheads leave little room for improvement through special cache features (e.g. PS89, WW90] Figure 6. Instruction cache miss ratios of SELF programs (direct mapped cache, 32 byte lines) l l l l l l l 2 2 2 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4 4 5 5 5 5 5 5 5 6 6 6 6 6 6 6 ....
....small caches, and thus there is little need for special object oriented caches. Write allocate caches with subblock placement reduce read miss ratios by up to a factor of two, and write miss ratios by a factor of ten. These findings are consistent with other work for non object oriented languages [KLS92, Jou93, Rei93, DTM94]. Instruction cache size significantly impacts performance. For example, doubling the instruction cache from 32K to 64K improves performance by 15 on a SPARCstation 2. This improvement is higher than that of any OO specific architectural feature we considered. Our results contradict (and, we ....
Philip Koopman, Peter Lee, and Daniel Siewiorek. Cache behavior of combinator graph reduction. TOPLAS 14(2): 265297, 1992.
.... programs and found very low data cache overheads for the same cache organization (writeallocate, subblock placement) Similar results have also been reported by Reinhold for Scheme programs [Rei93] by Jouppi for the SPEC benchmark suite [Jou93] and by Koopman et al. for combinator graph reduction [KLS92]. Such low data cache overheads leave little room for improvement through special cache features (e.g. PS89] WW90] Diwan et al. [DTM94] also observed that the data cache overhead of ML programs increased substantially with a write noallocate policy, i.e. with a cache that does not allocate ....
....small caches, and thus there is little need for special object oriented caches. Write allocate caches with subblock placement reduce read miss ratios by up to a factor of two, and write miss ratios by a factor of ten. These findings are consistent with other work for non object oriented languages ([KLS92], Jou93] Rei93] DTM94] Instruction cache size significantly impacts performance. For example, doubling the instruction cache from 32K to 64K improves performance by 15 on a SPARCstation 2. This improvement is higher than that of any OOspecific architectural feature we considered. Our ....
Philip Koopman, Peter Lee, and Daniel Siewiorek. Cache behavior of combinator graph reduction. TOPLAS 14(2): 265-297, 1992.
....For caches with subblock placement, the data cache overhead was under 9 for a 64K of larger data cache; without subblock placement the overhead was often higher than 50 . 1 Introduction Heap allocation with copying garbage collection is widely believed to have poor memory subsystem performance [30, 38, 48, 49, 50]. To investigate this, we conducted an extensive study of memory subsystem performance of heap allocation intensive programs on memory subsystem organizations typical of many workstations. The programs, compiled with the SML NJ compiler [4] do tremendous amounts of heap allocation, allocating one ....
....performs poorly on machines whose caches are smaller than the allocation area of the programs (256K or larger for the benchmarks studied here) and do not have one or more of the features mentioned above; this includes most current workstations. Our work differs from previous reported work [30, 38, 48, 49, 50] on memory subsystem performance of heap allocation in two important ways. First, previous work used the overall miss ratio as the performance metric, which is a misleading indicator of performance. The overall miss ratio neglects the fact that read and write misses may have different costs. Also, ....
[Article contains additional citation context not shown here]
Koopman, Jr., P. J., Lee, P., and Siewiorek, D. P. Cache behavior of combinator graph reduction. Transactions on Programming Languages and Systems 14, 2 (Apr. 1992), 265--277.
....garbage collection, generational garbage collection, heap allocation, page mode, subblock placement, write buffer, write back, write miss policy, write policy, write through 1. INTRODUCTION Heap allocation with copying garbage collection is widely believed to have poor memory system performance [Koopman, Jr. et al. 1992; Peng and Sohi 1989; Wil A paper containing some of the results presented in this paper appeared in the 21st Annual Symposium on Principles of Programming Languages. This research is sponsored by the Defense Advanced Research Projects Agency, DoD, through ARPA Order 8313, and monitored by ....
....performs poorly on machines whose caches are smaller than the allocation area of the programs (256K or larger for the benchmarks studied here) and do not have one or more of the features mentioned above; this includes most current workstations. Our work differs from previous reported work [Koopman, Jr. et al. 1992; Peng and Sohi 1989; Wilson et al. 1990; Wilson et al. 1992; Zorn 1991] on memorysystem performance of heap allocation in two ways. First, previous work used the overall miss ratio as the performance metric, which is a misleading indicator of performance. The overall miss ratio neglects the fact ....
[Article contains additional citation context not shown here]
Koopman, Jr., P. J., Lee, P., and Siewiorek, D. P. 1992. Cache behavior of combinator graph reduction. Transactions on Programming Languages and Systems 14, 2 (April), 265--277.
.... garbage collection, heap allocation, cache memories, dynamic storage management, applicative (functional) programming, Standard ML, simulation, performance of systems 1 Introduction Heap allocation with copying garbage collection is widely believed to have poor memory subsystem performance [30, 37, 38, 23, 39]. To investigate this, we conducted an extensive study of memory subsystem performance of heap allocation intensive programs on memory subsystem organizations typical of many workstations. The programs, compiled with the SML NJ compiler [3] do tremendous amounts of heap allocation, allocating one ....
....word combined with write allocate on writemiss, a write buffer and page mode writes, and cache sizes of 32K or larger. Heap allocation performs poorly on machines which do not have one or more of these features; this includes most current workstations. Our work differs from previous reported work [30, 37, 38, 23, 39] on memory subsystem performance of heap allocation in two important ways. First, previous work used overall miss ratios as the performance metric and neglected the potentially different costs of read and write misses. Overall miss ratios are misleading indicators of performance: a high overall ....
[Article contains additional citation context not shown here]
Philip J. Koopman, Jr., Peter Lee, and Daniel P. Siewiorek. Cache behavior of combinator graph reduction. Transactions on Programming Languages and Systems, 14(2):265--277, April 1992.
....fetching the contents of memory blocks in which every word is written before being read. Jouppi has demonstrated that write validate can yield significant performance improvements for C and Fortran programs [16] Koopman et al. first noted the benefits of this policy for garbage collected programs [18]. The impact of fetch on write upon the performance of the test programs will be discussed briefly in x5. The temporal cost of writing data to main memory, which depends upon the write hit policy, is not analyzed in detail. Properties of practical memory systems and of the test programs ....
Philip J. Koopman, Jr., Peter Lee, and Daniel P. Siewiorek. Cache behavior of combinator graph reduction. ACM Transactions on Programming Languages and Systems, 14(2):265-- 297, April 1992.
....that many machines support heap allocation poorly. However, with the appropriate memory subsystem organization, heap allocation can have good memory subsystem performance. 1 Introduction Heap allocation with copying garbage collection is widely believed to have poor memory subsystem performance [31, 38, 39, 24, 40]. To investigate this, we conducted an extensive study of memory subsystem performance of heap allocation intensive programs on memory subsystem organizations typical of many workstations. The programs, compiled with the SML NJ compiler [3] do tremendous amounts of heap allocation, allocating one ....
....word combined with writeallocate on write miss, a write buffer and page mode writes, and cache sizes of 32K or larger. Heap allocation performs poorly on machines which do not have one or more of these features; this includes most current workstations. Our work differs from previous reported work [31, 38, 39, 24, 40] on memory subsystem performance of heap allocation in two important ways. First, previous work used overall miss ratios as the performance metric and neglected the potentially different costs of read and write misses. Overall miss ratios are misleading indicators of performance: a high overall ....
[Article contains additional citation context not shown here]
Philip J. Koopman, Jr., Peter Lee, and Daniel P. Siewiorek. Cache behavior of combinator graph reduction. Transactions on Programming Languages and Systems, 14(2):265--277, April 1992.
....essence, the main conclusion of these papers is that programs with dynamic heap allocation tend to have bad cache performance and either hardware techniques, or software techniques, or a combination of both, must be used in order to improve cache performance. More recent work, by Koopman et al. [27], Diwan et al. 14] and Reinhold [31, 32] has shown that some cache design features, already available on current machines, can eliminate all of the allocation misses. Moreover, Reinhold shows that sequential allocation, due to the fact that it tends to spread memory references uniformly across ....
....on a fetch on write cache divided by the number of reads. This curve is shown to give an idea of the fraction of misses that are eliminated by write validate and write around caches. Write validate is the best organization for fast allocating programs, a result already shown by Koopman et al. [27] and Diwan et al. 14] It eliminates all of the write misses without adding many read misses, as can been seen from the comparison with the read miss ratio of fetch on write caches. On the write around caches, on the other hand, most write misses of the fetch on write cache simply become read ....
[Article contains additional citation context not shown here]
Philip J. Koopman, Jr., Peter Lee, and Daniel P. Siewiorek. Cache behavior of combinator graph reduction. ACM TOPLAS, 14(2):265--297, April 1992.
.... to collect run time data during development and testing, and then use the collected profile information in optimizing the code for final delivery [Wal91] Koopman and Lee obtained improvements in the performance of a lazy functional language by implementing graph reduction as self modifying code [KLS92]. And, of course, there have been countless other applications of self modifying code. In this paper, we report on our experience with a new approach to generating optimized code at run time. We have implemented a prototype compiler, which we call Fabius, that can automatically compile a general ....
....might be complicated by the fact that run timegenerated code may contain embedded pointers to other data and code objects; this can occur if pointers are inlined like other values during optimization. 7 Run time code generation and modification can interact poorly with modern memory hierarchies [KLS92]. Most modern architectures prefetch instructions into an instruction cache and many do not automatically invalidate cache entries when memory writes occur. Cache flushing may therefore be required when dynamically generating or modifying code [Kep91] The regularity of code space allocation and ....
Philip J. Koopman, Jr., Peter Lee, and Daniel P. Siewiorek. Cache behavior of combinator graph reduction. ACM Transactions on Programming Languages and Systems, 14(2):265--297, April 1992.
No context found.
Koopman, Philip J., Lee, Peter, and Siewiorek, Daniel. Cache behavior of combinator graph reduction. ACM Transactions on Programming Languages and Systems, 14, 2 (April 1992) 265--297.
No context found.
Philip J. Koopman, Peter Lee, and Daniel P. Siewiorek. Cache behavior of combinator graph reduction. ACM Transactions on Programming Languages and Systems, 14(2):265--297, April 1992.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC