| A. Diwan, D. Tarditi, and J. E. B. Moss. Memory subsystem performance of programs using copying garbage collection. In Conference Record of the Twenty-First ACM Symposium on Principles of Programming Languages, pages 1--14, Portland, OR, Jan. 1994. |
....the fundamental structure of the cache; changes only a#ect how addresses are computed before being applied to the cache. Hardware changes will not a#ect any critical path. Performance benefits are possible for non numerical codes; for example, heap allocation with copying garbage collection [9]. We demonstrate the e#ectiveness of our scheme by showing speedups of 1.4 to 2.6 on a set of representative, array intensive loop nests. In the next section, we discuss previous work. In Section 3, we describe the local memory scheme in detail; we discuss the new instructions required, the ....
A. Diwan, D. Tarditi, and E. Moss. Memory subsystem performance of programs using copying garbage collection. In 21st Symposium on Principles of Programming Languages, pages 1--13. ACM, January 1994.
....as we do in this paper. There have also been a number of papers investigating the effect of heap organization on reference locality in garbage collected languages [6, 13, 19] including several recent papers that specifically consider the effect 2 of garbage collection on cache performance [7, 15, 20, 21]. This work differs from ours in its focus. While much of the related garbage collection work has investigated how generational garbage collection interacts with processor cache architecture, none of the previous work we are aware of has attempted to classify objects using profiles and segregate ....
Amer Diwan, David Tarditi, and Eliot Moss. Memory subsystem performance of programs using copying garbage collection. In Conference Record of the 21st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'94), pages 1--14, Portland, Oregon, January 17--21, 1994. ACM Press.
....locality as we do in this paper. There have also been a number of papers investigating the e#ect of heap organization on reference locality in garbage collected languages [5, 14] including several recent papers that specifically consider the e#ect of garbage collection on cache performance [7, 21, 23]. This work di#ers from ours in its focus. While much of the related garbage collection work has investigated how generational garbage collection interacts with processor cache architecture, none of the previous work we are aware of has attempted to classify objects using profiles and segregate ....
Amer Diwan, David Tarditi, and Eliot Moss. Memory subsystem performance of programs using copying garbage collection. In ACM SIGPLAN-SIGACT POPL'94, pages 1--14, Portland, Oregon, January 17--21, 1994. ACM Press.
....the semantics of the virtual machine may not match the semantics of the language being compiled (e.g. the exception semantics) Even if the semantics happen to match, the engineering tradeoffs may differ dramatically. For example, functional languages like Haskell or Scheme allocate like crazy (Diwan, Tarditi, and Moss 1993), and JVM implementations are typically not optimised for this case. Finally, a virtual machine typically comes complete with a very large infrastructure class loaders, verifiers and the like that may well be inappropriate. Our intended level of abstraction is much, much lower. Our problem ....
Diwan, A, D Tarditi, and E Moss. 1993 (January). Memory subsystem performance of programs using copying garbage collection. In 21st ACM Symposium on Principles of Programming Languages (POPL'94), pages 1--14.
....locality as we do in this paper. There have also been a number of papers investigating the e#ect of heap organization on reference locality in garbage collected languages [6, 13, 19] including several recent papers that specifically consider the e#ect of garbage collection on cache performance [7, 15, 20, 21]. This work di#ers from ours in its focus. While much of the related garbage collection work has investigated how generational garbage collection interacts with processor cache architecture, none of the previous work we are aware of has attempted to classify objects using profiles and segregate ....
Amer Diwan, David Tarditi, and Eliot Moss. Memory subsystem performance of programs using copying garbage collection. In Conference Record of the 21st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'94), pages 1--14, Portland, Oregon, January 17--21, 1994. ACM Press.
....will provide strong support for the thesis claim. I plan to use a combination of wall clock time, heap size, instruction counts, and cache miss counts to answer the questions. The instruction counts and cache miss information will be collected via the simulation tools of Tarditi, Diwan, and Moss [21]. This information should serve as a predictor of performance for future architectures. ffl Representation of types: Is too much sharing lost by flattening a particular type How is program time affected by choosing to flatten a type How is program locality affected by choosing to ....
D. Tarditi, A. Diwan, and E. Moss. Memory subsystem performance of programs using copying garbage collection. In Conference Record of the 21st Annual ACM Conference on Principles of Programming Languages, pages 1--14, Jan. 1994.
....are also compelling reasons for providing support for stacks. First, Appel and Shao s work did not consider imperative languages, such as Java, where the ability to share environments is greatly reduced nor did it consider languages that do not require garbage collection. Second, Tarditi and Diwan [13, 12] have shown that with some cache architectures, heap allocation of continuations (as in SML NJ) can have substantial overhead due to a loss of locality. Third, stack based activation records can have a smaller memory footprint than heap based activation records. Finally, many machine architectures ....
Amer Diwan, David Tarditi, and Eliot Moss. Memory subsystem performance of programs using copying garbage collection. In Twenty-First ACM Symposium on Principles of Programming Languages, pages 1--14, January 1994.
....invalid until overwritten (write validate) Jouppi found that Write validate and write around always outperform fetch on write. In general writevalidate outperforms write around since data just written is more likely to be accessed soon again than data read previously. Diwan, Tarditi, and Moss [3] show that garbagecollected programs run much faster on writevalidate caches than on write around, since all initializing writes are misses, and almost all such data will be read back soon (incurring a read miss on a write around cache) The technique described in this paper is useful whenever: ....
Amer Diwan, David Tarditi, and Eliot Moss. Memory subsystem performance of programs using copying garbage collection. In Proc. 21st Annual ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages, pages 1--14. ACM Press, 1994.
....requirements imposed by the semantics of SML. Allocation rates SML NJ has extremely high allocation rates; as much as an order of magnitude higher than traditional Lisp or Smalltalk systems [Ung86, Zor89] Tarditi and Diwan have measured allocation rates of one word per 6 10 instructions executed [DTM94]. This is primarily because of the absence of a stack; the role of stack frames is played by heap allocated continuation closures [App92] This high allocation rate has two important consequences: one, there is a guaranteed high mortality rate for newly allocated objects, and two, the mutator ....
Diwan, A., D. Tarditi, and E. Moss. Memory subsystem performance of programs using copying garbage collection. In To appear in POPL'94, January 1994.
....not have a significantly better locality of reference than heap allocated activation records, even in a modern cache memory hierarchy. Stacks do have a much better write miss ratio, but not a much better read miss ratio. But on many modern machines, the write miss penalty is approximately zero [23, 16, 7]. 3. The amortized cost of collection can be very low [1, 7] especially with modern generational garbage collection techniques [36] The major contribution of our paper is a safe for space closure conversion algorithm that integrates and improves most previous closure analysis techniques [26, ....
Amer Diwan, David Tarditi, and Eliot Moss. Memory subsystem performance of programs using copying garbage collection. In Proc. 21st Annual ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages, pages 1-- 14. ACM Press, 1994.
....their lifetime; short lived objects are treated separately from long lived ones. Modern generational garbage collectors [App90] exhibit good overall performance, often with an overhead of under 40 (including the overhead of maintaing garbage collector invariants, and reasonable cache performance [DTM94] However, it retains many of the problems mentioned above: pauses, memory consumption, and interoperability. There are a number of techniques for reducing or eliminating the garbage collection pauses, including the incremental marksweep collector of Dijkstra et al. [DLM 78] and the ....
Amer Diwan, David Tarditi, and Eliot Moss. Memory subsystem performance of programs using copying garbage collection. In Proc. of the 21st Annual ACM Symposium on Principles of Programming Languages, pages 1--14, January 1994.
....However, a close look at our real time measurements in Figures 14 17 suggests the opposite: Anomaly A ukk and mcc look slightly superlinear, even the worse for increasing k. Anomaly B On the contrary, lazyTree looks closer to linear, and even the better for increasing k. It is well known [DTM94] that on today s pipelined processors with multi level caching, the performance of the memory subsystem can significantly affect a program s execution time. The size ratio between the resident page set and the on chip cache determines the chance of cache hits; as this ratio increases, more ....
A. Diwan, D. Tarditi, and E. Moss. Memory Subsystem Performance of Programs Using Copying Garbage Collection. In Proceedings of the 21st ACM Symposium on Principles of Programming Languages, Portland, OR, January 1994, pages 1--14, 1994.
....access patterns in Standard ML of New Jersey Darko Stefanovic Department of Computer Science University of Massachussets May 5, 1994 The increasing gap between processor speed and main memory speed has given rise to considerable interest in memory system performance of programs. A recent study[2] of programs written in Standard ML under the New Jersey implementation[1] pointed out that in a system with high allocation and few updates the allocated memory tends to be read back soon after allocation. The authors provide circumstantial evidence from cache simulation studies to support this ....
....pointed out that in a system with high allocation and few updates the allocated memory tends to be read back soon after allocation. The authors provide circumstantial evidence from cache simulation studies to support this assertion. This kind of access pattern favours different cache organisations[2] from a read compute write pattern that one may expect in traditional language settings. Since proper understanding of the access patterns is seen to be important, we undertook to measure it directly. Our apparatus consists of the Standard ML of New Jersey compiler, version 0.93 (with the Sparc ....
[Article contains additional citation context not shown here]
Amer Diwan, David Tarditi, and J. Eliot B. Moss. Memory subsystem performance of programs using copying garbage collection. In Conference Record of the Twenty-First ACM Symposium on Principles of Programming Languages, pages 1--14, Portland, Oregon, January 1994.
....reasons for providing support for stacks. First, almost all compilers use a stack based architecture. Second, Tarditi and Diwan have shown that with the wrong kind of cache architecture, heap allocation of continuations (as in SML NJ) can have substantial overhead due to a loss of locality [DTM95, DTM94] Third, Appel and Shao do not consider imperative languages, such as Java, where the ability to share environments is greatly reduced nor do they consider languages that do not require garbage collection. Finally, many machine architectures have hardware devices that expect programs to behave in ....
Amer Diwan, David Tarditi, and Eliot Moss. Memory subsystem performance of programs using copying garbage collection. In Twenty-First ACM Symposium on Principles of Programming Languages, pages 1--14, January 1994.
....in the expression serve as the extent marker for bindings. Opinions vary on the usefulness of stack based structures. There are several claims that stack based closure allocation does not outperform a heap based closure allocation scheme that uses a sophisticated compile time garbage collection [6, 39, 55]. The debate is far from resolved: Colby and Lee [22] explore the question whether a compact closure allocation scheme can improve the effectiveness of the cache memory to the point of achieving better execution times. They investigate the behavior of several closure allocation strategies. On the ....
Diwan, A., Tarditi, D., and Moss, J. E. Memory subsystem performance of programs using copying garbage collection. In POPL'94 [111], pp. 1--14.
....also compelling reasons for providing support for stacks. First, Appel and Shao s work did not consider imperative languages, such as Java, where the ability to share environments is greatly reduced, nor did it consider languages that do not require garbage collection. Second, Tarditi and Diwan [14, 13] have shown that with some cache architectures, heap allocation of continuations (as in SML NJ) can have substantial overhead due to a loss of locality. Third, stack based activation records can have a smaller memory footprint than heap based activation records. Finally, many machine ....
Amer Diwan, David Tarditi, and Eliot Moss. Memory subsystem performance of programs using copying garbage collection. In Twenty-First ACM Symposium on Principles of Programming Languages, pages 1--14, January 1994.
....a significantly better locality of reference than heap allocated activation records, even in a modern cache memory hierarchy. Stacks do have a much better write miss ratio, but not a much better read miss ratio. But on many modern machines, the write miss penalty is approximately zero [Jou93, DTM94, Rei94] 3. The amortized cost of collection can be very low [App87] also see Chapter 5) especially with modern generational garbage collection techniques [Ung86] The major contribution of this chapter is a safe for space closure conversion algorithm that integrates and improves most ....
....read miss penalty is not too large and the write miss penalty is zero. 5.6.1 Write misses The Standard ML of New Jersey compiler [AM91] uses no stack; all frames are allocated on the garbage collected heap. If any system should have poor cache locality, this is the one. Diwan, Tarditi, and Moss [DTM94] simulated the memory hierarchy performance of SML NJ on a DECstation 5000, and found two things: ffl SML NJ program executions have an astoundingly high write miss ratio. ffl SML NJ programs are not much delayed by cache misses. The reason these two statements are not inconsistent, they ....
[Article contains additional citation context not shown here]
Amer Diwan, David Tarditi, and Eliot Moss. Memory subsystem performance of programs using copying garbage collection. In Proc. 21st Annual ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages, pages 1-- 14. ACM Press, 1994.
....if the read miss penalty is not too large and the write miss penalty is zero. 5.1 Write misses The Standard ML of New Jersey compiler [7] uses no stack; all frames are allocated on the garbagecollected heap. If any system should have poor cache locality, this is the one. Diwan, Tarditi, and Moss[15] simulated the memory hierarchy performance of SML NJ on a DECstation 5000, and found two things: ffl SML NJ program executions have an astoundingly high write miss ratio. ffl SML NJ programs are not much delayed by cache misses. The reason these two statements are not inconsistent, they ....
....marked as allocated but invalid. Thus, a write miss does not require reading the rest of the written cache line from memory. Subsequent (sequential) writes will fill the rest of the line. One word cache line: The DECstation 5000 has a cache line size of one word, but four lines are read on a miss [15]. For some applications this is better than sub block placement, but for sequential writes it is equally good. It is more expensive to implement, since it requires a full tag (not just a valid bit) for each word. Diwan et al. found excellent memory subsystem performance of SML NJ on this ....
Amer Diwan, David Tarditi, and Eliot Moss. Memory subsystem performance of programs using copying garbage collection. In Proc. 21st Annual ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages, pages 1--14. ACM Press, 1994.
....associated with maintaining and using remembered sets. There have been a number of papers concerning this issue, but the work by Hosking [7, 6] is perhaps the most complete. In general, these studies complement our current work. There have been a number of studies relating cache performance and GC [12, 19, 4], but they are concerned with a different set of issues than we are. Zorn has compared the cost of copying and mark and sweep collection [20] and the cost of conservative collection to malloc and free allocation [21] but again he does not provide the same multiple language and platform context ....
A. Diwan, D. Tarditi, and J. E. B. Moss. Memory Subsystem Performance of Programs using Copying Garbage Collection. In Conference Record of the Twenty-first Annual ACM Symposium on Principles of Programming Languages, ACM SIGPLAN Notices. ACM Press, Jan. 1994.
....version with heap allocated frames, and a Stack version with stack allocated frames. The simulations counted read misses, write misses, and total instruction count of SML programs compiled to the MIPS instruction set. Our simulations include the instructions and cache misses of garbage collection. Diwan et al. 1994) measured a heap only ML system; Reinhold and Moss (1994) measured a stack frame Scheme system. In order to make a more direct comparison, we measured stack frames vs. heap frames in the same ML system. We simulated only the primary data cache. We simulated direct mapped caches of sizes ranging ....
....times (after adjustment) of several benchmarks using Heap and Stack frames, running in simulated caches of di#erent sizes. We simulated a write allocate cache with partial fill, and also a write around cache. Jouppi (1993) simulated both kinds of cache for C programs without garbage collection; Diwan et al. 1994) simulated both caches for (almost) purely heapallocating ML programs. By simulating both caches on stack and heap allocation for the same programs, we can compare more straightforwardly. The results are not too surprising: write allocate is better on all programs than write around; and heap ....
Diwan, Amer, David Tarditi, and Eliot Moss. 1994. Memory subsystem performance of programs using copying garbage collection. In Proc. 21st Annual ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages, pages 1--14. ACM Press.
....is very high: on a DECStation 5000 200, a typical value is 16 megabytes second. A large part of allocation is due to function closure records and callee save register records; less than a quarter is due to data records, and a large fraction of these are three words long, that is, cons cells [12]. There is no check for allocation overflow on individual allocations; instead, checks are performed at function entry points only. A function in the SML NJ intermediate representation has one entry point, several exit points and no loops. The compiler can statically determine the maximum possible ....
Amer Diwan, David Tarditi, and J. Eliot B. Moss. Memory subsystem performance of programs using copying garbage collection. In Conference Record of the Twenty-First ACM Symposium on Principles of Programming Languages, Portland, Oregon, January 1994.
....Here I continued to work on understanding and improving the performance of garbage collection. I conducted two performance studies of programs compiled with the SML NJ compiler. In the first study, I tested the commonly held belief that garbage collection leads to poor memory system performance (Diwan et al. 1994; Diwan et al. 1995) I showed this belief to be false: given the right memory system hardware, such as that in the DECstation 5000, garbage collection can have excellent memory system performance. In the second study, I itemized the cost of a garbage collection implementation into its components ....
Diwan, A., Tarditi, D., and Moss, J. E. B. (1994). Memory subsystem performance of programs using copying garbage collection. In Conference Record of the Twenty-First ACM Symposium on Principles of Programming Languages, pages 1--14, Portland, Oregon.
No context found.
A. Diwan, D. Tarditi, and J. E. B. Moss. Memory subsystem performance of programs using copying garbage collection. In Conference Record of the Twenty-First ACM Symposium on Principles of Programming Languages, pages 1--14, Portland, OR, Jan. 1994.
No context found.
A. Diwan, D. Tarditi, and J. E. B. Moss. Memory subsystem performance of programs using copying garbage collection. In Conference Record of the Twenty- rst Annual ACM Symposium on Principles of Programming Languages, ACM SIGPLAN Notices. ACM Press, Jan. 1994.
No context found.
Amer Diwan, David Tarditi, and J. Eliot B. Moss. Memory subsystemperformance of programs using copying garbage collection. In Conference Record of the Twenty-First ACM Symposium on Principles of Programming Languages, Portland, Oregon, January 1994.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC