| K. Bolland and A. Dollas. "Predicting and Precluding Problems with Memory Latency". IEEE Micro, vol. 14, no. 4, Aug. 1994, pp.59-67. |
....memory access latencies. Our evaluation of this mechanism shows that the miss rate of the cache is improved, on average, by 18 in addition to a significant reduction in the bandwidth requirement. 1. Introduction This paper introduces an approach to alleviate the growing memory latency problem [3] by better exploiting the spatial locality exhibited by applications. Spatial locality is the tendency of neighboring memory locations to be referenced close together in time. The traditional approach for exploiting spatial locality is to use a cache line (consisting of several words) to fetch ....
K. Bolland and A. Dollas. Predicting and precluding problems with memory latency. In IEEE Micro, pages 59--66, August, 1994.
....can effectively exploit value locality to collapse true dependencies, reduce average memory latency and bandwidth requirements, and provide measurable performance gains. 3. 1 Introduction and Related Work The gap between main memory and processor clock speeds is growing at an alarming rate [36]. As a result, computer system performance is increasingly dominated by the latency of servicing memory accesses, particularly those accesses which are not easily predicted by the temporal and spatial locality captured by conventional cache memory organizations [37] Conventional cache memories ....
K. Roland and A. Dollas. "Predicting and precluding problems with memory latency." IEEE Micro, 14(4):59--67, 1994.
....significant proportion of the cost of data cache misses can be eliminated or reduced with SPAID without unduly increasing memory traffic. 1. Introduction It is well known that processor clock speeds are increasing exponentially over time, while memory speeds are not increasing nearly as rapidly [RD94] The computing industry has reached the point where system performance is dominated by the cost of servicing cache misses. To address this problem, several instruction set architectures (e.g. PowerPC [IBM93] include non blocking prefetch instructions that allow the hardware to overlap cache ....
K. Roland and A. Dollas. Predicting and precluding problems with memory latency. IEEE Micro, 14(4):59--67, 1994.
....a 53 bytes cell it will then take about 180 nanoseconds for the transmission. The switching delay is at most 180 nanosecond since we assume that the switch is not the bottleneck. Assume a random access time of 70 nanoseconds to a CAS DRAM and 10 nanoseconds thereafter for each 4 bytes accessed[16][19]. For a 48 bytes payload this gives a total of 190 nanoseconds. For a cacheline of 256 bytes the access time will be 710 nanoseconds. Altogether, the round trip latency for one cell sums up to 2190 nanoseconds including the memory access time. This estimation is a bit conservative. It is likely ....
K. Bolland and A. Dollas, Predicting and Precluding Problems with Memory Latency, IEEE Micro, August 1994, pp 59-67.
....Since the mid 1980 s, the performance of microprocessor based machines has improved at a rate of between 50 and 100 per year. In contrast, DRAM speeds have improved at only 7 per annum [16] These figures translate into a doubling of the gap between CPU and DRAM performance once every 6. 2 years [7]. 1982 1984 10 DRAM 1988 1990 1992 CPU (fast) CPU (slow) 10 3 4 5 Year Performance 10 10 2 1986 1980 Figure 3.1 : Performance trends for processors and DRAM storage hardware [15] The CPU (fast) curve assumes a 100 per annum improvement from 1985 onwards, while the CPU (slow) curve ....
....performance, it hinges on the assumption that CPU performance improvements can be maintained at the (very high) rate of 80 per annum. This trend is attributable mainly to shrinkage of component sizes, which is in turn made possible by continued improvements in processor fabrication technology [7, 15]. Shrinkage of chip components has a twofold benefit: 1. Smaller components can be clocked at higher rates since signal propagation delays are reduced in smaller circuits; and 2. A larger number of components can be incorporated into a given area of silicon, and this facilitates the implementation ....
[Article contains additional citation context not shown here]
K. Boland and A. Dollas. Predicting and precluding problems with memory latency. IEEE Micro, 14(4):59--67, August 1994.
....in Section 5. 2 The Problem and Related Work 2.1 Introduction Since the mid 1980s, CPU speeds have been improving at 50 to 100 per year, while DRAM latency has only been improving at 7 per year [HP96] resulting in a doubling of the time in instructions to access DRAM every 6. 2 years [BD94] Caches have traditionally been used to bridge the DRAM CPU speed gap. However, the growing gap presents increasing challenges for cache designers. This section summarizes some improvements to the basic design of caches, as well as attempts at hiding the basic latency of DRAM by more ....
K. Boland and A. Dollas. Predicting and precluding problems with memory latency. IEEE Micro, 14(4):59--67, August 1994.
....the dynamic reference analysis. Our technique is fully compatible with existing Instruction Set Architectures. Results from detailed simulations of several integer programs show significant speedups. 1 Introduction As improvements in processor performance outpace that of main memory performance [1], the cache miss penalty will dominate the cycle counts of many applications. The large improvements in processor performance are due both to better circuit design and fabrication technology, which reduce the cycle time, and to better Instruction Level Parallelism (ILP) techniques, which increase ....
K. Boland and A. Dollas, "Predicting and precluding problems with memory latency," IEEE Micro, pp. 59--66, August 1994.
....reference costs increase. Keywords: memory hierarchy, caches, paging, software controlled replacement Computing Review Categories: B.3, C.4, D.4. 2 1 Introduction There has been much discussion in recent years of the memory wall [28, 17, 27] a consequence of a growing CPU DRAM speed gap [12, 5]. One approach to dealing with this growing CPUDRAM speed gap is to focus on reduction of misses, even strategies which make each miss cost more. For example, past work on software controlled caches [9] may be more revelant today than when it was first done, given the increased cost of misses. ....
....Since the mid 1980s, when load store (RISC) microprocessors became common, the trend in CPU speed improvement has been 50 to 100 per year, while DRAM speed improvement has only been 7 per year [13] leading to a doubling of DRAM reference costs, relative to CPU speed, every 6. 2 years [5]. While there have been many attempts at reducing or hiding the costs of DRAM references [26, 6, 18, 5] if current trends continue, it will not be long before miss costs of thousands of instructions are commonplace. For example, it is possible to buy a DEC Alpha system today running at 500MHz ....
[Article contains additional citation context not shown here]
K. Boland and A. Dollas. Predicting and precluding problems with memory latency. IEEE Micro, 14(4):59--67, August 1994.
....DRAM reference costs increase. keywords: memory hierarchy, caches, paging, software controlled replacement CR categories: B.3, C.4, D.4. 2 1 Introduction There has been much discussion in recent years of the memory wall [WM95, Joh95, Wil95] a consequence of a growing CPU DRAM speed gap [HJ91, BD94] One approach to dealing with this growing CPU DRAM speed gap is to focus on strategies for reducing misses, even if those strategies make each miss cost more. For example, work on softwarecontrolled caches in the past [CSB86] may be more revelant today than when it was first done, given the ....
....Since the mid 1980s, when load store (RISC) microprocessors became common, the trend in CPU speed improvement has been 50 to 100 per year, while DRAM speed improvement has only been 7 per year [HP95] leading to a doubling of DRAM reference costs, relative to CPU speed, every 6. 2 years [BD94] While there have been many attempts at reducing or hiding the costs of DRAM references [SF91, CB92, Jou90, BD94] if current trends continue, it will not be long before miss costs of thousands of instructions are commonplace. For example, it is possible to buy a DEC Alpha system today running ....
[Article contains additional citation context not shown here]
K. Boland and A. Dollas. Predicting and precluding problems with memory latency. IEEE Micro, 14(4):59--67, August 1994.
....has increased by a factor of over 10 in approximately 6 years between the two generations of Silicon Graphics multiprocessor. This order of magnitude increase in miss cost does not contradict the prediction that it takes 6. 2 years for the cost of memory access to double in terms of clock cycles [Boland and Dollas 1994]: there have been other changes in the memory system in the newer design, including doubling the cache block size from 64 bytes to 128 bytes, and a change in emphasis from lowering L2 miss costs to lowering L1 miss costs (in part, the change is necessary because L2 cache sizes have increased more ....
....has been in the range 50 to 100 per year (see figure 2.1) On the other hand, DRAM speed improvement over the same period has been approximately 7 per year [Hennessy and Patterson 1995] On average, in recent years, it has taken 6. 2 years for the cost of memory access in clock cycles to double [Boland and Dollas 1994]. 28 AN OBJECT ORIENTED LIBRARY FOR SHARED MEMORY PARALLEL SIMULATIONS performance 1 10 100 0.1 1965 1970 1975 1980 1985 1990 source: Hennessy and Jouppi 1991] supercomputers mainframes minicomputers microprocessors Figure 2.1 CPU Performance Trends As a consequence of these trends, caches ....
[Article contains additional citation context not shown here]
K Boland and A Dollas. Predicting and Precluding Problems with Memory Latency, IEEE Micro, vol. 14 no. 4 August 1994, pp 59--67.
....is growing steadily. Since the mid1980s, performance of CPUs had tended to double every 12 to 18 months, while DRAM performance has only improved by about 7 per year [Hennessy and Patterson 1995] Consequently, it takes 6. 2 years for the cost of memory access to double in terms of clock cycles [Boland and Dollas 1994]. Cache sizes especially second level (L2) cache sizes are increasing in an attempt at reducing misses to DRAM. Recent designs have L2 caches of over 1Mbyte. Increases in cache sizes and miss costs may invalidate some of the underlying assumptions in cache design. For example, a cache is ....
Boland and A Dollas [1994] K Boland and A Dollas. Predicting and Precluding Problems with Memory Latency, IEEE Micro, vol. 14 no. 4 August 1994, pp 59--67.
....as is explained in 3.3. 3 The Problem and Related Work 3.1 Introduction Since the mid 1980s, CPU speeds have been improving at 50 to 100 per year, while DRAM latency has only been improving at 7 per year [HP96] resulting in a doubling of the time in instructions to access DRAM every 6. 2 years [BD94] Caches have traditionally been used to bridge the CPU DRAM speed gap. However, the growing gap presents increasing challenges for cache designers. This section summarizes some improvements to the basic design of caches, as well as attempts at hiding the basic latency of DRAM by more ....
K. Boland and A. Dollas. Predicting and precluding problems with memory latency. IEEE Micro, 14(4):59--67, August 1994.
....reduction of conflict and capacity misses by utilizing small blocks and small fetch sizes when spatial locality is absent, and the prefetching effect of large fetch sizes when spatial locality exists. 1 Introduction This paper introduces an approach to solving the growing memory latency problem [1] by intelligently exploiting spatial locality. Spatial locality refers to the tendency for neighboring memory locations to be referenced close together in time. Traditionally there have been two main approaches used to exploit spatial locality. The first approach is to use larger cache blocks, ....
K. Boland and A. Dollas, "Predicting and precluding problems with memory latency," IEEE Micro, pp. 59--66, August 1994.
....can effectively exploit value locality to collapse true dependencies, reduce average memory latency and bandwidth requirements, and provide measurable performance gains. 1. Introduction and Related Work The gap between main memory and processor clock speeds is growing at an alarming rate [RD94] As a result, computer system performance is increasingly dominated by the latency of servicing memory accesses, particularly those accesses which are not easily predicted by the temporal and spatial locality captured by conventional cache memory organizations [Smi82] Conventional cache ....
K. Roland and A. Dollas. Predicting and precluding problems with memory latency. IEEE Micro, 14(4):59-- 67, 1994.
No context found.
K. Bolland and A. Dollas. "Predicting and Precluding Problems with Memory Latency". IEEE Micro, vol. 14, no. 4, Aug. 1994, pp.59-67.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC