| N. Jouppi, S. Wilton "Tradeoffs in Two-Level On-Chip Caching," 21st ISCA, April 1994, pp. 34--45. |
....first level data cache. In Figure 3 we see that the average IPC for an 1 thread wokload decreases with the increase of the associativity of the first level data cache. 5. 2 On chip Cache Hierarchy The Alpha 21164 has a 2 level on chip cache hierarchy, based on research done by Wilton and Jouppi[3]. That arrangement enables both fast L1 access time and high on chip hit rates. This scheme makes particular sense when L1 latency is critical to performance. Although we are in the opposite scenario here, we can use a small L1 cache to hide the latency effects of an L2 on chip cache designed for ....
N.P. Jouppi and S.J.E. Wilton. Tradeoffs in two-level on-chip caching. In 21st Annual International Symposium on Computer Architecture, pages 34--45, April 1994.
....proposed previously do not immediately target these subblocks for leakage optimization. Thus, the mechanism proposed in this paper can be applied in conjunction with other existing leakage control mechanisms. Two level exclusive cache schemes have also been proposed for improving performance [10]. Our technique mimics exclusion by putting a duplicated copy to sleep mode. In this paper, we make the following major contributions: We present a circuit level mechanism to implement state preserving (data retaining) leakage control at an L2 subblock granularity and compare its effectiveness ....
N. Jouppi and S. Wilton. Tradeoffs in two-level on-chip caching. In ISCA-21, pp. 34--45. IEEE Computer Society Press, 1994.
....the processor on which the memory reference originates. Additional work has examined the efficiency of memory hierarchies and proposed mechanisms to balance processor and memory system performance. Jouppi et al. studied the best cache size for two level cache hierarchy of single core processors [15]. That research explored the trade offs be tween miss rates and latencies of various cache sizes. Their result indicated two level caches perform better than singlelevel caches with the same chip area. Fartens et al. studied the area efficiency of single chip systems by comparing a single core ....
N. P. Jouppi and S. J. Wilton. Tradeoffs in two-level on-chip caching. In The 23th Annual International Symposium on Computer Architecture, pages 34-45, April 1994.
....and concludes with future directions. 2 Related Research Much work has been put into the front end architecture in an effort to improve the rate of instruction delivery to the execution core. Techniques to reduce the impact of I cache misses include multi level instruction memory hierarchies [17] and instruction prefetch [40] Techniques to reduce the impact of branch mispredictions include hybrid [21] and indirect [6] branch predictors, and recovery miss caches to reduce misprediction latencies [2] A number of compiler based techniques work to improve instruction delivery performance. ....
N. P. Jouppi and S. J. E. Wilton. Tradeoffs in two-level on-chip caching. In Proceedings of the 21st Annual International Symposium on Computer Architecture, pages 34--45, April 1994.
....and main memory speeds is making it increasingly difficult to adequately service memory requests with just a single level of cache placed between the processor and main memory. Many memory systems, therefore, link a series of successively larger (but slower) cache memories into a memory hierarchy [Short88, Baer87, Baer88, Przybylski89, Przybylski90, Happel92, Kessler91, Olukotun91, Jouppi94, Wang89]. In such a system, the processor first references the fastest cache in the hierarchy, and on a miss tries to find the requested data by referencing each of successively larger caches in the hierarchy. The overall performance of such a multi level cache hierarchy depends on the number of misses ....
Jouppi, N. and Wilton, S. Tradeoffs in two-level on-chip caching. In Proceedings of the 21st Annual International Symposium on Computer Architecture, Chicago, IL, IEEE Computer Society Press, 34-45, 1994.
....the processor on which the memory reference originates. Additional work has examined the efficiency of memory hierarchies and proposed mechanisms to balance processor and memory system performance. Jouppi et al. studied the best cache size for two level cache hierarchy of single core processors [15]. That research explored the trade offs between miss rates and latencies of various cache sizes. Their result indicated two level caches perform better than singlelevel caches with the same chip area. Farrens et al. studied the area efficiency of single chip systems by comparing a single core ....
N. P. Jouppi and S. J. Wilton. Tradeoffs in two-level on-chip caching. In The 23th Annual International Symposium on Computer Architecture, pages 34--45, April 1994.
....buffer or a combination of the two a cache assist. A cache assist needs to have a high degree of associativity, and it needs to have an access time equal to that of the level of cache utilizing it, i.e. its access time is very small. This imposes a limit on the size of the cache assist memory. In [8] it is shown that for any CMOS process technology the cache size cannot be This work was supported in part by the DARPA ITO under Grant DABT63 98 C0045. increased too much without causing an increase in cycle time and access time. When both a victim cache and a stream buffer are desirable, ....
Norman P. Jouppi and Steven J. E. Wilton. Tradeoffs in two-level on-chip caching. In Proc. 21st Annual Symposium on Computer Architecture, 1994.
....and maintaining software tools that reflect that complexity, simulation infrastructure is now widely shared among both academic and industry researchers. A majority of the papers published in recent conferences use shared tools such as the SimpleScalar tools [4] RSim [17] SimOS [19] and CACTI [10]. While some tools, such as CACTI, have been validated against real hardware, none of the microarchitecture simulators have been subject to such scrutiny. This lack of performance validation may be consistently introducing error into experimental studies, which, if sufficiently large, may cause ....
N. P. Jouppi and S. J. E. Wilton. Tradeoffs in two-level on-chip caching. In Proceedings of the 21st Annual International Symposium on Computer Architecture, April 1994.
....a single cycle. If the memory request hits in the L 2 cache, it takes 8 cycles to satisfy the request. When the request misses both the L 1 and L 2 caches, the total access delay is 50 cycles. For setassociative caches, we assume that the cycle time is lengthened by up to 20 as suggested in [11], and adjust the miss penalties accordingly. For the victim, column associative, and groupassociative caches, an extra delay is encountered when the requested data is present in an alternative location. Due to the fact that the processor pipeline is increasingly complex and difficult to turn ....
....and the alternative locations should be considered in evaluating the performance of the various cache organizations. Recall that in our simulation model, we assume that a hit to an alternative location in the victim, column and group associative caches takes 3 cycles. Based on results in [11], we also assume that the set associative design lengthens the cycle time by up to 20 . Figure 7 summarizes the average memory access time for the data references with various cache organizations. Note that all the results are normalized to the direct mapped cycle time. Due to the longer cycle ....
N. Jouppi and S. Wilton "Tradeoffs in Two-Level On-Chip Caching," Proc. 21st Int'l Symp. Comp. Arch., Chicago, IL, April 1994, pp. 34-- 45.
....buffer or a combination of the two a cache assist. A cache assist needs to have a high degree of associativity, and it needs to have an access time equal to that of the level of cache utilizing it, i.e. its access time is very small. This imposes a limit on the size of the cache assist memory. In [8] it is shown that for any CMOS process technology the cache size cannot be increased too much without causing an increase in cycle time and access time. When both a victim cache and a stream buffer are desirable, their relative sizes have to be selected within the bounds of the (small) cache ....
Norman P. Jouppi and Steven J. E. Wilton. Tradeoffs in two-level on-chip caching. In Proc. 21st Annual Symposium on Computer Architecture, 1994.
....and maintaining software tools that reflect that complexity, simulation infrastructure is now widely shared among both academic and industry researchers. A majority of the papers published in recent conferences use shared tools such as the SimpleScalar tools [4] RSim [17] SimOS [19] and CACTI [10]. While some tools, such as CACTI, have been validated against real hardware, none of the microarchitecture simulators have been subject to such scrutiny. This lack of performance validation may be consistently introducing error into experimental studies, which, if sufficiently large, may cause ....
N. P. Jouppi and S. J. E. Wilton. Tradeoffs in two-level on-chip caching. In Proceedings of the 21st Annual International Symposium on Computer Architecture, April 1994.
....problem, I cache misses, is also a difficult one but with fewer solutions proposed for solving it. One, brute force solution is to increase the primary I cache size. This may not always be possible or desirable for an on chip I cache because the cycle time of the cache is determined by its size [JoWi94] and is a major factor in determining the CPU clock speed. This limits a typical I cache size to between 8 and 32KB in the current generation. Fast processors, like the DEC Alpha 21164 [ERPR95] are bound to have small I caches in this range and thus higher miss rates. The Alpha21164 8KB I cache ....
Norman P. Jouppi and Steven J.E. Wilton, "Trade-offs in Two-level On-chip caching", International Symposium on Computer Architecture, pp. 34-45, April 1994.
....to the design of high performance microarchitectures. 2 Related Research Simulation based computer architectural analysis has been a rich area of research, most of which has focused on the design of a single subsystem such as the processor (e.g. 10, 12, 21] or cache hierarchy (e.g. [11, 18, 19, 24, 28]) Most of these studies focus on a small portion of the design space, and only a small subset take technology and implementation constraints into account in the analysis. Olukotun s studies of primary cache design for a multi chip module (MCM) based Gallium Arsenide microprocessor [18, 19] ....
....constraints into account in the analysis. Olukotun s studies of primary cache design for a multi chip module (MCM) based Gallium Arsenide microprocessor [18, 19] include a linear equation for calculating the delay between the processor and primary cache chips on the MCM. Jouppi and Wilton [11] use a detailed cycle time model in addition to simulation to study the performance of two level on chip caching relative to a single level of on chip cache. Uhlig [24] includes on chip implementation related parameters such as latency and bandwidth in analyzing various two level on chip cache ....
N.P. Jouppi and S.J.E. Wilton. Tradeoffs in twolevel on-chip caching. Proceedings of the 21st International Symposium on Computer Architecture, pages 34--45, April 1994.
....cache will have the lowest hit rate. Hit rate data obtained from a trace driven simulation (or some other means) must be included in the analysis before the various cache alternatives can be fairly compared. Similarly, a small cache has a lower access time, but will also have a lower hit rate. In [9], it was found that when the hit rate and cycle time are both taken into account, there is an optimum cache size between the two extremes. Appendix A: Obtaining and Using the CACTI Software A program that implements the CACTI model described in this paper is available. To obtain the software, ....
N. P. Jouppi and S. J. Wilton, "Tradeoffs in two-level on-chip caching," in Proceedings of the 21th Annual International Symposium on Computer' Architecture, pp. 34 45, April 1994. 26
No context found.
N. Jouppi, S. Wilton "Tradeoffs in Two-Level On-Chip Caching," 21st ISCA, April 1994, pp. 34--45.
No context found.
Norman P. Jouppi and Steven J. E. Wilton. Tradeoffs in Two--Level On--Chip Caching. Proc. 21st Annual International Symposium on Computer Architecture, pp. 34--45, Chicago, IL April 18--31 1994.
No context found.
N. P. Jouppi and S. J. E. Wilton. Tradeoffs in two-level on-chip caching. In Proceedings of the 21st Intl. Symposium on Computer Architecture, pages 34--45, 1994.
No context found.
N. P. Jouppi and S. J. Wilton. Tradeoffs in two-level on-chip caching. In The 23th Annual International Symposium on Computer Architecture, pages 34--45, April 1994.
No context found.
N. P. Jouppi and S. J. E. Wilton, "Tradeoffs in two-level on-chip caching," in Proceedings of the 21ST annual international symposium on Computer architecture (ISCA-21), pp. 34--45, 1994.
No context found.
N. P. Jouppi and S. J. Wilton. Tradeoffs in two-level on-chip caching. In The 23th Annual International Symposium on Computer Architecture, pages 34--45, April 1994.
No context found.
N. P. Jouppi and S. J. E. Wilton, "Tradeoffs in two-level on-chip caching," in Proceedings of the 21ST annual international symposium on Computer architecture (ISCA-21), pp. 34--45, 1994.
No context found.
Norman P. Jouppi and Steven J. E. Wilton. Tradeoffs in Two-Level On-Chip Caching. In Proceedings of the 21st International Symposium on Computer Architecture, pages 34--45, 1994.
No context found.
N. P. Jouppi and S. J. E. Wilton. Tradeoffs in two-level on-chip caching. In Proceedings of the 21st Annual International Symposium on Computer Architecture, pages 34--45, April 1994.
No context found.
Norman P. Jouppi and Steven J. E. Wilton. Tradeoffs in Two-Level On-Chip Caching. In Proceedings of the 21st International Symposium on Computer Architecture, pages 34--45, 1994.
No context found.
Norman P. Jouppi and Steven J. E. Wilton. Tradeoffs in two-level on-chip caching. In Proc. 21st Annual Symposium on Computer Architecture, 1994.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC