| Olukotun, K., Mudge, T. and Brown, R. Performance optimization of pipelined primary caches. In Proceedings of the 19th Annual International Symposium on Computer Architecture, Gold Coast, Australia, IEEE, 181-190, 1992. |
....to the design of high performance microarchitectures. 2 Related Research Simulation based computer architectural analysis has been a rich area of research, most of which has focused on the design of a single subsystem such as the processor (e.g. 10, 12, 21] or cache hierarchy (e.g. [11, 18, 19, 24, 28]) Most of these studies focus on a small portion of the design space, and only a small subset take technology and implementation constraints into account in the analysis. Olukotun s studies of primary cache design for a multi chip module (MCM) based Gallium Arsenide microprocessor [18, 19] ....
....18, 19, 24, 28] Most of these studies focus on a small portion of the design space, and only a small subset take technology and implementation constraints into account in the analysis. Olukotun s studies of primary cache design for a multi chip module (MCM) based Gallium Arsenide microprocessor [18, 19] include a linear equation for calculating the delay between the processor and primary cache chips on the MCM. Jouppi and Wilton [11] use a detailed cycle time model in addition to simulation to study the performance of two level on chip caching relative to a single level of on chip cache. Uhlig ....
K. Olukotun, T. Mudge, and R. Brown. Performance optimization for pipelined primary caches. Proceedings of the 19th International Symposium on Computer Architecture, pages 181--190, May 1992.
....critical path in many processors, the cycle time of those processors has to increase. Sometimes this tradeoff (increasing the cycle time for a decrease in cache miss rate) can yield better performance [PHH88] An alternative to increasing the cycle time is to make the cache access multiple cycles [OMB92] Instead of being one cycle, the cache access may have the latency of two instruction issue cycles. However since the cache is on chip, it can be pipelined and thus be able to accept an access per cycle, but still have a latency of two cycles (e.g. first level caches on the MIPS R4000 [KH92] A ....
Kunle Olukotun, T. Mudge, and R. Brown. Performance Optimization of Pipelined Primary Caches. In Conference Proceedings, The 19th International Symposium on Computer Architecture, pages 181--190, May 1992.
....the basis for comparing various cache designs. This reflects the recent trend of increasingly aggressive multiple issue processors. The processor cycle time is determined by the longest stage of the pipeline which is usually the memory access stage. Multiple cycle cache access has been studied in [17] but is beyond the scope of this paper. Processor cycle time not only serves as one of the two components in determining TPI, but also has multiple impacts on the value of the other component CPI. First, a shorter cycle time may imply larger miss penalties. Second, a shorter cycle time may ....
K. Olukotun, T. Mudge, and R. Brown, "Performance Optimization of Pipelined Primary caches," Proc. of 19th Int'l Symp. on Computer Architecture, May 1992, pp. 181--190.
....the issue of an instruction and data reference each cycle was assumed. We further assumed that the processor cycle time is determined by the firstlevel cache cycle time. This results in a variation in machine cycle time of about 1.8X from processors with 1KB caches through 256KB caches. Olukotun [6] has studied the effects of multi cycle cache latency on processor performance, however this is currently beyond the scope of this work) We modeled the baseline CPI of the machine without cache misses as being 1. To attain this CPI in a real machine in the presence of non unit latency functional ....
Kunle Olukotun, Trevor Mudge, and Richard Brown. Performance Optimization of Pipelined Primary Caches. In Proceedings of the 19th Annual International Symposium on Computer Architechure, pages 181-190. May, 1992.
....configurations. After quantifying the performance advantages of the proposed cache enhancements, we present our conclusions. 2 Background 2. 1 Performance Tradeoff Methodologies Previous research in performance evaluation has largely been restricted to specific subsystems of the overall machine [9, 10, 11, 12, 19, 20, 21, 31, 33]. Although some studies have spanned multiple subsystems [3, 6, 18, 24, 25, 34] they have either not included cycle time considerations, have not taken into account technology and implementation considerations, or have been limited in scope. Despite these drawbacks, these approaches have been ....
Olukotun, K., Mudge, T., and Brown, R. Performance optimization for pipelined primary caches. 19th International Symposium on Computer Architecture, pages 181--190, May 1992.
....of Cache Hierarchy Design Cache hierarchy design has been a very active area of research. This section describes studies that attempt to encompass several elements of the design space, and thus are more applicable to the high level architectural design problem addressed in this paper. Olukotun [26] analyzes the effect of the degree of primary cache pipelining for a multichip module (MCM) based GaAs microprocessor. Branch and load delay effects are varied, and the effect on cycles per instruction (CPI) is measured using trace driven simulation. A simple linear equation for the cache cycle ....
....that multiple levels of abstraction (architecture, timing, and circuit levels) need to be simultaneously considered to optimize system performance. Although the authors make some headway in achieving their objective, their methodology is restricted to L1 cache design as with their previous study [26]. Tradeoffs in designing two level on chip cache hierarchies are examined by Jouppi and Wilton [15] The purpose of the study is to compare the performance of this structure with a larger, single level on chip cache. The authors use trace driven simulation in conjunction with a detailed cache ....
[Article contains additional citation context not shown here]
K. Olukotun, T. Mudge, and R. Brown, "Performance Optimization for Pipelined Primary Caches," 19th International Symposium on Computer Architecture, pp. 181-190, May 1992.
....as: CPM = latency (Eqn 4.33) As memory systems become more complex, it becomes increasingly difficult to accurately estimate values for CPM. For example, a memory system can prefetch instructions or data [Farrens89, Hill87, Smith78, Smith92, Pierce95] caches can be designed to pipeline accesses [Jouppi90, Olukotun92, Palcharla94], and process multiple outstanding misses in parallel (a nonblocking or lockup free cache) and dirty write backs can be queued and processed as a background operation [Smith82, Kessler91] Each of these memory system optimizations can reduce the average value of CPM in a complex way that is ....
Olukotun, K., Mudge, T. and Brown, R. Performance optimization of pipelined primary caches. In Proceedings of the 19th Annual International Symposium on Computer Architecture, Gold Coast, Australia, IEEE, 181-190, 1992.
....so that the full effect of cache pipelining can be assessed. The feasible cycle times of the different architectures are determined from detailed timing analyses of critical paths. We refer to the simultaneous consideration of two level of machine abstraction as multilevel optimization [4] [5]. The methodology is demonstrated on the design of a pipelined cache for a high performance microprocessor, which is based on the MIPS instruction set architecture (ISA) 6] and was planned to be implemented in GaAs direct coupled FET logic with multichip module (MCM) packaging [7] The use of ....
O.A. Olukotun, T.N. Mudge, and R.B. Brown, "Performance Optimization of Pipelined Primary Caches," Proc. 19th Ann. Int'l Symp. Computer Architecture, pp. 181-190, Goldcoast, Australia, May 1992.
....studies on multi issue implementations of the MIPS R3000 ISA. In particular, we have used the Aurora III for a case study. The Aurora III is a prototype superscalar processor being developed in the Advanced Computer Architecture Laboratory at The University of Michigan [119] 120] 130] 131] 132][133][185] The Aurora III is a superscalar version of the MIPS R3000 ISA implemented in GaAs technology, and is scheduled for tape out in the Fall of 1994. 12 We investigate the performance characteristics of a number of hardware features in the Aurora III, only some of which have been included in ....
K. Olukotun, T. Mudge, Performance Optimization of Pipelined Primary Caches, A. Gottliebs (Ed.), Proceedings of the 19th Annual International Symposium on Computer Architecture, 1992, vol. 20, pp. 181-190.
....delays, so that the full effect of cache pipelining can be assessed. The feasible cycle times of the different architectures are determined from detailed timing analyses of critical paths. We refer to the simultaneous consideration of two level of machine abstraction as multilevel optimization [4][5]. The methodology is demonstrated on the design of a pipelined cache for a high performance microprocessor, which is based on the MIPS instruction set architecture (ISA) 6] and was planned to be implemented in GaAs direct coupled FET logic with multichip module (MCM) packaging [7] The use of ....
O. A. Olukotun, T. N. Mudge, and R. B. Brown, "Performance optimization of pipelined primary caches," in 19th Annual Int. Symp. Computer Architecture, Goldcoast, Australia, May 1992, pp. 181190.
....showing (shaded area) activity of pipeline during OE 1 clock. ffl Shared memory data and address buses were separated. A GaAs CPU needs all of the bus bandwidth for just the instruction cache. ffl The single level cache was changed to a two level system with a direct mapped primary cache [13, 14]. ffl Integer multiply and divide functions were pushed into the floating point accelerator (which has a parallel multiplier) to better utilize transistor resources; this improves performance. ffl Byte operations were not implemented; this allows the use of simple word based SRAMs without ....
Kunle Olukotun, Trevor Mudge, and Richard Brown, "Performance Optimization of Pipelined Primary Caches," Proceedings of the 19th International Symposium on Computer Architecture, Gold Coast, Australia, pp. 181-190, May 19--21, 1992.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC