| D. Kroft, "Lockup-free instruction fetch/prefetch cache organization," Proceedings of the 8th Annual International Symposium on Computer Architecture, pp. 81-87, 1981. |
....high performance architectures contain several simple hardware mechanisms for hiding memory hierarchy access costs. Early cache designs allowed only a single outstanding memory access to occur. Thus, all memory accesses stalled the processor until completed. Kroft introduced lockup free caches [62] to enable multiple concurrent memory accesses. Lockup free caches permit non blocking loads that do not stall the processor until a future instruction references the data. Lockup free caches require a mechanism, such as miss status handling registers (MSHRs) to maintain information about pending ....
David Kroft. Lockup-free instruction fetch/prefetch cache organization. In Proceedings of the 8th Annual International Symposium on Computer Architecture, pages 81--87, May 1981.
....memory. This is a severe problem since the microprocessor could execute up to 200 floating point instructions during that time. This problem is called latency problem. Researchers have developed several techniques, like software and hardware prefetching [CKP91, MLG92, CB94] non blocking caches [Kro81, SF91] stream buffers [Jou90, PK94] multithread 2.2 The Bottleneck: Memory Performance 9 Processor Bandwidth Out of Order Cache (I D L2) Sun Ultra 3 4.8 Gbyte s none 32 K 64 K Intel Pentium 4 3.2 Gbyte s 126 ROPs 12 K 8 K 256 K Alpha 21264B 2.7 Gbyte s 80 instr 64 K 64 K ....
D. Kroft. Lockup--Free Instruction Fetch/Prefetch Cache Organisation. In Proceedings of the 8th Annual International Symposium on Computer Architecture, pages 81--87, May 1981.
....in terms of the size of hardware structures dedicated to ILP exploitation. The heterogeneous ILP parameters investigated in this paper are issue rate, instruction window size, number of arithmetic (ALU) floating point (FPU) and address units, and maximum number of outstanding cache misses (MSHRs [9]) Heterogeneity in the memory subsystem is modeled in terms of the size and speed of caches. The HDSMs under study have three levels, with 2, 4 and 10 nodes in levels 1, 2 and 3, respectively. The machine is configured as a processor and memory hierarchy [1] the number of processing elements ....
Kroft, D. Lockup-Free Instruction Fetch/Prefetch Cache Organization. In Proc. 8 International Symposium on Computer Architecture, 1981.
....DRAM core access latency continues to improve more slowly than increases in processor speed. Several microarchitecture techniques have been developed to hide or tolerate memory latency. Some of them are purely hardware based, such as lock up free caches, multithreading, and value speculation [Kroft 81, Alverson 90, Lipasti 96] These techniques can be quite successful for hiding small latencies such as those between the on chip cache (L1) and a closely integrated second level cache (L2) However, in a performance study of the Pentium Pro, Bhandarkar and Ding specifically point out that ....
D. Kroft. Lockup-Free Instruction Fetch/Prefetch Cache Organization. In Architecture, pages 81--87, May 1981.
....with blocking caches, a second request cannot be issued until the first outstanding request is serviced. In processors with non blocking caches subsequent requests (secondary misses) to a block that already has a request outstanding (the primary miss) for it, are merged with the primary miss [88]. 132 We first reproduce the example shown above again in Figure 4 8. The coherence protocol chains for two cache blocks, A and B, are shown. The protocol chain for any coherence block is always rooted at a stable block; in the figure the stable state is the modified (M) state of the MOESI ....
David Kroft. Lockup-Free Instruction Fetch/Prefetch Cache Organization. In Proceedings of the Eighth Annual International Symposium on Computer Architecture, pages 81--87, May 1981.
....an identical machine with a 384 entry instruction window. 2. Related work Memory access is a very important long latency operation that has concerned researchers for a long time. Caches [29] tolerate memory latency by exploiting the temporal and spatial reference locality of applications. Kroft [19] improved the latency tolerance of caches by allowing them to handle multiple outstanding misses and to service cache hits in the presence of pending misses. Software prefetching techniques [5, 22, 24] are effective for applications where the compiler can statically predict which memory ....
D. Kroft. Lockup-free instruction fetch/prefetch cache organization. In Proceedings of the 8th Annual International Symposium on Computer Architecture, 1981.
....of other caching structures such as the TLBs. 5. RELATED WORK Related work falls into a group of studies conducted for reducing the negative effects of cache misses. Arguably the most important technique to reduce cache miss penalty is the non blocking caches, also called the lock up free caches [7]. Non blocking caches do not block after a cache miss, being able to provide data to other requests. Sohi and Franklin [13] discuss a multi port non blocking L1 cache. Farkas and Jouppi [3] explore alternative implementations of the non blocking caches. Farkas et. al [4] studies the usefulness of ....
D. Kroft. Lock-up Free Instruction Fetch/Prefetch Cache Organization. In Proc. of 8 International Symposium on Computer Architecture, May 1981.
....provides a cycle accurate simulation environment for a modern out of order superscalar processor with 5 stage pipelines and fairly accurate branch prediction mechanism. The memory extensions model the limitedness of non blocking caches through finite miss status holding registers (MSHRs) [12]. Bus contention and arbitration at all levels are also taken into account. Table 1 gives the simulation parameters used in the experiments. The DVS extensions introduce a new speed setting instruction. The speed setting instruction takes as argument an integer that specifies the desired CPU ....
D. Kroft. Lockup-free instruction fetch/prefetch cache organization. In Proceedings of the 18th International Symposium on Computer Architecture, pages 81--87, May 1981.
....value of its result to all the consumers. The version numbers and the commit bits enable only version numbers and commit bits to be forwarded inside the grid, when the speculation is correct, instead of actual data values. Hence, to implement sharing speculation, the caches or the MSHRs [8] in the system would need logic to use the selective re execution mechanism implemented in the GPA, to inject speculative values into the processor. It is worth noting that if mis speculation recovery overhead is sufficiently low, as in the GPA, then it is always better to speculate, since waiting ....
David Kroft. Lockup-free instruction fetch/prefetch cache organization. In Proceedings of the Eighth International Symposium on Computer Architecture, pages 81--87, May 1981.
No context found.
D. Kroft, "Lockup-free instruction fetch/prefetch cache organization," Proceedings of the 8th Annual International Symposium on Computer Architecture, pp. 81-87, 1981.
No context found.
D. Kroft. Lockup-free instruction fetch/prefetch cache organization. In ISCA-8, pages 81--87, May 1981.
No context found.
D. Kroft. Lockup-free instruction fetch/prefetch cache organization. In Proceedings of the 8th Intl. Symposium on Computer Architecture, pages 81--87, 1981.
No context found.
D. Kroft, "Lockup-Free Instruction Fetch/Prefetch Cache Organization, " Proc. Eighth Int'l Symp. Computer Architecture, pp. 81-87, May 1981.
No context found.
David Kroft. Lockup-free instruction fetch/prefetch cache organization. In Proceedings of the Eighth International Symposium on Computer Architecture, pages 81--87, May 1981.
No context found.
D. Kroft. Lockup-free instruction fetch/prefetch cache organization. In Proc. Eighth Symposium on Computer Architecture, pages 81--87, May 1981.
No context found.
David Kroft. Lockup-free instruction fetch/prefetch cache organization. In Proceedings of the Eighth International Symposium on Computer Architecture, pages 81--87. ACM, SIGARCH, May 1981.
No context found.
D. Kroft. Lockup-free instruction fetch/prefetch cache organization. In Proceedings of the Eighth Annual International Symposium on Computer Architecture, pages 81--87, May 1981.
No context found.
D. Kroft. Lockup-free Instruction Fetch/Prefetch Cache Organization. In Proceedings of the 8th Annual International Symposium on Computer Architecture, 1981.
No context found.
D. Kroft. Lockup-Free Instruction Fetch/Prefetch Cache Organization. In Proc. of the 8th. Int. Symp. on Comp. Architecture. May 1981, pp 81-87.
No context found.
D. Kroft. Lockup-free instruction fetch/prefetch cache organization. In 8th Annual International Symposium of Computer Architecture, pages 81--87, May 1981.
No context found.
D. Kroft. Lockup-Free Instruction Fetch/Prefetch Cache Organization. In Proc. ISCA-8, May 1981.
No context found.
D. Kroft. Lockup-free instruction fetch/prefetch cache organization. In pages 81-87, Honolulu, Hawaii, May 1981.
No context found.
David Kroft. Lockup-Free Instruction Fetch/Prefetch Cache Organization. In Proceedings of the 8th Annual International Symposium on Compute r Architecture, pages 81-87, May 1981.
No context found.
D. Kroft. Lockup-free instruction fetch/prefetch cache organization. In Proceedings of the 8th Annual International Symposium on Computer Architecture, pages 81--87, 1981.
No context found.
Kroft, D.: "Lockup-Free Instruction Fetch/Prefetch Cache Organization", 25 years of the International Symposia on Computer Architecture (selected papers), Association for Computing Machinery, August 1998, pages 20-21
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC