| David Nagle, Richard Uhlig, Tim Stanely, Stuart Sechrest, Trevor Mudge, and Richard Brown. Design tradeoffs for software-managed TLBs. In Proceedings of the 20th International Symposium on Computer Architecture. ACM, 1993. |
....As a consequence, many modern applications have working sets larger than the TLB coverage. Section 6. 3 shows that for many real applications, TLB misses degrade performance by as much as 30 to 60 , contrasting to the 4 to 5 reported in the 1980 s [2, 24] or the 5 to 10 reported in the 1990 s [17, 23]. Another trend that has contributed to this performance degradation is that machines are now usually shipped with on board, physically addressed caches that are larger than the TLB coverage. As a result, many TLB misses require access to the memory banks to find a translation for data that is ....
R. Uhlig, D. Nagle, T. Stanley, T. Mudge, S. Sechrest, and R. Brown. Design tradeoffs for software-managed TLBs. ACM Transactions on Computer Systems, 12(3):175--205, Aug. 1994.
....for small embedded systems, it is desirable to support the latter because it can simplify the hardware, keep its cost low, and also provide the flexibility to support paging based managements mentioned in this paper. Design tradeoffs for software managed TLBs are studied by the authors of [12]. With a MMU supporting 128 byte pages, it is possible to implement Paged Segmentation and Short Circuit Seg R3000 is a registered trademark of MIPS Technologies, Inc. ment Tree. For Intermediate level Skip Multi Size Paging, it can be implemented by calculating 128 byte physical page numbers ....
D. Nagle, R. Uhlig, T. Stanley, S. Sechrest, T. Mudge, R. Brown, "Design Tradeoffs for Software-Managed TLBs", Proceedings of IEEE/ACM 20th Annual International Symposium on Computer Architecture, pages 27-38, May 1993.
....recognized, the TLB has been the target for several optimizations to reduce access latency, miss rates and miss handling overheads. With regard to TLB structures themselves, there have been investigations on suitable organizations in terms of size, associativities and multi level organizations [31, 23, 6]. Superpaging is a concept that has been proposed to boost TLB coverage. Hardware and software techniques for supporting this mechanism have come under a lot of scrutiny [31, 32, 11] Most prior work in TLB optimizations has targeted lowering miss rates or miss handling costs. It is only recently ....
....the lessons learnt from this study and outlines suggestions for future TLB design and optimization. 2. RELATED WORK As was mentioned earlier, many studies [7, 15, 1, 14, 27] have pointed out the importance of the TLB and the necessity of speeding up the miss handling process. Several studies [31, 15, 23] have looked at hardware TLB structures organization and their impact on system performance in terms of capacity and or associativity. While some of these have focussed on single (monolithic) TLBs, there have been studies which have investigated the benefits of multi level TLBs [6, 2] There are ....
[Article contains additional citation context not shown here]
D. Nagle, R. Uhlig, T. Stanley, S. Sechrest, T. Mudge, and R. Brown. Design Tradeoffs for Software Managed TLBs. In Computer Architecture, pages 27--38, 1993.
....like RAMpage, and it is likely that results from a number of new projects addressing the same issues will be published in the near future. 4. 7 Further Reading The impact of the TLB on performance is an area in which some work has been published, but more could be done [Cheriton et al. 1993, Nagle et al. 1993] Direct Rambus [Crisp 1997] is starting to move into the mainstream now that Intel has endorsed it, and is likely to appear in mass market designs. The RAMpage project has its own web site [RAMpage 1997] and several papers on the subject have been published [Machanick 1996, Machanick and ....
D Nagle, R Uhlig, T Stanley, S Sechrest, T Mudge and R Brown. Design Trade-Offs for Software Managed TLBs, Proc. Int. Symp. on Computer Architecture, May 1993, pp 27--38.
....is replaced from the SRAMmain memory, its entry (if it has one) in the TLB is flushed. Being able to service TLB misses without having to go to DRAM in most cases is potentially a big win, considering that some studies have shown that TLB misses can account for a large fraction of execution time [35, 7]. As with a conventional hierarchy, it is possible in principle to address the L1 cache virtually, in which case the TLB would only be needed on a miss to the SRAM main memory. Similar problems to those found in virtually addressed caches in a conventional hierarchy [21, 41, 35, 22] would apply. ....
....of execution time [35, 7] As with a conventional hierarchy, it is possible in principle to address the L1 cache virtually, in which case the TLB would only be needed on a miss to the SRAM main memory. Similar problems to those found in virtually addressed caches in a conventional hierarchy [21, 41, 35, 22] would apply. This possibility is not explored in this paper. 2.4 DRAM Paging Device The DRAM paging device is a conventional DRAM, which can be implemented using the same range of design choices as a conventional DRAMmainmemory. Details of the DRAM are not changed from previous work [33] to ....
[Article contains additional citation context not shown here]
D. Nagle, R. Uhlig, T. Stanley, S. Sechrest, T. Mudge, and R. Brown. Design tradeoffs for software-managed TLBs. In Proc. 20th Int. Symp. on Computer Architecture (ISCA '93), pages 27--38, San Diego, CA, May 1993.
....not pessimistic. 6 Related Work 64 bit systems: Opal [Chase et al. 1994] Mungi [Heiser et al. 1993] Monads [Rosenberg et al. 1989] and [Carter et al. 1992] Page table mechanisms: Organick 1972; Cocke 1981; Huck and Hays 1993] TLBs and caches: Koldinger et al. 1992; Kaiser and Czaja 1992; Nagle et al. 1993; Chiueh and Katz 1992] User level mapping: Appel and Li 1991; Hosking and Moss 1993] OO systems: Jul et al. 1988; Krakowiak et al. 1990] 7 Conclusions Besides that we never can be sure to consider all relevant factors in such an examination, there are remarkable factors of uncertainty: ....
Nagle, D., Uhlig, R., Stanley, T., Sechrest, S., Mudge, T., and Brown, R. 1993. Design tradeoffs for software managed TLBs. In 20th Annual International Symposium on Computer Architecture (ISCA), San Diego, CA, pp. 27-- 38.
....objects can be reached by two steps, i.e. one additional page table. A further method is to use two trees, one covering all mapped pages and another one holding only pages which need to be reached very fast. Some systems use special address translation (fewer levels) for accessing kernel data [17]. Guarded page tables permit this more generally. 2.3.3 Restricted Guards The maximum guard length together with the physical address width determine the space requirements per page table entry. Sixteen bytes (64 64 bits) are certainly enough, but in case of limited physical addresses ( 32bit) ....
D. Nagle, R. Uhlig, T. Stanley, S. Sechrest, T. Mudge, and R. Brown. Design tradeoffs for software managed TLBs. In 20th Annual International Symposium on Computer Architecture, pages 27--38, San Diego, CA, May 1993.
....instruction caches exhibited few misses per event across all processors for both RC and SS and do not factor into the overall performance picture as data cache and TLB misses do. For detailed performance studies on TLBs and the virtual memory hierarchy, refer to Jacob and Mudge (1998) and Uhlig et al. 1994). Number of Processors Primary Data Cache Misses Per Committed Event 2468101214 0 50 100 150 RC SS Figure 4: Primary Data Cache Misses per Event as a Function of the Number of Processors. Number of Processors Secondary Data Cache Misses Per Committed Event 2468101214 01 ....
Uhlig, R., D. Nagle, T. Stanley, T. Mudge 1994. Design Tradeoffs for Software-Managed TLBs. ACM Transactions on Computer Systems, volume 12, number 3, pages 175--205, August.
....published material that describes the performance aspects of the hat layer and the scalability of the implementations on multiprocessor systems. Other related material includes a large body of literature on MMU hardware structures (see [6] for a good survey) and software page table issues (e.g. [3, 8]) In general, previous work concentrated on the structure of the MMU as it affects performance on uniprocessor systems. With the notable exception of [6] previous works did not consider multi user benchmarks to any great extent. 3 Requirements The structure and the interface of the hat layer ....
Nagle, David et al. "Design Tradeoffs for Software-Managed TLBs." Proceedings of the 20th International Symposium on Computer Architectures (May 1993).
....by the TLB cache, termed a TLB miss, is redirected to the miss handling subsystem which references the page translation tables. A recent trend in CPU design is software only page table management. The removal of hardware support for page table traversal, resulting in higher miss cycle times [16], is motivated by greater flexibility for operating systems design and porting. Clark and Emer [5] reported TLB miss handling times in a VAX 11 780 as contributing between 5 to 8 to the runtime of a user program. A more recent study by Romer et al. [17] using a DEC Alpha 3000 700 have found that ....
....sets of current and future workloads. The case that current virtual memory techniques will no longer be valid for the next generation of machines has been argued by others [26] Several ideas have been introduced to increase TLB reach. For example, trading associativity for increasing TLB entries [4, 14, 16] to increase TLB reach (address coverage) using the same silicon resources, multiple page sizes [14, 26, 18] to reduce the number of contiguously mapped pages and subblocking [25] to have more translations 1 held for each TLB cell. The design tradeoffs of silicon resource requirements, TLB access ....
[Article contains additional citation context not shown here]
D Nagle, R Uhlig and T Stanley. Design tradeoffs for software-managed TLBs. In Proc. 20th ISCA, pages 27--38, 1993.
....versus the number of TLB entries. For instance, TLBs with high levels of associativity require more hardware resources to implement. This additional hardware complexity takes up silicon area on the die which could be allocated to build more entries. Studies by Khalidi et al. [7] and Nagle et al. [10] have indicated the potential for reducing TLB misses by decreasing associativity 1 levels in return for a TLB with a larger number of entries while consuming similar die space. Khalidi showed a 256 entry set associative TLB had a lower miss rate than a 64 entry fully associative TLB while ....
....[16, 11] Further reduction in miss rates are possible if we are able to implement placement policies which can maximise the longevity of useful entries. The fullyassociative placement policy has been considered optimal in terms of miss ratios. However, as Khalidi et al. [7] and Nagle et al. [10] have shown, when die area is taken into account, TLBs with lower levels of associativity can have lower miss rates than fully associative TLBs because more entries can be build into the same silicon resource. Research has focused on selection policies within the set such as Least Recently Used ....
D Nagle, R Uhlig, T Stanley, S Sechrest, T Mudge and R Brown. Design tradeoffs for softwaremanaged TLBs. In Proc. 20th ISCA, 1993.
....configurations. 1 Introduction Software simulation plays an integral role in the design and validation of high performance uni and multiprocessor systems. Two major simulation methodologies used are Trace Driven Simulation (TDS) and Execution Driven Simulation (EDS) In TDS, hardware probes [27, 30] or software instrumentation of the application [22, 24, 25] allow information like basic block or data addresses to be collected in a trace buffer during the application s execution. The generated information is later used to drive a simulator of the system under study. Given that an application ....
R. Uhlig, D. Nagle, T. Stanley, T. Mudge, S. Sechrest, and R. Brown. Design Tradeoff for Software-Managed TLBs. ACM Transactions on Computer Systems, 12(3):206--235, August 1995.
....that the basic event counts available at the end of a trap driven simulation can be combined to compute all of the common memory performance metrics. Access constraints essentially define a filter for memory references. This observation is 5. A collection of case studies performed by our group [Nagle93, Nagle94, Uhlig94b, Uhlig95] has shown that changes in operating system structure tend to have a greater impact on TLB and Icache performance. Tapeworm s inability to consider D caches was therefore not a major hinderance to these studies. 112 the basis of trap driven adaptations of set sampling, time sampling and ....
....for implementing these primitives on existing hardware, each of which is based in some way on the unconventional use of certain privileged machine operations. We describe each in greater detail below. Method References Required Hardware Support Simulation Type TLB Miss Redirection [Nagle93] [Uhlig94b] [Talluri94] Software managed TLB TLB Page Table Shadowing [Lee94] Uhlig94b] Standard memory management hardware TLB Instruction Shadowing Kernel trap for breakpoint instructions I cache Instruction Recoding Kernel trap for un implemented instructions I cache Tagged Memory ....
[Article contains additional citation context not shown here]
Uhlig, R., Nagle, D., Stanley, T., Sechrest, S., Mudge, T. and Brown, R. Design tradeoffs for software-managed TLBs. ACM Transactions on Computer Systems (Fall): 1994.
....that the basic event counts available at the end of a trap driven simulation can be combined to compute all of the common memory performance metrics. Access constraints essentially define a filter for memory references. This observation is 5. A collection of case studies performed by our group [Nagle93, Nagle94, Uhlig94b, Uhlig95] has shown that changes in operating system structure tend to have a greater impact on TLB and Icache performance. Tapeworm s inability to consider D caches was therefore not a major hinderance to these studies. 112 the basis of trap driven adaptations of set sampling, time sampling and ....
....methods for implementing these primitives on existing hardware, each of which is based in some way on the unconventional use of certain privileged machine operations. We describe each in greater detail below. Method References Required Hardware Support Simulation Type TLB Miss Redirection [Nagle93] [Uhlig94b] Talluri94] Software managed TLB TLB Page Table Shadowing [Lee94] Uhlig94b] Standard memory management hardware TLB Instruction Shadowing Kernel trap for breakpoint instructions I cache Instruction Recoding Kernel trap for un implemented instructions I cache ....
[Article contains additional citation context not shown here]
Nagle, D., Uhlig, R., Stanley, T., Sechrest, S., Mudge, T. and Brown, R. Design tradeoffs for software-managed TLBs. In Proceedings of the 20th Annual International Symposium on Computer Architecture, San Diego, California, IEEE, 27-38, 1993.
No context found.
David Nagle, Richard Uhlig, Tim Stanely, Stuart Sechrest, Trevor Mudge, and Richard Brown. Design tradeoffs for software-managed TLBs. In Proceedings of the 20th International Symposium on Computer Architecture. ACM, 1993.
No context found.
D. Nagle, R. Uhlig, T. Stanley, S. Sechrest, T. Mudge, and R. Brown, Design tradeoffs for software-managed TLBs, In Proc. of 20th Int. Symp. on Computer Architecture, pages 27--38. ACM, 1993.
No context found.
D. Nagle, R. Uhlig, T. Stanley, S. Sechrest, T. Mudge, R. Brown: "Design tradeoffs for software-managed TLBs" in Proceedings of the Twentieth International Symposium on Computer Architecture (ACM 1993), pp. 27-38. 39
No context found.
R. Uhlig, D. Nagle, T. Stanley, T. Mudge, S. Sechrest, and R. Brown. Design tradeoffs for software-managed TLBs. ACM Transactions on Computer Systems, 12(3):175--205, Aug. 1994.
No context found.
Uhlig, R., Nagle, D., Stanley, T., Mudge, T., Sechrest, S., and Brown, R. Design Tradeoffs for Software-Managed TLBs. ACM Transactions on Computer Systems 12, 3 (August 1994).
No context found.
D. Nagle, R. Uhlig, T. Stanley, S. Sechrest, T. Mudge, and R. Brown, Design tradeoffs for software-managed TLBs, In Proc. of 20th Int. Symp. on Computer Architecture, pages 27--38. ACM, 1993. The authors found that newer operating systems are changing the types of TLB misses. The TLB miss rate varied widely for the same application under different operating systems.
No context found.
Nagle, D., et. al, Design Trade-offs for Software Managed TLBs, Proceedings of the 20th Annual INter national Symposium on Computer Architecture, May 1993, pp. 27-38.
No context found.
Richard Uhlig, David Nagle, Tim Stanley, Trevor Mudge, Stuart Sechrest, and Richard Brown. Design tradeoffs for software-managed TLBs. ACM Transactions on Computer Systems, August 1994.
No context found.
David Nagle, Richard Uhlig, Tim Stanely, Stuart Sechrest, Trevor Mudge, and Richard Brown. Design tradeoffs for software-managed TLBs. In Proceedings of the 20th International Symposium on Computer Architecture, May 1993.
No context found.
David Nagle, Richard Uhlig, Tim Stanley, Stuart Sechrest, Trevor Mudge and Richard Brown. Design Tradeoffs for Software-Managed TLBs. In Proceedings of the 21st International Symposium on Computer Architecture, pages 358-369, Chicago, IL, April 1994.
No context found.
D. Nagle, R. Uhlig, T. Stanley, S. Sechrest, T. Mudge, and R. Brown. Design tradeoffs for software-managed TLBs. In Proc. 20th Int. Symp. on Computer Architecture (ISCA '93), pages 27--38, San Diego, CA, May 1993.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC