43 citations found. Retrieving documents...
K. Bala, M. F. Kaashoek, and W. E. Weihl. Software prefetching and caching for translation lookaside buffers. In Proceedings of the First Symposium on Operating Systems Design and Implementation, pages 243--253, 1994.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Virtual Clusters: Resource Mangement on Large Shared-Memory.. - Govil (2000)   (Correct)

....in the TLB. Translating physical addresses to machine addresses using the pmap at every software reload of the MIPS TLB could lead to high overheads. Cellular Disco reduces this overhead by maintaining for every VCPU a 1024 entry translation cache called the second level software TLB (L2TLB) [6]. The entries in the L2TLB also contain the combined virtual to machine address translations, and servicing a TLB miss from the L2TLB is much faster than generating a virtual exception to be handled by the operating system inside the virtual machine. Figure 3.2 illustrates the mappings stored in ....

Kavita Bala, M. Frans Kaashoek, and William Weihl. Software prefetching and caching for translation lookaside buffers. In Proceedings of the First Symposium on Operating Systems Design and Implementation (OSDI), pages 243--253, November 1994.


Itanium Page Tables and TLB - Chapman, Wienand, Heiser (2003)   (Correct)

....page table. Replacing the native Linux pagetable with the long format VHPT would require extensive changes, since the multi level pagetable paradigm is deeply entrenched. Instead we use the long format VHPT as a pagetable cache, or essentially another (software managed) level of TLB [BKW94] On a TLB reload miss, the software handler inserts the entry into both the long format VHPT and the TLB. A later TLB miss on the same entry is handled by the hardware walker reloading from the VHPT without invoking software. This approach has inherently more overheads, which could be eliminated ....

Kavita Bala, M. Frans Kaashoek, and William E. Weihl. Software prefetching and caching for translation lookaside buffers. In Proceedings of the 16 1st USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 243--253, Monterey, CA, USA, 1994. USENIX/ACM/IEEE. 7


Enhancing IA-64 Memory Management - Au, Heiser (2000)   (Correct)

....which can be associated with each process and are used in addition to the virtual page number to match an entry in the TLB. This negates the need for a TLB flush per context switch but still requires duplicate TLB entries for essentially same mappings in different contexts processes. Past studies [1,2] have shown that TLB handling costs can take up a significant part of an application s processing time. TLB coverage is one of the major factors in determining the TLB miss rate and hence the impact of TLB costs on application performance [2] TLB miss handling overhead is turning into a ....

Kavita Bala, M. Frans Kaashoek, and William E. Weihl. Software prefetching and caching for translation lookaside buffers. In Proceedings of the 1st USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 243--253, Monterey, CA, USA, 1994. USENIX/ACM/IEEE.


The Design and Implementation of the L4 Microkernel on the.. - Wiggins (1999)   (1 citation)  (Correct)

....cached data. This complicates handling somewhat, as in some circumstances the TLB must also be ushed, which wouldn t be necessary if the fault arose as the result of a TLB miss. The CPD therefore caches PD entries from several di erent address spaces, similar to a direct mapped software TLB [BKW94] As long as there are enough domains for all threads wishing to execute, no ushes are required. Caches and TLBs must be ushed whenever a valid CPD entry is replaced. However, the ARM only supports 16 domains, which is much less than the number of concurrently running tasks a general purpose ....

Kavita Bala, M. Frans Kaashoek, and William E. Weihl. Software prefetching and caching for translation lookaside bu ers. In Proceedings of the 1st Symposium on Operating Systems Design and Implementation, pages 243-253, Monterey, CA, USA, 1994. USENIX/ACM/IEEE.


Characterizing the d-TLB Behavior of SPEC CPU2000 Benchmarks - Kandiraju, Sivasubramaniam (2002)   (Correct)

....Superpaging is a concept that has been proposed to boost TLB coverage. Hardware and software techniques for supporting this mechanism have come under a lot of scrutiny [31, 32, 11] Most prior work in TLB optimizations has targeted lowering miss rates or miss handling costs. It is only recently [28, 25, 3, 20] that the issue of prefetching TLB entries to hide all or some of the miss costs has started drawing interest. Many research findings on TLBs have also made their way into commercial offerings. A very nice survey of several of these TLB structures can be found in [15] In all, a good deal of ....

....of the operating system on the software managed MIPS R2000 TLB, and investigate the impact of size, associativity and partitioning of TLB entries (between OS and application) They point out that the operating system has a considerable influence on the number and nature of misses. Bala et al. [3] focus in specifically on interprocess communication activities, and illustrate software techniques for lowering miss penalties on software managed TLBs. Superpaging is another well investigated technique to boost the coverage of the TLB and better utilize its capacity [31, 32, 30, 12] Studies ....

[Article contains additional citation context not shown here]

K. Bala, M. F. Kaashoek, and W. E. Weihl. Software Prefetching and Caching for Translation Lookaside Buffers. In Proceedings of the Usenix Symposium on Operating Systems Design and Implementation, pages 243--253, 1994.


The Impulse Memory Controller - Lixin Zhang Zhen (2001)   (4 citations)  (Correct)

.... TLB performance bottleneck range from changing the TLB structure to retain more of the working set (e.g. multi level TLB hierarchies [1, 16] to implementing better management policies (in software [21] or hardware [20] to masking TLB miss latency by prefetching entries (again, in software [4] or hardware [41] All of these approaches can be improved by exploiting superpages. Most commercial TLBs support superpages, and have for several years [30, 43] but more research is needed into how best to make general use of them. Khalidi [24] and Mogul [31] discuss the benefits of systems ....

K. Bala, F. Kaashoek, and W. Weihl. Software prefetching and caching for translation buffers. In Proceedings of the First Symposium on Operating System Design and Implementation, pages 243-- 254, Nov. 1994.


Linking Programs in a Single Address Space - Deller, Heiser (1999)   (2 citations)  (Correct)

....contents randomly, a significant number of TLB misses is expected, particularly in the traverse operations. Mungi is implemented on top of the L4 microkernel [EHL97] hence TLB misses are handled by L4. The microkernel s TLB miss handler is highly optimised and loads the TLB from a software cache [BKW94,EHL99] which is big enough to hold all page table entries required for the benchmark. However, the need to support 64 bit address spaces makes L4 s TLB miss handler inherently slower than what can be achieved in a system only supporting 32 bit address spaces. Slightly slower handling of TLB misses in ....

Kavita Bala, M. Frans Kaashoek, and William E. Weihl. Software prefetching and caching for translation lookaside buffers. In Proceedings of the 1st Symposium on Operating Systems Design and Implementation, pages 243--253, Monterey, CA, USA, 1994. USENIX/ACM/IEEE.


Linking Programs in a Single Address Space - Deller, Heiser (1999)   (2 citations)  (Correct)

....contents randomly, a significant number of TLB misses is expected, particularly in the traverse operations. Mungi is implemented on top of the L4 microkernel [EHL97] hence TLB misses are handled by L4. The microkernel s TLB miss handler is highly optimised and loads the TLB from a software cache [BKW94,EHL99] which is big enough to hold all page table entries required for the benchmark. However, the need to support 64 bit address spaces makes L4 s TLB miss handler inherently slower than what can be achieved in a system only supporting 32 bit address spaces. Slightly slower handling of TLB misses in ....

Kavita Bala, M. Frans Kaashoek, and William E. Weihl. Software prefetching and caching for translation lookaside buffers. In Proceedings of the 1st Symposium on Operating Systems Design and Implementation, pages 243--253, Monterey, CA, USA, 1994. USENIX/ACM/IEEE.


A New Virtual Memory Implementation for L4/MIPS - Szmajda (1999)   (Correct)

....routine, including one instruction in the inner loop. Furthermore, the LPC trie loop is expected to execute fewer iterations in dense address spaces. 5. 4 TLB Cache Refill Atechnique to improve the average time required for TLB refill is to use a TLB cache,also called an L2 TLB or a software TLB [26, 27]. ATLB cache is a software maintained cache of translations used by the refill handler to refill the hardware TLB. Atrue page table must also be maintained, to be looked up if the TLB cache misses. Adirect mapped 16,384 entry TLB cache forms the critical refill path of L4 MIPS. On a TLB cache ....

Kavita Bala, M. Frans Kaashoek, William E. Weihl, Software Prefetching and Caching for Translation Lookaside Buffers,Proceedings of the First Symposium on Operating System Design and Implementation (OSDI), pp. 243--253, Novermber 1994.


Reevaluating Online Superpage Promotion with Hardware Support - Zhen Fang Lixin (2001)   (3 citations)  (Correct)

.... TLB performance bottleneck range from changing the TLB structure to retain more of the working set (e.g. multi level TLB hierarchies [1, 8] to implementing better management policies (in software [10] or hardware [9] to masking TLB miss latency by prefetching entries (again, in software [2] or hardware [25] All of these approaches can be improved by exploiting superpages. Most commercial TLBs support superpages, and have for several years [16, 28] but more research is needed into how best to make general use of them. Chen et al. 6] suggest the possibility of using variable page ....

K. Bala, F. Kaashoek, and W. Weihl. Software prefetching and caching for translation buffers. In Proc. of the First Symposium on Operating System Design and Implementation, pp. 243--254, Nov. 1994.


Extensible Operating Systems - Maheshwari (1994)   (Correct)

....default, any TLB misses are forwarded to the application through an upcall to a specified location. As an optimization, the kernel maintains a software TLB in memory to provide a bigger cache of PTEs than the hardware TLB; also, the application can prefetch entries into the TLB ahead of demand [BKW94] The application manages its own page table in the user space to handle TLB misses. To avoid the upcall into the user space on a TLB miss, the application can install its own TLB miss handler into the kernel, which can then access the page table in the user space. Also, if the kernel wants to ....

K. Bala, M.F. Kaashoek, and W.E. Weihl. Software prefetching and caching for translation lookaside buffers. In Proc. of the first Symp. on OSDI, June 1994.


Virtual Memory In A 64-Bit Microkernel - Elphinstone (1999)   (1 citation)  (Correct)

....approaches use an LVA for one level of page table entries, and store the page table entries for the LVA itself in physical memory, thus limiting nesting of TLB misses to one level. Examples of this approach include caching higher level page table entries in a software TLB to reduce nesting [BKW94] or storing higher level page table entries completely physically in an MPT or an IPT (as described in following section) However, if the address space is sparsely allocated, the need to allocate page tables page wise will lead to high memory overhead independent of nesting. 3.4 Inverted Page ....

....refill handler fails to find a matching entry, then a trap to a software routine is taken which can look up the entry in another page table, or follow an overflow chain if a hashed page table is used. A similar approach is taken on the PowerPC [Pow97] A completely software based approach [BKW94] is also possible. In this particular approach the general exception handler was modified to detect kernel TLB 72 CHAPTER 5. GPTS WITH A SOFTWARE TLB 73 misses early in the exception handler, which then used a direct mapped 1 software TLB to cache entries and reduce the kernel PTE refill ....

[Article contains additional citation context not shown here]

Kavita Bala, M. Frans Kaashoek, and William E. Weihl. Software prefetching and caching for translation lookaside buffers. In First Symposium on Operating Systems Design and Implementations, pages 243--253. Usenix, 1994. 123 BIBLIOGRAPHY 124


Operating System Support for Persistent Systems: Past, Present .. - Dearle, Hulse (2000)   (5 citations)  (Correct)

....the implementer. In such systems, the implementer is free to represent the virtual to physical mappings using whatever data structure best suits the application domain. Many of the library operating systems have implemented facilities that expose the soft TLB architecture provided by the hardware [8]. However, on the Pentium platform that hosts Charm, the format of page tables is mandated by the architecture and little software control over the TLB is possible. In order to give arenas as much self autonomy as possible and obviate the need to replicate meta data, each arena has read only ....

K. Bala, M.F. Kaashoek, and W.E. Weihl "Software Prefetching and Caching for Translation Lookaside Buffers", in Proceedings of First Symposium on Operating System Design and Implementation (OSDI) , Monterey, California, 1994.


Associativity Revisited - A study of Set, Column, and.. - Channon, Lai, Koch   (Correct)

....use in the TLB s placement policy. We have found them to be more efficient than the standard set associative design with the benefit of having a similar hardware complexity. In addition, we consider the application of these alternative techniques to the implementation of software secondary TLBs [3] which have large number of entries and a small associative set. The policies have been adapted from data and instruction cache proposals. Agarwal and Pudar [2] have suggested the column associative cache as a method to reduce conflict misses within a direct mapped cache. Seznec [12] introduces ....

....replacement methods, instead of 2 way set associative, are superior to smaller fully associative TLBs using the same page size. ffl We introduce the application benefits and costs of the skewed associative replacement method for software secondary TLB (STLB) management. Software secondary TLBs [3] have large number of entries with small set associative policies. The application of the superior skewed associative management allows the number of entries to be reduced while maintaining same miss ratio. This 2 improves STLB efficiency and reduces kernel memory consumption. 1.2 Rest of the ....

[Article contains additional citation context not shown here]

K Bala, M Kaashoek and W Weihl. Software prefetching and caching for translation lookaside buffers. In First Symposium on Operating Systems Design and Implementation (OSDI), 1994.


Level Two Translation Lookaside Buffers - Callaghan, Hoque, Rotenberg (1995)   (Correct)

....without the prohibitive cost in chip area. 2 Another approach to the TLB problem is to tolerate L1 TLB misses but to reduce the penalty associated with TLB miss handling. Several papers within the last two years take this approach with some form of L2 TLB. DeLano et al. [4] and Bala et al. [2] both cached PTEs in a software L2 TLB. On an L1 TLB miss, this L2 TLB is searched first before a page table walk is initiated. An interesting result is that in addition to reducing miss handling time, an L2 TLB reduces the number of nested TLB misses which are apt to occur in hierarchical page ....

.... to reducing miss handling time, an L2 TLB reduces the number of nested TLB misses which are apt to occur in hierarchical page table organizations (all but the top most level of the page table hierarchy are located in virtual space, requiring accesses to the page tables to go through the TLB)[2]. Borkenhagen et al. [15] describe a commercial processor implementation with a hardware L2 TLB (512 entry, 4 way associative) which has the advantage of only a 2 cycle penalty for an L2 hit. Taylor et al. [18] implemented a unique form of L1 TLB called the TLB slice. The cache in this ....

Kavita Bala, M. Frans Kaashoek and William E. Weihl. Software Prefetching and Caching for Translation Lookaside Buffers. In Proceedings of the First Symposium on Operating System Design and Implementation, November 1994.


A Comparison of Online Superpage Promotion Mechanisms - Fang, Zhang (1999)   (Correct)

.... growing TLB performance bottleneck range from changing the TLB structure to retain more of the working set (e.g. multi level TLB hierarchies [1, 9] to implementing better management policies (in software [12] or hardware [11] to masking TLB miss latency by prefetching entries (in software [2] or hardware [23] All of these approaches can be improved by exploiting superpages. Most TLBs now support superpages, and have for several years [16, 27] but more research is needed into how best to make general use of this capability. Chen et al. 7] suggest the possibility of using variable ....

K. Bala, F. Kaashoek, and W. Weihl. Software prefetching and caching for translation bu ers. In Proc. of the First OSDI, pp. 243-254, Nov. 1994. 21


Page Tables for 64-Bit Computer Systems - Elphinstone, Heiser, Liedtke (1999)   (Correct)

....A single GPT tree can arbitrarily mix node sizes, even on the same level. Multiple page size can also easily be supported. 2. 4 Software TLB cache Instead of just using a page table for handling TLB misses, a software cache for TLB entries, called software TLB (STLB) or secondary TLB, can be used [Bala et al. 1994]. The TLB miss handler first attempts to load the missing entry from the STLB and only on a miss consults the proper page table. This can significantly speed up TLB miss handling when using forward mapped page tables. Tagging the entries of the STLB with address space IDs allows sharing it ....

Bala, Kavita, Kaashoek, M. Frans, and Weihl, William E. (1994). Software prefetching and caching for translation lookaside buffers. In Proceedings of the 1st Symposium on Operating Systems Design and Implementation, pages 243--253, Monterey, CA, USA. usenix.


Fast Address-Space Switching on the StrongARM SA-1100 Processor - Wiggins, Heiser (2000)   (Correct)

....cached data. This complicates handling somewhat, as in some circumstances the TLB must also be flushed, which wouldn t be necessary if the fault arose as the result of a TLB miss. The CPD therefore caches PD entries from several different address spaces, similar to a direct mapped software TLB [2]. As long as there are enough domains for all threads wishing to execute, no flushes are required. Caches and TLBs must be flushed whenever a valid CPD entry is replaced. However, the ARM only supports 16 domains, which is much less than the number of concurrently running tasks a general purpose ....

K. Bala, M. F. Kaashoek, and W. E. Weihl. Software prefetching and caching for translation lookaside buffers. In Proc. 1st OSDI, pages 243--253, Monterey, CA, USA, 1994. USENIX/ACM/IEEE.


Exterminate All Operating System Abstractions - Engler, Kaashoek (1995)   (27 citations)  Self-citation (Kaashoek)   (Correct)

.... manner by (perhaps) saving a few scratch registers in some agreed upon location in application space and then jumping to an application specified PC address [24] Of course, all of these operations can be sped up by downloading application code into the kernel [4, 9] or using a software TLB [14, 3] to cache translations. These implementation techniques aside, the full functionality provided by the underlying hardware should be exposed (e.g. reference bits, the ability to disable caching on a page basis, the ability to use different pagesizes, etc. Process The only state that the ....

K. Bala, M.F. Kaashoek, and W.E. Weihl. Software prefetching and caching for translation lookaside buffers. In Proceedings of the First Symposium on OSDI, pages 243--253, June 1994.


AVM: Application-Level Virtual Memory - Dawson Engler Sandeep (1995)   (24 citations)  Self-citation (Kaashoek)   (Correct)

....system s TLB refill code into the kernel (this code can be sandboxed to ensure fault isolation [23] or by reducing the number of TLB misses that the AVM system must handle. Aegis uses the latter approach: it overlays the hardware TLB with a large software TLB (STLB) to absorb capacity misses [3, 13]. On a TLB miss, Aegis first checks to see whether the required mapping is in the STLB; if so, Aegis installs it and resumes execution. Otherwise, the miss is forwarded to the application. Currently we use a unified STLB. It is a directmapped, resides in unmapped physical memory, and on an STLB ....

K. Bala, M.F. Kaashoek, and W.E. Weihl. Software prefetching and caching for translation lookaside buffers. In Proceedings of the First Symposium on OSDI, pages 243--253, June 1994.


Transparent Operating System Support for Superpages - Navarro (2002)   (4 citations)  (Correct)

No context found.

K. Bala, M. F. Kaashoek, and W. E. Weihl. Software prefetching and caching for translation lookaside buffers. In Proceedings of the First Symposium on Operating Systems Design and Implementation, pages 243--253, 1994.


Dynamic Prefetching in the Virtual Memory Window of.. - Vuletic, Pozzi, Ienne   (Correct)

No context found.

K. Bala, M. F. Kaashoek, and W. E. Weihl. Software prefetching and caching for translation lookaside bu#ers. In In the Proceedings of the First USENIX Symposium on Operating Systems Design and Implementation (OSDI, 1994), pages 243--53, Monterey, Calif., Nov. 1994. USENIX Assoc.


Trap-driven Memory Simulation - Uhlig (1995)   (2 citations)  (Correct)

No context found.

Bala, K., Kaashoek, M. F. and Weihl, W. E. Software prefetching and caching for translation lookaside buffers. In Proceedings of the First Symposium on Operating Systems Design and Implementation, Monterey, CA, 243-253, 1994.


Maximizing Memory Bandwidth for Streamed Computations - McKee (1995)   (7 citations)  (Correct)

No context found.

K. Bala, M.F. Kaashoek, and W.E. Weihl, "Software Prefetching and Caching for Translation Lookaside Buffers", Proceedings of the Usenix First Symposium on Operating Systems Design and Implementation (OSDI), published as ACM Operating Systems Review, 28(5):243-253, Winter 1994.


A New Page Table for 64-bit Address Spaces - Talluri, Hill, Khalidi (1995)   (20 citations)  (Correct)

No context found.

Kavita Bala, M. Frans Kaashoek, William E. Weihl. Software prefetching and caching for translation lookaside buffers. In Proc. First Symposium on Operating System Design and Implementation (OSDI), pages 243--253, November 1994.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC