52 citations found. Retrieving documents...
J. Bradley Chen, Anita Borg, and Norman P. Jouppi. A simulation based study of TLB performance. In Proc. 19th ISCA. ACM, 1992.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Legba: Fast Hardware Support for Fine-Grained Protection - Wiggins, Winwood, Tuch.. (2003)   (1 citation)  (Correct)

....and since the TLB is on the processor core, TLB capacity is generally limited to, at most, a few hundred entries. Consequently, TLB coverage is inherently limited, and would be further degraded by smaller page sizes. The inadequate coverage of modern TLBs has been highlighted by several studies [23 26]. Several attempts have been made to address this, including super pages [22] sub blocking [27] in memory translation [28] virtually addressed memory hierarchies [29,30] in cache translation [31] and even software managed address translation [32] However, all these studies focused on ....

J. Bradley Chen, Anita Borg, and Norman P. Jouppi. A simulation based study of TLB performance. In Proc. 19th ISCA. ACM, 1992.


Itanium Page Tables and TLB - Chapman, Wienand, Heiser (2003)   (Correct)

....of reducing the consumption of TLB entries (or increase TLB coverage) in scenarios where sharing of pages is significant. TLB coverage has been identified as a potential bottleneck in system performance, with TLB miss handling overheads of 20 40 reported even on single tasked benchmarks [CBJ92,HH93,Tal95,KS02] although generally with software loaded TLBs) Linux presently only supports the short format VHPT. This means it cannot support sharing of TLB entries, which is supported by the architecture only with the long VHPT format. The long format is also required to support mixing ....

J. Bradley Chen, Anita Borg, and Norman P. Jouppi. A simulation based study of TLB performance. In Proceedings of the 19th International Symposium on Computer Architecture (ISCA). ACM, 1992. 3


Enhancing IA-64 Memory Management - Au, Heiser (2000)   (Correct)

....which can be associated with each process and are used in addition to the virtual page number to match an entry in the TLB. This negates the need for a TLB flush per context switch but still requires duplicate TLB entries for essentially same mappings in different contexts processes. Past studies [1,2] have shown that TLB handling costs can take up a significant part of an application s processing time. TLB coverage is one of the major factors in determining the TLB miss rate and hence the impact of TLB costs on application performance [2] TLB miss handling overhead is turning into a ....

....in different contexts processes. Past studies [1,2] have shown that TLB handling costs can take up a significant part of an application s processing time. TLB coverage is one of the major factors in determining the TLB miss rate and hence the impact of TLB costs on application performance [2]. TLB miss handling overhead is turning into a bottleneck with real memory sizes increasing at a rapid rate, while TLB sizes are remaining essentially constant [11] TLB coverage refers to how much memory the TLB can map. This in turn is directly related to the number of TLB entries and the page ....

J. Bradley Chen, Anita Borg, and Norman P. Jouppi. A simulation based study of TLB performance. In Proceedings of the 19th International Symposium on Computer Architecture (ISCA). ACM, 1992.


Characterizing the d-TLB Behavior of SPEC CPU2000 Benchmarks - Kandiraju, Sivasubramaniam (2002)   (Correct)

....recognized, the TLB has been the target for several optimizations to reduce access latency, miss rates and miss handling overheads. With regard to TLB structures themselves, there have been investigations on suitable organizations in terms of size, associativities and multi level organizations [31, 23, 6]. Superpaging is a concept that has been proposed to boost TLB coverage. Hardware and software techniques for supporting this mechanism have come under a lot of scrutiny [31, 32, 11] Most prior work in TLB optimizations has targeted lowering miss rates or miss handling costs. It is only recently ....

....[31, 15, 23] have looked at hardware TLB structures organization and their impact on system performance in terms of capacity and or associativity. While some of these have focussed on single (monolithic) TLBs, there have been studies which have investigated the benefits of multi level TLBs [6, 2]. There are also implementations of multi level TLBs in commercial processors such as MIPS R4000, Hal s SPARC64, IBM AS 400 PowerPC, We would like to differentiate between the terms software TLB management and software TLB handling in this paper. We use the latter to denote that the miss ....

[Article contains additional citation context not shown here]

J. B. Chen, A. Borg, and N. P. Jouppi. A Simulation Based Study of TLB Performance. In Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 114--123, 1992.


The Impulse Memory Controller - Lixin Zhang Zhen (2001)   (4 citations)  (Correct)

....online software policies for dynamically remapping pages to improve cache performance [6, 39] Competitive algorithms have been used to help increase the efficiency of other operating system functions and resources, including paging, synchronization, and file cache management. Chen et al. [11] report on the performance effects of various TLB organizations and sizes. Their results indicate that the most important factor for minimizing the overhead induced by TLB 30 misses is reach, the amount of address space that the TLB can map at any instant in time. Even though the SPEC benchmarks ....

J. B. Chen, A. Borg, and N. P. Jouppi. A simulation based study of TLB performance. In Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 114--123, May 1992.


Reevaluating Online Superpage Promotion with Hardware Support - Zhen Fang Lixin (2001)   (3 citations)  (Correct)

.... software policies for dynamically remapping pages to improve cache performance [3, 23] Competitive algorithms have been used to help increase the efficiency of other operating system functions and resources, including paging [26] synchronization [12] and file cache management [4] Chen et al. [6] report on the performance effects of various TLB organizations and sizes. Their results indicate that the most important factor for minimizing the overhead induced by TLB misses is reach, the amount of address space that the TLB can map at any instant in time. Even though the SPEC benchmarks they ....

....by prefetching entries (again, in software [2] or hardware [25] All of these approaches can be improved by exploiting superpages. Most commercial TLBs support superpages, and have for several years [16, 28] but more research is needed into how best to make general use of them. Chen et al. [6] suggest the possibility of using variable page sizes to improve TLB reach, but do not explore the implications of their use. Khalidi [13] and Mogul [17] discuss the benefits of systems that support superpages, and advocate static allocation via compiler or programmer hints. Talluri et al. 18] ....

J. B. Chen, A. Borg, and N. P. Jouppi. A simulation based study of TLB performance. In Proc. of the 19th ISCA, pp. 114--123, May 1992.


WCDRAM: A fully associative integrated Cached-DRAM with wide.. - Kedem, Koganti (1997)   (1 citation)  (Correct)

....made independent of the block size, then a fully associative cache with large blocks would be very effective. This is supported by the fact that Translation Look aside Buffers (TLBs) which cache page (1KB 8KB) translations, have very high hit rates even though they have very few entries (32 128) [5], when compared to data caches. A technique for reducing cache block transfer time and thereby the miss penalty, is to place the SRAM cache and the DRAM memory on the same IC. By doing this, the internal bandwidth available within the IC can be used to transfer the data between the cache and the ....

J. Bradley Chen, Anita Borg and Norman P. Jouppi, "A Simulation Based Study of TLB Performance", WRL Research Report 91/2, Digital Western Research Laboratory, CA, May 1992, http://www.research.digital.com:80/wrl/publications/pubslist.html.


Virtual Memory In A 64-Bit Microkernel - Elphinstone (1999)   (1 citation)  (Correct)

....less than 5 [CE85] of overall runtime. However in recent studies, miss handling is not unknown to contribute 40 of application runtime [HH93] Various methods have been proposed to combat increasing TLB miss ratios. Associativity trade offs and changes [NUS 93, CLK97] micro TLBs [CBJ92] variable page sizes [CBJ92, TKHP92, KTNW93, ROKB95] and subblocking [Tal95] have been examined and incremental improvements made in effective TLB coverage. TLBs have been removed altogether in some experimental systems [WEG 86, CSD86, JM97] which perform address translation in the cache; ....

....of overall runtime. However in recent studies, miss handling is not unknown to contribute 40 of application runtime [HH93] Various methods have been proposed to combat increasing TLB miss ratios. Associativity trade offs and changes [NUS 93, CLK97] micro TLBs [CBJ92] variable page sizes [CBJ92, TKHP92, KTNW93, ROKB95] and subblocking [Tal95] have been examined and incremental improvements made in effective TLB coverage. TLBs have been removed altogether in some experimental systems [WEG 86, CSD86, JM97] which perform address translation in the cache; however, the tech CHAPTER 4. ....

[Article contains additional citation context not shown here]

J. Bradley Chen, Anita Borg, and Norman P. Jouppi. A simulation based study of TLB performance. In 19th International Symposium on Computer Architecture, May 1992.


Multi-Dimensional Translation Lookaside Buffers - Channon, Koch (1996)   (1 citation)  (Correct)

....sets of current and future workloads. The case that current virtual memory techniques will no longer be valid for the next generation of machines has been argued by others [26] Several ideas have been introduced to increase TLB reach. For example, trading associativity for increasing TLB entries [4, 14, 16] to increase TLB reach (address coverage) using the same silicon resources, multiple page sizes [14, 26, 18] to reduce the number of contiguously mapped pages and subblocking [25] to have more translations 1 held for each TLB cell. The design tradeoffs of silicon resource requirements, TLB access ....

J B Chen, A Borg and N P Jouppi. A simulation based study of TLB performance. In Proc. 19th ISCA, pages 114--123, 1992.


Attribute Caches - Richardson, Flynn (1995)   (Correct)

....along with application I O requests. Understanding the nature of application I O requests can drive I O cache performance improvements or application I O optimizations. A. 1 Trace Collection A new version of the WRL tracing facilities collected traces on DECstation 5000 s running ULTRIX [4, 5]. Its kernel based approach traces all processes. The modified system logs system call information in a physically mapped trace buffer. On an I O system call, the call type, process ID, and call parameters are entered in the buffer. On return from the system call, the return value, error status ....

J. B. Chen, A. Borg, and N. P. Jouppi. A simulation based study of TLB performance. In The 19th Annual International Symposium on Computer Architecture, pages 114--123. IEEE Computer Society Press, May 1992.


A Comparison of Online Superpage Promotion Mechanisms - Fang, Zhang (1999)   (Correct)

.... software policies for dynamically remapping pages to improve cache performance [3, 21] Competitive algorithms have been used to help increase the eciency of other operating system functions and resources, including paging [24] synchronization [14] and le cache management [5] Chen et al. [7] report on the performance e ects of various TLB organizations and sizes. Their results indicate that the most important factor for minimizing the overhead induced by TLB misses is reach, the amount of address space that the TLB can map at any instant in time. Even though the SPEC benchmarks they ....

....latency by prefetching entries (in software [2] or hardware [23] All of these approaches can be improved by exploiting superpages. Most TLBs now support superpages, and have for several years [16, 27] but more research is needed into how best to make general use of this capability. Chen et al. [7] suggest the possibility of using variable page sizes to improve TLB reach, but do not explore the implications of their use. Khalidi et al. [15] and Mogul [17] discuss bene ts of systems that support superpages, advocating static allocation via compiler or programmer hints. Talluri et al. 18] ....

J. B. Chen, A. Borg, and N. P. Jouppi. A simulation based study of TLB performance. In Proc. of the 19th ISCA, pp. 114-123, May 1992.


Memory Reference Locality and Periodic Relocation in Main.. - Oksanen, Malmi   (Correct)

....by the TLB, a TLB miss has occurred and the mapping is resolved by referring to the data structures of the operating system. Current estimates for the cost of handling a TLB miss are between 20 and 40 instructions [10] but the overhead has been predicted to grow to 100 instructions in the future [4]. Currently TLBs contain from tens up to a few hundred TLB entries, each covering the mapping of one virtual memory page, typically 4 or 8 kB. A TLB can thus map at least several hundred kilobytes, often even megabytes. This is enough to cover all, or at least the most used part, of the working ....

J.B. Chen, A. Borg, and N. Jouppi, A Simulation Based Study of TLB Performance, Proceedings of the 19th Annual International Symposium on Computer Architecture, May 1992, pp. 114-123.


Memory Reference Locality and Periodic Relocation in Main.. - Kenneth Oksanen Lauri   (Correct)

....by the TLB, a TLB miss has occurred and the mapping is resolved by referring to the data structures of the operating system. Current estimates for the cost of handling a TLB miss are between 20 and 40 instructions [10] but the overhead has been predicted to grow to 100 instructions in the future [4]. Currently TLBs contain from tens up to a few hundred TLB entries, each covering the mapping of one virtual memory page, typically 4 or 8 kB. A TLB can thus map at least several hundred kilobytes, often even megabytes. This is enough to cover all, or at least the most used part, of the working ....

J.B. Chen, A. Borg, and N. Jouppi, A Simulation Based Study of TLB Performance, Proceedings of the 19th Annual International Symposium on Computer Architecture, May 1992, pp. 114-123.


I/O Component Characterization for I/O Cache Designs - Richardson   (Correct)

....application I O requests can drive I O cache performance improvements or application I O optimizations. This requires tracing application I O requests and including information about each request. Anew version of the WRL tracing facilities collected the traces on DECstation 5000 s running ULTRIX [3, 4]. Its kernel based approach traces all processes. Figure 1 shows the system configuration. The original system was designed primarily to study processor memory issues in a multi programmingenvironment. The modified system logs system call information, rather than instruction and data addresses, in ....

J. B. Chen, A. Borg, and N. P. Jouppi. A simulation based study of TLB performance. In The 19th Annual International Symposium on Computer Architecture, pages 114--123. IEEE Computer Society Press, May 1992.


Increasing TLB Reach Using Superpages Backed by Shadow Memory - Swanson, Stoller, Carter (1998)   (22 citations)  (Correct)

....Note that failures on write backs, which would be more difficult to handle since the processor is not awaiting a response, cannot happen the OS is required to flush the dirty data back to memory before swapping a page to disk and removing the corresponding mapping. 5. Related Work Chen et al.[4] report the performance effects of various TLB organizations and sizes. Their results agree with our premise that the most important factor for minimizing the overhead induced by TLB misses is TLB reach. They studied several SPECmarks programs, which have much smaller memory requirements than our ....

J. B. Chen, A. Borg, and N. P. Jouppi. A simulation based study of tlb performance. In Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 114--123, May 1992.


Tradeoffs in the Design of Single Chip Multiprocessors - Albonesi, Koren (1994)   (Correct)

....this amounts to 322500 rbe or less than the cost of one 64kB cache. Thus, we see that by removing one processor and its cache, we can increase the TLB in each of the remaining 15 processors to 512 entries. This should reduce the amount of TLB misses by over 50 from that of a 32 entry TLB[3]. In order to get a lower bound on the impact of this architectural change on performance, we use a technique called trace modification to conservatively modify the trace to emulate a trace taken from a processor with a 512 entry TLB. To accomplish this, we use a program that marks the TLB misses ....

J.B Chen, A. Borg, and N.P. Jouppi, "A Simulation Based Study of TLB Performance, " 19th International Symposium on Computer Architecture, pp. 114-123, May 1992.


Issues in Implementing Virtual Memory - Elphinstone, Russell, Heiser   (Correct)

....to minimise misses and increase coverage. 4.2.1 Software TLB management improvements can be divided into optimising replacement and placement of TLB entries, and minimising refill times. The likely improvements in performance that can be achieved by replacement policy are minor [UNS 94, CBJ92] Hardware designers typically choose random replacement, as the cost involved in implementing other policies in hardware outweigh the performance benefits to be gained. Performance improvements through software involvement in replacement policy are doubtful. The costs of software misses are a ....

....4.2.2 Hardware Currently, much research effort has been directed towards increasing TLB coverage, the focus being on increasing the number of TLB entries, increasing the page size, or using multiple page sizes. Increasing the number of TLB entries is an obvious solution, as illustrated by Chen [CBJ92] However, large fully associative structures are difficult to build. Reducing the associativityto increase the number of entries is a valid technique [UNS 94, TH94] However, it is not clear whether the number of entries could be increased sufficiently to cover a significant proportion of ....

[Article contains additional citation context not shown here]

J. Bradley Chen, Anita Borg, and Norman P. Jouppi. A simulation based study of TLB performance. In 19th International Symposium on Computer Architecture, May 1992.


Improving TLB performance - Youxin Gao   (Correct)

....TLB can be in the critical path of a memory access, good TLB performance is essential to good overall performance of a machine. An early study shows that TLB miss penalties consume 6 of all machine cycles and 4 of execution time, and hence can have a significant impact on machine performance [3]. This effect is even larger in today s modern computers which have larger memory sizes. A recent work investigates both TLB and cache performances in different systems by running several workloads, e.g. program development, database and engineering [5] Their study shows that the system spends ....

....system, and the situation becomes much worse for multi processor. Reducing TLB misses and miss penalties becomes increasingly important to overall performance. Like cache misses, TLB misses may be classified into three different categories, capacity miss, compulsory miss and conflict miss. [3] has shown that TLB miss is primarily dominated by capacity miss, because the mapping size of TLB, which is the product of the number of entries and the page size, is not big enough to map the entire working set of the program. Intuitively, capacity miss can be reduced by increasing the mapping ....

[Article contains additional citation context not shown here]

J.B. Chen, A. Borg and N.P. Jouppi, A Simulation-Based Study of TLB Performance, WRL Research Report, 91/2, 1991.


Efficient Address Translation Simulation - Channon, Koch, Hannaford (1995)   (Correct)

....becomes unable to map entire process working sets, MMU address translation performance becomes a very real limitation on real system performance [14] Cache performance studies are normally conducted experimentally. Address translation evaluation techniques generally utilise a software simulator [2, 5] because of the cost, complexity and difficulty of exact physical measurements from hardware prototypes. We have developed a simulation package that allows software models of memory management strategies to be tested against program reference streams. The implementation design process used modern ....

....be consumed with in the simulator. Simulation studies which are based on very long traces are not common as they can take months of CPU time. Gee et al. [10] studied the characteristics of the SPEC92 performance suite which involved over seven months of simulation across many machines. Borg at al [5] performed a series of very long trace based TLB studies. This paper is often cited as no other similar study has since been conducted. The factors determining the availability of such work are storage and generation requirements of long traces, and more importantly the time period for a ....

J B Chen, A Borg, and N P Jouppi. A simulation based study of TLB performance. In Proc. 19th ISCA, pages 114--123, 1992.


Improving TLB Miss Handling with Page Table Pointer Caches - Wu, Zwaenepoel (1997)   (Correct)

....a more costly hardware or software handler is invoked to load and insert the required page table entry into the TLB so that the access can proceed. For many programs, a TLB provides a very high hit rate and address translation overhead represents an insignificant part of the overall execution time [3]. However, programs with large working sets or applications that spend significant time communicating with other address spaces can spend as much as 50 of total execution time handling TLB misses [2, 6, 9] In the future, TLB miss handling time can become more of a problem because both hardware ....

.... halted to handle an asynchronous interrupt (with precise exception handling) Most recent work in TLB performance enhancement has revolved around the use of superpages to increase the reach of a basic TLB with some very simple extensions of the basic design and significant operating system support [3, 7, 9, 10]. There are several drawbacks to this approach. Larger pages reduce efficiency because they increase internal fragmentation and makes the unit of protection and sharing more coarse grained. Attempts to remedy the situation using a combination of small pages and larger pages when possible have ....

[Article contains additional citation context not shown here]

J. Bradley Chen, A. Borg, and N. Jouppi. A simulation based study of TLB performance. In Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 114--123, May 1992.


Improving the Address Translation Performance of Widely Shared .. - Yousef Khalidi (1995)   (3 citations)  (Correct)

....a problem because the operating system and user programs usually can choose virtual addresses that work with the common mask scheme. It is important to note that a common mask TLB can still be used when forward mapped page tables are used to share PTEs. TLB performance is also a well studied area [9,21,28,29] and various schemes proposed to increase TLB reach include use of superpages [14] and subblocking [15] that many commercial architectures use in their TLBs. These techniques improve TLB performance by increasing the TLB reach of a single TLB entry by storing mappings for multiple virtual pages ....

Bradley Chen, Anita Borg and Norman Jouppi, "A Simulation Based Study of TLB Performance", 19th Annual International Symposium on Computer Architecture, May 1992, pp. 114123.


Reducing TLB and Memory Overhead Using Online Superpage .. - Romer, Ohlrich.. (1995)   (26 citations)  (Correct)

....accordingly. Because modern systems typically incur a penalty of between 10 and 30 cycles per TLB miss [Kane Heinrich 92, Dutton et al. 92] any application with a working set larger than the TLB s coverage can spend a significant fraction of its time waiting for TLB misses to be serviced [Chen et al. 92, Bala et al. 94, Talluri et al. 92] This research was sponsored by the Office of Naval Research through a Research Initiation Award, and by an equipment grant from Digital Equipment Corporation. Bershad was partially supported by a National Science FoundationPresidential Young Investigator Award ....

....online decisions that result in performance within a constant factor of an optimal offline algorithm. Prior research in this area has influenced, for example, the design of synchronization [Karlin et al. 91] paging [Sleator Tarjan 85] and cache management algorithms [Cao et al. 94] Others [Chen et al. 92, Mogul 93, Khalidi et al. 93] have described the potential positive impact of a system that supports superpages, although they do not describe policies for promotion or demotion. Instead, they suggest that the programmer or compiler offer the operating system a hint about the appropriate page ....

Chen, J. B., Borg, A., and Jouppi, N. P. A Simulation-based Study of TLB Performance. In Proceedings of the 19th Annual Symposium on Computer Architecture, pages 114--123, May 1992.


Experimental Methodology: Issues and Practice - David Channon   (Correct)

....community. The common practice of generating experimental results for memory system analysis has produced well established techniques and approaches to areas such as cache analysis. Cache and address translation evaluation techniques generally utilise a software simulator [Borg et al. 1990; Chen et al. 1992] because of the cost, complexity and difficulty of exacting physical measurements from hardware prototypes. Unfortunately, most studies conducted are difficult if not impossible to validate or duplicate accurately. 2 D. Channon D. Koch Although researchers use simulators to gather results the ....

....10000 100000 1000 100 HITS PER MISS 172 3663 6945 716 90 29 20833 9615 36 60 1855 23255 38462 391 34483 58823 62500 2k pages 1k pages 410 32 19 16 76 42 8 15 43 138 22 203 64 KB reach 128KB reach 512KB reach 256KB reach Fig. 1. First Version] TLB behavior with the same reach. Chen et al. [Chen et al. 1992] performed a similar experiment using very long traces to generate their results. Comparing the results, the graph has the same general shape and identical trends. Is this sufficient to conclude that the experiment has been satisfactorily replicated The differences in the experiments include Chen ....

[Article contains additional citation context not shown here]

Chen, J B, Borg, A, and Jouppi, N P (1992). A simulation based study of TLB performance. In Proceedings of the 19th ISCA, pages 114--123.


Design Tradeoffs for Software-Managed TLBs - Uhlig, Nagle, Stanley, Mudge.. (1993)   (62 citations)  (Correct)

....1985 study, Clark and Emer examined the cost of hardware TLB management by monitoring a VAX 11 780. For their workloads, 5 to 8 of a user program s run time was spent handling TLB misses [9] More recent papers have investigated the TLB s impact on user program performance. Chen, Borg and Jouppi [6], using traces generated from the SPEC benchmarks, determined that, for a reasonable range of page sizes, the amount of the address space that could be mapped was more important than the page size chosen in determining TLB miss rate. Talluri et al. 25] have shown that although older TLBs (as in ....

....TLB Miss Tapeworm Hooks TLB Miss Handlers Tapeworm Kernel Code (Unmapped Space) Simulated TLB (128 Slots) Page Tables (Mapped Space) Actual TLB (64 Slots) Policy Functions Uhlig et al. 5 age resources. Some researchers have overcome the storage resource problem by consuming traces on the fly [2, 6]. This technique requires that system operation be suspended for extended periods of time while the trace is processed, thus introducing distortion at regular intervals. Third, tracedriven simulation assumes that address traces are invariant to changes in the structural parameters or management ....

[Article contains additional citation context not shown here]

Chen, J.B., A. Borg, and N.P. Jouppi. A simulation based study of TLB performance. in The 19th Annual International Symposium on Computer Architecture. 1992. Gold Coast, Australia: IEEE.


TSF: An Object Oriented Address Translation Simulation Framework - David Channon (1996)   (Correct)

....in this area. A number of researchers have proposed alternative ideas to the traditional forward mapped page table combined with a fixed page size design. Several ideas have been introduced to increase TLB reach. Some alternative ideas are: ffl Trading associativity for increasing TLB entries [3] to increase TLB reach (address coverage) using the same silicon resources. ffl Multiple page sizes [17, 28, 22] to reduce the number of contiguously mapped pages. ffl Subblocking [27] to increase the number of translations held for each fully associative TLB cell. ffl Hash indexed page table ....

....mechanism, the MMU. The MMU is usually composed of TLB hardware to translate process virtual addresses to physical memory, and software managed page tables to deal with the limited capacity of TLB hardware. Address translation evaluation techniques generally utilise a software simulator [2, 3] because of the cost, complexity and difficulty of exact physical measurements from hardware prototypes. 2.1 Problem domain Although researchers use simulators to gather results the tools are not usually available to other researchers. To our knowledge there are no freely available specialised ....

[Article contains additional citation context not shown here]

J B Chen, A Borg and N P Jouppi. A simulation based study of TLB performance. In Proc. 19th ISCA, pages 114--123, 1992.


High-Bandwidth Address Translation for Multiple-Issue Processors - Austin, Sohi (1996)   (11 citations)  (Correct)

....restrict all instructions fetched in a single cycle to be within the same virtual memory page, requiring at most one translation per cycle. Instruction fetch translation is well served by a single ported instruction TLB or by a small micro TLB implemented over a unified instruction and data TLB [CBJ92] The rest of this paper is organized as follows: Section 2 describes our framework for address translation and qualitatively explores the impact that address translation latency and bandwidth have on system performance. Section 3 details the mechanisms proposed for high bandwidth address ....

....multi level TLBs; Hal s SPARC64 [Gwe95] and IBM s AS 400 64 bit PowerPC [BHIL94] processors both implement multi level TLBs to meet the latency and bandwidth needs of their respective designs. Multi level TLB designs have long been used for reducing the latency of instruction fetch translations [CBJ92] 3.4 Piggyback Ports Piggyback ports, shown in Figure 3a, exploit spatial locality in simultaneous address translation requests. When simultaneous requests arrive at a TLB port, requests with identical virtual page addresses may be satisfied by the same TLB access. To implement piggybacking, ....

J. B. Chen, A. Borg, and N. P. Jouppi. A simulation based study of TLB performance. Proc. of the 19th Annual International Symposium on Computer Architecture, 19(2):114--123, May 1992.


Using Virtual Memory to Improve Cache and TLB Performance - Romer (1998)   (3 citations)  (Correct)

....policies for selecting page sizes. Implementing the superpage construction policies described in this paper could use their mechanisms for controlling page sizes, but would require extensions to provide feedback on TLB performance to the application. ffl In Chen et al. s study of TLB performance [Chen et al. 92] the authors suggest using compiler feedback to identify regions of the program text that could be mapped by larger pages, and tuning memory allocation algorithms to map frequently used data to contiguous regions in order to facilitate the use of large pages. They do not evaluate these ....

....spent between 5.2 and 41.4 of their time in the TLB miss handler. Improving TLB performance requires either reducing the number of TLB misses incurred by an application, or reducing the cost of individual TLB misses. While some recent research has focused on reducing the cost of TLB misses [Chen et al. 92, Bala et al. 94, Uhlig et al. 94, Austin Sohi 96] TLB miss paths are already highly optimized, with a single miss costing as little as 10 30 cycles [Kane Heinrich 92, Dutton et al. 92] The other alternative for improving TLB performance is to reduce the number of TLB misses. One approach ....

J. B. Chen, A. Borg, and N. P. Jouppi. A Simulation Based Study of TLB Performance. In Proceedings of the 19th Annual Symposium on Computer Architecture, pages 114--123. IEEE, May 1992.


Hardware And Software Mechanisms For Reducing Load Latency - Austin (1996)   (1 citation)  (Correct)

....turned to alternative TLB organizations with better latency and bandwidth characteristics; for example, Hal s SPARC64 [Gwe95] and IBM s AS 400 64 bit PowerPC [BHIL94] processor both implement multi level TLBs. Many processors implement multi level TLBs for instruction fetch translation as well [CBJ92] This chapter extends the work in high bandwidth address translation design by introducing four designs with better latency and area characteristics than a multi ported TLB. Using detailed timing simulation, the performance of the proposed high bandwidth designs was compared to the performance ....

....restrict all instructions fetched in a single cycle to be within the same virtual memory page, requiring at most one translation per cycle. Instruction fetch translation is well served by a single ported instruction TLB or by a small micro TLB implemented over a unified instruction and data TLB [CBJ92] The remainder of this chapter is organized as follows. Section 5.2 describes a performance model for address translation and qualitatively explores the impact of address translation latency and bandwidth on system performance. Section 5.3 details the proposed mechanisms for high bandwidth ....

[Article contains additional citation context not shown here]

J. B. Chen, A. Borg, and N. P. Jouppi. A simulation based study of TLB performance. Proceedings of the 19th Annual International Symposium on Computer Architecture, 19(2):114--123, May 1992.


Software-Managed Address Translation - Jacob, Mudge (1997)   (19 citations)  (Correct)

....efforts [37] For example, the Intel Pentium Processor User s Manual devotes 100 of its 700 pages to memory management structures [31] most of which exist for backward compatibility and are unused by today s system software. Typical virtual memory systems exact a run time overhead of 5 10 [4, 9, 41, 47], an apparently acceptable cost that has changed little in ten years [14] despite significant changes in cache sizes and organizations. However, several recent studies have found that the handling overhead of memory management hardware can get as high as 50 of application execution time [1, 28, ....

....register, points to the process control block of the active process. Virtual address causes a CacheMiss exception Virtual address causes a CacheMiss exception (physical cacheable address) 20 bits 5 Discussion Many studies have shown that significant overhead is spent servicing TLB misses [1, 4, 9, 28, 41, 44, 47]. In particular, Anderson, et al. 1] show TLB miss handlers to be among the most commonly executed primitives, Huck and Hays [28] show that TLB miss handling can account for more than 40 of total run time, and Rosenblum, et al. 44] show that TLB miss handling can account for more than 80 of ....

J. B. Chen, A. Borg, and N. P. Jouppi. "A simulation based study of TLB performance." In Proc. 19th Annual International Symposium on Computer Architecture (ISCA 19), May 1992.


Design Tradeoffs for Software-Managed TLBs - Nagle, Uhlig, Stanley, Mudge.. (1993)   (62 citations)  (Correct)

....1985 study, Clark and Emer examined the cost of hardware TLB management by monitoring a VAX 11 780. For their workloads, 5 to 8 of a user program s run time was spent handling TLB misses [5] More recent papers have investigated the TLB s impact on user program performance. Chen, Borg and Jouppi [6], using traces generated from the SPEC benchmarks, determined that the amount of physical memory mapped by the TLB is strongly linked to the TLB miss rate. For a reasonable range of page sizes, the amount of the address space that could be mapped was more important than the page size chosen. ....

....realistic system wide address traces that account for multiprocess workloads and the operating system itself [5, 15] Second, tracedriven simulation can consume considerable processing and storage resources. Some researchers have overcome the storage resource problem by consuming traces on the fly [6, 15]. This technique requires that system operation be suspended for extended periods of time while the trace is processed, thus introducing distortion at regular intervals. Third, trace driven simulation assumes that address traces are invariant to changes in the structural parameters or management ....

[Article contains additional citation context not shown here]

Chen, J.B., A. Borg, and N.P. Jouppi. A simulation based study of TLB performance. in The 19th Annual International Symposium on Computer Architecture. 1992. Gold Coast, Australia: IEEE.


Computer architecture research in the Dept. of Computer.. - Channon, Koch, Hannaford   (Correct)

....in this area. A number of researchers have proposed alternative ideas to the traditional forward mapped page table combined with a fixed page size design. Several ideas have been introduced to increase TLB reach. Some alternative ideas are: Trading associativity for increasing TLB entries [3] to increase TLB reach (address coverage) using the same silicon resources. Multiple page sizes [15, 23, 20] to reduce the number of contiguously mapped pages. Subblocking [22] to increase the number of translations held for each fully associative TLB cell. Hash indexed page table [12, ....

....of different address translation paradigms and implementations. In addition TSF is designed to facilitate peer validation of results and methodology. Cache performance studies are normally conducted experimentally. Address translation evaluation techniques generally utilise a software simulator [1, 3] because of the cost, complexity and difficulty of exact physical measurements from hardware prototypes. TSF allows software models of memory management strategies to be tested against program reference streams. TSF uses modern software engineering techniques including design pattern principles ....

J B Chen, A Borg, and N P Jouppi. A simulation based study of TLB performance. In Proc. 19th ISCA, pages 114--123, 1992.


A Caching Model of Operating System Kernel Functionality - Cheriton, Duda (1994)   (57 citations)  (Correct)

....thrash the second level data cache as well as the Cache Kernel memory mappings. Moreover, a program that has poorer page locality than we have hypothesized (i.e. less than four percent usage of pages) also suffers a significant performance penalty from TLB miss behavior on most architectures [3]. For example, we measured up to a 25 percent degradation in performance in the MP3D program mentioned above from processors accessing particles scattered across too many pages. The solution with MP3D was to enforce page locality as well as cache line locality by copying particles in some cases as ....

J.B. Chen, A. Borg, and N.P. Jouppi. A simulationbased study of TLB performance. In Proc. 19th Annual Intl. Symposium on Computer Architecture, pages 114--123. ACM SIGARCH, IEEE Computer Society, May 1992.


Improving the Address Translation Performance of Widely.. - Khalidi, Talluri (1995)   (3 citations)  (Correct)

....is required in the operating system to keep track of the common mask regions. Liedtke also proposes a virtually indexed cache that supports unaligned aliases, SF cache [33] but does not share a single cache entry as our scheme does (Section 10.1) TLB performance is also a well studied area [9,21,28,29] and various schemes proposed to increase TLB reach include use of superpages [14] and subblocking [15] that many commercial architectures use in their TLBs. These techniques improve TLB performance by increasing the TLB reach of a single TLB entry by storing mappings for multiple virtual pages ....

Bradley Chen, Anita Borg and Norman Jouppi, "A Simulation Based Study of TLB Performance", 19th Annual International Symposium on Computer Architecture, May 1992, pp. 114123.


An Efficient Technique for Tracking Nondeterministic Execution .. - Elnozahy May (1995)   (Correct)

....Based on the experience gained from implementing the initial phases of the project, the extensions and applicability of these techniques to distributed settings will be investigated. 5 Related Work The effect of the memory hierarchy on system performance has long been an active area of research [1, 3, 6, 7, 8, 9, 16, 18, 19, 20]. A typical study consists of gathering a trace of memory references trace and feeding it to a simulator. Gathering the trace can be done in hardware [11, 17, 20] in firmware [2] or in software [4, 14] In hardware techniques, a device is inserted on the bus to monitor the traffic and collect ....

J.B. Chen, A. Borg, and N.P. Jouppi. A simulation based study of tlb performance. In Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 114--123, May 1992.


Implementation of Multiple Pagesize Support in HP-UX - Subramanian, Mather.. (1998)   (6 citations)  (Correct)

....carried out by looping over the 4KB based constituent VM data structures. Our system offers significant application performance improvement when using large pagesizes. 1 Introduction Translation Lookaside Buffer (TLB) misses can degrade the performance of applications with large working set sizes [2, 4, 18, 21, 23]. A TLB is a cache of recently accessed virtual to physical page translation information. The working set of a process is the memory actively referenced during a certain time interval [6] A typical TLB that performs translations using small pagesizes such as 4KB, cannot hold all the translations ....

J. B. Chen, A. Borg, and N. P. Jouppi. A Simulation Based Study of TLB Performance. In Proceedings of the 19th Annual International Symposium on Computer Architecture (ISCA), pages 114--123, May 1992.


TSF: An Object Oriented Address Translation Simulation Framework - David Channon (1996)   (Correct)

....problem in this area. A number of researchers have proposed alternative ideas to the traditional forward mapped page table combined with a fixed page size design. Several ideas have been introduced to increase TLB reach. Some of which are: ffl Trading associativity for increasing TLB entries [2] to increase TLB reach (address coverage) using the same silicon resources. ffl Multiple page sizes [15, 26, 20] to reduce the number of contiguously mapped pages. ffl Subblocking [25] to have more translations held for each fully associative TLB cell. ffl Hash indexed page table [10, 19] to ....

....experimentation in this field usually create their own tools and techniques. This represents a significant duplication of effort resulting in most of the work not being validated. Address translation evaluation techniques, including cache performance studies, generally utilise a software simulator [1, 2] because of the cost, complexity and difficulty of exact physical measurements from hardware prototypes. The usual approach is the combination of collected memory reference trace data and a purpose coded cache simulation [23] Memory reference models [27] can be an alternative, but their accuracy ....

[Article contains additional citation context not shown here]

J B Chen, A Borg and N P Jouppi. A simulation based study of TLB performance. In Proc. 19th ISCA, pages 114--123, 1992.


Trap-driven Memory Simulation - Uhlig (1995)   (2 citations)  Self-citation (Chen)   (Correct)

No context found.

Chen, J. B., Borg, A. and Jouppi, N. P. A simulation based study of TLB performance. In Proceedings of the 19th Annual International Symposium on Computer Architecture, Gold Coast, Australia, IEEE, 114-123, 1992.


The Impact of Software Structure and Policy on CPU and Memory.. - Chen (1994)   (5 citations)  Self-citation (Chen Borg)   (Correct)

....Finally, it allowed us to identify performance problems common to both systems, an indication of how current memory system designs are ill suited for operating system execution. Previous studies that consider operating system activity concentrate on variations in memory system structure [2, 3, 4, 16, 23, 57, 66]. Although several [2, 3, 4, 66] used address traces of both DEC Ultrix and DEC VMS operating systems, the data from the two systems was used primarily as a basis for identifying common behavior in both operating systems without drawing distinctions between them. Simulation of complete memory ....

....systems without drawing distinctions between them. Simulation of complete memory systems: Previous studies have concentrated on memory system components such as caches or TLBs in isolation, varying parameters such as cache size in an attempt to optimize behavior over a given workload set [3, 23, 38, 60]. Simulating a complete memory system has two important advantages. First, it permits an objective evaluation and comparison of operating system memory performance, considering the delays from all memory system components and not just one. Second, it permits a comparison of the relative importance ....

[Article contains additional citation context not shown here]

J. Bradley Chen, Anita Borg, and Norman P. Jouppi. A Simulation Based Study of TLB Performance. The Proceedings of the 19th Annual International Symposium on Computer Architecture, May, 1992, pp. 114-123.


Memory Behavior of an X11 Window System - Chen (1994)   (7 citations)  Self-citation (Chen)   (Correct)

....distinction between interactive workloads and more traditional benchmarks is their sensitivity to latency, which is the time required for the system to respond to a given input event. Analysis of memory system components such as caches and write buffers is common practice for throughput benchmarks [7, 8, 10]. However, interactive programs and client server systems have received relatively little attention in recent research. 3, 20] This is unfortunate in that, for many computer users, quick response time for latency critical interactive applications is more important than the throughput of batch ....

....systems such as X11 don t make good use of these new features, miss rates will go up, increasing the impact of the TLB on overall performance. Also, the penalty for a single TLB miss could increase. An earlier study used 100 cycles as an estimate of the TLB miss penalty for a futuristic machine [8]. As the balance of TLB to cache resources changes, TLB performance could become an important issue. X11 workloads require more TLB resources than popular benchmarks such as gcc. If computer systems are designed to optimize the performance of the popular benchmarks, systems such as X11 can be ....

J. Bradley Chen, Anita Borg, and Norman P. Jouppi. A Simulation Based Study of TLB Performance. The Proceedings of the 19th Annual International Symposium on Computer Architecture, May, 1992, pp. 114-123.


Software Methods for System Address Tracing: Implementation.. - Chen, Wall, Borg (1994)   (4 citations)  Self-citation (Chen Borg)   (Correct)

....our conclusions. 2. Previous Work Software methods have been applied extensively to study user only traces, yielding results in cache behavior [16, 17, 26, 28, 15] prefetching [6] the importance of long traces [5] the impact of context switches [20] and studies of TLB and page behavior [9, 18, 29]. These user only studies are useful but limited, as system activity can have a large impact on overall performance [2, 12, 30] More recent work documenting significant performance problems for system execution on RISC based computer systems [3, 24] suggests that system behavior needs more ....

....instrumentation tools, trace formats and techniques that help insure trace quality, and measurements to establish that address traces reflect true system behavior. Traces from all three systems have already been applied to numerous problems in memory system and software design research [5, 7, 8, 9, 18]. ....

J. Bradley Chen, Anita Borg, and Norman P. Jouppi. A Simulation Based Study of TLB Performance. The Proceedings of the 19th Annual International Symposium on Computer Architecture, May, 1992, pp. 114-123.


Legba: Fast Hardware Support for Fine-Grained Protection - Wiggins, Winwood, Tuch.. (2003)   (1 citation)  (Correct)

No context found.

J. Bradley Chen, Anita Borg, and Norman P. Jouppi. A simulation based study of TLB performance. In Proc. 19th ISCA. ACM, 1992.


Legba: Fast Hardware Support for Fine-Grained Protection - Wiggins, Winwood, Tuch.. (2003)   (1 citation)  (Correct)

No context found.

J. Bradley Chen, Anita Borg, and Norman P. Jouppi. A simulation based study of TLB performance. In Proc. 19th ISCA. ACM, 1992.


A Selectively Accessing TLB for High Performance and Lower.. - Consumption Jung-Hi Min (2002)   (Correct)

No context found.

Anita Borg , J. Bradley Chen, Norman P. Jouppi,"A Simulation Based Study of TLB Performance," In ISCA, 1991.


Instruction History Management for High-Performance Microprocessors - Bhargava (2003)   (Correct)

No context found.

J. B. Chen, Anita Borg, and N. P. Jouppi. A simulation based study of tlb performance. In 19th International Symposium on Computer architecture, pages 114 -- 123, April 1992.


Transparent Operating System Support for Superpages - Navarro (2002)   (4 citations)  (Correct)

No context found.

J. B. Chen, A. Borg, and N. P. Jouppi. A simulation based study of TLB performance. In Proceedings the 19th Annual International Symposium on Computer Architecture, pages 114--123, Gold Coast, Australia, May 1992.


A Survey on the Interaction between Caching, Translation and.. - Wiggins (2003)   (Correct)

No context found.

J. Bradley Chen, Anita Borg, and Norman P. Jouppi. A simulation based study of TLB performance. In Proceedings of the 19th International Symposium on Computer Architecture (ISCA). ACM, 1992.


Level Two Translation Lookaside Buffers - Callaghan, Hoque, Rotenberg (1995)   (Correct)

No context found.

J. Bradley Chen, Anita Borg, and Norman P. Jouppi. A Simulation Based Study of TLB Performance. In The 19th Annual International Symposium on Computer Architecture. May, 1992.


Options for Dynamic Address Translation in COMAs - Qiu, Dubois (1998)   (7 citations)  (Correct)

No context found.

J. Bradley Chen and Anita Borg. "A Simulation Based Study of TLB Performance," In Proceedings of the 19th Annual International Symposium on Computer Architecture(ISCA), pages 114-123, May 1992.


Virtual Memory Support for Multiple Pages - Khalidi, Talluri, Nelson, Williams (1993)   (1 citation)  (Correct)

No context found.

Chen, J. Bradley, Anita Borg, and Norman P. Jouppi. "A Simulation Based Study of TLB Performance." Proceedings of the 19th Annual International Symposium on Computer Architecture (May 1992): 114--123.


Virtual Memory Support for Multiple Page Sizes - Khalidi, Talluri, Nelson.. (1993)   (21 citations)  (Correct)

No context found.

J. Bradley Chen, Anita Borg, and Norman P. Jouppi, "A Simulation Based Study of TLB Performance", Proceedings of the 19th Annual International Symposium on Computer Architecture, pp. 114-123, May 1992.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC