| J. Bradley Chen, Anita Borg, and Norman P. Jouppi. A simulation based study of TLB performance. In Proc. 19th ISCA. ACM, 1992. |
....and since the TLB is on the processor core, TLB capacity is generally limited to, at most, a few hundred entries. Consequently, TLB coverage is inherently limited, and would be further degraded by smaller page sizes. The inadequate coverage of modern TLBs has been highlighted by several studies [23 26]. Several attempts have been made to address this, including super pages [22] sub blocking [27] in memory translation [28] virtually addressed memory hierarchies [29,30] in cache translation [31] and even software managed address translation [32] However, all these studies focused on ....
J. Bradley Chen, Anita Borg, and Norman P. Jouppi. A simulation based study of TLB performance. In Proc. 19th ISCA. ACM, 1992.
....of reducing the consumption of TLB entries (or increase TLB coverage) in scenarios where sharing of pages is significant. TLB coverage has been identified as a potential bottleneck in system performance, with TLB miss handling overheads of 20 40 reported even on single tasked benchmarks [CBJ92,HH93,Tal95,KS02] although generally with software loaded TLBs) Linux presently only supports the short format VHPT. This means it cannot support sharing of TLB entries, which is supported by the architecture only with the long VHPT format. The long format is also required to support mixing ....
J. Bradley Chen, Anita Borg, and Norman P. Jouppi. A simulation based study of TLB performance. In Proceedings of the 19th International Symposium on Computer Architecture (ISCA). ACM, 1992. 3
....which can be associated with each process and are used in addition to the virtual page number to match an entry in the TLB. This negates the need for a TLB flush per context switch but still requires duplicate TLB entries for essentially same mappings in different contexts processes. Past studies [1,2] have shown that TLB handling costs can take up a significant part of an application s processing time. TLB coverage is one of the major factors in determining the TLB miss rate and hence the impact of TLB costs on application performance [2] TLB miss handling overhead is turning into a ....
....in different contexts processes. Past studies [1,2] have shown that TLB handling costs can take up a significant part of an application s processing time. TLB coverage is one of the major factors in determining the TLB miss rate and hence the impact of TLB costs on application performance [2]. TLB miss handling overhead is turning into a bottleneck with real memory sizes increasing at a rapid rate, while TLB sizes are remaining essentially constant [11] TLB coverage refers to how much memory the TLB can map. This in turn is directly related to the number of TLB entries and the page ....
J. Bradley Chen, Anita Borg, and Norman P. Jouppi. A simulation based study of TLB performance. In Proceedings of the 19th International Symposium on Computer Architecture (ISCA). ACM, 1992.
....recognized, the TLB has been the target for several optimizations to reduce access latency, miss rates and miss handling overheads. With regard to TLB structures themselves, there have been investigations on suitable organizations in terms of size, associativities and multi level organizations [31, 23, 6]. Superpaging is a concept that has been proposed to boost TLB coverage. Hardware and software techniques for supporting this mechanism have come under a lot of scrutiny [31, 32, 11] Most prior work in TLB optimizations has targeted lowering miss rates or miss handling costs. It is only recently ....
....[31, 15, 23] have looked at hardware TLB structures organization and their impact on system performance in terms of capacity and or associativity. While some of these have focussed on single (monolithic) TLBs, there have been studies which have investigated the benefits of multi level TLBs [6, 2]. There are also implementations of multi level TLBs in commercial processors such as MIPS R4000, Hal s SPARC64, IBM AS 400 PowerPC, We would like to differentiate between the terms software TLB management and software TLB handling in this paper. We use the latter to denote that the miss ....
[Article contains additional citation context not shown here]
J. B. Chen, A. Borg, and N. P. Jouppi. A Simulation Based Study of TLB Performance. In Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 114--123, 1992.
....online software policies for dynamically remapping pages to improve cache performance [6, 39] Competitive algorithms have been used to help increase the efficiency of other operating system functions and resources, including paging, synchronization, and file cache management. Chen et al. [11] report on the performance effects of various TLB organizations and sizes. Their results indicate that the most important factor for minimizing the overhead induced by TLB 30 misses is reach, the amount of address space that the TLB can map at any instant in time. Even though the SPEC benchmarks ....
J. B. Chen, A. Borg, and N. P. Jouppi. A simulation based study of TLB performance. In Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 114--123, May 1992.
.... software policies for dynamically remapping pages to improve cache performance [3, 23] Competitive algorithms have been used to help increase the efficiency of other operating system functions and resources, including paging [26] synchronization [12] and file cache management [4] Chen et al. [6] report on the performance effects of various TLB organizations and sizes. Their results indicate that the most important factor for minimizing the overhead induced by TLB misses is reach, the amount of address space that the TLB can map at any instant in time. Even though the SPEC benchmarks they ....
....by prefetching entries (again, in software [2] or hardware [25] All of these approaches can be improved by exploiting superpages. Most commercial TLBs support superpages, and have for several years [16, 28] but more research is needed into how best to make general use of them. Chen et al. [6] suggest the possibility of using variable page sizes to improve TLB reach, but do not explore the implications of their use. Khalidi [13] and Mogul [17] discuss the benefits of systems that support superpages, and advocate static allocation via compiler or programmer hints. Talluri et al. 18] ....
J. B. Chen, A. Borg, and N. P. Jouppi. A simulation based study of TLB performance. In Proc. of the 19th ISCA, pp. 114--123, May 1992.
....made independent of the block size, then a fully associative cache with large blocks would be very effective. This is supported by the fact that Translation Look aside Buffers (TLBs) which cache page (1KB 8KB) translations, have very high hit rates even though they have very few entries (32 128) [5], when compared to data caches. A technique for reducing cache block transfer time and thereby the miss penalty, is to place the SRAM cache and the DRAM memory on the same IC. By doing this, the internal bandwidth available within the IC can be used to transfer the data between the cache and the ....
J. Bradley Chen, Anita Borg and Norman P. Jouppi, "A Simulation Based Study of TLB Performance", WRL Research Report 91/2, Digital Western Research Laboratory, CA, May 1992, http://www.research.digital.com:80/wrl/publications/pubslist.html.
....less than 5 [CE85] of overall runtime. However in recent studies, miss handling is not unknown to contribute 40 of application runtime [HH93] Various methods have been proposed to combat increasing TLB miss ratios. Associativity trade offs and changes [NUS 93, CLK97] micro TLBs [CBJ92] variable page sizes [CBJ92, TKHP92, KTNW93, ROKB95] and subblocking [Tal95] have been examined and incremental improvements made in effective TLB coverage. TLBs have been removed altogether in some experimental systems [WEG 86, CSD86, JM97] which perform address translation in the cache; ....
....of overall runtime. However in recent studies, miss handling is not unknown to contribute 40 of application runtime [HH93] Various methods have been proposed to combat increasing TLB miss ratios. Associativity trade offs and changes [NUS 93, CLK97] micro TLBs [CBJ92] variable page sizes [CBJ92, TKHP92, KTNW93, ROKB95] and subblocking [Tal95] have been examined and incremental improvements made in effective TLB coverage. TLBs have been removed altogether in some experimental systems [WEG 86, CSD86, JM97] which perform address translation in the cache; however, the tech CHAPTER 4. ....
[Article contains additional citation context not shown here]
J. Bradley Chen, Anita Borg, and Norman P. Jouppi. A simulation based study of TLB performance. In 19th International Symposium on Computer Architecture, May 1992.
....sets of current and future workloads. The case that current virtual memory techniques will no longer be valid for the next generation of machines has been argued by others [26] Several ideas have been introduced to increase TLB reach. For example, trading associativity for increasing TLB entries [4, 14, 16] to increase TLB reach (address coverage) using the same silicon resources, multiple page sizes [14, 26, 18] to reduce the number of contiguously mapped pages and subblocking [25] to have more translations 1 held for each TLB cell. The design tradeoffs of silicon resource requirements, TLB access ....
J B Chen, A Borg and N P Jouppi. A simulation based study of TLB performance. In Proc. 19th ISCA, pages 114--123, 1992.
....along with application I O requests. Understanding the nature of application I O requests can drive I O cache performance improvements or application I O optimizations. A. 1 Trace Collection A new version of the WRL tracing facilities collected traces on DECstation 5000 s running ULTRIX [4, 5]. Its kernel based approach traces all processes. The modified system logs system call information in a physically mapped trace buffer. On an I O system call, the call type, process ID, and call parameters are entered in the buffer. On return from the system call, the return value, error status ....
J. B. Chen, A. Borg, and N. P. Jouppi. A simulation based study of TLB performance. In The 19th Annual International Symposium on Computer Architecture, pages 114--123. IEEE Computer Society Press, May 1992.
.... software policies for dynamically remapping pages to improve cache performance [3, 21] Competitive algorithms have been used to help increase the eciency of other operating system functions and resources, including paging [24] synchronization [14] and le cache management [5] Chen et al. [7] report on the performance e ects of various TLB organizations and sizes. Their results indicate that the most important factor for minimizing the overhead induced by TLB misses is reach, the amount of address space that the TLB can map at any instant in time. Even though the SPEC benchmarks they ....
....latency by prefetching entries (in software [2] or hardware [23] All of these approaches can be improved by exploiting superpages. Most TLBs now support superpages, and have for several years [16, 27] but more research is needed into how best to make general use of this capability. Chen et al. [7] suggest the possibility of using variable page sizes to improve TLB reach, but do not explore the implications of their use. Khalidi et al. [15] and Mogul [17] discuss bene ts of systems that support superpages, advocating static allocation via compiler or programmer hints. Talluri et al. 18] ....
J. B. Chen, A. Borg, and N. P. Jouppi. A simulation based study of TLB performance. In Proc. of the 19th ISCA, pp. 114-123, May 1992.
....by the TLB, a TLB miss has occurred and the mapping is resolved by referring to the data structures of the operating system. Current estimates for the cost of handling a TLB miss are between 20 and 40 instructions [10] but the overhead has been predicted to grow to 100 instructions in the future [4]. Currently TLBs contain from tens up to a few hundred TLB entries, each covering the mapping of one virtual memory page, typically 4 or 8 kB. A TLB can thus map at least several hundred kilobytes, often even megabytes. This is enough to cover all, or at least the most used part, of the working ....
J.B. Chen, A. Borg, and N. Jouppi, A Simulation Based Study of TLB Performance, Proceedings of the 19th Annual International Symposium on Computer Architecture, May 1992, pp. 114-123.
....by the TLB, a TLB miss has occurred and the mapping is resolved by referring to the data structures of the operating system. Current estimates for the cost of handling a TLB miss are between 20 and 40 instructions [10] but the overhead has been predicted to grow to 100 instructions in the future [4]. Currently TLBs contain from tens up to a few hundred TLB entries, each covering the mapping of one virtual memory page, typically 4 or 8 kB. A TLB can thus map at least several hundred kilobytes, often even megabytes. This is enough to cover all, or at least the most used part, of the working ....
J.B. Chen, A. Borg, and N. Jouppi, A Simulation Based Study of TLB Performance, Proceedings of the 19th Annual International Symposium on Computer Architecture, May 1992, pp. 114-123.
....application I O requests can drive I O cache performance improvements or application I O optimizations. This requires tracing application I O requests and including information about each request. Anew version of the WRL tracing facilities collected the traces on DECstation 5000 s running ULTRIX [3, 4]. Its kernel based approach traces all processes. Figure 1 shows the system configuration. The original system was designed primarily to study processor memory issues in a multi programmingenvironment. The modified system logs system call information, rather than instruction and data addresses, in ....
J. B. Chen, A. Borg, and N. P. Jouppi. A simulation based study of TLB performance. In The 19th Annual International Symposium on Computer Architecture, pages 114--123. IEEE Computer Society Press, May 1992.
....Note that failures on write backs, which would be more difficult to handle since the processor is not awaiting a response, cannot happen the OS is required to flush the dirty data back to memory before swapping a page to disk and removing the corresponding mapping. 5. Related Work Chen et al.[4] report the performance effects of various TLB organizations and sizes. Their results agree with our premise that the most important factor for minimizing the overhead induced by TLB misses is TLB reach. They studied several SPECmarks programs, which have much smaller memory requirements than our ....
J. B. Chen, A. Borg, and N. P. Jouppi. A simulation based study of tlb performance. In Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 114--123, May 1992.
....this amounts to 322500 rbe or less than the cost of one 64kB cache. Thus, we see that by removing one processor and its cache, we can increase the TLB in each of the remaining 15 processors to 512 entries. This should reduce the amount of TLB misses by over 50 from that of a 32 entry TLB[3]. In order to get a lower bound on the impact of this architectural change on performance, we use a technique called trace modification to conservatively modify the trace to emulate a trace taken from a processor with a 512 entry TLB. To accomplish this, we use a program that marks the TLB misses ....
J.B Chen, A. Borg, and N.P. Jouppi, "A Simulation Based Study of TLB Performance, " 19th International Symposium on Computer Architecture, pp. 114-123, May 1992.
....to minimise misses and increase coverage. 4.2.1 Software TLB management improvements can be divided into optimising replacement and placement of TLB entries, and minimising refill times. The likely improvements in performance that can be achieved by replacement policy are minor [UNS 94, CBJ92] Hardware designers typically choose random replacement, as the cost involved in implementing other policies in hardware outweigh the performance benefits to be gained. Performance improvements through software involvement in replacement policy are doubtful. The costs of software misses are a ....
....4.2.2 Hardware Currently, much research effort has been directed towards increasing TLB coverage, the focus being on increasing the number of TLB entries, increasing the page size, or using multiple page sizes. Increasing the number of TLB entries is an obvious solution, as illustrated by Chen [CBJ92] However, large fully associative structures are difficult to build. Reducing the associativityto increase the number of entries is a valid technique [UNS 94, TH94] However, it is not clear whether the number of entries could be increased sufficiently to cover a significant proportion of ....
[Article contains additional citation context not shown here]
J. Bradley Chen, Anita Borg, and Norman P. Jouppi. A simulation based study of TLB performance. In 19th International Symposium on Computer Architecture, May 1992.
No context found.
Chen, J. B., Borg, A. and Jouppi, N. P. A simulation based study of TLB performance. In Proceedings of the 19th Annual International Symposium on Computer Architecture, Gold Coast, Australia, IEEE, 114-123, 1992.
No context found.
J. Bradley Chen, Anita Borg, and Norman P. Jouppi. A simulation based study of TLB performance. In Proc. 19th ISCA. ACM, 1992.
No context found.
J. Bradley Chen, Anita Borg, and Norman P. Jouppi. A simulation based study of TLB performance. In Proc. 19th ISCA. ACM, 1992.
No context found.
Anita Borg , J. Bradley Chen, Norman P. Jouppi,"A Simulation Based Study of TLB Performance," In ISCA, 1991.
No context found.
J. B. Chen, Anita Borg, and N. P. Jouppi. A simulation based study of tlb performance. In 19th International Symposium on Computer architecture, pages 114 -- 123, April 1992.
No context found.
J. B. Chen, A. Borg, and N. P. Jouppi. A simulation based study of TLB performance. In Proceedings the 19th Annual International Symposium on Computer Architecture, pages 114--123, Gold Coast, Australia, May 1992.
No context found.
J. Bradley Chen, Anita Borg, and Norman P. Jouppi. A simulation based study of TLB performance. In Proceedings of the 19th International Symposium on Computer Architecture (ISCA). ACM, 1992.
No context found.
J. Bradley Chen, Anita Borg, and Norman P. Jouppi. A Simulation Based Study of TLB Performance. In The 19th Annual International Symposium on Computer Architecture. May, 1992.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC