| Intel Corporation, "Pentium Processor User's Manual,". |
....followed by 16 2 length one instructions) All the length two instructions in the X tests could be fully length decoded using only the first byte. Test I0 consists of eight length two instructions containing ModR M length information in the second byte, complicating length calculation [15]. A noise problem with some instructions resulted in a violation of the setup time at the length decoder inputs, so we opted to use a single cache line for testing all instructions. The single cache line is repeatedly read from the head of the input FIFO, keeping the FIFO loop off the critical ....
Intel Corporation, "Pentium Processor User's Manual,".
....and thus would as well occupy TLB entries when the address space would not be switched. 2 The x86 Memory Model The reader should be familiar with the segmentation model and the paging mechanisms of the x86 processor family. A detailed description can be found in the reference manuals of the 486 [Intel Corp. 1990] and Pentium processor [Intel Corp. 1993] Segmentation and paging does not differ significantly on both processors. 9 3 Address Space Switch on 486 L4 (as L3 [Liedtke 1993] favours a flat memory model. On this processor, the virtual address space has a size of 4 GB. It consists of 3.5 ....
....when the address space would not be switched. 2 The x86 Memory Model The reader should be familiar with the segmentation model and the paging mechanisms of the x86 processor family. A detailed description can be found in the reference manuals of the 486 [Intel Corp. 1990] and Pentium processor [Intel Corp. 1993]. Segmentation and paging does not differ significantly on both processors. 9 3 Address Space Switch on 486 L4 (as L3 [Liedtke 1993] favours a flat memory model. On this processor, the virtual address space has a size of 4 GB. It consists of 3.5 GB of user space and 0.5 GB of kernel space, see ....
[Article contains additional citation context not shown here]
Intel Corp. 1993. Pentium Processor User's Manual, Volume 3: Architecture and Programming Manual. Intel Corp.
....for a bus based multiprocessor system. The MESI (Modified Exclusive Shared Invalid) write invalidate cache coherence protocol with the write back policy is considered as the basis in the following discussion. With minor variations, this protocol has been implemented in several commercial systems [Intel, 1994, Greenley et al. 1995, Levitan et al. 1995] In this protocol, each cache line has an associated MESI state recorded in the cache directory (also called cache tag array) The definitions of the four states are given in Table 1. When a memory request from either the processor or the snooping bus ....
Intel Corp. 1994. Pentium Processor User's Manual, Vol. 1, & 2. Intel Corp. Order Number 241428, and 241429.
....in various ways. One obvious way is the use of double speed memories as implemented in the IBM Power 2 [SmWe94] and another is to use an interleaved structure in order to access multiple banks simultaneously provided there is no conflict. This latter solution is implemented in the Intel Pentium [Inte93] and the SGI TFP [Hsu94] In the Dec Alpha 21164, 2 mirrored banks are implemented in order to feature 2 load ports [Dec 95] Consequently, no access can be issued simultaneously with a store. As caches are not modeled, we do not attend to a specific cache structure. Thus, the only issuing ....
Intel , "Pentium Processor User's Manual", 1993
....state of having transferred some data and about to start an IPC to transfer more. The API for Fluke system calls is directly analogous to the interface of machine instructions that operate on large ranges of memory, such as the block move and string instructions on machines such as the Intel x86 [21]. The buffer addresses and sizes used by these instructions are stored in registers, and the instructions advance the values in these registers as they work. When the processor takes an interrupt or page fault during a string instruction, the parameter registers in the interrupted processor state ....
....kernels, and can introduce additional overhead on the kernel entry exit paths, especially on architectures with the process model bias discussed above. This is because such processors behave differently in a trap or interrupt depending on whether the interrupted code was in user or supervisor mode [21]; therefore each trap or interrupt handler in the kernel must now determine whether the interrupted code was a user thread, a process model kernel thread, or the interrupt model core kernel itself, and react appropriately in each case. In addition, the process model stacks of kernel threads on ....
Intel Corp. Pentium Processor User's Manual, volume 3. Intel, 1993.
....executing, therefore referencing data in an alternate address space. There is direct support for large address spaces, as processes have free run of a 64 bit space. Superpages are supported in the UltraSPARC by a variable page size of 8KB, 64KB, 512KB, or 4MB, set in the TLB entry. 23 Pentium [6] The Pentium (Fig. 15) does not provide address space protection directly; there are no address space identifiers and the TLBs are typically flushed on context switch. The caches are physically tagged, and therefore do not need flushing. Protection is usually provided indirectly through ....
Intel. Pentium Processor User's Manual. Intel Corporation, Mt. Prospect IL, 1993.
....Hitachi S 3800 [KIS 94] this solution may not be practical for caches. Increasing the number of banks also implies deeper crossbar and thus longer cache access time, even though smaller banks are faster and could partially compensate for cache access time degradation. The Pentium data cache [Int93] has 8 banks, while the MIPS R8000 [MIP94] uses a dual banked data cache. Thus, to design a data cache that can accept multiple requests simultaneously there are many cost performance tradeoffs to address. The purpose of this study is to expose these tradeoffs for different cache structures and ....
....which corresponds to the number of data paths between the processor and the cache, and the number of cache ports or cell ports which corresponds to the number of ports of each cache SRAM cell. An n bank cache may have only p processorto cache ports with p n. For example, the Pentium data cache [Int93] has 2 processor to cache ports, 8 banks and 1 cell port, i.e. n = 8 and p = 2. It is assumed that a p port cache is always associated with a processor having p processor to cache ports. Benchmarks Ten benchmarks were used for the study, five SPECint92 codes and five SPECfp92 codes listed ....
[Article contains additional citation context not shown here]
Intel. Pentium Processor User's Manual, 1993.
....accesses per cycle to this cache space. On a smaller scale, the same issue has already been raised for superscalar processors. Up to now, different forms of multi banking and multi porting have been used. Two banked caches (odd even) are used in the MIPS TFP [14] and T5 [23] Intel Pentium [15] and HP PA 8000 [20] allowing two simultaneous accesses to two different cache banks. Data replication is used in the DEC 21164 (two identical banks: stores are serialized to preserve coherence) and virtual multi porting in the Power2 [19] the SRAM access time is half the processor clock) allows ....
....in the DEC 21164 (two identical banks: stores are serialized to preserve coherence) and virtual multi porting in the Power2 [19] the SRAM access time is half the processor clock) allows two cache accesses in a single processor clock cycle. True multi porting has been used in the Intel Pentium s [15] and IBM Power2 s D TLBs, using true dual ported memory. Actually, among all these solutions, it seems that only the multi bank concept can scale up with the number of simultaneous cache accesses per cycle. For n ports, an SRAM with a clock 1 n th the processor clock is unlikely if n is large. ....
Intel Corporation. Pentium Processor User's Manual, 1993.
....for caches. Increasing the number of banks also implies deeper crossbar and thus longer cache access time, even though smaller banks are faster and could partially compensate for cache access time degradation. The MIPS R8000 [MIP94a] uses a dual banked data cache, while the Pentium data cache [Int93] is an 8 bank cache. Thus, to design a data cache that can accept multiple requests simultaneously there are many cost performance tradeoffs to address. The purpose of this study is to expose these tradeoffs for different cache structures and to provide performance hints for each structure. In ....
....corresponds to the number of data paths between the processor and the cache, and the number of cache ports or cell ports which corresponds to the number of ports of each cache SRAM cell. An n bank cache may have only p processor to cache ports with p n. For example, the Pentium data cache [Int93] has 2 processor to cache ports and 8 banks and 1 cell port, i.e. n = 8 and p = 2. It is assumed that an p port cache is always associated with a processor having p processor to cache ports. Benchmarks Ten benchmarks were used for the study, five SPECint92 codes and five SPECfp92 codes ....
[Article contains additional citation context not shown here]
Intel. Pentium Processor User's Manual, 1993.
.... architectures the hardware support for memory management is unnecessarily complicated, places constraints on the operating system, and often frustrates porting efforts [37] For example, the Intel Pentium Processor User s Manual devotes 100 of its 700 pages to memory management structures [31], most of which exist for backward compatibility and are unused by today s system software. Typical virtual memory systems exact a run time overhead of 5 10 [4, 9, 41, 47] an apparently acceptable cost that has changed little in ten years [14] despite significant changes in cache sizes and ....
....mechanisms associated with memory management that computer users have come to expect. These are found in nearly every modern microarchitecture and operating system (e.g. UNIX [3] Windows NT [15] OS 2 [16] 4. 3 BSD [34] DEC Alpha [17, 46] MIPS [23, 32] PA RISC [25] PowerPC [29, 39] Pentium [31], and SPARC [52] and include the following: Address space protection. User level applications should not have unrestricted access to the data of other applications or the operating system. A common hardware assist uses address space identifiers (ASIDs) which extend virtual addresses and ....
[Article contains additional citation context not shown here]
Intel. Pentium Processor User's Manual. Intel Corporation, Mt. Prospect IL, 1993.
....switch between G and its scheduler and G 0 and its scheduler. To seamlessly switch, we introduce the notion of virtual processor. The general concept of a virtual processor is borrowed from IBM s VM operating system [71] and more recently, the v86 mode in Intel 80x86 Pentium processors [35]. Each virtual processor maintains its own set of schedulers, tasks, and task instances. Whenever P is to execute for some length of simulated time during the second pass, it must know which virtual processor it should use. Choosing the correct virtual processor is not difficult, since the ....
Intel Corporation. Pentium Processor User's Manual, 1993.
....followed by 16 2i length one instructions) All the lengthtwo instructions in the Xi tests could be fully length decoded using only the first byte. Test I0 consists of eight length two instructions containing ModR M length information in the second byte, complicating length calculation [15]. A noise problem with some instructions resulted in a violation of the setup time at the length decoder inputs, so we opted to use a single cache line for testing all instructions. The single cache line is repeatedly read from the head of the input FIFO, keeping the FIFO loop off the critical ....
Intel Corporation, Pentium Processor User's Manual.
....followed by 16 2i length one instructions) All the lengthtwo instructions in the Xi tests could be fully length decoded using only the first byte. Test I0 consists of eight length two instructions containing ModR M length information in the second byte, complicating length calculation [15]. A noise problem with some instructions resulted in a violation of the setup time at the length decoder inputs, so we opted to use a single cache line for testing all instructions. The single cache line is repeatedly read from the head of the input FIFO, keeping the FIFO loop off the critical ....
Intel Corporation, Pentium Processor User's Manual.
....GIPS. The sequence of arrows in the picture illustrates the 3:6 GIPS instruction flow through the Tag Units for a typical scenario with 5 length 3 instructions. 3. 1 RAPPID Architecture The RAPPID test chip implements instruction length decoding for the Pentium R fl Processor instruction set [9]. Analysis showed that instructions longer than seven bytes are rare, and that certain instructions appear with much higher frequency in common programs. Our asynchronous design optimized for these common cases at the expense of unoptimizing rare instructions. Instructions longer than seven bytes ....
Intel Corporation. Pentium Processor User's Manual.
No context found.
Intel Corp. 1993. Pentium Processor User's Manual, Volume 3: Architecture and Programming Manual. Intel Corp.
No context found.
Intel Corp. 1993. Pentium Processor User's Manual, Volume 3: Architecture and Programming Manual.Intel Corp.
No context found.
Intel Corp. 1993. Pentium Processor User's Manual, Volume 3: Architecture and Programming Manual. Intel Corp.
No context found.
Intel Corp. Intel Corp. 1993. Pentium Processor User's Manual, Volume 3: Architecture and Programming Manual. Intel Corp.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC