49 citations found. Retrieving documents...
Intel Corp. Pentium Pro Family Developer's Manual, 1996. Volume 1: Speci cations.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Speculation-Based Techniques for Lockfree Execution of Lock-Based .. - Rajwar (2002)   (Correct)

....these locks demonstrate 100 a simple silent store pair pattern (Section 3.6.2) By doing so, SLE does not rely on the knowledge of locks and can use silent store pair predictors The simplicity and portability of test test set locks make them quite popular. Hardware architecture manuals recommend [28, 31, 54, 73] and database vendors are advised [83] to use these simple locks as portable locking mechanisms. The POSIX threads standard recommends synchronization be implemented in library calls such as pthread mutex lock( and these calls implement the test set or test test set locks. While test set based ....

Intel Corporation. Pentium Pro Family Developer's Manual, Volume 3: Operating System Writer's Manual, January 1996.


Scalability and Resource Usage of an OLAP Benchmark on.. - Taufer, Stricker, Weber (2002)   (Correct)

....rather frequently) The prototype is based on LINUX proc file mechanism to write performance data. However, the information available in the proc file are significantly extended. The framework of performance modeling framework uses the hardware performance counters of the Pentium processor [10] that are made accessible through a library we have implemented ourselves. All information for performance analysis is gathered at every node of our distributed system in parallel and the performance modeling framework puts it all together into a global view of the parallel system. The monitoring ....

Intel Corporation. Pentium Pro Family Developer's Manual, 1996.


The QoSbox: A PC-Router for Quantitative Service.. - Christin, Liebeherr (2001)   (3 citations)  (Correct)

.... to account for possible clock skews, and using microtime( generates the overhead of the system call (approximately 450 nanoseconds on a PentiumPro 200 MHz [10] A more efficient solution is to directly read the timestamp counter (TSC) register available in the Pentium series processors [14], and compatible architectures such as recent AMD processors (e.g. Athlon) This register is an unsigned 64 bit precision integer, and gives the number of cycles elapsed since the machine has been turned on. Thus, the resolution of the counter is much finer than that provided by microtime( A ....

Intel Corporation. Pentium Pro Family Developer's Manual. Volume III: Operating System Writer's Guide. 1995.


Speculative Lock Elision: Enabling Highly Concurrent.. - Rajwar, Goodman (2001)   (16 citations)  (Correct)

....TEST SET operation. An example implementation of the TEST TEST SET sequence is shown in Figure 2. While numerous lock constructs, both hardware and software, have been proposed, the simplicity and portability of TEST TEST SET locks make them quite popular. Hardware architecture manuals recommend [8, 10, 33, 18], and database vendors are advised [22] to use these simple locks as a portable locking mechanism (of course, a few other software primitives are also used when circumstances dictate their use) The POSIX threads standard recommends synchronization be implemented in library calls such as ....

Intel Corporation. Pentium Pro Family Developer's Manual, Volume 3: Operating System Writer's Manual, January 1996.


Correctly Implementing Value Prediction in.. - Martin, Sorin.. (2001)   (3 citations)  (Correct)

....SC is straightforward, because the existing mechanisms are already sufficient. In the next section, we will explore the ramifications of adding value prediction to systems that exploit relaxed consistency models. 3 Value Prediction Relaxed Memory Models Many common instruction set architectures [18, 19, 20, 30, 33] do not require the strict semantics of sequential consistency. These systems are said to implement relaxed memory consistency models. Relaxed memory models allow the hardware to potentially employ optimizations such as store queues and write buffers, and they can simplify the implementation of ....

....store buffers by relaxing the order from a thread s write to its subsequent reads. The other class, generally referred to as weakly ordered models, allows much more reordering of reads and writes. 3. 1 Processor Consistency PC models, such as SPARC Total Store Order (TSO) 33] and IA 32 [19], allow relaxation of the order from a thread s write to its subsequent reads. Since PC models do not allow relaxation of read to read program order, simple implementations must, in our example, execute r1 and r2 in program order. If, on the other hand, a more sophisticated implementation allows ....

Intel Corporation. Pentium Pro Family Developer's Manual, Volume 3: Operating System Writer's Manual, Jan. 1996.


The Impulse Memory Controller - Lixin Zhang Zhen (2001)   (4 citations)  (Correct)

....will increase in future microprocessors. Proposed solutions to this growing TLB performance bottleneck range from changing the TLB structure to retain more of the working set (e.g. multi level TLB hierarchies [1, 16] to implementing better management policies (in software [21] or hardware [20]) to masking TLB miss latency by prefetching entries (again, in software [4] or hardware [41] All of these approaches can be improved by exploiting superpages. Most commercial TLBs support superpages, and have for several years [30, 43] but more research is needed into how best to make ....

Intel Corporation. Pentium Pro Family Developer's Manual, Jan. 1996.


Quantifying the Impact of Architectural Scaling on.. - Heath, Kaur, Martin.. (2001)   (1 citation)  (Correct)

....over a switched network. All experiments were run on Linux 2.2.12. 2.3. Characterizing Performance After casting our communication systems into the LogGP model, we break down the LogGP parameters into their architectural costs. Our approach is to use the processor s hardware event counters [24] to charge various hardware events to each parameter of the LogGP model. Specifically, we measure the following events: The number of instructions decoded. 3 P M P M P M Interconnection network P (processors) L (latency) g (gap) limited capacity (L g to or from a proccessor) o ....

Intel Corporation, Santa Clara, CA. Pentium Pro family developer's manual, volume 3: Operating system writer's manual, 1996. Order number 242692.


Source Code Instrumentation and its Perturbation Analysis in.. - Um Ii May   (Correct)

....RDMSR and WRMSR instructions to program and reset the counters. Therefore, it is necessary to write some code that runs in kernel mode. For maximum flexibility and ease of installation, we wrote a loadable device driver module rather than modifying the booted kernel. See section 10.6. 1 of [8] for PerfEvtSelx programming details and appendix A of [7] for the description of the events available for counting. 1 after a kernel mode instruction sets a model specific register bit. 5 3.3 Optimized Instrumentation In this section we will describe a few steps we have taken in order to ....

Intel Corporation. Pentium pro family developer's manual, order number 242692.


Quantifying the Impact of Architectural Scaling on.. - Heath, Kaur, Martin.. (2001)   (1 citation)  (Correct)

....driver over a switched network. All experiments were run on Linux 2.2.12. 2.3 Characterizing Performance After casting our communication systems into the LogGP model, we break down the LogGP parameters into their architectural costs. Our approach is to use the processor s hardware event counters [24] to charge various hardware events to each parameter of the LogGP model. Specifically, we measure the following events: # The number of instructions decoded. # The number of external bus acceses to memory space 1 . 1 The memory bus is only critical when the cache is not large enough to hold ....

Intel Corporation, Santa Clara, CA. Pentium Pro family developer 's manual, volume 3: Operating system writer's manual, 1996. Order number 242692.


The Click Modular Router - Kohler (2000)   (64 citations)  (Correct)

....about 400,000 64 byte packets per second even under loads much higher than those we tested. 8.4 cpu time breakdown Table 8.1 breaks down the CPU time cost of forwarding a packet through the baseline Click IP router of Figure 5.1. Costs were measured in nanoseconds by Pentium III cycle counters [23]. Each measurement is the accumulated cost for all packets in a 10 second run divided by the number of packets forwarded. These measurements are larger than the true values as using Pentium III cycle counters has significant cost. Most of the tasks performed by a Click router s CPU are included in ....

Intel Corporation. Pentium Pro Family Developer's Manual, Volume 3, 1996. http://developer.intel.com/design/pro/manuals.


Towards A Simplified Database Workload For Computer.. - Keeton, Patterson (2000)   (1 citation)  (Correct)

....only after all previous instructions have been retired, and all of the instruction s constituent ops have completed. The Pentium Pro retires up to three ops per clock cycle, yielding a theoretical minimum cycles per op (CPI) of 0.33. More information on the Pentium Pro can be found in [6] 11] [13] [21] 34] Measurements were performed using the Pentium Pro hardware counters [13] We present aggregate (user operating system) activity, factoring out the idle loop. On the uniprocessor, this technique is possible because NT implements the idle loop using the HALT instruction. The event ....

....constituent ops have completed. The Pentium Pro retires up to three ops per clock cycle, yielding a theoretical minimum cycles per op (CPI) of 0.33. More information on the Pentium Pro can be found in [6] 11] 13] 21] 34] Measurements were performed using the Pentium Pro hardware counters [13]. We present aggregate (user operating system) activity, factoring out the idle loop. On the uniprocessor, this technique is possible because NT implements the idle loop using the HALT instruction. The event counters are inactive during this idle loop, ensuring that we can reliably separate system ....

Intel Corporation.Pentium Pro family developer's manual, volume 3: Operating system writer's manual. Intel Corporation, 1996, Order number 242692.


Computer Architecture Support for Database Applications - Keeton (1999)   (3 citations)  (Correct)

....was measured for both Chapter 3 and Chapter 4. The I O subsystems of these configurations are shown in Figure 2 2 through Figure 2 4. 18 sors active. Finally, the number of outstanding bus transactions was varied by changing a BIOS parameter to limit the I O queue depth of the controller [50]. Ideally, we would like to limit each processor to a single outstanding bus transaction, to explore the effects of the non blocking L2 cache. The BIOS only allows us, however, to limit the overall system (in other words, all four processors) to a single outstanding bus transaction. Thus, we ....

....The Pentium Pro retires up to three ops per clock cycle, yielding a theoretical minimum cycles per op (CPI) of 0.33. Table 2 5 summarizes the characteristics of the Pentium Pro caches. More detailed descriptions of the Pentium Pro s architectural features can be found in [15] 25] 39] [50] [76] We will also present additional details in subsequent sections, when discussing our measurement results. 2.5.2. Potential Sources of Pentium Pro Stalls In practice, the 0.33 theoretical minimum CPI is seldom achieved, due to stalls from cache misses, oversubscription of certain resources, ....

[Article contains additional citation context not shown here]

Intel Corporation. Pentium Pro family developer's manual, volume 3: Operating system writer's manual. Intel Corporation, 1996, Order number 242692.


Reevaluating Online Superpage Promotion with Hardware Support - Zhen Fang Lixin (2001)   (3 citations)  (Correct)

....traps will increase in future microprocessors. Proposed solutions to this growing TLB performance bottleneck range from changing the TLB structure to retain more of the working set (e.g. multi level TLB hierarchies [1, 8] to implementing better management policies (in software [10] or hardware [9]) to masking TLB miss latency by prefetching entries (again, in software [2] or hardware [25] All of these approaches can be improved by exploiting superpages. Most commercial TLBs support superpages, and have for several years [16, 28] but more research is needed into how best to make ....

Intel Corporation. Pentium Pro Family Developer's Manual, Jan. 1996.


Commit-Reconcile Fences (CRF): A New Memory Model for.. - Shen, Arvind, Rudolph (1999)   (1 citation)  (Correct)

.... explicitly or not (e.g. programmer centric models [16, 5, 19] It is the task of the compiler to ensure that the semantics of a high level program is preserved when its compiled version is executed on an architecture with a certain low level memory model (e.g. architecture centric models [25, 18, 26, 14]) The essence of any memorymodel is the correspondencebetween each load instruction and the store instruction that supplies the value retrieved by the load. Unfortunately, at the architecture level, memory access operations often have some sophisticated implementation characteristics that make it ....

Intel, editor. Pentium Pro Family Developer's Manual, Volume 3: Operating System Writer's Manual. Intel Corporation, 1996.


Eager Writeback - a Technique for Improving Bandwidth.. - Lee, Tyson, Farrens (2000)   (4 citations)  (Correct)

....control packets and data packets can be pipelined and use separate busses. RDRAM address remapping [4] was modeled to reduce the rate of bank interference. The peak bandwidth that can be reached in our RDRAM model is 1.6GB sec. A simpli ed uncacheable write combining (or write coalescing) memory [2][3] was implemented as well for the purpose of correctly simulating our benchmark behavior. Whenever a data write to an uncacheable region results in an L1 cache miss, the write operation will immediately request access to the bus and drive data out to the system memory directly (skipping a ....

Intel Corporation. Pentium Pro Family Developer's Manual, volume 3: Operating System Writer's Manual. Intel Literature Centers, 1996.


Quantifying the Impact of Architectural Scaling on.. - Heath, Kaur, Martin.. (2001)   (1 citation)  (Correct)

....running the tulip driver. All experiments were run on Linux 2.2.12. 2.3 Characterizing Performance After casting our communication systems into the LogGP model, we break down the LogGP parameters into their architectural costs. Our approach is to use the processor s hardware event counters [24] to charge various hardware events to each parameter of the LogGP model. Speci cally, we measure the following events: 1 We expect to get machines with 750 MHz AMD Athlon CPUs and 133 MHz system bus soon so will have measurements for 5 clock speeds for the nal paper. 5 The number of ....

Intel Corporation. Pentium Pro family developer's manual, volume 3: Operating system writer's manual. Santa Clara, CA, 1996. Order number 242692.


Comparison of Memory System Behavior in Java and.. - Marden, Lu, Lai, Lipasti   Self-citation (Corporation)   (Correct)

....multiprocessor systems, using unmodified binary programs. To measure memory behavior, we wrote a memory system model that simulates a two level cache hierarchy and a cycle accurate multiprocessor split transaction bus. The bus protocols in our memory model are based on the Pentium II MESI protocol [7] and are tuned for characteristics of processors a few years in the future. Simics sends each memory request to our memory model, which analyzes the effects of the requests and sends the timing information back to Simics. To prevent our results from being skewed, the memory model detects and ....

Intel Corporation. Pentium Pro Family Devel- oper's Manual, Volume 1: Specification, 1996.


A Novel Renaming Scheme to Exploit Value Temporal.. - Jourdan, Ronen.. (1998)   (19 citations)  Self-citation (Corporation)   (Correct)

....which the processor maps logical registers into physical locations. Register renaming is used to remove register anti dependencies and output dependencies and to recover from control speculation. The basic register renaming mechanism is well known and widely used (e.g. Intel Pentium Pro Processor [Inte96]) This section presents the most advanced combined register renaming and dependencytracking scheme involving three structures: a Free List (FL) a Register Alias Table (RAT) and an Active List (AL) This scheme has been used in the MIPS R10000 and DEC 21264. The RAT maintains the latest ....

Intel Corporation, Pentium Pro Family Developer's Manual. Volume 2: Programmer's Reference Manual, 1996.


Minerva: An Adaptive Subblock Coherence Protocol for Improved .. - Rothman, Smith   (Correct)

No context found.

Intel Corp. Pentium Pro Family Developer's Manual, 1996. Volume 1: Speci cations.


Kimberly Keeton - David Patterson Yong   (Correct)

No context found.

Intel Corporation.Pentium Pro family developer's manual, volume 3: Operating system writer's manual. Intel Corporation, 1996, Order number 242692.


Three Extensions to Register Integration - Roth, Al. (2002)   (1 citation)  (Correct)

No context found.

Intel Corporation. Pentium Pro Family Developer's Manual, 1996.


Quantifiable Service Differentiation for Packet Networks - Christin (2003)   (Correct)

No context found.

Intel Corporation. Pentium Pro Family Developer's Manual. Volume III: Operating System Writer's Guide, 1995.


Efficient Remapping Mechanisms for an Adaptable Memory System - Zhang (2002)   (Correct)

No context found.

Intel Corporation. Pentium Pro Family Developer's Manual. Palo Alto, CA USA, Jan. 1996.


Supporting Time-Sensitive Applications on a Commodity OS - Ashvin Goel Luca (2002)   (5 citations)  (Correct)

No context found.

Intel Corporation, editor. Pentium Pro Family Developer 's Manual, chapter 7.4.15. Intel, December 1995.


Using Lightweight Checkpoint/Recovery to Improve the Availability.. - Sorin (2002)   (Correct)

No context found.

Intel Corporation. Pentium Pro Family Developer's Manual, Volume 3: Operating System Writer's Manual, January 1996.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC