40 citations found. Retrieving documents...
D. W. Clark. Cache Performance in the VAX-11/780. ACM Transactions on Computer Systems, 3:24--37, 1983.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

An Analysis of Operating System Behavior on a.. - Redstone, Eggers, Levy (2000)   (9 citations)  (Correct)

....to the superscalar processor. 4. RELATED WORK In this section, we discuss previous work in three categories: characterizing OS performance, Web server behavior, and the SMT architecture. Several studies have investigated architectural aspects of operating system performance. Clark and Emer [8] used bus monitors to examine the TLB performance of the VAX 11 780; they provided the first data showing that OS code utilized the TLB less effectively than user code. In 1988, Agarwal, Hennesy, and Horowitz [1] modified the microcode of the VAX 8200 to trace both user and system references and ....

D. Clark and J. Emer. Measurement of the VAX-11/780 translation buffer: Simulation and measurement. ACM Transactions on Computer Systems, 3(1), February 1985.


The Influence of Caches on the Performance of Sorting - LaMarca, Ladner (1996)   (39 citations)  (Correct)

....the efficient comparison based sorting algorithms. 1 Introduction. Since the introduction of caches, main memory has continued to grow slower relative to processor cycle times. The time to service a cache miss to memory has grown from 6 cycles for the Vax 11 780 to 120 for the AlphaServer 8400 [3, 7]. Cache miss penalties have grown to the point where good overall performance cannot be achieved without good cache performance. As a consequence of this change in computer architectures, algorithms which have been designed to minimize instruction count may not achieve the performance of ....

D. Clark. Cache performance of the VAX-11/780. ACM Transactions on Computer Systems, 1(1):24--37, 1983.


Virtual Memory In A 64-Bit Microkernel - Elphinstone (1999)   (1 citation)  (Correct)

....in a nested TLB miss while servicing the original TLB miss, then either the nested miss is resolved via a page table entry elsewhere in the another LVA, or the nesting misses eventually resolve as the root of the LVA is stored in physical memory. The classic example of an LVA is the VAX [LL82, CE85] A simplified diagram is shown in Figure 3.3. The page table for user space is an array in kernel virtual space. The page table for kernel virtual space is an array in physical memory. Kernel Virtual Memory User Virtual Memory Physical Memory Page table entries for kernel VM Page table ....

....of physical memory. The TLB aims to hold as many relevant virtual to physical translations as possible to cover as much of physical memory as possible. TLB coverage is not keeping pace with increasing physical memory sizes. In the past, TLB miss handling typically contributed less than 5 [CE85] of overall runtime. However in recent studies, miss handling is not unknown to contribute 40 of application runtime [HH93] Various methods have been proposed to combat increasing TLB miss ratios. Associativity trade offs and changes [NUS 93, CLK97] micro TLBs [CBJ92] variable page sizes ....

Douglas Clark and Joel S. Emer. Performance of the VAX-11/780 translation buffer: Simulation and measurement. ACM Transactions on Computer Systems, 3(1):31--62, February 1985.


Optimizing the Instruction Cache Performance of the.. - Torrellas, Xia, Daigle (1995)   (38 citations)  (Correct)

....heavy use of the operating system. Given that the operating system code has a complex functionality, a large size, and interrupt driven transfers of control among its procedures, caches may be less effective in intercepting its accesses. Indeed, there is some evidence that backs this claim. Clark [11] reported a lower performance of the VAX 11 780 cache when operating system activity was taken into account. Similarly, Agarwal et al. [2] pointed out the many cache misses caused by the operating system. Torrellas et al. [23] reported that the operating system code causes a large fraction of the ....

D. Clark. Cache Performance in the VAX-11/780. ACM Transactions on Computer Systems, 1(1):24--37, February 1983.


Tradeoffs in Supporting Two Page Sizes - Talluri, Kong, Hill, Patterson (1992)   (43 citations)  (Correct)

....Cray Research Foundation and Digital Equipment Corporation; Patterson is supported in part by DARPA NASA Ames Research Center, Grant Number: NAG2 591. 1. Introduction A translation lookaside buffer (TLB) is a fast buffer containing recently used virtual to physical address translations [ClE85, HeP90, SaB81, Smi82]. Most computers that support paged virtual memory [Den70] use TLBs to reduce average address translation time. Ten years ago, TLB miss handling was responsible for only a small fraction of a machine s cycles perinstruction (CPI) A TLB could map a substantial fraction of main memory (e.g. ....

....Ten years ago, TLB miss handling was responsible for only a small fraction of a machine s cycles perinstruction (CPI) A TLB could map a substantial fraction of main memory (e.g. 0.5MB) machines had large CPIs (e.g. 10 cycles) and programs had small working sets. For example, Clark and Emer [ClE85] report that the VAX11 780 loses only 5 of its performance to TLB misses. Wood et al. WEG86] report TLB miss rates to be around 0.03 3 for some machines built in the early 1980s. However, technological and architectural trends have led to increasing main memory sizes, decreasing CPIs, and ....

D. W. CLARK and J. S. EMER, Performance of the VAX-11/780 Translation Buffer: Simulation and Measurement, ACM Transactions on Computer Systems 3, 1 (February 1985), 31-62.


Performance Implications of Context Switches on Misses to DRAM - Meerdervoort (1999)   (Correct)

....and either roughly modeled the effects of multiprogramming and system references or excluded them altogether. This omission was not significant for small caches, as was shown by the simulation results of Smith [1982] which correspond reasonably well with the actual measurements of systems [Clark 1983]. As applications and cache sizes grew during the 1980s new techniques were needed to implement the larger simulations required to accurately model these systems. Moreover, the effects of multitasking on the larger caches could not be ignored anymore. Trace based simulation of larger caches ....

D Clark. Cache performance in the VAX-11/780, ACM Transactions on Computer Systems, vol. 1 no. 1, February 1983, pp 24-37.


An SRAM Main Memory Model - Salverda (1997)   (1 citation)  (Correct)

....new hierarchy in the face of an increasing CPU DRAM gap. For this purpose, it is necessary to obtain a measure of its overall performance. A variety of techniques exist to measure memory system performance, each suited to different requirements in speed and accuracy of measurement. Among these are [1, 11]: 1. Hardware measurement. Probes are attached directly to the hardware to measure memory system activity. 2. Analytical modelling. This involves the use of mathematical models, which incorporate the various organizational parameters and trade offs among them, to yield rough estimates of ....

D.W. Clark. Cache performance in the VAX-11/780. ACM Transactions on Computer Systems, 1(1):34--37, February 1983.


Static Cache Simulation and its Applications - Mueller (1994)   (10 citations)  (Correct)

....with dedicated instruments, e.g. a logic analyzer. The probes can be stored in a trace file. This technique requires additional, expensive hardware and some expertise to use this hardware. The technique is very fast since the program executes at its original speed and does not need to be modified [16, 15]. Yet, the method hides on chip activities such as instruction or data references accessing primary caches. ffl Frequency Counting: Similar to inline tracing, the program is modified to include instrumentation code. But rather than generating a program trace, the execution frequency of code ....

D. W. Clark. Cache performance in the VAX-11/780. ACM Transactions on Computer Systems, 1(1):24--37, February 1983.


Design and Analysis of Cache-Conscious Programs - Spork (1999)   (Correct)

....more complex. In this thesis, I present the phenomenon and a model for predicting the performance of the memory hierarchy, and then propose a methodology for designing latency conscious programs. 1. 1 Algorithms and memory latency Cache miss penalties have risen by a factor of two every year [3, 8, 10, 31]. The effect of latency times of a factor of 100 or more implies that predictions of execution times using traditional unit cost models unaware of these effects may give imprecise or even wrong results. The penalty for a page fault in virtual memory is much larger, but has little concern as long ....

D. W. CLARK, Cache Performance in the VAX-11/780, ACM Transactions on Computer Systems 1, 1 (1983), 24--37.


Page Tables for 64-Bit Computer Systems - Elphinstone, Heiser (1998)   (Correct)

....required, compared to the MPT. However, if the appropriate page table page is not mapped, a secondary fault is generated. In 32 bit systems the mapping for any page of the page table can be obtained from the root page which is held in unmapped memory. This approach has been used in the VAX [11]. For larger address spaces, multiple misses can occur, up to six cascaded faults for 64 bits. This is unavoidable, as it is infeasible to hold the complete LPT in physical memory. Hence, the LPT is faster than the MPT only as long as all required portions of the page table are mapped. As the cost ....

Douglas W. Clark and Joel S. Emer. Performance of the VAX-11/780 translation buffer: Simulation and measurement. ACM Transactions on Computer Systems, 3:31--62, 1985.


The Interaction of Architecture and Operating System.. - Anderson, Levy, Bershad, .. (1991)   (107 citations)  (Correct)

....amount of information overlooked can be huge. In trace driven studies gathered through a microcode based tool, Agarwal et al. found that during the execution of two VAX Ultrix workloads, over 50 of the references were system references [Agarwal et al. 88] worse, this study and others (such as [Clark Emer 85] have shown operat ing system behavior to differ significantly from application behavior. Thus, the result of ignoring such a large execution component could be dramatic. ffl In those modern architectures where the needs of operating systems have been carefully considered, traditional Unix ....

....TLBs are poorly used by the operating system relative to user mode programs. For example, in a study of TLB performance on the VAX 11 780, Clark and Emer found that while the VMS operating system accounts for only one fifth of all references, it accounts for more than two thirds of all TLB misses [Clark Emer 85] On the other hand, this system space structure has several problems with respect to modern operating systems. Because the unmapped region is accessed directly through a physical base register, there is no indirection and therefore no ability to specify page level protection or access control, ....

D. W. Clark and J. S. Emer. Performance of the VAX-11/780 translation buffer: Simulation and measurement. ACM Transactions on Computer Systems, 3(1):31--62, February 1985.


The Evaluation Of Massively Parallel Array Architectures - Herbordt (1994)   (4 citations)  (Correct)

....then analytical techniques can be used to approximate the performance. Otherwise a platform must be constructed on which to run the test suite. The two common alternatives are prototyping and simulation. The advantage of using prototypes, especially those with built in instrumentation (see e.g.[42]) is that they guarantee accuracy with very fast turn around. However, it is readily apparent that building prototypes is expensive and time consuming and also that, once built, a prototype is difficult to modify. A cost effective alternative, especially early in the evaluation process, is ....

Clark, D. W. Cache performance in the VAX-11/780. ACM Transactions on Computer Systems 1, 1 (1983), 24--37.


The Influence of Caches on the Performance of Heaps - LaMarca, Ladner (1996)   (31 citations)  (Correct)

....machines of only ten years ago. We compare the performance of implicit heaps, skew heaps and splay trees and discuss the difference between our results and Jones s. 1 Introduction The time to service a cache miss to memory has grown from 6 cycles for the Vax 11 780 to 120 for the AlphaServer 8400 [6, 14]. Cache miss penalties have grown to the point where good overall performance cannot be achieved without good cache performance. Unfortunately, many fundamental algorithms were developed without considering caching. In this paper, we perform a study of the cache performance of heaps [36] The main ....

D. Clark. Cache performance of the VAX-11/780. ACM Transactions on Computer Systems, 1:1:24--37, 1983.


Synchronization, Coherence, and Consistency for High Performance .. - Dwarkadas (1992)   (Correct)

....due to time and storage constraints. Analysis using hardware measurement is, however, limited to an existing implementation of a cache based architecture. Exact reproducibility is also not possible with hardware measurement. A comprehensive set of hardware measurements is presented by Clark [21]. 17 2.4.2 Simulation Since accurately modeling the characteristics of program behavior that determine the performance of caches has been found to be analytically intractable, simulation has been the prevailing technique used to yield accurate predictions [72] Various methods of simulation in ....

....program are generated on the fly [28] during execution by the execution driven technique. This avoids the large space overhead incurred by trace driven simulation and the time involved in accessing these large traces from disk, as well as the specialized hardware needed to generate these traces [4, 21]. Execution driven simulation also avoids another drawback of trace driven simulation, where changes in the address trace due to architectural variations are not reflected in the simulations carried out. The execution driven technique achieves its goal of avoiding the high overhead associated with ....

D. W. Clark. Cache Performance in the VAX-11/780. ACM Transactions on Computer Systems, 1(1):24--37, February 1983.


Efficient Memory Simulation in SimICS - Magnusson, Werner (1995)   (22 citations)  (Correct)

....turned off. We would expect the penalties indicated in table 4 to be higher on faster versions. 9 Related Work Traditionally, the approach used to gather complete address traces of multiprogram and OS workloads involved microcode or different forms of hardware monitoring or modifications [1, 13, 30, 41]. Recently, more flexible techniques have been developed that rely only on the manipulation of ECC bits in the host memory and clever modifications to the host operating system [32, 42] In general, these techniques are unwieldy and inflexible. Instrumenting the program binary is a common ....

D. Clark. Cache Performance in the VAX-11/780. ACM Transactions on Computer Systems, 1:24-- 37, 1983.


Emulation of a Virtual Shared Memory Architecture - Raina (1993)   (3 citations)  (Correct)

....for parallel machines [9, 184] The following techniques have been used in the past. Hardware based tracing: these methods directly capture addresses as they are issued by the processor to an off chip cache or memory. Such a technique was used in the VAX 11 780 to study its cache performance [48]. The main drawbacks of such techniques are complexity, cost and lack of flexibility. Also, only physical addresses that are emitted out of the processor are recorded. So addresses to an on chip cache would not be recorded. These techniques are difficult to adapt to parallel machines. The ....

D. W. Clark. Cache Performance in the VAX-11/780. ACM Transactions on Computer Systems, 3:24--37, 1983.


BACH: BYU Address Collection Hardware, The Collection of.. - Kelly Flanagan (1992)   (6 citations)  (Correct)

....(ii) the absence of an executing operating system makes the tracing of multiprogramming workloads difficult, and (iii) a time dilation of 1000:1 or more is typical [8] 2. 5 Hardware Monitors Hardware monitors collect address traces by passively monitoring the signals generated by the target CPU [10, 11]. These signals are stored in the hardware monitor s local memory until the buffer is full. When full, the buffer is emptied to secondary storage. Since the signals from the CPU are monitored in real time no time dilation is introduced. The resulting traces are complete, containing all CPU ....

Douglas W. Clark and Joel S. Emer, Performance of the VAX-11/780 Translation Buffer: Simulation and Measurement, ACM Transactions on Computer Systems, 3(1):31--62, February 1985.


Design Tradeoffs for Software-Managed TLBs - Uhlig, Nagle, Stanley, Mudge.. (1993)   (62 citations)  (Correct)

....mappings not in the TLB result in misses that must be serviced either by hardware or by software. In their 1985 study, Clark and Emer examined the cost of hardware TLB management by monitoring a VAX 11 780. For their workloads, 5 to 8 of a user program s run time was spent handling TLB misses [9]. More recent papers have investigated the TLB s impact on user program performance. Chen, Borg and Jouppi [6] using traces generated from the SPEC benchmarks, determined that, for a reasonable range of page sizes, the amount of the address space that could be mapped was more important than the ....

....coarser resolution system clock. It also avoids the problems inherent in the common method of improving system clock resolution by taking averages of repeated invocations [8] 3. 2 TLB Simulation with Tapeworm Many previous TLB studies have used trace driven simulation to explore design trade offs [3, 9, 25]. However, there are a number of difficulties with trace driven TLB simulation. First, it is difficult to obtain accurate traces. Code annotation tools like pixie [24] or AE [14] generate userlevel address traces for a single task. However, more complex tools are required in order to obtain ....

[Article contains additional citation context not shown here]

Clark, D.W. and J.S. Emer, Performance of the VAX-11/780 translation buffer: Simulation and measurement. ACM Transactions on Computer Systems, 1985. 3(1): p. 31-62.


A Simulation-Based Study of TLB Performance - Chen, Borg, Jouppi (1991)   (48 citations)  (Correct)

....The many recent studies on memory system behavior and performance have concentrated almost exclusively on cache design [10, 9] Little attention has been given to TLB performance. Early studies have shown that TLB miss penalties consume 6 of all machine cycles [4] and 4 of execution time [3], and hence can have a significant impact on machine performance. However, these results were for VAX computers with 512 byte page sizes, an order of magnitude smaller than is typical today, and main memory sizes two orders of magnitude smaller than those considered in this study. Wood [12, 11] ....

....space. They are less useful for behavior such as is seen with typical C programs, where memory activity is concentrated in the bottom of several segments. His methods also become less applicable when memory access times become large with respect to processor speed. Finally, previous TLB studies [3, 11] have considered set associative or direct mapped organizations. These were common when TLBs were made from discrete MSI and LSI RAMs. Recently, however, VLSI RISC microprocessors (e.g. 5, 7] typically make use of fullyassociative TLBs, since these require about the same area as set associative ....

Douglas W. Clark and Joel S. Emer. Performance of the VAX 11/780 Translation Buffer: Simulation and Measurement. ACM Transactions on Computer Systems 3(1), February, 1985.


Cache Write Policies and Performance - Jouppi (1991)   (84 citations)  Self-citation (Performance)   (Correct)

....[11] includes write overheads in his analysis, but only considers the case of write back caches at all levels. Write miss policies have been even less investigated. Almost all of the known results in the literature have been for the combination of write allocate and fetch on write. The VAX 11 780 [2] and 8800 [3] were notable exceptions to this and used no write allocate. No known results in the literature compare the performance of different write miss policies. 1 By uniprocessor we include non coherency issues in multiprocessor cache memories, as well as uniprocessor cache memories. 1 ....

Clark, Douglas W. Cache Performance in the VAX 11/780. ACM Transactions on Computer Systems 1(1):24-37, February, 1983.


Emulation of a Virtual Shared Memory Architecture - Raina (1993)   (3 citations)  (Correct)

No context found.

D. W. Clark. Cache Performance in the VAX-11/780. ACM Transactions on Computer Systems, 3:24--37, 1983.


Using Set Sampling for Level Three Cache Studies - Thornock (1999)   (Correct)

No context found.

Douglas W. Clark, Cache performance in the VAX-11/780, ACM Transactions on Computer Systems, 1(1):24--37, February 1983.


A National Trace Collection and Distribution Resource - Flanagan (1998)   (Correct)

No context found.

Douglas W. Clark, Cache performance in the VAX-11/780, ACM Transactions on Computer Systems, 1(1):24--37, February 1983.


Trap-driven Memory Simulation - Uhlig (1995)   (2 citations)  (Correct)

No context found.

Clark, D. Cache performance in the VAX-11/780. ACM Transactions on Computer Systems 1 : 24-37, 1983.


Quantitative Analysis of Hardware Support for Real-Time.. - Saurav Chatterjee (1996)   (Correct)

No context found.

D. Clark. and J. Emer. Performance of the VAX-11/780 Translation Buffer: Simulation and Measurements. ACM Transactions on Computer Systems, Feb 1985.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC