| B. Bershad, R. Draves, and A. Forin. Using Microbenchmarks to Evaluate System Performance. In Proc. Third Workshop on Workstation Operating Systems, pp. 148-153. IEEE Technical Committee on Operating Systems, Key Biscayne, FL, Apr., 1992. |
....Microbenchmarks concisely identify the causes of performance problems, but they are not so good for predicting the performance of an entire system in actual use. Complex systems, especially, display performance interactions (such as cache conflicts) that microbenchmarks explicitly ignore [4]. Trying to fix this by using multiple microbenchmarks doesn t work, since it doesn t test their interactions. The use of an ad hoc selection of programs to measure system performance is not inherently evil. For example, a paper about how an operating system fails to support a particular class of ....
B. Bershad, R. Draves, and A. Forin. Using Microbenchmarks to Evaluate System Performance. In Proc. Third Workshop on Workstation Operating Systems, pp. 148-153. IEEE Technical Committee on Operating Systems, Key Biscayne, FL, Apr., 1992.
....over a network we implemented basic, source level, Mach kernel capability that uses a modified Gnu debugger. Designed and evaluated microkernel cache management policies to prevent caching side effects. Wrote guidelines for running microbenchmark tests on high performance RISC processors. Bershad et al. 92a] Converted the first set of Mach course notes into distributable form. Simulated memory consistency models (such as sequential, weak, release, lazy) for various processor and network configurations and compared their performance. Compared Mach 3.0 and Ultrix and found that, for similar ....
Bershad, B., R.P. Draves, and A. Forin. Using Microbenchmarks to Evaluate System Performance. In Proceedings of the Third Workshop on Workstation Operating Systems. WWOS-3, April, 1992.
....of the OS service on an otherwise unloaded system. The principal motivations for these choices appear to have been the limited resolution of traditional hardware timers and a desire to distinguish OS overhead from hardware overhead by using warm caches, etc. The result, as Bershad et al. note [2], is that microbenchmarks have not been very useful in assessing the OS and hardware overhead that an application or driver will actually receive in practice. Most previous efforts to quantify the performance of personal computer and desktop workstation OSs have focused on average case values ....
....the USENIX Third Symposium on Operating Systems Design and Implementation (OSDI 99) 1999 Intel Corporation Page 6 of 14 2.2. 4 Thread Pseudocode Procedure LatThreadFunc( KeSetPriorityThread( KeGetCurrentThread( 24) loop (FOREVER) WaitForObject(gEvent,FOREVER) GetCycleCount( ghIRP ASB[2]) This completes the read, sending the data to the user mode app IoCompleteRequest(ghIRP) ghIRP = NULL loop 2.2.5 GetCycleCount code Because not all versions of the Visual C inline assembler recognize the Pentium RDTSC instruction, the following function is provided. ....
B. N. Bershad, R.P. Draves and A. Forin, "Using Microbenchmarks to Evaluate System Performance", Proc. 3rd Wkshop on Workstation Operating Systems, Key Biscayne, FL, April, 1992
....This rapid re use is likely to result in most relevant data being in the L1 or L2 cache of the conventional machine. It is questionable whether this caching effect is representative of real invocations of process creation services. These concerns are similar to those expressed by Bershad et al. [6]; they suggest flushing caches to measure the impact of caches on microbenchmarks. This issue has larger ramifications for IRAMs: microbenchmarks will almost always underestimate the value of IRAM architectures, since the repetition in these benchmarks increases locality and artificially reduces ....
BERSHAD, B. N., DRAVES, R. P., AND FORIN, A. Using microbenchmarks to evaluate system performance. In Proceedings of the Fourth Workshop on Workstation Operating Systems (1992).
....acceptable. It is difficult to find comparable performance figures for other RPC systems, since we do not know of any other optimized local RPC results on the same architecture. Instead, Table 2 compares our performance to that of four other RPC implementations running on other hardware: Mach RPC [Bershad et al. 1992], SRC RPC [Schroeder Burrows 1990] LRPC [Bershad et al. 1990] and URPC [Bershad et al. 1991] These are all optimized RPC implementations; commercial RPC implementations are frequently another order of magnitude slower. 7 Related Work The most closely related work to ours is Druschel and ....
Bershad, B., Draves, R., and Forin, A. Using Microbenchmarks to Evaluate System Performance. In Proceedings of the Third Workshop on Workstation Operating Systems, April 1992.
....mutex policies. Table 2 shows the performance of null request reply of RT IPC with various policies. The cost of RT IPC is more expensive than the cost of original Mach IPC. In Mach IPC, a null request reply is highly optimized by inline expansions in order to reduce the effect of cache collisions[1]. In our implementation, we need to add several function calls for policy mechanism separation so that the measured cost becomes higher than the additional cost of the real time resource manager. Next, we demonstrate the effectiveness of the real time resource manager using a benchmark. The ....
B.N.Bershad, R.P.Draves and A.Forin, "Using Microbenchmarks to Evaluate System Performance ", In the proceedings of third workshop of Workstation Operating Systems, 1992
....linked at the normal address. However, in both in and out of kernel cases we use the same server, so this is not a factor. In general, we do not expect cache effects to be significant and consistent in this experiment. Other studies have shown that random changes can have significant cache effects[2]. 5 Limitations A serious limitation that arises from use of this framework is the loss of the protection that separate address spaces provides. Hardware protection is an important mechanism used to provide robustness in the face of unfriendly or malicious programs. Use can only be made of the ....
B. Bershad, R. Draves, and A. Forin. Using microbenchmarks to evaluate system performance. Technical report, Carnegie Mellon University, November 1991.
....described in [Forin et al. 91] allows the protocol server to also act as the network device driver, controlling the network interface directly, and avoiding the copy and the 1 A user to kernel RPC for Mach 3. 0 on the DECstation 2100 takes between 100 and 300 secs, depending on cache effects [Bershad et al. 91] whereas a system call takes only about 30 secs. Little effort has been spent trying to reduce the latency of user to kernel RPC, whereas significant effort has gone towards reducing user to user RPC latency [Draves et al. 91] UDP Implementation Time (ms) Unix server (IPC, no VM) 14.3 Unix ....
Bershad, B. N., Draves, R. P., and Forin, A. Using microbenchmarks to evaluate system performance. In Submitted to WWOS 92, December 1991.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC