| B. Bershad. High performance cross-address space communication. Technical Report 90-06-02 (PhD Thesis), University of Washington, June 1990. |
....systems. We are motivated by observations made by numerous researchers that traditional multiprocessor operating systems are not flexible enough to support high performance parallel applications. Poor integration between user level threads and kernel [3] rigid communication abstractions [5], and the lack of application specific information in process scheduling [9] for example, can severely limit the performance gain of running a parallel application on multiple processors. On an exokernel operating system, however, these limitations may not exist because the appropriate scheduling ....
....protocol, the critical section is either terminated or retried. 1.3 Related Work There is a plethora of research literature on multiprocessor operating systems, most focus on different synchronization strategies and techniques. Several papers influenced this thesis heavily. Anderson et al. [2, 3, 4, 5] discuss operating system support for multiprocessor computers. In particular, much emphasis has been given to the implications of different designs and implementations of thread, communication, and scheduling systems. Three contributions from their work influenced the design and implementation of ....
[Article contains additional citation context not shown here]
B. Bershad. High performance cross-address space communication. Technical Report 90-06-02 (PhD Thesis), University of Washington, June 1990.
....penalty. The primary cause of this is the increased cost of kernel to pager communication, since inflexible pagers built into the kernel use procedure calls to communicate but external pagers must use more expensive crossdomain calls. The efficiency of such communication has been optimized [Bershad 90] but it is still much slower than procedure calls. As a result, the default pager is integrated with the kernel in commercial systems based on Mach. An alternative to bypassing flexibility is to restructure flexible systems to improve efficiency. To avoid overheads associated with flexibility, ....
Brian N. Bershad. High Performance Cross-Address Space Communication. PhD dissertation, Department of Computer Science and Engineering, University of Washington, June 1990. 25
....MMS need not contain the entire CM stream; since CM streams are accessed sequentially, a small circular buffer suffices. MMSs avoid the overhead of page faults. Also, since data is released explicitly, page replacement algorithms are not needed. Finally, the URPC mechanism developed by Bershad [6] uses shared memory to reduce kernel interaction in local client server IPC on sharedmemory multiprocessors. This is similar in spirit to MMS, though the setting is different. 7. CONCLUSION Existing operating systems incorporate design principles that are contrary to the needs of applications ....
B. Bershad, "High-Performance Cross-Address Space Communication", Technical Report No. 90-06-02, Dept. of Computer Science and Engineering, Univ. of Washington, June 1990.
....at the destination(s) messages are read from this segment. This approach has been used by a variety of commercial applications using the shared memory mapping facilities in Unix System V [2] It has also been used in some research systems, including the Berkeley DASH project [18] and URPC [3]. Several trends suggest the need for further improvement of communication support. cheriton pescadero.stanford.edu y kutter pescadero.stanford.edu Shared Segment Shared Message Region Sender Virtual Address Space Receiver Virtual Address Space Shared Message Region Kernel Notification ....
....transfer time between processors. 7 Related Work Previous performance work on message systems and RPC has been dominated by efforts to reduce the cost as close as possible to the raw copy cost (e.g. V [5] Amoeba [16]and Taos [15] and various efforts to reduce the copy cost (Mach [1]and URPC [3]) Mach takes the copy model of IPC and optimizes it using memory mapping techniques, whereas the memory based messaging approach takes the memory mapping model and extends it for efficient communication. The memory based message model is similar to that used in the Berkeley DASH project [18] ....
[Article contains additional citation context not shown here]
Brian N. Bershad. High Performance Cross-Address Space Communication. PhD thesis, University of Washington, Department of Computer Science and Engineering, June 1990.
....and other Unix like systems. Ultrix RPC 3 software has not been particularly optimized and thus, on a network that is over ten times faster, the performance of small packet exchanges does not change significantly. Unfortunately, small data exchange is a common occurrence in distributed systems [6]. In short, the advantages gained from increased network hardware performance can be almost completely masked without proper software design. Designing software mechanisms so that applications can efficiently use the next generation of high speed networks is a difficult challenge. This thesis ....
Brian N. Bershad. High Performance Cross-Address Space Communication. Ph.D. thesis, University of Washington, June 1990. Department of Computer Science and Engineering Technical Report 90-06-02.
....(same machine) RPCs can be made much faster, depending on how they are implemented in the operating system kernel. By carefully minimizing data copying and using shared memory, Bershad was able to reduce worst case overhead for local RPCs on DEC Firefly multiprocessors to about 150 microseconds [9]. Bershad s technique has the advantage of propagating the execution priority of the client to the server (the client task is mapped into the server s address space, and it effectively becomes an instance of the server for the duration of the RPC) To accomplish this trick, Bershad relied on ....
....creating proportionately many server tasks or creating serial bottlenecks. On our implementation platform, round trip RPCs consumed several milliseconds between CPUs on the local multiprocessor (we used Sun RPC) The special techniques for minimizing context switch overhead in RPCs described in [9, 46] were not available to us in the commercial real time operating system we used. Therefore, the theoretical limits of RPC overhead was a moot issue for us. On our platform, RPC was expensive. Instead of trying to reduce RPC overhead, which would have been difficult and would not have been portable, ....
[Article contains additional citation context not shown here]
B. N. Bershad, High Performance Cross-Address Space Communication, PhD thesis, University of Washington, June 1990.
....and commercial purposes. 2 Structuring an Operating System A multiprocessor operating system is typically large and complex. Its maintainability, expandability, adaptability, and portability strongly depend on its internal structure. Different techniques for structuring operating systems [25] are described in this section, along with a discussion of some of the effects of such structuring on the ease with which an operating system can be adapted for use with multiprocessor computing engines. The various structuring techniques described in this section are not mutually exclusive; ....
....the use of mixed language environments. Furthermore, in order to guarantee the integrity of a system based on language level decomposition, any executable code must be inspected by a trusted system entity to guarantee type safety at runtime, es7 sentially requiring access to its source code [25]. Finally, all language based systems will still require the availability of lower level runtime system support for efficient program execution, as clearly apparent from current efforts to develop a threads like common runtime system layer for both high performance Fortran and concurrent C . It ....
[Article contains additional citation context not shown here]
Brian N. Bershad. High performance cross-address space communication. Technical Report 90-06-02, Dept. of Computer Science and Eng., University of Washington, June 1990. Ph.D. dissertation.
....designer. As an alternative to static generation of stubs, some projects have designed efficient remote evaluation mechanisms for heterogeneous applications. Distributed applications gain substantial performance improvements through the use of customized interface mechanisms like RPC or REV stubs [3, 14, 17, 6]. Stubs in these projects are often handwritten or rewritten from those generated automatically because their performance is critical in many systems and their design if often dependent upon the context of use in a configuration. In comparison, Polygen accommodates many of these approaches by ....
B. Bershad. High Performance Cross-Address Space Communication. Ph.D. Dissertation, University of Washington, Seattle, Technical Report 06-02, (1990).
.... Sender Virtual Address Space Receiver Virtual Address Space Kernel Notification Mechanism Shared Message Region Shared Message Region Shared Segment Figure 1: Two processes communicating through shared memory used in some research systems, including the Berkeley DASH project [25] and URPC [3]. Several trends suggest the need for further improvement of communication support. First, several applications require high input output performance, placing demands on communication system facilities. For example, moving video from a network interface to a multimedia application and then onto a ....
....of the ParaDiGM implementation and measurements. As an example, we discovered that it was faster to invalidate a received cache line in software than to have the cache controller perform this task. The basic memory based message model is similar to that used in the Berkeley DASH project [25] URPC [3] and many commercial systems using shared memory for communication between processes. Our contribution has been the refinement of the signaling and consistency support and an efficient hardware and software implementation that further optimizes this communication model. The signaling mechanism has ....
[Article contains additional citation context not shown here]
Brian N. Bershad. High Performance Cross-Address Space Communication. PhD thesis, University of Washington, Department of Computer Science and Engineering, June 1990.
....have large virtual memory footprints. Of course, this is hardly a conclusive proof; see Section 5.2 for a discussion of the ExOS virtual memory system. 4. 6 Protected Control Transfers Aegis provides a protected control transfer mechanism as a substrate for implementing efficient IPC mechanisms [6, 22, 29]. Operationally, a protected control transfer changes the program counter to an agreed upon value in the callee, donates the current time slice to the callee s processor environment, and installs required elements of the callee s processor context (addressing context identifier, address space tag, ....
....and (2) specialization and extensibility of these abstractions can result in substantial performance improvements. Due to space constraints we focus on IPC and virtual memory. 5. 1 Fast IPC Abstractions Fast inter process communication is crucial for building efficient and decoupled systems [6, 22, 29]. As described in Section 4, the Aegis protected control transfer mechanism is an efficient substrate for implementing fast IPC mechanisms. We measure the efficiency of IPC primitives that are constructed in ExOS on top of the Aegis primitive. pipe: measures the time needed to send a word sized ....
B. N. Bershad. High performance cross-address space communication. Technical Report 90-06-02 (PhD Thesis) , University of Washington, June 1990.
....expressed using such heavyweight processes must be coarse grained and is often not suitable for high performance parallel programs. Hence, in many contemporary operating system kernels, address space and threads are decoupled so that a single address space can have multiple execution threads[2, 4]. Threads are an emerging model for expressing concurrency within Unix processes[7] In multiprocessors, threads are primarily used to simultaneously utilize all the available processors. Threads are even useful on a uniprocessor system for mapping asynchronous behavior into equivalent synchronous ....
....are an emerging model for expressing concurrency within Unix processes[7] In multiprocessors, threads are primarily used to simultaneously utilize all the available processors. Threads are even useful on a uniprocessor system for mapping asynchronous behavior into equivalent synchronous behavior[7, 4]. Though such kernel level threads offer a general programming interface to an application, they are expensive and therefore are not used in fine grained parallel programs[4, 17] Unlike kernel level threads, user level threads (also known as lightweight threads) are managed by runtime library ....
[Article contains additional citation context not shown here]
Brian N. Bershad. High performance cross-address space communication. Technical Report 90-06-02, Dept. of Computer Science and Eng., University of Washington, June 1990. Ph.D. dissertation.
....the implementation uses messages if the client and server are on different nodes. If the client and server are colocated, arguments and results can be passed through a small segment called a channel that is shared by the client and the server (this technique for lightweight RPC is described in [Bershad 90] 2.2 Segments and Addressing In most operating systems, including segmented systems such as Multics [Daley Dennis 68] each protection domain has a private virtual address space; a segment may be mapped to a different virtual address range by each domain that attaches it. In contrast, Opal ....
B. N. Bershad. High Performance Cross-Address Space Communication. PhD dissertation, University of Washington, June 1990. Department of Computer Science and Engineering Technical Report 90-06-02.
....to the underlying control and data transfer mechanism. Historically, RPCs began by using an underlying message passing transport layer appropriate for network communications. In time, however, it became clear that the most common use of RPCs was for servers residing on the same machine [2]. The Mach 3 operating system takes advantage of this by providing a relatively fast local RPC mechanism that ignores network communication needs. But recent research into lightweight remote procedure calls indicates that existing local RPC facilities can gain further performance improvements by ....
....with the existing standard RPC interface, MiG (Mach interface Generator) The structure of Mach 3 leads to a different design for its LRPC facility than for the original LRPC facility implemented by Bershad in the TAOS operating system [3] 2. LRPC Concepts Implementation Bershad showed in [2] that well over 95 of RPC calls in general settings are to servers on the same computer. This implies that the RPC functionality of providing transparent server access across a network of computers is unused in the common case. In addition, the high cost of network access masks further ....
[Article contains additional citation context not shown here]
Bershad, Brian N., High Performance Cross--Address Space Communication, Ph.D. dissertation, Department of Computer Science and Engineering, University of Washington, Technical Report No. 90--06--02, June 1990.
....philosophy to design so called nanokernels which are responsible for fewer functions than microkernels. The functionality not provided by the nanokernel is provided by libraries and application servers. The advantages of this approach are supported by results in cross address space communication [Bershad, 1990] and threads [Anderson, 1991] on multiprocessors. Other microkernel based systems include Chorus [Rozier, 1992] Amoeba [Renesse, 1992] and QNX [Hildebrand, 1992] and ARCADE [Delaney, 1989] Each of these systems has its own strengths. Chorus provides binary UNIX compatibility; Amoeba makes ....
B. Bershad (1990) High Performance Cross-Address Space Communication, Ph.D. Dissertation, Tech.
....become more careful and more successful at building IPC mechanisms for speed, ruthlessly streamlining and optimizing the common cases. Using the Mach 3. 0 microkernel as an example, IPC performance on a Microvax III (CVax processor) has gone from about 750 secs for a round trip RPC in 1989 [Bershad 90] to 497 secs in 1992 (measured recently using Mach 3.0 version MK68) The improvements were due to tightening the interface [Draves 90] and the implementation [Draves et al. 91] The second reason why IPC has gotten faster faster than the rest of the operating system is that measured IPC ....
Bershad, B. N. High Performance Cross-Address Space Communication. PhD dissertation, University of Washington, Department of Computer Science and Engineering, Seattle, WA 98195, June 1990.
....and these machines are most efficient when dealing with per processor data structures [Anderson et al. 89] We were also concerned about the latency of transferring control from one thread to another. Our experiences designing fast interprocess communication (IPC) systems [Draves 90, Bershad 90] taught us several important lessons regarding low latency control transfer. We wanted to apply these lessons in a general way to other kernel level control transfer paths. While low latency was important for the cross address space RPC path, especially since most of the operating system was ....
....another. Further, continuation based RPC maintains the logical separation between a client s thread and a server s. Threads remain fixed in their address space, eliminating many of the protection, debugging and garbage collection problems that occur when threads migrate between address spaces [Bershad 90] A natural extension to the continuation model allows us to completely mimic the LRPC transfer protocol. By default, when a Mach thread traps into the kernel, it generates a continuation that will transfer control back to the same user level context in which the trap occurred. We are ....
Bershad, B. N. High Performance Cross-Address Space Communication. PhD dissertation, University of Washington, Department of Computer Science and Engineering, Seattle, WA 98195, June 1990.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC