| Johnson, D. B., Zwaenepoel, W.: "The Peregrine HighPerformance RPC System", Software -- Practice & Experience, vol. 23, no. 2, February 1993, pp. 201 - 221 |
....are critical. There are four major areas of related work: historical high performance RPC systems, fast network stacks, CORBA and RMI on high performance network layers, and other approaches for fast DCOM. Fast RPC Historical high performance RPC systems include Firefly [17] and Peregrine [9]; the first systems to deliver RPC performance equal to the underlying network performance. Both systems fast hardware for their era, the late 1980 s early 1990 s, and surprisingly this hardware is closest in network processor performance ratios to contemporary SAN systems. Both of these systems ....
David B. Johnson and Willy Zwaenepoel. The Peregrine high-performance RPC system. Software-- Practice and Experience, 23(2):201--221, February 1993.
....been dedicated to optimize RPC, and many different techniques have been demonstrated. In this section, we discuss some of the research directions, and how they relate to this work. Virtual memory Some approaches use virtual memory primitives to optimize the transfer of large messages. Peregrine [23], which is based on the V system [6] uses page remapping to efficiently move the call arguments and results between the client s and server s address spaces. The DEC Firefly RPC facility [32] uses a global buffer pool, which resides in memory shared among all user address spaces, whereas in the ....
David B. Johnson and Willy Zwaenepoel. The Peregrine high-performance RPC system. Software - Practice and Experience, 23(2):201--221, 1993.
....party s and system buffers. L3 [11] reduces the number of copies to one using a temporarily mapped communication window , but does not have anything analogous to user to user input alignment, reverse copyout, and page swapping, which may reduce or eliminate copying altogether. Local Peregrine RPC [10] comes close to the I O oriented IPC solution, passing data out of client buffers by copy onwrite and into and out of servers by mapping and unmapping. However, Peregrine passes data into client buffers by copying instead of input alignment and page swapping. I O oriented IPC s novel use of the ....
D. B. Johnson and W. Zwaenepoel. "The Peregrine HighPerformance RPC System", in Software --- Practice and Experience, 23(2):201-221, Feb. 1993.
....3 Header Prediction Header prediction has often been suggested as a performance benefit for TCP[2] There are two distinct kinds of optimizations that are often called header prediction. The first, involving prefilling parts of the transport header, is a known optimization for lowering latency [11, 8], and is not discussed further here. The second technique involves exploiting traffic locality to predict the next incoming packet and to avoid the protocol control block (PCB) lookup cost. Others have studied using traffic locality to improve throughput for bulk data transfer protocols [1, 13] ....
David B. Johnson and Willy Zwaenopoel. "The Peregrine High-Performance RPC System." To appear in Software Practice and Experience.
....an abstract execution trace of a call to rmin. 1 Performance of RPC Communication using the RPC paradigm is at the root of many distributed systems. As such, the performance of this component is critical. As a result, a lot of research has been carried out on the optimization of this paradigm [32, 6, 17, 36, 15, 26]. Many studies have been carried out, but they often result in using new protocols that are incompatible with an existing standard such as the Sun RPC. The problem in reimplementing a protocol that is specified only by its implementation is that features (and even bugs) may be lost, resulting in ....
....such as specific RPC optimizations, kernel level optimizations, operating system structuring, and automatic program transformation. Let us outline the salient aspects of these research directions. General RPC optimizations. A considerable amount of work has been dedicated to optimize RPC (see [32, 17, 36, 25, 24]) In most of these studies, a fast path in the RPC is identified, corresponding to a performance critical, frequently used case. The fast path is then optimized using a wide range of techniques. The optimizations address different layers of the protocol stack, and are performed either manually ....
D.B. Johnson and W. Zwaenepoel. The Peregrine high-performance RPC system. Software - Practice And Experience, 23(2):201--221, February 1993.
....explanation can for example be found in [Knut68] Sec. 2.3.4.1. Our approach is based on the fact that the usage pattern of an interface description shows locality. Locality has already been used successfully in other work to improve the performance of communication software ( Clar89] Jaco90] [John93]) 2.2 Stub generators Presentation conversion routines are often generated automatically by a so called stub generator. The input to a stub generator is a definition of the data types exported at the interfaces between the distributed parts of the application. These data types are defined in a ....
Johnson, David and Willy Zwaenepoel. The Peregrine High-performance RPC System. Software Practice and Experience, 23, 2 (February 93), 201-221.
....take full advantage of SHRIMP s features. The result is a round trip null RPC latency of 9.5 microseconds, which is about one microsecond above the hardware minimum. 2 1 Introduction Much is known about how to optimize remote procedure call (RPC) mechanisms on traditional workstation networks[2, 14, 17, 20]. The main effort in previous work was to reduce or avoid copying, to make traps and context switches fast, and to take advantage of common case behavior. The emergence of new multiprocessor network interfaces opens new possibilities for constructing network software. It is not always clear, ....
....[21] approach allows an arbitrary handler to be invoked, using a fast path implementation but switching to a slower path if the handler blocks. Neither of these systems provides full RPC services, such as automatic stub generation or binding between untrusting parties. Several papers (e.g. [14, 17]) describe optimizations that dramatically improve the performance of RPC in traditional systems. This is generally done by avoiding copying, and reducing context switching overhead and network and RPC protocol overhead. 9 Discussion and conclusions Network interfaces can have a great impact on ....
D. Johnson, and W. Zwaenepoel, The Peregrine high performance RPC system. Tech. Rep. COMP TR91-152, Dept. of Computer Science, Rice Univ., 1991.
....a remote procedure look like a local one. A call to this procedure is done transparently on the local machine but the actual computation takes place on a distant machine. Performance is a key point in RPC. A lot of research has been carried out on the optimization of the layers of the protocol [23, 5, 16, 24, 14, 18]. Many studies have been proposed, but they necessitate the use of new protocols, incompatible with existing standard such as Sun RPC. The high genericity of the RPC implementation is an invitation to specialization. Our group is currently developing a partial evaluator for C, named Tempo [7] It ....
....which belong to the operating system s interface. This makes it impossible to separately specialize an application module, and, in particular, our XDR example. General RPC Optimizations. A considerable amount of work has been dedicated to optimize existing RPC implementations (see for example [23, 16, 24]) In these studies, a fast path in the RPC is identified, corresponding to a performance critical, frequently used case. The fast path is then optimized using a wide range of techniques. Some of these consist of manual optimizations on a specific layer of the RPC protocol stack. Our approach aims ....
D.B. Johnson and W. Zwaenepoel. The Peregrine high-performance RPC system. Software - Practice And Experience, 23(2):201--221, February 1993.
.... spend 80 of their execution time in 20 of their code (80 20 rule) Protocol software is no exception to this rule, as indicated by the amount of recent work that deals with optimization of the most frequent case in different protocols (header prediction header templates ( Clar89] Jaco90] [John93]) integrated layer processing ( Clar90] Abbo92b] Morpheus language [Abbo92a] Generally speaking, the goal of these efforts is to implement a so called fast path through a protocol, i.e. to minimize the number of instructions in the most frequently executed path of the protocol. Today, the ....
Johnson, David and Willy Zwaenepoel. The Peregrine High-performance RPC System. Software Practice and Experience, 23, 2 (February 93), 201-221.
....the bu#er and store its address translations in the descriptor table or list. System calls are required to build the descriptors inside the OS. Autonet [41] for example, uses a chained list of descriptors and VAXClusters [28] uses a descriptor table. Page remapping is a method to avoid copying [30, 10, 15, 27, 6]. When transfers are properly aligned and the right length (i.e. a multiple of the physical page size) the OS swaps the virtual to physical mappings between the pages of the kernel bu#er with those of the application bu#er. This technique can achieve zero copy with bu#er restrictions. ....
D.B. Johnson and W. Zaenepoel. The peregrine highperformance rpc system. Software: Practice and Experience, 23(2):201--221, February 1993.
....for implementing widespread distributed services such as NFS [32] and NIS [29] The RPC implementation used in this paper is the commercial, 1984 copyrighted version of Sun. Performance is a key point in RPC. A lot of research has been carried out on the optimization of the layers of the protocol [3,15,24,30,33]. However, many of these optimizations involve new protocols that are incompatible with an existing standard such as the Sun RPC. 2.1 RPC Main Features The RPC protocol makes a remote procedure look like a local one. A call to a remote procedure is done transparently on the local machine, but the ....
D.B. Johnson and W. Zwaenepoel. The Peregrine high-performance RPC system. Software - Practice And Experience, 23(2):201--221, February 1993.
....identical to that for a local call. There is really no marshaling or demarshaling per se; data moves directly from source memory to destination memory without unnecessary copying or buffering. Using writable stacks to simplify demarshaling costs can be done even without the remote memory model [15], however doing so is more complex. Once the stack is ready, the client activates the server by writing a flag word in the server, for which the server polls. Call by reference is straightforward to provide through the remote read and write primitives. In this case, the references placed on the ....
David B. Johnson and Willy Zwaenepoel. The Peregrine high-performance RPC system. Software Practice and Experience, 23(2):201--221, February 1993.
....provides a fixed set of multicast protocols with no opportunity to modify or extend the multicast support. Several systems that are optimized for low latency RPC, such as active messages [28] or LRPC [3] take advantage of assumptions about the underlying communication architecture. Peregrine RPC [17] applies performance optimizations within traditional RPC semantics. These systems sacrifice modularity and extensibility in favor of performance. Thus, the communication mechanism is fast yet lacks the flexibility of more general communication models. Optimisticactive messages [16] and Thekkath ....
D. Johnson and W. Zwaenepoel. The Peregrine HighPerformance RPC System. Software: Practice and Experience, 23(2), February 1993.
....frame is identical to that for a local call. There is really no marshaling or demarshaling per se; data moves directly from source memory to destination memory without unnecessary copying or buffering. Using writable stacks to simplify demarshaling can be done even without the remote memory model [37], however doing so is more complex. Once the stack is ready, the client activates the server by writing a flag word in the server, for which the server polls. Call by reference is straightforward to provide through the remote read and write primitives. In this case, the references placed on the ....
....a simple name server application and from measurements of NFS [56] 7.1 The Trouble with RPC RPC is the predominant communication mechanism between the components of contemporary distributed systems 1 . For this reason, an enormous amount of energy has been devoted to increasing its performance [37, 58, 66, 70]. Still, RPC times are substantial compared to the raw hardware speed. While this cost is due in part to the latency of network controllers and the software protocols used for network transfer, it is due as well to the semantics of RPC. RPC performs two conceptually simple functions: ffl It ....
David B. Johnson and Willy Zwaenepoel. The Peregrine high-performance RPC system. Software -- Practice and Experience, 23(2):201--221, February 1993.
....out of) the cache. Data copying is expensive because memory access times are not scaling with processor speeds [4] In a page cache, data can be transferred directly into the cache from the disk or network by remapping the system buffer page that receives the data into the client s address space [59, 61]. In an object cache, this is more difficult; objects are typically copied into the cache. Remapping might be possible in some cases where the amount of data transferred is close to a multiple of the page size. This is made difficult, however, because not all of the objects in the buffer are ....
D. B. Johnson and W. Zwaenepoel. The Peregrine high-performance RPC system. Software --- Practice and Experience, 23(2):201--221, February 1993.
....request for service from a client to a server process is structured to give synchronization semantics at the client similar to normal procedure call. Numerous examples of different RPC services and implementations exist, including Firefly RPC [19] Alphorn [3] lightweight RPC [4] and Peregrine [14]. Among the commercial RPC packages released have been Courier from Xerox [24] Sun RPC [22] Netwise RPC from Novell Netware, and NCA from Apollo [2] On the surface, the semantics of RPC seem very simple, yet the reality is that there are subtleties and variations. For example, there are many ....
D. Johnson and W. Zwaenepoel. The Peregrine highperformance RPC system. Software Practice & Experience, 23(2):201--222, 1993.
....fundamental primitive in distributed and parallel computing; the performance of distributed systems and parallel programs is often determined by the performance of IPC primitives. Recent research has shown how to lower the cost of a round trip IPC [Bershad et al. 1989, Druschel and Peterson 1992, Johnson and Zwaenepoel 1993, Karger 1989] For example, a local RPC can be performed in 7.7 s (254 cycles on a 33 MHz 486) Yarvin et al. 1993] an RPC across an Ethernet can be performed in 340 s (8800 cycles on a 25 MHz MIPS) Thekkath and Levy 1993] and an RPC across an ATM link can be performed in 93 s (2300 ....
Johnson, D.B., and Zwaenepoel, W., "The Peregrine High-Performance RPC System," Software Practice and Experience, Vol. 23, No. 2, pp. 201-221, Feb. 1993.
....service from a client to a server process is structured to give synchronization semantics at the client similar to normal procedure call. Numerous examples of different RPC services and implementations exist, including Firefly RPC [SB90] Alphorn [AGH 91] lightweight RPC [BALL90] Peregrine [JZ93] and SUPRA RPC [Sto94] Among the commercial RPC packages released have been Courier from Xerox [Xer81] Sun RPC [Sun88] Netwise RPC from Novell Netware, NCA from Apollo [Apo89] and DCE RPC. On the surface, the semantics of RPC seem very simple, yet the reality is that there are subtleties and ....
D. Johnson and W. Zwaenepoel. The Peregrine high-performance RPC system. Software: practice & experience, 23(2):201--222, 1993.
....extrapolation is correct. Of the 22 secs required to remap another page, we found that the CPU was stalled waiting for cache fills approximately half of the time. The operation is likely to become more memory bound as the gap between CPU and memory speeds widens. Second, the Peregrine RPC system [JZ93] reduces RPC latency by remapping a single kernel page containing the request packet into the server s address space, to serve as the server thread s runtime stack. The authors report a cost of only 4 secs for this operation on a Sun 3 60. We suspect the reason for this surprisingly low number is ....
David B. Johnson and Willy Zwaenepoel. The Peregrine high-performance RPC system. Software---Practice and Experience, 23(2):201--221, February 1993.
....to take full advantage of SHRIMP s features. The result is a round trip null RPC latency of 9.5 microseconds, which is about one microsecond above the hardware minimum. 1 Introduction Much is known about how to optimize remote procedure call (RPC) mechanisms on traditional workstation networks[2, 14, 17, 21]. The main effort in previous work was to reduce or avoid copying, to make traps and context switches fast, and to take advantage of common case behavior. The emergence of new multiprocessor network interfaces opens new possibilities for constructing network software. It is not always clear, ....
....[22] approach allows an arbitrary handler to be invoked, using a fast path implementation but switching to a slower path if the handler blocks. Neither of these systems provides full RPC services, such as automatic stub generation or binding between untrusting parties. Several papers (e.g. [14, 17]) describe optimizations that dramatically improve the performance of RPC in traditional systems. This is generally done by avoiding copying, and reducing context switching overhead and network and RPC protocol overhead. 10 Discussion and conclusions Network interfaces can have a great impact on ....
D. Johnson, and W. Zwaenepoel, The Peregrine high performance RPC system. Tech. Rep. COMP TR91-152, Dept. of Computer Science, Rice Univ., 1991.
.... shared between the client and the server [5, 6] The DASH IPC mechanism used a reserved area of virtual space in processes to exchange IPC data through page remapping [67] The Peregrine IPC system used copy on write for output, copy for input, and page remapping to move data between IPC endpoints [73]. Finally, the Mach IPC system uses copy maps, as described previously. Some research has produced special purpose I O systems to work around a specific problem. For example the MMBUF system is a system that avoids data copying between the filesystem and networking layer by using a special buffer ....
W. Zwaenepoel and D. Johnson. The Peregrine high-performance RPC system. Software - Practice and Experience, 23(2):201--221, February 1993.
....3 Header Prediction Header prediction has often been suggested as a performance benefit for TCP[4] There are two distinct kinds of optimizations that are often called header prediction. The first, involving prefilling parts of the transport header, is a known optimization for lowering latency[11, 15], and is not discussed further here. The second technique involves exploiting traffic locality to predict the next incoming packet to avoid the protocol control block (PCB) lookup cost. Others have studied using traffic locality to improve throughput for bulk data transfer protocols [2, 17] we ....
David B. Johnson and Willy Zwaenepoel. The Peregrine high-performance RPC system. Software -- Practice and Experience, 23(2):201--221, February 1993.
....2 r 1 C S CS (a) Parallelism in depth m 1 1 r m 2 r 2 C S1 S2 (b) Parallelism in breadth Figure 1: Basic topologies in client server model from the perspective of optimization. on improving RPC program performance have focused on reducing latency and transmission time within this pairwise form [16, 5, 14]. When this simple topology extends to a network of client server model computing, more advanced optimization other than just efficient pairwise hooking between client and server is called for. Figure 1 shows two basic topologies to form an application of networked client server computation. These ....
D. B. Johnson and W. Zwaenepoel. The Peregrine high--performance RPC system. Journal of Software Practice and Experience, Vol. 23(2):201--221, February 1993.
No context found.
Johnson, D. B., Zwaenepoel, W.: "The Peregrine HighPerformance RPC System", Software -- Practice & Experience, vol. 23, no. 2, February 1993, pp. 201 - 221
No context found.
D. B. Johnson and W. Zwaenepoel, `The Peregrine high-performance RPC system', Software---Practice and Experience, 23, 201--221 (1993).
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC