| V. Karamcheti and A. A. Chien, "Software overhead in messaging layers: Where does the time go?" in Proc. ASPLOS VI, San Jose, CA, Oct. 1994, pp. 51--60. |
....network in use, but also on the specific timing of distributed systems. Implementing mechanisms for reliable message delivery on top of a MSA that does not provide reliability has been shown inefficient in certain cases; this type of implementation can increase communication overhead up to 200 [9]. Thus, to avoid problems associated with a large number of retransmissions, message delivery should be reliable in software DSMs. In order delivery is not necessary for systems based on the request reply model, since a node can issue a single request at a time. Efficiently guaranteeing reliable ....
V. Karamcheti and A. Chien. Software Overhead in Messaging Layers: Where Does the Time Go? In Proceedings of ASPLOS-IV, 1994.
....host CPU from the burden of communication [17] through the use of bus mastering, DMA enabled NICs. In this way, CPU has more time to spend on useful application calculations. When a (user level) process needs to access a conventional network interface, overall communication is delayed [13], since, through a system call, the OS switches to kernel level and assumes the copying of data from user areas to kernel areas for protection. Nevertheless, modern network technologies (i.e. SCI, Myrinet, etc. are mitigating this startup latency with optimized communication protocols (i.e. VIA) ....
V. Karamcheti and A. Chien. Software Overhead in Messaging Layers: Where Does the Time Go? In Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 51--60, Oct 1994.
....server load. However, none of these studies analyze the costs associated in the realization of these protocols or applications upon the VI primitives. There has been extensive work in examining protocol layer costs over other user level communication abstractions for high performance systems [9, 21, 23, 39], and from these we draw much of our methodology. Additionally, we draw upon the lessons of related highperformance network architectures [15, 31, 32] and the benchmark techniques developed by Culler et al. in [13, 14] to analyze our implementations. Specific to the VI Architecture, Speight et. ....
Vijay Karamcheti and Andrew A. Chien. Software overhead in messaging layers: where does the time go? ACM SIGPLAN Notices, 29(11):51--60, November 1994.
....When executing parallel programs on a PC cluster, TCP IP over Ethernet is used conventionally as their communication infrastructure. However, to guarantee the reliability of communication in WAN, TCP IP and Ethernet have miscellaneous overheads that are not acceptable for parallel computing[2]. That is, although PC clusters are organized in LAN or SAN environments, communication is performed by using wide area based network hardware and protocol software, and thus, are not able to achieve low latency high throughput communication needed for parallel computing. Parallel applications ....
V. Karamcheti and A. Chien, Software Overhead in Messaging Layers: Where Does the Time Go?, Proceedings of International Conference on Architectural Support of Programming Languages and Operating Systems (ASPLOS-VI), 1994.
....application context. Although the restrictions and limitations of previous interfaces [to Active Message systems] made their implementations simple and efficient, the same restrictions and limitations prevent them from supporting the broader spectrum of applications now required [Mainwaring95a] [Karamcheti94] reported instruction counts (but no timings) for Active Messages on a CM 5 (CMAM) which are roughly comparable to ours although they were measured on a SPARC processor and ours are for PA RISC. The CMAM finite sequence, multiple packet delivery protocol seems to provide functionality that ....
....on a SPARC processor and ours are for PA RISC. The CMAM finite sequence, multiple packet delivery protocol seems to provide functionality that approaches our simple datagram protocol: it does not support our group receive operations, but, like ours, it does handle out of order packet delivery. [Karamcheti94] quotes 397 instructions to do a 16 word (64 byte) unidirectional send. A send using Hamlyn s tagged remote write consumes 260 processor cycles (fewer instructions) most of which are consumed when the processor stalls while writing to the I O bus, and a receive consumes 120 cycles. Wallach95] ....
[Article contains additional citation context not shown here]
Vijay Karamcheti and Andrew A. Chien. Software overhead in messaging layers: where does the time go? Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems (San Jose, CA). ACM, October 1994.
....to the above mentioned issues, the communication libraries target a wide class of networks and computers thereby complicating network specific optimizations. The study of software costs has revealed that the software communication overhead dominates the hardware routing costs in most systems [32]. The software costs are due to the gap between the user communication requirements and the features offered by the network. The features offered by the network are arbitrary network delivery, finite buffering, and limited fault handling. The user requirements are in order delivery, end to end ....
Karamcheti, V., and Chien, A. Software overhead in messaging layers: Where does the time go? In Proceedings of ASPLOS - VI, SanJose, California (March 1994).
.... has seen tremendous efforts investigating high performance messaging layers [6, 36, 42, 43] In addition, several projects describe methods of providing a measure of classic abstractions on top of these layers [16, 37] Many of these projects have provided detailed analysis or performance models [4, 13, 26]. However, these models were always in the context of specialized communication systems, thus making comparisons to general purpose systems difficult. In a more general networking context, much work has been done to quantify the performance of existing IP protocol stacks [27, 28] or increase ....
V. Karamcheti and A. Chien. Software Overhead in Messaging Layers: Where Does the Time Go? In Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, Oct. 1994.
....like Fast Sockets [15] and Active Messages [16] it still costs 4060 micro seconds to send a message to the network. For instance, in the multi packet delivery implementation of Active Messages in the CM 5 machine, it costs 6221 instructions for sending a 1024word message in the finite sequence [18]. Since all requests are actually sent to remote hosts through the network, all the sending and receiving requests incur the communication interface overhead and will result in high network traffic. To overcome these problems, we designed and implemented an I Structure Software Cache (ISSC) 10, ....
.... 4 Conclusions Do software caches really work In this paper, we demonstrated a software implementation of IStructure cache, i.e. ISSC, can deliver performance gains for most distributed memory systems which don t have extremely fast inter node communications, such as network of workstations [3, 9, 15, 18]. ISSC caches values obtained through split phase transactions in the operation of an I Structure. It also exploit spatial data locality by clustering individual element requests into block. Our experiment results show that the inclusion of ISSC in a parallel system that provides split phase ....
V. Karamcheti and A. Chien. Software overhead in messaging layers: Where does the time go? In Proceedings of the 6th ACM International Conference on Architectural Support for Programming Languages and Systems (ASPLOS VI), Oct. 5-7, 1994.
....discussed as an opposite of in order message delivery. If all of the messages are routed adaptively, a reordering process is needed for a group of ordered messages[19] Software and hardware approaches exist to solve the problem. The software reordering process generally incurs significant cost[15]. The hardware approach reduces the software overhead, however it is relatively expensive if it requires the reordering process. Triplex router answered this problem by supporting both adaptive and oblivious routing[11] T3E separates adaptive and non adaptive messages[20] We use a similar ....
V. Karamcheti and A.A. Chien: "Software Overhead in Messaging Layers: Where Does the Time Go?", Proc. ASPLOS VI, pp.51--60 (1994).
....reduction operations [20] Each routing and switching policy is best suited for traffic with particular characteristics and performance requirements. For example, adaptive routing can reduce end to end delay, but out of order packet arrival can complicate protocol processing at the receiving node [21]. Opportunities for adaptive routing vary depending on the topology, the distance a packet must travel, and network congestion. Similarly, wormhole switching achieves low latency without requiring packet buffers, but virtual cut through and packet switching may achieve better throughput at high ....
V. Karamcheti and A. A. Chien, "Software overhead in messaging layers: Where does the time go?," in Proc. Int'l Conf. on Architectural Support for Programming Languages and Operating Systems, pp. 51--60, October 1994.
.... has seen tremendous efforts investigating high performance messaging layers [6, 36, 42, 43] In addition, several projects describe methods of providing a measure of classic abstractions on top of these layers [16, 37] Many of these projects have provided detailed analysis or performance models [4, 13, 26]. However, these models were always in the context of specialized communication systems, thus making comparisons to general purpose systems difficult. In a more general networking context, much work has been done to quantify the performance of existing IP protocol stacks [27, 28] or increase ....
V. Karamcheti and A. Chien. Software Overhead in Messaging Layers: Where Does the Time Go? In Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, Oct. 1994.
....as a parallel computing system with an excellent cost performance ratio [12, 13] Most PC clusters so far have used WAN or LAN based network devices and protocols such as Ethernet and TCP IP, due to their market availability. However, they do not always match communication patterns in clusters[5], thus deteriorating the entire performance with extra overhead incurred from each layer of (1) communication software, 2) device handlers and (3) network hardware. Performance improvements in communication software have been well studied in FM[15] PM[3] and BIP[6] Therefore, we have aimed at ....
V. Karamcheti and A. Chien. Software overhead in messaging layers: Where does the time go? In Proceedings of International Conference on Architectural Support of Programming Languages and Operating Systems (ASPLOS-VI), pages 526--531, 1994.
....has seen a tremendous e orts investigating high performance messaging layers [43, 6, 42, 36] In addition, several projects [16, 37] describe methods of providing a measure of classic abstractions on top of these layers. Many of these projects have provided detailed analysis or performance models [4, 13, 26]. However, these models were always in the context of specialized communication systems, thus making comparisons to general purpose systems dicult. In a more general networking context, much work has been done to quantify the performance of existing IP protocol stacks [27, 28] or increase their ....
Karamcheti, V., and Chien, A. Software Overhead in Messaging Layers: Where Does the Time Go? In Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems (Oct. 1994).
....does not introduce much extra delay when compared to an optimistic solution. Moreover, the optimistic scheme also introduces acknowledgement related overhead on the LCP and the network. In their investigation of the sources of software overhead in messaging layers, Karamcheti and Chien [20] found that 23 messaging layers incur significant software overheads when implementing high level features not provided by the network hardware (e.g. FIFO message ordering, deadlock safety, reliable delivery, and buffer management) We too have found that high level services incur large ....
V. Karamcheti and A.A. Chien. Software Overhead in Messaging Layers: Where Does the Time Go? In Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, California, October 1994.
....layer wouldn t need to fragment it into multiple packets, and one could imagine an implementation of the OSI stack specialized for small messages, that omits the transport layer. Indeed, the pros and cons of layered protocol architecture have become a major topic of debate in recent years [CT87, AP93, KP93, KC94, BD95]. Kenneth P. Birman Building Secure and Reliable Network Applications 34 34 Although the OSI layering is probably the best known, the notion of layering communication software is pervasive, and there are many other examples of layered architectures and layered software systems. Later in this ....
....popular as a simple but widely understood framework within which to discuss protocols. 1. 5 Additional Reading General discussion of network architectures and the ISO hierarchy: Tan88, Com91, CS91, CS93, ANSA91a, ANSA91b, ANSA89, CD90, CDK94, XTP95] Pros and Cons of layered architectures: [CT87, RST88, RST89, Ous90, AP93, KP93, KC94, BD95]. Reliable stream communication: Rit84, Jac88, Tan88, Com91, CS91, CS93, CDK94] Failure Models and Classification: Lam78b, Lam84, Ske82b, FLP85, ST87, CD90, Mar90, Cri91a, CT91, CHT92, GR93, SM94] Kenneth P. Birman Building Secure and Reliable Network Applications 44 44 2. Communication ....
[Article contains additional citation context not shown here]
Vijay Karamcheti and Andrew A. Chien. Software Overhead in Messaging Layers: Where Does the Time Go? In Proceedings of the 6th ACM Symposium on Principles of Programming Languages and Operating Systems; (San Jose, CA; Oct. 1994). ACM.
....the packet size is increased, but this has no large impact on application performance, except for Radix on RX ni M ni and RX ni M h . In QR and SOR, performance is dominated by broadcast latency and slidingwindow stalls, not by per packet overhead. 7. RELATED WORK Karamcheti and Chien [16] studied the division of protocol tasks between network hardware and host software for CM 5 Active Messages, a messaging layer similar to LCI. They argue for higher level services (ordering, reliability, flow control) in the network hardware to reduce costs in the software messaging layer. Our ....
V. Karamcheti and A. Chien. Software Overhead in Messaging Layers: Where Does the Time Go? In Proc. of the 6th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, pp. 51--60, San Jose, CA, Oct. 1994.
....load. However, all of these studies do not analyze the costs associated in the realization of these protocols or applications upon the VI primitives. There has been extensive work in examining protocol layer costs over other user level communication abstractions for high performance systems [6, 18, 20, 31], and from these we draw much of our methodology. Additionally, we use the benchmark techniques developed in [10, 11] to analyze our implementations. Specific to the VI Architecture, 22] demonstrated the feasibility of layering the distributed component object model (DCOM) protocol (essentially ....
Vijay Karamcheti and Andrew A. Chien. Software overhead in messaging layers: where does the time go? ACM SIGPLAN Notices, 29(11):51--60, November 1994.
....subsystem must deliver user to user performance at least comparable to that of MPPs. These new networks have exposed many problems with the traditional implementation of communications software; dramatic increases in hardware speed have not been translated to high application performance [13]. For example, the Myrinet network is capable of delivering about two orders of magnitude higher bandwidth than traditional Ethernet networks. The Myrinet implementation of the Berkeley socket interface [20] however, delivers only one order of magnitude higher bandwidth than the Ethernet ....
....place. In BIP and VMMC however, copies can be avoided in more cases. Reliable communication: As mentioned above, Myrinet does not provide the user with completely reliable hardware. Previous work has argued that reliable transmission is important for delivering high performance to the end user [13]. There are di#erent levels at which the communication libraries can implement reliability. Of the libraries that support reliable communication, AM II and VMMC, AM II choose to implement reliability in the library at the cost of an extra copy. VMMC avoids the copy by implementing reliable ....
V. Karamcheti and A. A. Chien. Software overhead in messaging layers: where does the time go? ACM SIGPLAN Notices, 29(11):51--60, Nov. 1994.
....newly available space. This scheme is simple and efficient, only requires a counter for each of the other nodes of the network, and has the benefit of preserving the order of the packets. Reliable and in order delivery guarantees can be expensive if implemented in the higher level messaging layers [16]. Their cost can be decreased if built directly into the lower level layer, where there is an opportunity to take advantage of some useful features of the network. Through a careful design which exploits the characteristics of the Myrinet architecture, FM offers reliable and in order guarantees ....
Vijay Karamcheti and Andrew Chien. Software overhead in messaging layers: Where does the time go? In Proceedings of the Sixth Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VI), 1994. Available from http://www-csag.cs.uiuc.edu/papers/asplos94.ps.
....in high speed networks. Their results indicate that modern distributed systems require both high throughput and low latency. As a result, support for both fast short messages and longer high bandwidth data transfers is needed. Karamcheti and Chien studied software overhead on message passing [6]. They argue that the networking hardware should provide reliable, in order message delivery (as VMMC assumes) They show that omitting these features, can more than double software overhead for message passing. The VMMC model provides similar functionality to Active Messages [12] In this ....
V. Karamcheti and A. Chien. Software overhead in messaging layers: Where does the time go? In The 6th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 51--60, October 1994.
....a complex program due to limits on static estimation analysis [34] and also due to the complex run time behaviors of the architectures that could not be easily captured. The communication overhead is especially tricky to model and the complexity of modeling its software component has been shown [35]. The effect on message passing or shared memory model on the performance characteristic of a parallel program was analyzed by Chandra and others [36] Some studies have indicated the potential of compiler analyses for performance prediction. In particular, Wall [34] discusses the relative merits ....
V. Karamcheti and A. Chien, Software Overhead in Messaging Layers: Where Does the Time Go?, Proceedings of the 6th ACM International Conference on Architectural Support for Programming Languages and Systems (ASPLOS VI), pp. 51--60.
....the network, avoiding buffering and copies. Streamed messages provide efficient gather scatter, enabling header attachment removal without data copies. FM also provides communication service guarantees. Analysis of the literature and our ongoing studies to support fine grained parallel computing [10] led to the conclusion that a low level messaging layer should provide the following key guarantees, or higher level messaging layers will suffer a performance loss: ffl Reliable delivery, ffl Ordered delivery, and ffl Control over scheduling of communication work (decoupling) 3 MSRPC on Fast ....
Vijay Karamcheti and Andrew Chien. Software overhead in messaging layers: Where does the time go? In Proceedings of the Sixth Symposium on Architectural Suppor for Programming Languages and Operating Systems (ASPLOS-VI), pages 51--60, San Jose, California, October 1994. Association for Computing Machinery. Available from http://www-csag.cs.uiuc.edu/papers/asplos94.ps.
....performance networks, we undertook empirical studies of communication layers inside parallel computers. These studies identified the key guarantees a communication layer must provide to avoid incurring large software overhead at higher levels of the system. Our study of CM 5 Active Messages (CMAM) [12] measured the dy namic instruction count of CMAM assembly code and identified the overhead contributions of the range of guarantees provided by the communication layer (inorder delivery, buffer management, fault tolerance) Because the network of the CM 5 provided none of these features, the ....
....other hand, if a mes saging layer s guarantees are too strong (i.e. they provide more functionality than is generally needed) the messaging layer s common case performance may be needlessly degraded. Analysis of the literature and our ongoing studies to support fine grained parallel com puting [5, 12, 13, 14] have led to the conclusion that a low level messaging layer should provide the following key guarantees: Reliable delivery, In order delivery, and Control over scheduling of communication work (decoupling) As mentioned in the previous section, studies of communication software costs [12] ....
[Article contains additional citation context not shown here]
V. Karamcheti and A. Chien. Software over- head in messaging layers: Where does the time go? In Proceedings of the Sixth Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VI), pages 51-60, San Jose, California, October 1994. Association for Computing Machinery. Available from http://www-csag.cs.uiuc.edu/papers/ asplos94.ps.
No context found.
V. Karamcheti and A. A. Chien, "Software overhead in messaging layers: Where does the time go?" in Proc. ASPLOS VI, San Jose, CA, Oct. 1994, pp. 51--60.
No context found.
V. Karamcheti and A.A. Chien. Software overheads in messaging layers: Where does the time go? In 6th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VI), pages 51--60, October 1994.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC