35 citations found. Retrieving documents...
Lauria, M., S. Pakin, and A. Chien (1998). Efficient Layering for High Speed Communication: Fast Messages 2.x. In Proceedings of the IEEE International Symposium on High Performance Distributed Computing.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

A Middleware Toolkit for Client-Initiated Service.. - Eisenhauer, Bustamante, .. (2000)   (3 citations)  (Correct)

....associated with event channels, sinks and sources, and PBIO uses these types to automatically handle heterogeneous data transfer issues. Building this functionality into ECho allows for efficient layering that nearly eliminates data copies during marshalling and unmarshalling. As others have noted[15], careful layering to minimize data copies is critical to delivering full network bandwidth to higher levels of software abstraction. The layering with PBIO is a key feature of ECho that makes it suitable for applications which demand high performance for large amounts of data. Base Type Handling ....

Mario Lauria, Scott Pakin, and Andrew A. Chien. Efficient layering for high speed communication: Fast messages 2.x. In Proceedings of the 7th High Performance Distributed Computing (HPDC7), July 1998.


Event Services for High Performance Computing - Eisenhauer, Bustamante, Schwan (2000)   (6 citations)  (Correct)

....to be associated with event channels, sinks and sources and will automatically handle heterogeneous data transfer issues. Building this functionality into the ECho using PBIO allows for efficient layering that nearly eliminates data copies during marshalling and unmarshalling. As others have noted[15], careful layering to minimize data copies is critical to delivering full network bandwidth to higher levels of software abstraction. The layering with PBIO is a key feature of ECho that makes it suitable for applications which demand high performance for large amounts of data. Base Type Handling ....

....the native data representation. At common 100Mbps network speeds, these additional data copy operations are account for a relatively small fraction of the total exchange costs. However, minimizing data copies is critical to delivering full network bandwidth to higher levels of software abstraction[15]. As gigabit networks and specialized low latency communications mechanisms come into more common use, the additional copy operations imposed on even homogeneous communications by fixed wire formats will become a more important limitation on communication speeds, increasing ECho s performance ....

M. Lauria, S. Pakin, and A. A. Chien. Efficient layering for high speed communication: Fast messages 2.x. In Proceedings of the 7th High Performance Distributed Computing (HPDC7), July 1998.


Native Data Representation: An Efficient Wire Format.. - Eisenhauer.. (2002)   (Correct)

....and unpack operations. However, relegating these tasks to the communicating applications means that the communicating components must agree on the format of messages. In addition, the semantics of application side pack unpack operations generally imply a costly data copy to or from message buffers [16, 10]. Other packages, such as MPI, support the creation of user defined data types for messages and fields and provide some marshalling and unmarshalling support for them. Although this provide some level of flexibility, MPI does not have any mechanisms for run time discovery of data types of unknown ....

M. Lauria, S. Pakin, and A. A. Chien, "Efficient layering for high speed communication: Fast messages 2.x," in Proceedings of the 7th High Perfor- mance Distributed Computing (HPDC-7), July 1998.


Contention-Aware Communication Schedule For High-Speed.. - Tam, Wang   (Correct)

....of our implementation on a 32 node cluster with various network configurations are examined and reported in this paper. 1 Introduction The performance problem related to the communication software has been an active research issue for the past decade. Currently, lightweight messaging systems [7, 18, 19, 25, 32, 35, 36] offer the best communication performance, as they create a fast communication path that bypasses the traditional in kernel messaging protocol stack (e.g. TCP IP) which is a serious obstacle in exploiting the high performance of modern network [1, 15, 16] Although most of these lightweight ....

M. Lauria, S. Pakin, and A. Chien. Efficient layering for high speed communication: Fast messages 2.x. In Proceedings of the 7th High Performance Distributed Computing Conference (HPDC7), July 1998.


ENSEMBLE: A Communication Layer for Embedded.. - Cadot, Kuijlman..   (Correct)

....use of these primitives is often the task of the programmer, whereas we use a compiler based approach. When sending long messages, manual fragmentation and assembly to implement a pipeline is cumbersome. Therefore, thin message passing layers for fast networks like Fast Messages for Myrinet [7], provide a streaming interface where a message may be presented as many small parts; each part is written into the stream using a separate call. Although this efficiently supports scatter gather message vectors used in many protocol stacks, the function call overhead is prohibitive for sending ....

M. Lauria, S. Pakin, and A. Chien. Efficient layering for high speed communication: Fast Messages 2.x. In 7th High Perf. Distributed Computing Conf. (HPDC7), Chicago, Illinois, July 1998.


Comparing the Communication Performance and Scalability of .. - Luecke, Raffin, Coyle   (5 citations)  (Correct)

....a 512 1024 byte second level cache for both data and instructions. Nodes were interconnected via a Myrinet network. The system was running Windows NT Server 4.0 and HPVM 1. 1 for Myrinet clustered compute nodes developed by the Concurrent Systems Architecture Group at the University of Illinois [11, 12]. For all tests, version 5.0 DIGITAL Fortran compiler with the optimize:3 compiler option and the MPI FM library from HPVM 1.1 were used. The average performance from the data in figure 3 for sending a message from one processor to another was 27:5 microseconds for an 8 byte message and 49:7 ....

M. Lauria, S. Pakin, and A. A. Chien. Efficient layering for high speed communication: Fast messages 2.x. In Proceedings of the 7th High Performance Distributed Computing conference (HPDC7), Chicago, USA, July 1998.


Event Services in High Performance Systems - Eisenhauer, Bustamante, Schwan (2001)   (1 citation)  (Correct)

....to be associated with event channels, sinks and sources and will automatically handle heterogeneous data transfer issues. Building this functionality into ECho using PBIO allows for efficient layering that nearly eliminates data copies during marshalling and unmarshalling. As others have noted[18], careful layering to minimize data copies is critical to delivering full network bandwidth to higher levels of software abstraction. The layering with PBIO is a key feature of ECho that makes it suitable for applications which demand high performance for large amounts of data. Base Type Handling ....

....native data representation. At common 100Mbps network speeds, these additional data copy operations are account for a relatively small fraction of the total exchange costs. However, minimizing data copies is critical to delivering full network bandwidth to higher levels of software abstraction [18]. As gigabit networks and specialized lowlatency communications mechanisms come into more common use, the additional copy operations imposed on even homogeneous communications by fixed wire formats will become a more important limitation on communication speeds, increasing ECho s performance ....

Mario Lauria, Scott Pakin, and Andrew A. Chien. Efficient layering for high speed communication: Fast messages 2.x. In Proceedings of the 7th High Performance Distributed Computing (HPDC7), July 1998.


Design and Evaluation of Communication Latency Hiding/Reduction.. - Afsahi (2000)   (Correct)

....destination without any intermediate buffering. However, applications at the send side do not know the final receive buffer addresses and, hence, the communication subsystems at the receiving end still copy messages at a temporary buffer. Several research groups have tried to avoid memory copying [79, 14, 106, 119, 118]. They have been able to remove the extra memory copying operations between the application user buffer space and the network interface at the send side. However, they haven t been able to remove the memory copying at the receiver sides. They may achieve a zerocopy messaging at the receiver sides ....

....hence, the communication subsystems at the receiving end still copy messages unnecessarily from the network interface to a system buffer, and then from the system buffer to the user buffer when the receiving application posts the receive call. Some researchers have tried to avoid memory copying [48, 79, 106, 14, 119, 118]. While they have been able to remove the memory copying between the application buffer space and the network interface at the send side by using user level messaging techniques, they haven t been able to remove the memory copying at the receiver sides completely. They may achieve a zero copy ....

[Article contains additional citation context not shown here]

M. Lauria, S. Pakin and A. A. Chien, "Efficient Layering for High Speed Communication: Fast Messages 2.x", Proceedings of the 7th High Performance Distributed Computing (HPDC7) Conference, 1998.


Efficient Wire Formats for High Performance Computing - Bustamante, Eisenhauer.. (2000)   (4 citations)  (Correct)

.... and receiver have the same natural data representation, such as in exchanges between homogeneous architectures, this approach allows received data to be used directly from the message buffer, making it feasible for middleware to effectively utilize high performance communication layers like FM [13] or the zero copy messaging demonstrated by Rosu et al. 17] and Welsh et al. 19] When conversion between formats is necessary, these DCG conversions are of the same order of efficiency as the compile time generated stub routines used by the fastest systems relying upon a priori agreements[14] ....

....order to change message formats can be a significant impediment to the integration, deployment and evolution of complex systems. In addition, the semantics of application side pack unpack operations generally imply a data copy to or from message buffers, with a significant impact on performance [13, 17]. Packages which perform internal marshalling, such as MPI, could avoid data copies and offer more flexible semantics in matching fields provided by senders and receivers. However, existing packages have failed to capitalize on those opportunities. For example, MPIs type matching rules require ....

M. Lauria, S. Pakin, and A. A. Chien. Efficient layering for high speed communication: Fast messages 2.x. In Proceedings of the 7th High Performance Distributed Computing (HPDC7), July 1998.


Design and Performance of Maestro Cluster Network - Yamagiwa, Fukuda, Wada (2000)   (Correct)

....on the other hand form an aggregation as a routing unit, and therefore, each packet need not include full routing information. Furthermore, in TCP, the host machine performs message fragmentation as triggered by I O interrupts, which thus incur more communication overhead. FM2.0 with Myrinet [7] realizes pipelined message transfer by providing a user with library functions that divide a message into packets and transfer each packet one by one. However, Myrinet needs to temporarily store packets in its SRAM. Due to this reason, no matter how small packet the host sends actually, it must ....

Mario Lauria et al. Efficient layering for high speed communication: Fast messages 2.x. In Proceedings of the 7th High Performance Distributed Computing (HPDC7) conference, Chicago, Illinois, 1998.


MPI derived data types support in VIRTUS - Cristaldi, Iannello   (Correct)

....of system software to improve the integration between the basic components of clusters. A first step in this regard was the development of low level communication libraries capable to deliver to the applications the performance of modern Gigabit LANs: Active Messages (AM) 5] Fast Messages (FM) [12], U Net [6] VMMC 2 [4] PM [14] A second step was to integrate low level libraries into higher level, standard communication interfaces to provide programming environments on clusters similar to those available on commercial MPP machines. For instance, in the HPVM project the full bandwidth of ....

....method implies the processor is busy during the whole transfer, on modern I O buses (PCI) it may achieve the same bandwidth and lower latency than DMA. As an additional advantage it does not require data to be stored in a pinned down buffer, leading to true 0 copy protocols. Fast Messages (FM) [12] is a low level, high performance communication library characterized by an accurate choice of the services to be provided at the library interface. By providing a few key services buffer management, reliable and in order delivery the FM programming interface allows for a leaner, more ....

M. Lauria, S. Pakin, and A.A. Chien, "Efficient Layering for High Speed Communication: Fast Messages 2.x", Procs. of the 7th High Performance Distributed Computing Conference (HPDC7), Chicago, Illinois, July 28-31, 1998.


A scalable flow control algorithm for the Fast.. - Canonico, Cristaldi.. (1999)   (Correct)

....to the applications has been recognized as a key factor for the success of cluster architectures. A number of research projects have been started to study the design of high performance communication software for high speed, low latency networks: Active Messages (AM) 2] Fast Messages (FM) [10, 11], U Net [3] VMMC 2 [1] BIP [12] A distinguishing aspect of the FM project is an accurate choice of the services to be provided at the library interface. Relying on the favorable characteristics of the interconnection network used in the project, the Myrinet LAN [4] FM designers decided to ....

....4 we present experimental data confirming the effectiveness of the proposed approach and pointing out its practical limitations. In section 5 we report about related work, and in section 6 we conclude the paper. 3 2 Credit based flow control in FM According to the Fast Messages programming model [10], the parallel system consists of n nodes each running at most P independent processes (contexts in FM terminology) Messages can be sent to any process and they have an associated handler function, which is invoked on message reception as in the Active Messages model [2] Message reception is ....

[Article contains additional citation context not shown here]

M. Lauria, S. Pakin, and A.A. Chien. "Efficient Layering for High Speed Communication: Fast Messages 2.x", Procs. of the 7th High Performance Distributed Computing Conference (HPDC7), Chicago, Illinois, July 28-31, 1998.


Transparent Network Connectivity in Dynamic Cluster Environments - Xiaodong Fu Hua   (Correct)

....for customizing application use of underlying resources. To assess the complexity and performance overheads of the network connectivity layer, we describe its implementation in the concrete context of two environments an Ethernet LAN on top of TCP IP, and a Myrinet LAN on top of Fast Messages [11, 9]. Our results show that the layer incurs minimal overheads and can effectively select the best substrate for implementing application communication requirements. The rest of this extended abstract is organized as follows. Section 2 presents relevant background and related approaches for ....

Lauria, M., Pakin, S., and Chien, A.A. Efficient layering for high speed communication: Fast Message 2.x. In Proc. of the 7th High Performance Distributed Computing (HPDC7) conf., 1998.


Applying Patterns to Develop a Pluggable Protocols .. - Schmidt, O'Ryan.. (2000)   (1 citation)  (Correct)

.... I O, shared memory pools, periodic I O, and interface pooling, 4) enhancement of underlying communications protocols, e.g. provision of a reliable byte stream protocol over ATM, and (5) tight coupling between the ORB and efficient user space protocol implementations, such as Fast Messages [18]. 3.3.3 ORB Policy Control Component It is not possible to determine aprioriall attributes defined by all protocols. Therefore, TAO s pluggable protocols framework provides an extensible policy control component, which implements the QoS framework defined in the CORBA Messaging [16] and ....

....that is tuned for high performance and real time application requirements. For example, TAO s pluggable protocols framework can be integrated with zero copy high speed network interfaces [23, 45, 20, 9] embedded systems [8] or high performance communication infrastructures like Fast Messages [18]. 7 Concluding Remarks To be an effective development platform for performancesensitive applications, CORBA middleware must preserve end to end application QoS properties across the communication layer. It is essential, therefore, to define a pluggable protocols framework that allows custom ....

M. Lauria, S. Pakin, and A. Chien, "Efficient Layering for High Speed Communication: Fast Messages 2.x.," in Proceedings of the 7th High Performance Distributed Computing (HPDC7) conference, (Chicago, Illinois), July 1998.


A Middleware Toolkit for Client-Initiated Service.. - Eisenhauer, Bustamante, .. (2000)   (3 citations)  (Correct)

....associated with event channels, sinks and sources, and PBIO uses these types to automatically handle heterogeneous data transfer issues. Building this functionality into ECho allows for efficient layering that nearly eliminates data copies during marshalling and unmarshalling. As others have noted[15], careful layering to minimize data copies is critical to delivering full network bandwidth to higher 5 levels of software abstraction. The layering with PBIO is a key feature of ECho that makes it suitable for applications which demand high performance for large amounts of data. Base Type ....

Mario Lauria, Scott Pakin, and Andrew A. Chien. Efficient layering for high speed communication: Fast messages 2.x. In Proceedings of the 7th High Performance Distributed Computing (HPDC7), July 1998.


Event Services for High Performance Computing - Eisenhauer, Bustamante, Schwan (2000)   (6 citations)  (Correct)

....to be associated with event channels, sinks and sources and will automatically handle heterogeneous data transfer issues. Building this functionality into the ECho using PBIO allows for efficient layering that nearly eliminates data copies during marshalling and unmarshalling. As others have noted[15], careful layering to minimize data copies is critical to delivering full network bandwidth to higher levels of software abstraction. The layering with PBIO is a key feature of ECho that makes it suitable for applications which demand high performance for large amounts of data. Base Type Handling ....

Mario Lauria, Scott Pakin, and Andrew A. Chien. Efficient layering for high speed communication: Fast messages 2.x. In Proceedings of the 7th High Performance Distributed Computing (HPDC7), July 1998.


Fast Heterogeneous Binary Data Interchange for.. - Plale, Eisenhauer.. (2000)   (Correct)

....components in order to change message formats can be a significant impediment to the integration, deployment and evolution of complex systems. In addition, the semantics of application side pack unpack operations generally imply a data copy to or from message buffers. Such copies are known[16, 23] to have a significant impact on communication system performance. Packages which can perform internal marshalling, such as MPI, have an opportunity to avoid data copies and to offer more flexible semantics in matching fields provided by senders and receivers. However, existing packages have ....

Mario Lauria, Scott Pakin, and Andrew A. Chien. Efficient layering for high speed communication: Fast messages 2.x. In Proceedings of the 7th High Performance Distributed Computing (HPDC7), July 1998.


Efficient Wire Formats for High Performance Computing - Bustamante, Eisenhauer.. (2000)   (4 citations)  (Correct)

.... and receiver have the same natural data representation, such as in exchanges between homogeneous architectures, this approach allows received data to be used directly from the message buffer, making it feasible for middleware to effectively utilize high performance communication layers like FM [11] or the zero copy messaging demonstrated in [15, 17] When conversion between formats is necessary, these DCG conversions are of the same order of efficiency as the compile time generated stub routines used by the fastest systems relying upon a priori agreements[12] However, because the ....

....order to change message formats can be a significant 2 impediment to the integration, deployment and evolution of complex systems. In addition, the semantics of application side pack unpack operations generally imply a data copy to or from message buffers with a significant impact on performance [11, 15]. Packages which perform internal marshalling, such as MPI, could avoid data copies and offer more flexible semantics in matching fields provided by senders and receivers. However, existing packages have failed to capitalize on those opportunities. For example, MPIs type matching rules require ....

Mario Lauria, Scott Pakin, and Andrew A. Chien. Efficient layering for high speed communication: Fast messages 2.x. In Proceedings of the 7th High Performance Distributed Computing (HPDC7), July 1998.


Efficient Communication Using Message Prediction for.. - Afsahi, Dimopoulos (1999)   (Correct)

....hence, the communication subsystems at the receiving end still copy messages unnecessarily from the network interface to a system buffer, and then from the system buffer to the user buffer when the receiving application posts the receive call. Some researchers have tried to avoid memory copying [16, 24, 30, 5, 34, 33]. While they have been able to remove the memory copying between the application buffer space and the network interface at the sender side by using user level messaging techniques, they haven t been able to remove the memory copying at the receiver sides completely. They may achieve a zero copy ....

....posting to avoid the message copy in the fast socket buffer. If the message handler knows that the data s final memory destination is already known upon message arrival the message is directly moved to the application user space. Otherwise, it has to be copied into the fast socket buffer. FM 2. x [24] uses a similar approach as fast sockets, namely layer interleaving. FM collaborates with the handler to direct the incoming messages into the destination buffer if the receive call has already been posted. Figure 1. Data transfers in a traditional messaging layer Network Send Process Receive ....

M. Lauria, S. Pakin and A. A. Chien, "Efficient Layering for High Speed Communication: Fast Messages 2.x", Proceedings of the 7th High Performance Distributed Computing, HPDC7, Conference, 1998.


Fast Heterogeneous Binary Data Interchange - Eisenhauer, Daley (2000)   (4 citations)  (Correct)

....application components in order to change message formats can be a significant impediment to the integration, deployment and evolution of complex systems. In addition, the semantics of application side pack unpack operations generally imply a data copy to from message buffers. Such copies are known[8, 11] to have a significant impact on communication system performance. This paper describes PBIO (Portable Binary Input Output) 3] a multi purpose communication middleware. In developing PBIO we have not attempted to recreate various higher level communication abstractions offered by MPI or by the ....

Mario Lauria, Scott Pakin, and Andrew A. Chien. Efficient layering for high speed communication: Fast messages 2.x. In Proceedings of the 7th High Performance Distributed Computing (HPDC7), July 1998.


Applying Patterns to Design a High-performance.. - O'Ryan, Kuhns.. (1999)   (Correct)

.... I O, shared memory pools, periodic I O, and interface pooling, 4) enhancement of underlying communications protocols, e.g. provision of a reliable byte stream protocol over ATM, and (5) tight coupling between the ORB and efficient user space protocol implementations, such as Fast Messages [21]. 4.1.3 ORB Policy Control Component This component allows applications to control the QoS attributes of configured ORB transport protocols explicitly. It is not possible to determine a priori all attributes defined by all protocols. Therefore, TAO s pluggable protocols framework provides an ....

....that is tuned for high performance and real time application requirements. For example, TAO s pluggable protocols framework can be integrated with zero copy high speed network interfaces [29, 39, 7, 17] embedded systems [12] or high performance communication infrastructures like Fast Messages [21]. 7 Concluding Remarks To be an effective development platform for performancesensitive applications, OO middleware must preserve communication layer QoS properties of applications end to end. It is essential, therefore, to define a pluggable protocols framework that allows custom inter ORB ....

[Article contains additional citation context not shown here]

M. Lauria, S. Pakin, and A. Chien, "Efficient Layering for High Speed Communication: Fast Messages 2.x.," in Proceedings of the 7th High Performance Distributed Computing (HPDC7) conference, (Chicago, Illinois), July 1998.


Porting MPICH ADI on GAMMA with Flow Control - Chiola, Ciaccio   (1 citation)  (Correct)

....error rate exhibited by 100base T star topologies adopting standard class 5 UTP cabling. The idea of associating acknowledgement packets to each data packet was therefore discarded. A credit based protocol was instead chosen for flow control purposes, similar to the one already adopted in FM2 [14]. Let us assume a 100 way GAMMA cluster, where each NIC may access a receive queue counting 1024 entries (1024 is a good choice, providing storage for 1024 full size incoming Ethernet packets with an acceptable memory occupancy of 1.5 MByte in kernel space) After setting the size of the NIC ....

M. Lauria, S. Pakin, and A. Chien. Efficient Layering for High Speed Communication: Fast Messages 2.x. In Proc. of the Seventh IEEE Int'l Symp. on High Performance Distributed Computing (HPDC-7), Chigago, Illinois, July 1998.


Design and Evaluation of an HPVM-based Windows NT.. - Chien, Lauria.. (1999)   (5 citations)  Self-citation (Lauria Pakin Chien)   (Correct)

....in other research projects exploring high performance communication. The implementation of the higher level APIs on top of FM (MPI [26] Global Arrays, Shmem [20] turned into a study on efficient software layering, which resulted in 90 of the FM performance being delivered to the applications [27]. FM has been used as a testbed for research on novel mechanisms for coscheduling on network of workstations [36] and on the introduction of QoS guarantees in a wormhole routing interconnect [11] HPVM is available for download on our web site at http: www csag.ucsd.edu. 3 The Machine: HPVM ....

....to deliver high bandwidth (92MB s) with low overhead (a few seconds) and low latency ( 9s) achieving a half power message size of 250 bytes while providing reliable and in order delivery, and flow control. HPVM also includes efficient implementations of standard scientific computing APIs (MPI [26, 27] Global Arrays, Shmem [20] atop FM. Performance highlights include 13.3 s min latency and 84.2 MB s peak bandwidth for MPI FM, 14.4 s min latency and 76 MB s peak bandwidth for a Shmem s shmem put operation. These APIs enable the easy porting and the high performance required to integrate the ....

Mario Lauria, Scott Pakin, and A. A. Chien. Efficient layering for high speed communication: Fast Messages 2.x. In Proceedings of High-Performance Distributed Computing Conference, 1998. Available from http://www-csag.cs.uiuc.edu/papers/hpdc7-lauria.ps.


Sandia Report - Unlimited Release Printed   (Correct)

No context found.

Lauria, M., S. Pakin, and A. Chien (1998). Efficient Layering for High Speed Communication: Fast Messages 2.x. In Proceedings of the IEEE International Symposium on High Performance Distributed Computing.


High-Speed I/O: The Operating System as a Signalling Mechanism - Burnside, Keromytis (2003)   (Correct)

No context found.

M. Lauria, S. Pakin, and A. A. Chien. Efficient Layering for High Speed Communication: Fast Messages 2.x. In Proceedings of the 7th IEEE Symposium on High Performance Distributed Computing (HPDC), July 1998.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC