22 citations found. Retrieving documents...
Yocum, K. G., Chase, J. S., Gallatin, A. J., and Lebeck, A. R. (1997). Cut-through delivery in trapeze: An exercise in low-latency messaging. In Proc. of the Sixth IEEE International Symposium on High Performance Distributed Computing.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Building Firewalls with Intelligent Network Interface Cards - Friedman, Nagle (2001)   (6 citations)  (Correct)

....on TCP wrappers [39] which also runs on a host, uses pattern based access control to network services. Previous work on intelligent I O cards and NICs has focused on performance bene ts [3] Much of this work has used Myrinet NICs which have a low performance LanAI processor and little memory [8, 16, 40]. The SPINE project [15] described o oading multimedia functionality to a processor on a NIC, although the focus is on safe code execution. The Auspex NFS server [18] handles NFS caching and communication on its Ethernet processors. The VMP network adapter board [22] discusses the possiblity of ....

Kenneth G. Yocum, Je G. Chase, Andrew J. Gallatin, and Alvin R. Lebeck. Cut-through delivery in trapeze: an exercise in low-latency messaging. IEEE International Symposium on High-Performance Distributed Computing (Portland, OR, 5-8 August 1996.


User-Level Communication in Cluster-Based Servers - Carrera, Rao, Iftode, Bianchini (2002)   (15 citations)  (Correct)

....way, the communicating parties can avoid intermediate copies of data, interrupts, and context switches in the critical path. The result of these optimizations is communication latency andbandwidth that approachthe hardware limits. There has been extensive research in user level communication, e.g. [35, 17, 27, 34, 7, 6, 36, 1, 16]. This research led to the proposal of the VIA [15, 13] industry standard by a group of leading computer companies. In VIA, each communicating process can open directly accessible interfaces to the network hardware. Each interface, called Virtual Interface (VI) represents a communication ....

....large (96node) clusters. 2.3 Related Work The main contribution of this paper is to quantify the performance impact of user level communication and several messaging characteristics on cluster based servers. Previous studies of user level communication were performed either for microbenchmarks [6, 35, 27, 36, 15, 10, 8, 32] or in the context of scientific applications [28, 23, 33] This paper extends those studies to real, non scientific applications, in particular to the now very popular cluster based servers. As far as we are aware, in only one other paper [22] has the performance of a server been evaluated as a ....

K. G. Yocum, J. S. Chase, A. J. Gallatin, and A. R. Lebeck. Cut-Through Delivery in Trapeze: An Exercise in Low Latency Messaging. In Proceedings of the IEEE Symposium on High-Performance Distributed Computing, Portland, OR, August 1997.


Quantifying the Impact of Architectural Scaling on.. - Heath, Kaur, Martin.. (2001)   (1 citation)  (Correct)

....advances that would result in substantial improvements to messaging performance. 1. Introduction This work considers the impact of expected architectural trends on messaging performance. In recent years, much research has focused on the design and implementation of specialized messaging systems [6, 36, 42, 45], reducing the fixed cost of sending and receiving messages to tens of instructions and a handful of bus operations. Increased performance, however, is typically gained only by trading off connectivity the variety and number of potential entities a given messaging system can send to and receive ....

K. G. Yocum, J. S. Chase, A. J. Gallatin, and A. R. Lebeck. Cut-Through Delivery in Trapeze: An Exercise in Low Latency Messaging. In Symposium on High-Performance Distributed Computing (HPDC), Portland, OR, Aug. 1997. 15


Quantifying the Impact of Architectural Scaling on.. - Heath, Kaur, Martin.. (2001)   (1 citation)  (Correct)

....advances that would result in substantial improvements to messaging performance. 1 Introduction This work considers the impact of expected architectural trends on messaging performance. In recent years, much research has focused on the design and implementation of specialized messaging systems [6, 36, 42, 45], reducing the fixed cost of sending and receiving messages to tens of instructions and a handful of bus operations. Increased performance, however, is typically gained only by trading off connectivity the variety and number of potential entities a given messaging system can send to and receive ....

K. G. Yocum, J. S. Chase, A. J. Gallatin, and A. R. Lebeck. Cut-Through Delivery in Trapeze: An Exercise in Low Latency Messaging. In Symposium on High-Performance Distributed Computing (HPDC), Portland, OR, Aug. 1997. 11


Quantifying the Impact of Architectural Scaling on.. - Heath, Kaur, Martin.. (2001)   (1 citation)  (Correct)

....advances that would result in substantial improvements to messaging performance. 1 Introduction This work considers the impact of expected architectural trends on messaging performance. In recent years, much research has focused on the design and implementation of specialized messaging systems [6, 36, 42, 45], reducing the xed cost of sending and receiving messages to tens of instructions and a handful of bus operations. Increased performance, however, is typically gained only by trading o connectivity the variety and number of potential entities a given messaging system can send to and receive ....

Yocum, K. G., Chase, J. S., Gallatin, A. J., and Lebeck, A. R. Cut-Through Delivery in Trapeze: An Exercise in Low Latency Messaging. In Symposium on High-Performance Distributed Computing (HPDC) (Portland, OR, Aug. 1997). 20


Software Distributed Shared Memory over Virtual Interface.. - And   (Correct)

.... on ideas similar to that of U Net [12] virtual interfaces to the network from application device channels [8] and Virtual Memory Mapped Communication (VMMC) 9] Other research that discuss user level direct access to the network interface are FM [27] AM [11] Hamlyn [34] PM [33] and Trapeze [36]. Prototype implementations of the VI Architecture have been developed on Myrinet, and 100 Mb s Ethernet. M VIA [25] is a software emulation of VIA over various network interface cards including Ethernet cards. Berkeley VIA [4] is an implementation of VIA over Myrinet. A performance study of VIA ....

K. Yocum. Cut-through delivery in Trapeze: An exercise in low-latency messaging. In Proceedings of the International Symposium on High Performance Distributed Computing, pages 243-252, 1997.


Realizing the Performance Potential of the Virtual.. - Speight, Abdel-Shafi, .. (1999)   (15 citations)  (Correct)

....of a user level handler that is executed upon message arrival with the message body as an argument. This allows the programmer and compiler to overlap communication and computation, thereby hiding latency. Other work in the area of user level communication includes PM [25] LFC [5] Trapeze [28], and BIP [20] Several extant systems give the network interface access to virtual to physical address translation capability in order to facilitate user level networking. In StarT [3] FLASH [17] and Typhoon [21] the network interface shares the TLB with the host processor. The Meiko CS 2 ....

K. Yocum, "Cut-through delivery in Trapeze: An exercise in low-latency messaging," presented at Proceedings of the International Symposium on High Performance Distributed Computing, pp. 243-252, 1997.


Performance Monitoring in a Myrinet-Connected Shrimp Cluster - Liao, Martonosi, Clark (1998)   (11 citations)  (Correct)

....these projects have studied application performance on their clusters, we know of no publications specifically describing performance tools for them, nor any using our firmware based approach. Other work has used only microbenchmark and statistical methods [7] or high level software measurements [28]. There are some other research projects on other programmable network interfaces [11, 24, 26] They also study the placement of functionality between the host and the network interface. However, the primary focus of these projects is on reducing the software overhead of communication to achieve ....

K. Yocum, J. Chase, et al. Cut-through delivery in Trapeze: An exercise in low-latency messaging. In Proc. 6th IEEE Intl. Symposium on High Performance Distributed Computing, Aug. 1997.


A Survey Of Messaging Software Issues And Systems.. - Weber, Santos..   (Correct)

....have recently been proposed and implemented for the Myrinet. We overview 9 of these MSAs and their various versions for a total of 13 systems: AM [7] and AM II [15] FM [13] and FM 2.x [11] U Net [1] BIP [14] PM [17] and PM 1. 2 [18] VMMC [6] and VMMC 2 [5] BDM [9] MyriAPI [10] and Trapeze [21]. These MSAs differ substantially in terms of performance, allowing us to illustrate the impact of their design choices. The remainder of this paper is organized as follows. The next section describes the Myrinet hardware. Section 3 discusses the main issues involved in MSAs. Section 4 overviews ....

....performed by the kernel when a send or redirect command is posted and the buffer is not present in the UTLB. Since the UTLB is placed in host memory and transfers are effected by the NI DMA, the NI keeps a cache of each UTLB in its memory to speed up the UTLB lookup task. 4.7. Trapeze. Trapeze [21] was designed to support low latency transfer of virtual memory pages by the operating system kernel. Each Trapeze message is composed by a 128 byte control message and an optional payload of up to 8 Kbytes (the page size) The control data is transferred by the host processor itself. The payload ....

K. Yocum, J. Chase, A. Gallatin, and A. Lebeck. Cut-Through Delivery in Trapeze: An Exercise in Low Latency Messaging. In Proceedings of the 6th High Performance Distributed Computing Conference, August 1997.


PARNASS: Porting Gigabit-LAN components to a workstation cluster - Griebel, Zumbusch (1997)   (2 citations)  (Correct)

....special memory and memory mapping OS calls is avoided. BullDog does not require OS calls for message passing. The system is not secure and is not multitasking aware, because the user process controls the NIC. The syntax is very similar to some MPI primitives. The Trapeze message passing system [YCGL97] has been designed mainly for the fast transfer of memory pages. It is used in the implementation of a global memory management service, which extends an OS by a remote paging and a cooperative caching service. Such services may substitute full virtual shared memory support for a parallel ....

K. G. Yocum, J. S. Chase, A. J. Gallatin, and A. R. Lebeck. Cut-through delivery in Trapeze: An exercise in low-latency messaging. Technical report, Dept. of Computer Science, Duke Univ., Durham, NC, 1997.


Design Issues for User-Level Network Interface Protocols on.. - Bhoedjang, Rühl, Bal (1998)   (2 citations)  (Correct)

....interrupts and polling. Interrupts usually can be enabled or disabled by the receiver; sometimes the sender can also set a flag in each packet that determines whether an interrupt is to be generated when the packet arrives. To reduce the number of interrupts, LFC has implemented a polling watchdog [9] on the NI. This mechanism starts a timer when a message U Net Trapeze Hamlyn FM FM MC VMMC LFC yes no Assume Myrinet is reliable AM II VMMC 2 Locus of flow control PM BIP application no (unreliable interface) yes recovery host NI Reliability protocol Reliability strategy prevent buffer ....

....the programmableNI processor. The latter capability gives NIs much flexibility, which often compensates for the lack of hardware support present in the more advanced interfaces used by Massively Parallel Processors (MPPs) We have seen several examples in the paper: ffl The polling watchdog device [9] can easily be implemented in the NI control program, as shown by LFC. ffl Address translation can be implemented in software on the NI and the host (OS) giving comparable functionality as address translation hardware of machines like the Meiko CS2. ffl Programmable NIs enable efficient software ....

[Article contains additional citation context not shown here]

K. Yocum, J. Chase, A. Gallatin, and A. Lebeck. Cut-Through Delivery in Trapeze: An Exercise in Low-Latency Messaging. In The 6th Int. Symp. on High Performance Distributed Computing, Portland, OR, August 1997.


Modeling Communication Pipeline Latency - Wang, Krishnamurthy, Martin.. (1998)   (8 citations)  (Correct)

....AN2 network and DEC Alpha workstations [6] Using the GMS pipeline parameters derived from that work (Table 3) we were able to confirm its optimal fragment size by applying the Fixed sized Theorem. In Trapeze, the network interface implements pipelining in a manner that is transparent to the host [17]. Although successful for the specific packet size and configurations studied, these systems do not provide a general solution. stage i g i ( s) G i ( s KB) Srv DMA 2.1 25.6 Wire 4.0 60.1 Req DMA 2.1 25.6 Req CPU 92.8 26.2 Table 3: GMS pipeline parameters. The bottleneck shifts from the Req CPU ....

Yocum, K. G., Chase, J. S., Gallatin, A. J., and Lebeck, A. R. Cut-through delivery in trapeze: An exercise in low-latency messaging. In Proc. of the Sixth IEEE International Symposium on High Performance Distributed Computing (August 1997).


Experiences with Fast Forwarding on Myrinet - Yocum, Chase   Self-citation (Yocum Chase)   (Correct)

....checks, a complete discussion is beyond the scope of this paper, and the following assumes that only page size pagealigned buffers are cached on the NIC. 3. 1 The Trapeze Payload Cache Trapeze supports simple header splitting by sending fixed size control messages with optional attached payloads [11]; with the payload clearly separated from the control message, the payload cache implementation is simple. The control information is received and sent normally; packet headers fit in the control message and are easily modified. The control message s attached payload is cached on the NIC. The ....

....host based forwarding latency, by removing a transfer across the I O interconnect. Figure 8 shows the effect of removing an I O while using a LANai 9 to forward between two LANai 7s on Trapeze. While cut through delivery is used to minimize latency for the point to point transfers in Trapeze [11], the forwarder imposes a significant store andforward delay. Payload caching reduces this forwarding delay by 15 to 20 . 200 300 400 500 600 700 Microseconds payload caching forwarding Figure 8. User level forwarding latency with Trapeze using the LANai 9 on a 32 bit 33 Mhz PCI ....

Kenneth G. Yocum, Jeffrey S. Chase, Andrew J. Gallatin, and Alvin R. Lebeck. Cut-through delivery in Trapeze: An exercise in low-latency messaging. In Sixth IEEE International Symposium on High Performance Distributed Computing (HPDC-6), pages 243-- 252, August 1997. 8


Statement of Research and Teaching Progress - Chase (2001)   Self-citation (Chase)   (Correct)

....prototype became operational with GMS in the Spring of 1997. Trapeze established new performance standards for page transfer latency and delivered network bandwidth in workstation clusters. It served as a basis for experimentation with novel NIC techniques and related OS structures [36, 33, 1, 4, 12, 34], as well as subsequent research on GMS [1, 32] In the course of this research we earned a patent for self tuning NIC features to balance transfer latency and bandwidth, and produced open source OS extensions now in wide use, including high speed network drivers, FreeBSD Alpha platform support, ....

K. G. Yocum, J. S. Chase, A. J. Gallatin, and A. R. Lebeck. Cut-through delivery in Trapeze: An exercise in low-latency messaging. In Sixth IEEE International Symposium on High Performance Distributed Computing (HPDC-6), pages 243--252, August 1997. 11


Network I/O with Trapeze - Chase, Anderson, Gallatin, Lebeck..   Self-citation (Yocum Chase Gallatin Lebeck)   (Correct)

....to achieve high bandwidth is to use large transfers, reducing pertransfer overheads. On the other hand, a key technique for achieving low latency for large packets is to fragment each message and pipeline the fragments through the network, overlapping transfers on the network links and I O buses [16, 14]. Since it is not possible to do both at once, systems must select which strategy to use. Table 1 shows the effect of this choice on Trapeze latency and bandwidth for 8KB payloads, which are typical of block I O traffic. The first two columns show measured one way latency and bandwidth using ....

K. G. Yocum, J. S. Chase, A. J. Gallatin, and A. R. Lebeck. Cut-through delivery in Trapeze: An exercise in low-latency messaging. In Sixth IEEE International Symposium on High Performance Distributed Computing (HPDC-6), pages 243--252, Aug. 1997. 7


Trapeze/IP: TCP/IP at Near-Gigabit Speeds - Gallatin, Chase, Yocum   (28 citations)  Self-citation (Yocum Chase Gallatin)   (Correct)

....and Myricom. tified. In most cases published performance results are based on research prototypes using previous generation technology. This paper presents experiences with high speed TCP IP networking on a gigabit per second Myrinet network [3] Our work is based on the Trapeze messaging system [10, 5, 1, 9], which consists of a messaging library and custom firmware for Myrinet. Using Trapeze firmware, Myrinet delivers communication performance at the limit of I O bus speeds on many platforms, closely approaching the full gigabit per second wire speed on the most powerful hosts. This makes ....

....endpoint for kernel based TCP IP networking. us to run TCP with MTUs of 64 KB or larger, yielding very low per packet overheads in the networking code. ffl Adaptive message pipelining. The Trapeze firmware pipelines DMA transfers on the I O bus and network link to minimize large packet latency [10]. The pipelining scheme adaptively reverts to larger unpipelined DMA transfers in bandwidth constrained scenarios [9] This technique enables Trapeze to combine low large packet latencies with high bandwidth under load. One item missing from this list is interrupt suppression. Handling of ....

Kenneth G. Yocum, Jeffrey S. Chase, Andrew J. Gallatin, and Alvin R. Lebeck. Cut-through delivery in Trapeze: An exercise in low-latency messaging. In Proceedings of the Sixth IEEE International Symposium on High Performance Distributed Computing (HPDC-6), pages 243--252, August 1997.


Potentials and Limitations of Fault-Based Markov.. - Bartels, Karlin.. (1999)   (1 citation)  Self-citation (Chase)   (Correct)

....and cooperative network memory systems[2,5] offers a new opportunity for prefetching. When network memory is being used as backing store instead of disk, the cost of transferring (or prefetching) a page over a gigabit network is low perhaps 50 times lower than the cost of a disk fetch[6]. However, this reduction in I O latency is a two edged sword: 1) it reduces the cost of a prediction error, thereby increasing the potential for speculative prefetching (i.e. prefetching without future knowledge) but (2) it reduces the total I O stall time for the program, thereby also ....

....prefetching. The Markov predictor, the best predictor for this application, had a prediction accuracy of 80 for wave5. We ran the experiment on a small network of 600 MHz DEC Alpha 21164 processors, connected by a 1Gb sec Myrinet[1] switched network running with Trapeze software and firmware[6] to accelerate page transfers. A page fault from a remote memory in PGMS on this configuration takes less than 200 sec. In this environment, we measured only a 6 speedup for wave5. Given the parameters for this configuration and wave5, our model predicts a maximum speedup of 10 . However, ....

K. Yocum, J. Chase, A. Gallatin, and A. R. Lebeck. Cutthrough delivery in trapeze: An exercise in low-latency messaging. In Proc. of the IEEE International Symp. on High Performance Distributed Computing, 1997.


A Case for Buffer Servers - Anderson, Yocum, Chase (1999)   (3 citations)  Self-citation (Yocum Chase)   (Correct)

.... work deals with issues at the network, storage, operating system, and distributed system levels, combining I O prefetching techniques [25] network memory management, low overhead storage access interfaces in the operating system kernel, careful handling of data movement in the network interface [28, 27], and tighter integration of the network interface with the kernel I O subsystem [2] At the core of our approach is a network memory system (based on the Global Memory Service [9] which defines a basic protocol and mechanisms for moving, caching, and locating pages and blocks in the network. ....

K. G. Yocum, J. S. Chase, A. J. Gallatin, and A. R. Lebeck. Cut-through delivery in Trapeze: An exercise in low-latency messaging. In Proceedings of the Sixth IEEE International Symposium on High Performance Distributed Computing (HPDC-6), pages 243--252, Aug. 1997.


Adaptive Message Pipelining for Network Memory and.. - Yocum, Anderson.. (1998)   Self-citation (Yocum Chase Gallatin Lebeck)   (Correct)

....to evaluate its performance benefits for network storage and network memory systems. 3 We find that this simple technique offers compelling benefits over pipelining strategies recently proposed in other work (see Section 2.4) In particular, it produces 3 Note to reviewers. A previous paper [20] describes an early version of the Trapeze cut through delivery technique, developed concurrently with the earliest work on Myrinet message pipelining by the Fast Messages group at UIUC. The near optimal pipeline schedules automatically, because it naturally adapts to different G and g values at ....

Kenneth G. Yocum, Jeffrey S. Chase, Andrew J. Gallatin, and Alvin R. Lebeck. Cut-through delivery in Trapeze: An exercise in low-latency messaging. In Proceedings of the Sixth IEEE International Symposium on High Performance Distributed Computing (HPDC-6), pages 243-- 252, August 1997.


Cheating the I/O Bottleneck: Network Storage with Trapeze/Myrinet - Anderson, al. (1998)   (2 citations)  Self-citation (Yocum Chase Gallatin)   (Correct)

....network memory systems, and other distributed OS services that cooperatively share data across the cluster. Our broad goal is to use the power of the network to cheat the I O bottleneck for data intensive computing on workstation clusters. This paper describes use of the Trapeze messaging system [27, 5] for high speed data transfer in a network memory system, the Global Memory Service (GMS) 14, 18] Trapeze is a firmware program for Myrinet PCI adapters, and an associated messaging library for DEC AlphaStations running Digital Unix 4.0 and Intel platforms running FreeBSD 2.2. Trapeze ....

....stall time is determined primarily by the time to transfer the requested page on the network. On the other hand, bursts of page transfers (e.g. for read ahead for sequential access) require high bandwidth. The Trapeze firmware employs a message pipelining technique called cut through delivery [27] to balance low payload latency with high bandwidth under load. With this technique, the one way raw Trapeze latency for a 4K page transfer is 70 s on 300MHz Pentium II 440LX systems with LANai 4.1 M2M PCI32 Myrinet adapters. On these systems, Trapeze delivers 112 MB s for a stream of 8K payloads; ....

Kenneth G. Yocum, Jeffrey S. Chase, Andrew J. Gallatin, and Alvin R. Lebeck. Cut-through delivery in Trapeze: An exercise in low-latency messaging. In Proceedings of the Sixth IEEE International Symposium on High Performance Distributed Computing (HPDC-6), pages 243--252, August 1997.


Implementing Cooperative Prefetching and Caching.. - Voelker.. (1998)   (31 citations)  Self-citation (Chase)   (Correct)

....attached to a full crossbar SW8 Myrinet switch. The disks from which all experimentes are performed are 7200 RPM ST32171W Seagate Barricuda drives. Pages and file blocks are 8KB, and reading a random 8KB page from disk takes an average of 13ms. For optimum network performance we used Trapeze [31, 2] firmware for the Myrinet adapters. Trapeze uses an adaptive message pipelining technique called cut through delivery [31] to minimize transfer latencies on the network in a manner similar to GMS subpages [20] Using Trapeze, GMS can perform an 8KB page fault from remote memory in 165 s on ....

....Seagate Barricuda drives. Pages and file blocks are 8KB, and reading a random 8KB page from disk takes an average of 13ms. For optimum network performance we used Trapeze [31, 2] firmware for the Myrinet adapters. Trapeze uses an adaptive message pipelining technique called cut through delivery [31] to minimize transfer latencies on the network in a manner similar to GMS subpages [20] Using Trapeze, GMS can perform an 8KB page fault from remote memory in 165 s on platforms capable of delivering the full bandwidth of the 33 MHz 32 bit PCI bus. The Alcor is limited to 66 MB s in the receiving ....

Kenneth G. Yocum, Jeffrey S. Chase, Andrew J. Gallatin, and Alvin R. Lebeck. Cut-through delivery in Trapeze: An exercise in low-latency messaging. In Proceedings of the Sixth IEEE International Symposium on High Performance Distributed Computing (HPDC-6), pages 243--252, August 1997.


Improving the I/O Performance and Correctness of Network File.. - Wang (1999)   (Correct)

No context found.

Yocum, K. G., Chase, J. S., Gallatin, A. J., and Lebeck, A. R. (1997). Cut-through delivery in trapeze: An exercise in low-latency messaging. In Proc. of the Sixth IEEE International Symposium on High Performance Distributed Computing.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC