| Felten, E.W., Alpert, R.D., Bilas, A., Blumrich, M.A., Clark, D.W., Damianakis, S.N., Dubnicki, C., Iftode, L., Li, K.: Early experience with message-passing on the SHRIMP multicomputer. In: Proc. 23rd Symp. on Computer Architecture. (1996) 296--307 |
.... system area networks, called Vi rtual Interface Archi ecture (VIA) 2] whi hi n turn was heavi5 i spi ed by Departmentof Computer Science, Rutgers University, Email: vinicio,muralir,ricardob,ifV28 cs.rutgers.edu theuni versi ty researchi n user level and memory mapped communi cati on [3] [4]. Programmable devi ce controllers thati ncorporate a powerful processor and memory and can execute sophi sti cated I O protocols have also been studi ed. In parti cular, programmable di sk controllers have been proposed to o#oad the host CPU and reduce I O communi cati on [5] 6] 7] ....
.... di sk controllers have been proposed to o#oad the host CPU and reduce I O communi cati on [5] 6] 7] Programmable networki nterfaces have beeni n the marketplace for a whi6 [8] but they have been studi ed almost solely as support for clusteri nterconnectsi n di stri buted shared memory [4] or di tri uted file systems [9] Despi te thi s extensi ve body of research, two major avenues that can deci si vely define the archi tecture of the next generati on of hi gh performance network servers have not been addressed. Fi rst, there has been no research toi nvesti gate a server archi ....
Edward W. Felten, Richard D. Alpert, Angelos Bilas, Matthias A. Blumrich, Douglas W. Clark, Stef6V8 Damianakis, Cezary Dubnicki, Liviu IfG de, and Kai Li, "Early experience with message-passing on the shrimp multicomputer," in Proceedings of the 23rd Annual Symposium on Computer Architecture, May 1996. 8
....was not favourable for such systems. Intelligent devices have been shown to be a promising innovation for servers, especially in the case of storage systems [16, 1, 9] Intelligent network interfaces [30] have also been studied, but mostly for cluster interconnects in distributed shared memory [15] or distributed file systems [5] Recently released network interface cards have been equipped with hardware support to o#oad the TCP IP protocol processing from the host [3, 25, 2, 11, 14, 40, 19] Some of these cards also provide support to o#oad networking protocol processing for network ....
Felten, E. W., Alpert, R. D., Bilas, A., Blumrich, M. A., Clark, D. W., Damianakis, S., Dubnicki, C., Iftode, L., and Li, K. Early experience with message-passing on the shrimp multicomputer. In Proceedings of the 23rd Annual Symposium on Computer Architecture (May 1996).
....in its own send queue with an appropriate destination identifier. Such messaging systems also try to minimize buffer copying which contributes to a major part of 5 the overhead. Examples of such messaging systems are Active Messages [19, 43] U Net [42] Fast Messages (FM) 25] and SHRIMP [2, 8]. Let us examine how these lightweight messaging systems achieve message transfer. An application is typically linked to a communication library in the host, and a portion of the host memory is allocated for DMA to and from the network interface. A typical message transfer in these systems is ....
E. W. Felten, R. A. Alpert, A. Bilas, M. A. Blumrich, D. W. Clark, S. N. Damianakis, C. Dubnicki, L. Iftode, and K. Li. Early Experience with Message-Passing on the SHRIMP Multicomputer. In International Symposium on Computer Architecture (ISCA), pages 296--307, 1996.
....traffic elimination from the host CPU and memory subsystem while allowing frame producers to transfer frames to schedulers. 5. Related Work A number of NI based research projects have focused on providing low latency message passing over cluster interconnects like ATM, Myrinet, FDDI and HIPPI[8, 16, 23, 22] using intelligent NIs equipped with programmable CoProcessors[4, 11, 15, 19] Our DVCM communication machine implementation on FORE SBA 200 (i960CA) cards allows run time extension of NI functionality and enables computation directly on the NI[19] The SPINE project at the University of ....
E. W. Felten, R. D. Alpert, A. Bilas, M. A. Blumrich, D. W. Clark, S. Damianakis, C. Dubnicki, L. Iftode, and K. Li. Early Experience with Message-Passing on the SHRIMP Multicomputer. Proceedings of the 23rd International Symposium on Computer Architecture, May 1996.
....sho cco spcco Figure 17: Multicast latency versus number of destinations for three different communication start up times: 5.0, 10.0, and 20.0 microseconds. Currently researchers are exploring multiple directions to design efficient network interface architectures [21, 43] and messaging layers [13, 29, 44, 45] to reduce communication start up time. In this context the current results indicate that message contention in multicast will gradually dominate with reduction in communication start up time. Thus, algorithms like the CCO hold great promise for implementing multicast with reduced latency in ....
E. W. Felten, R. A. Alpert, A. Bilas, M. A. Blumrich, D. W. Clark, S. N. Damianakis, C. Dubnicki, L. Iftode, and K. Li. Early Experience with Message-Passing on the SHRIMP Multicomputer. In International Symposium on Computer Architecture (ISCA), pages 296-- 307, 1996.
....message in its own send queue with an appropriate destination identifier. Such messaging systems also try to minimize buffer copying which contributes to a major part of the overhead. Examples of such messaging systems are Active Messages [23, 51] U Net [50] Fast Messages (FM) 29] and SHRIMP [3, 10]. Let us examine how these lightweight messaging systems achieve message transfer. An application is typically linked to a communication library, and a portion of the host memory is allocated for DMA to and from the network interface. A typical message transfer in these systems is done in the ....
E. W. Felten, R. A. Alpert, A. Bilas, M. A. Blumrich, D. W. Clark, S. N. Damianakis, C. Dubnicki, L. Iftode, and K. Li. Early Experience with Message-Passing on the SHRIMP Multicomputer. In International Symposium on Computer Architecture (ISCA), pages 296--307, 1996. 38
....target commodity workstation clusters. Several recent research projects have explored the benefits of using programmable network interfaces provided by current gigabit networks [6, 3] These benefits include the lower message overhead possible when interfaces are directly accessible at user level [7, 10, 21], the lower largemessage latency possible when interfaces fragment and pipeline data transfers between host memory and the network [4, 8, 22] the higher throughput possible when fragmentation and pipeline are adaptive to message size [18, 20] and the lower overheads possible when interfaces ....
E. W. Felten, R. D. Alpert, A. Bilas, M. A. Blumrich, D. W. Clark, S. N. Damianakis, C. Dubnick, L. Iftode, and K. Li. Early experience with message-passing on the Shrimp multicomputer. In Proc. of the 23rd International Symposium of Computer Architecture, May 1996.
....streams fairly. This has already been demonstrated by us in [32, 31] for the case of a host based scheduler implementation. 5 Related Work A number of NI based research projects have focused on providing low latency message passing over cluster interconnects like ATM, Myrinet, FDDI and HIPPI [7, 9, 17, 28, 27]. The network interfaces used in many cluster interconnects are intelligent and equipped with programmable co processors [5, 12, 16, 22] This makes it an attractive target to offload certain host tasks to allow tighter integration between computation and communication. Our DVCM communication ....
Edward W. Felten, Richard D. Alpert, Angelos Bilas, Matthias A. Blumrich, Douglas W. Clark, Stefanos Damianakis, Cezary Dubnicki, Liviu Iftode, and Kai Li. Early Experience with Message-Passing on the SHRIMP Multicomputer. Proceedings of the 23rd International Symposium on Computer Architecture, May 1996.
....traffic elimination from the host CPU and memory subsystem while allowing frame producers to transfer frames to schedulers. 5 Related Work A number of NI based research projects have focused on providing low latency message passing over cluster interconnects like ATM, Myrinet, FDDI and HIPPI[10, 19, 29, 28] using intelligent NIs equipped with programmable CoProcessors[6, 13, 18, 22] The network interfaces used in many cluster interconnects are intelligent and equipped with programmable co processors[6, 13, 18, 26] This makes it an attractive target to offload certain host tasks to allow tighter ....
Edward W. Felten, Richard D. Alpert, Angelos Bilas, Matthias A. Blumrich, Douglas W. Clark, Stefanos Damianakis, Cezary Dubnicki, Liviu Iftode, and Kai Li. Early Experience with MessagePassing on the SHRIMP Multicomputer. Proceedings of the 23rd International Symposium on Computer Architecture, May 1996.
....Science The Ohio State University 2015 Neil Ave. Columbus, OH 43210 Contact Author: M. Banikazemi (banikaze cis.ohio state.edu) 1. Introduction Cluster computing is becoming increasingly popular for providing cost effective and affordable parallel computing for day to day computational needs [2, 11, 16]. Such environments consist of clusters of workstations connected by Local Area Networks (LANs) The possibility of the incremental expansion of clusters by incorporating new generations of computing nodes and networking technologies is another factor contributing to the popularity of cluster ....
E. W. Felten, R. A. Alpert, A. Bilas, M. A. Blumrich, D. W. Clark, S. N. Damianakis, C. Dubnicki, L. Iftode and K. Li. Early Experience with Message-Passing on the SHRIMP Multicomputer. International Symposium on Computer Architecture (ISCA), 1996.
No context found.
E. Felten, R. Alpert, A. Bilas, M. Blumrich, D. Clark, S. Damianakis, C. Dubnicki, L. Iftode, and K. Li. Early Experience with Message-Passing on the SHRIMP Multicomputer. In Proc. 23rd Annual Symposium on Computer Architecture (ISCA), May 1996.
No context found.
E. Felten, R. Alpert, A. Bilas, M. Blumrich, D. Clark, S. Damianakis, C. Dubnicki, L. Iftode, and K. Li. Early experience with message-passing on the shrimp multicomputer. In Proceedings of the 23rd Annual Symposium on Computer Architecture, May 1996.
....a powerful tool in designing systems for automated monitoring, healing, recovery and repair. Fortunately, the tool that makes this idea possible already exists. Remote memory communication (RMC) is a technology originally developed to lower the overhead of communication by reducing OS involvement [2, 11]. In addition to low overhead send receive operations that require no OS intervention, RMC provides remote DMA (RDMA) primitives that allow external access to the memory of a host, without using its CPU. RDMA read and write primitives are present in industrial RMC standards like VIA [10] ....
....host CPU is still involved in such transfers. In contrast, a remote DMA operation completely bypasses the CPU on the remote host. For this, the remote NIC performs a silent DMA to from the host memory. RDMA write is the most common RMC operation, practically supported by all RMC implementations [11, 10, 15]. With RDMA write, the sender can write into a remote memory buffer without remote CPU intervention. The completion of the RDMA write can be determined by checking a completion queue in the network interface or through an application specific flag in the area to be updated. RDMA read is a more ....
E. Felten, R. Alpert, A. Bilas, M. Blumrich, D. Clark, S. Damianakis, C. Dubnicki, L. Iftode, and K. Li. Early Experience with Message-Passing on the SHRIMP Multicomputer. In Proceedings of the 23rd Annual Symposium on Computer Architecture, May 1996.
....evaluated. 2. 3 New I O Technology Intelligent devices have been shown to be a promising innovation for servers, especially in the case of storage systems [27, 1, 10] Intelligent Network Interfaces [39] have also been studied, but mostly for cluster interconnects in distributed shared memory [26] or distributed file systems [4] Recently released Network Interface Cards have been equipped with hardware support to offload the TCP IP protocol processing from the host [3, 35, 2, 19, 24, 53, 32] Some of these cards also provide support to offload network protocol processing for network ....
FELTEN, E. W., ALPERT, R. D., BILAS, A., BLUMRICH, M. A., CLARK, D. W., DAMIANAKIS, S., DUBNICKI, C., IFTODE, L., AND LI, K. Early Experience with Message-Passing on the SHRIMP Multicomputer. In Proceedings of the 23rd Annual Symposium on Computer Architecture (May 1996).
....goals of solving the bandwidth and CPU bottlenecks which occur when other solutions such as IP Tunneling or bridging are used to connect In niBand Fabrics to TCP IP networks. Intelligent network interfaces [25] have been studied, but mostly for cluster interconnects in distributed shared memory [16] or distributed le systems [3] Recently released network interface cards have been equipped with hardware support to o oad the TCP IP protocol processing from the host [1, 2, 11, 15, 18, 33] Some of these cards also provide support to o oad networking protocol processing for network attached ....
Felten, E. W., Alpert, R. D., Bilas, A., Blumrich, M. A., Clark, D. W., Damianakis, S., Dubnicki, C., Iftode, L., and Li, K. Early Experience with Message-Passing on the SHRIMP Multicomputer. In Proceedings of the 23rd Annual Symposium
....network bandwidth (Table 3) As a consequence, a roundtrip communication for either a page or lock transfer is at best on the order of a millisecond. Current network technologies [6, 13, 7] as well as aggressive software for fast interrupts, exceptions [30] and virtual memory mapped communication [10, 11] have brought such latencies down signi cantly to the neighborhood of a couple of microseconds. An interesting question is to what extent our results are speci c to the Paragon architecture and how they would be a ected by di erent architectural parameters. Fast interrupts and low latency messages ....
E.W. Felten, R.D. Alpert, A. Bilas, M.A. Blumrich, D.W. Clark, S. Damianakis, C. Dubnicki, L. Iftode, and K. Li. Early Experience with Message-Passing on the SHRIMP Multicomputer. In Proceedings of the 23rd Annual Symposium on Computer Architecture, May 1996.
No context found.
E Felten, R Alpert, A Bilas, M Blumrich, D W Clark, S Damianakis, C Dubnicki, L Iftode, and K Li. Early Experience with Message-Passing on the Shrimp Multicomputer. In International Symposium on Computer Architecture XXIII, 1996.
....directly to memory. Hence, there is no explicit receive operation. CPU involvement in receiving data can be as little as checking a flag, although a hardware notification mechanism is also supported. Numbers for the latency and bandwidth delivered by the SHRIMP VMMC layer can be found in [13]. Notifications The notification mechanism is used to transfer control to a receiving process, or to notify the receiving process about external events. It consists of a message transfer followed by an invocation of a user specified, user level handler function. The receiving process can ....
E.W. Felten, R.D. Alpert, A. Bilas, M.A. Blumrich, D.W. Clark, S.N. Damianakis, C. Dubnicki, L. Iftode, and K. Li. Early Experience with Message-Passing on the SHRIMP Multicomputer. Proceedings of 23rd International Symposium on Computer Architecture, May 1996, pages 296--307.
....(VMMC) 25] is a communication model that provides direct data transfers between the sender s and receiver s virtual address spaces. This section provides a high level overview of VMMC as implemented on Myrinet hardware. The model has been designed and implemented for the SHRIMP multicomputer [16, 14, 32, 25]. Since the SHRIMP network interface supports VMMC mostly in hardware, the implementation requires either no software overhead or only a few user level instructions to transfer data between the separate virtual address spaces of two machines on a network. In short, VMMC on the customized network ....
....address spaces of two machines on a network. In short, VMMC on the customized network interface of SHRIMP has somewhat better performance, at the cost of more operating system modifications and substantially reduced flexibility. VMMC provides support for protected, user level message passing [25, 32]. The main idea is to allow data to be transmitted directly from a source virtual memory to a destination virtual memory. For messages that pass data without passing control, the VMMC approach can completely eliminate software overheads associated with message reception. The VMMC model eliminates ....
E. Felten, R. Alpert, A. Bilas, M. Blumrich, D. Clark, S. Damianakis, C. Dubnicki, L. Iftode, and K. Li. Early experience with message-passing on the shrimp multicomputer. In Proceedings of the 23rd Annual Symposium on Computer Architecture, May 1996.
....reflect the initial state of the implementation; we expect the performance to improve as we tune the prototype to fit our hardware and software environment. 3. 1 Apparatus We took performance measurements of the shared logical disk running on two nodes of the Princeton SHRIMP multicomputer [5, 10]. SHRIMP consists of a number of ordinary Linux Pentium PCs connected by an Intel Paragon backplane. SHRIMP uses hardware support to provide protected, low latency, userlevel communication. The SHRIMP hardware has a raw user to user latency of about four microseconds; the SHRIMP stream sockets ....
Edward W. Felten, Richard Alpert, Angelos Bilas, Matthias A. Blumrich, Douglas W. Clark, Stefanos Damianakis, Cezary Dubnicki, Liviu Iftode, and Kai Li. Early experience with message-passing on the SHRIMP multicomputer. In Proceedings of the 23rd International Symposium on Computer Architecture, 1996. To appear.
....and lower cost. With the supports of such high performance interconnection networks, multiple SHV servers can be connected to form a powerful supercomputing environment. In the past, various fast messaging mechanisms for clusters have been proposed, such as AM [4] FM [5] UNet [6] VMMC [13], and BIP [7] These mechanisms have been ported on Fast Ethernet, ATM or Myrinet. Recently several prototype cluster communication systems using Gigabit networking have been built. For example, Berkeley s Linux VIA [9] is a high performance implementation of the Virtual Interface Architecture ....
E. Felten, R. Alpert, A. Bilas, M. Blumrich, D. Clark, S. Damianakis, C. Dubnicki, L. Ifode, and K. Li, "Early Experience with Message-passing on the Shrimp Multicomputer", Proc. of the 23rd Annual Symposium on Computer Architecture, 1996.
....interrupts In addition to answering these questions, we discuss other lessons learned, including some things that consumed much of our design time, yet turned out not to matter. 2 The SHRIMP System The architecture of the SHRIMP system has been described in several previous publications [10, 11, 12, 22] notably [9] and will only be described in as much detail as necessary here. Specific details of the architecture and implementation will be described more thoroughly throughout this paper. 2.1 Architecture The SHRIMP system consists of sixteen PC nodes connected by an Intel routing ....
Edward W. Felten, Richard Alpert, Angelos Bilas, Matthias A. Blumrich, Douglas W. Clark, Stefanos N. Damianakis, Cezary Dubnicki, Liviu Iftode, and Kai Li. Early Experience with Message-Passing on the Shrimp Multicomputer. In Proceedings of the 23nd Annual Symposium on Computer Architecture, pages 296--307, May 1996.
No context found.
E.W. Felten, R.D. Alpert, A. Bilas, M.A. Blumrich, D.W. Clark, S. Damianakis, C. Dubnicki, L. Iftode, and K. Li. Early Experience with Message-Passing on the SHRIMP Multicomputer. In Proceedings of the 23rd Annual Symposium on Computer Architecture, May 1996.
....sockets library. Experiments show that a hybrid spin thenblock strategy offers good performance in a wide variety of situations, and that speeding up the interrupt path significantly improves performance. 1. Introduction Many network interfaces can place incoming data directly in user memory [1, 7, 3, 2]. This capability enables the construction of very efficient network software since the network interface can deliver a burst of packets without any software intervention. On such an architecture, communication can be handled entirely in a user level library. In message passing systems, software ....
....and be awakened later by an interrupt. This requires both a policy for when to poll and when to block, and a mechanism for efficient blocking. This paper considers the questions of which receive policy and which mechanism to use. We present an implementation on the prototype SHRIMP multicomputer [1, 7], and the results of experiments using our user level sockets library [5] for micro benchmarks, larger benchmarks, and for a distributed file system. Our results show that a hybrid spin block policy is best in a wide range of situations, and that reducing the interrupt service overhead ....
[Article contains additional citation context not shown here]
E. W. Felten, R. Alpert, A. Bilas, M. A. Blumrich, D. W. Clark, S. Damianakis, C. Dubnicki, L. Iftode, and K. Li. Early experience with message-passing on the shrimp multicomputer. In Proceedings of 23th International Symposium on Computer Architecture, May 1996.
No context found.
Felten, E.W., Alpert, R.D., Bilas, A., Blumrich, M.A., Clark, D.W., Damianakis, S.N., Dubnicki, C., Iftode, L., Li, K.: Early experience with message-passing on the SHRIMP multicomputer. In: Proc. 23rd Symp. on Computer Architecture. (1996) 296--307
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC