44 citations found. Retrieving documents...
Felten, E.W., Alpert, R.D., Bilas, A., Blumrich, M.A., Clark, D.W., Damianakis, S.N., Dubnicki, C., Iftode, L., Li, K.: Early experience with message-passing on the SHRIMP multicomputer. In: Proc. 23rd Symp. on Computer Architecture. (1996) 296--307

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Impact of Next-Generation I/O Architectures on the .. - Carrera.. (2002)   (1 citation)  (Correct)

.... system area networks, called Vi rtual Interface Archi ecture (VIA) 2] whi hi n turn was heavi5 i spi ed by Departmentof Computer Science, Rutgers University, Email: vinicio,muralir,ricardob,ifV28 cs.rutgers.edu theuni versi ty researchi n user level and memory mapped communi cati on [3] [4]. Programmable devi ce controllers thati ncorporate a powerful processor and memory and can execute sophi sti cated I O protocols have also been studi ed. In parti cular, programmable di sk controllers have been proposed to o#oad the host CPU and reduce I O communi cati on [5] 6] 7] ....

.... di sk controllers have been proposed to o#oad the host CPU and reduce I O communi cati on [5] 6] 7] Programmable networki nterfaces have beeni n the marketplace for a whi6 [8] but they have been studi ed almost solely as support for clusteri nterconnectsi n di stri buted shared memory [4] or di tri uted file systems [9] Despi te thi s extensi ve body of research, two major avenues that can deci si vely define the archi tecture of the next generati on of hi gh performance network servers have not been addressed. Fi rst, there has been no research toi nvesti gate a server archi ....

Edward W. Felten, Richard D. Alpert, Angelos Bilas, Matthias A. Blumrich, Douglas W. Clark, Stef6V8 Damianakis, Cezary Dubnicki, Liviu IfG de, and Kai Li, "Early experience with message-passing on the shrimp multicomputer," in Proceedings of the 23rd Annual Symposium on Computer Architecture, May 1996. 8


TCP Servers: Offloading TCP Processing in Internet .. - Rangarajan.. (2002)   (3 citations)  (Correct)

....was not favourable for such systems. Intelligent devices have been shown to be a promising innovation for servers, especially in the case of storage systems [16, 1, 9] Intelligent network interfaces [30] have also been studied, but mostly for cluster interconnects in distributed shared memory [15] or distributed file systems [5] Recently released network interface cards have been equipped with hardware support to o#oad the TCP IP protocol processing from the host [3, 25, 2, 11, 14, 40, 19] Some of these cards also provide support to o#oad networking protocol processing for network ....

Felten, E. W., Alpert, R. D., Bilas, A., Blumrich, M. A., Clark, D. W., Damianakis, S., Dubnicki, C., Iftode, L., and Li, K. Early experience with message-passing on the shrimp multicomputer. In Proceedings of the 23rd Annual Symposium on Computer Architecture (May 1996).


Where to Provide Support for Efficient Multicasting.. - Sivaram, Kesavan.. (1998)   (3 citations)  (Correct)

....in its own send queue with an appropriate destination identifier. Such messaging systems also try to minimize buffer copying which contributes to a major part of 5 the overhead. Examples of such messaging systems are Active Messages [19, 43] U Net [42] Fast Messages (FM) 25] and SHRIMP [2, 8]. Let us examine how these lightweight messaging systems achieve message transfer. An application is typically linked to a communication library in the host, and a portion of the host memory is allocated for DMA to and from the network interface. A typical message transfer in these systems is ....

E. W. Felten, R. A. Alpert, A. Bilas, M. A. Blumrich, D. W. Clark, S. N. Damianakis, C. Dubnicki, L. Iftode, and K. Li. Early Experience with Message-Passing on the SHRIMP Multicomputer. In International Symposium on Computer Architecture (ISCA), pages 296--307, 1996.


A Network Co-processor-Based Approach to Scalable.. - Krishnamurthy.. (2000)   (3 citations)  (Correct)

....traffic elimination from the host CPU and memory subsystem while allowing frame producers to transfer frames to schedulers. 5. Related Work A number of NI based research projects have focused on providing low latency message passing over cluster interconnects like ATM, Myrinet, FDDI and HIPPI[8, 16, 23, 22] using intelligent NIs equipped with programmable CoProcessors[4, 11, 15, 19] Our DVCM communication machine implementation on FORE SBA 200 (i960CA) cards allows run time extension of NI functionality and enables computation directly on the NI[19] The SPINE project at the University of ....

E. W. Felten, R. D. Alpert, A. Bilas, M. A. Blumrich, D. W. Clark, S. Damianakis, C. Dubnicki, L. Iftode, and K. Li. Early Experience with Message-Passing on the SHRIMP Multicomputer. Proceedings of the 23rd International Symposium on Computer Architecture, May 1996.


Efficient Multicast on Irregular Switch-based Networks with.. - Kesavan, Panda   (Correct)

....sho cco spcco Figure 17: Multicast latency versus number of destinations for three different communication start up times: 5.0, 10.0, and 20.0 microseconds. Currently researchers are exploring multiple directions to design efficient network interface architectures [21, 43] and messaging layers [13, 29, 44, 45] to reduce communication start up time. In this context the current results indicate that message contention in multicast will gradually dominate with reduction in communication start up time. Thus, algorithms like the CCO hold great promise for implementing multicast with reduced latency in ....

E. W. Felten, R. A. Alpert, A. Bilas, M. A. Blumrich, D. W. Clark, S. N. Damianakis, C. Dubnicki, L. Iftode, and K. Li. Early Experience with Message-Passing on the SHRIMP Multicomputer. In International Symposium on Computer Architecture (ISCA), pages 296-- 307, 1996.


Architectural Support for Efficient Multicasting in.. - Sivaram, Kesavan.. (2001)   (Correct)

....message in its own send queue with an appropriate destination identifier. Such messaging systems also try to minimize buffer copying which contributes to a major part of the overhead. Examples of such messaging systems are Active Messages [23, 51] U Net [50] Fast Messages (FM) 29] and SHRIMP [3, 10]. Let us examine how these lightweight messaging systems achieve message transfer. An application is typically linked to a communication library, and a portion of the host memory is allocated for DMA to and from the network interface. A typical message transfer in these systems is done in the ....

E. W. Felten, R. A. Alpert, A. Bilas, M. A. Blumrich, D. W. Clark, S. N. Damianakis, C. Dubnicki, L. Iftode, and K. Li. Early Experience with Message-Passing on the SHRIMP Multicomputer. In International Symposium on Computer Architecture (ISCA), pages 296--307, 1996. 38


Using Embedded Network Processors to Implement Global.. - Coady, Ong, Feeley (1999)   (6 citations)  (Correct)

....target commodity workstation clusters. Several recent research projects have explored the benefits of using programmable network interfaces provided by current gigabit networks [6, 3] These benefits include the lower message overhead possible when interfaces are directly accessible at user level [7, 10, 21], the lower largemessage latency possible when interfaces fragment and pipeline data transfers between host memory and the network [4, 8, 22] the higher throughput possible when fragmentation and pipeline are adaptive to message size [18, 20] and the lower overheads possible when interfaces ....

E. W. Felten, R. D. Alpert, A. Bilas, M. A. Blumrich, D. W. Clark, S. N. Damianakis, C. Dubnick, L. Iftode, and K. Li. Early experience with message-passing on the Shrimp multicomputer. In Proc. of the 23rd International Symposium of Computer Architecture, May 1996.


A Network Co-processor-Based Approach to Scalable.. - Krishnamurthy.. (2000)   (3 citations)  (Correct)

....streams fairly. This has already been demonstrated by us in [32, 31] for the case of a host based scheduler implementation. 5 Related Work A number of NI based research projects have focused on providing low latency message passing over cluster interconnects like ATM, Myrinet, FDDI and HIPPI [7, 9, 17, 28, 27]. The network interfaces used in many cluster interconnects are intelligent and equipped with programmable co processors [5, 12, 16, 22] This makes it an attractive target to offload certain host tasks to allow tighter integration between computation and communication. Our DVCM communication ....

Edward W. Felten, Richard D. Alpert, Angelos Bilas, Matthias A. Blumrich, Douglas W. Clark, Stefanos Damianakis, Cezary Dubnicki, Liviu Iftode, and Kai Li. Early Experience with Message-Passing on the SHRIMP Multicomputer. Proceedings of the 23rd International Symposium on Computer Architecture, May 1996.


A Network Co-Processor-Based Approach to Scalable.. - Krishnamurthy.. (2000)   (3 citations)  (Correct)

....traffic elimination from the host CPU and memory subsystem while allowing frame producers to transfer frames to schedulers. 5 Related Work A number of NI based research projects have focused on providing low latency message passing over cluster interconnects like ATM, Myrinet, FDDI and HIPPI[10, 19, 29, 28] using intelligent NIs equipped with programmable CoProcessors[6, 13, 18, 22] The network interfaces used in many cluster interconnects are intelligent and equipped with programmable co processors[6, 13, 18, 26] This makes it an attractive target to offload certain host tasks to allow tighter ....

Edward W. Felten, Richard D. Alpert, Angelos Bilas, Matthias A. Blumrich, Douglas W. Clark, Stefanos Damianakis, Cezary Dubnicki, Liviu Iftode, and Kai Li. Early Experience with MessagePassing on the SHRIMP Multicomputer. Proceedings of the 23rd International Symposium on Computer Architecture, May 1996.


Profile-Based Load Balancing for Heterogeneous Clusters - Banikazemi, Prabhu..   (Correct)

....Science The Ohio State University 2015 Neil Ave. Columbus, OH 43210 Contact Author: M. Banikazemi (banikaze cis.ohio state.edu) 1. Introduction Cluster computing is becoming increasingly popular for providing cost effective and affordable parallel computing for day to day computational needs [2, 11, 16]. Such environments consist of clusters of workstations connected by Local Area Networks (LANs) The possibility of the incremental expansion of clusters by incorporating new generations of computing nodes and networking technologies is another factor contributing to the popularity of cluster ....

E. W. Felten, R. A. Alpert, A. Bilas, M. A. Blumrich, D. W. Clark, S. N. Damianakis, C. Dubnicki, L. Iftode and K. Li. Early Experience with Message-Passing on the SHRIMP Multicomputer. International Symposium on Computer Architecture (ISCA), 1996.


Nonintrusive Remote Healing Using Backdoors - Florin Sultan Aniruddha (2003)   Self-citation (Iftode)   (Correct)

No context found.

E. Felten, R. Alpert, A. Bilas, M. Blumrich, D. Clark, S. Damianakis, C. Dubnicki, L. Iftode, and K. Li. Early Experience with Message-Passing on the SHRIMP Multicomputer. In Proc. 23rd Annual Symposium on Computer Architecture (ISCA), May 1996.


Nonintrusive Failure Detection and Recovery for.. - Sultan, Bohra..   Self-citation (Iftode)   (Correct)

No context found.

E. Felten, R. Alpert, A. Bilas, M. Blumrich, D. Clark, S. Damianakis, C. Dubnicki, L. Iftode, and K. Li. Early experience with message-passing on the shrimp multicomputer. In Proceedings of the 23rd Annual Symposium on Computer Architecture, May 1996.


Using Remote Memory Communication for Self-Healing Systems - Sultan, Bohra, Neamtiu..   Self-citation (Iftode)   (Correct)

....a powerful tool in designing systems for automated monitoring, healing, recovery and repair. Fortunately, the tool that makes this idea possible already exists. Remote memory communication (RMC) is a technology originally developed to lower the overhead of communication by reducing OS involvement [2, 11]. In addition to low overhead send receive operations that require no OS intervention, RMC provides remote DMA (RDMA) primitives that allow external access to the memory of a host, without using its CPU. RDMA read and write primitives are present in industrial RMC standards like VIA [10] ....

....host CPU is still involved in such transfers. In contrast, a remote DMA operation completely bypasses the CPU on the remote host. For this, the remote NIC performs a silent DMA to from the host memory. RDMA write is the most common RMC operation, practically supported by all RMC implementations [11, 10, 15]. With RDMA write, the sender can write into a remote memory buffer without remote CPU intervention. The completion of the RDMA write can be determined by checking a completion queue in the network interface or through an application specific flag in the area to be updated. RDMA read is a more ....

E. Felten, R. Alpert, A. Bilas, M. Blumrich, D. Clark, S. Damianakis, C. Dubnicki, L. Iftode, and K. Li. Early Experience with Message-Passing on the SHRIMP Multicomputer. In Proceedings of the 23rd Annual Symposium on Computer Architecture, May 1996.


TCP Servers: A TCP/IP Offloading Architecture For Internet.. - Banerjee (2002)   Self-citation (Iftode)   (Correct)

....evaluated. 2. 3 New I O Technology Intelligent devices have been shown to be a promising innovation for servers, especially in the case of storage systems [27, 1, 10] Intelligent Network Interfaces [39] have also been studied, but mostly for cluster interconnects in distributed shared memory [26] or distributed file systems [4] Recently released Network Interface Cards have been equipped with hardware support to offload the TCP IP protocol processing from the host [3, 35, 2, 19, 24, 53, 32] Some of these cards also provide support to offload network protocol processing for network ....

FELTEN, E. W., ALPERT, R. D., BILAS, A., BLUMRICH, M. A., CLARK, D. W., DAMIANAKIS, S., DUBNICKI, C., IFTODE, L., AND LI, K. Early Experience with Message-Passing on the SHRIMP Multicomputer. In Proceedings of the 23rd Annual Symposium on Computer Architecture (May 1996).


MemNet: Memory-Mapped Networking for Servers - Rangarajan, Banerjee, Iftode (2002)   Self-citation (Iftode)   (Correct)

....goals of solving the bandwidth and CPU bottlenecks which occur when other solutions such as IP Tunneling or bridging are used to connect In niBand Fabrics to TCP IP networks. Intelligent network interfaces [25] have been studied, but mostly for cluster interconnects in distributed shared memory [16] or distributed le systems [3] Recently released network interface cards have been equipped with hardware support to o oad the TCP IP protocol processing from the host [1, 2, 11, 15, 18, 33] Some of these cards also provide support to o oad networking protocol processing for network attached ....

Felten, E. W., Alpert, R. D., Bilas, A., Blumrich, M. A., Clark, D. W., Damianakis, S., Dubnicki, C., Iftode, L., and Li, K. Early Experience with Message-Passing on the SHRIMP Multicomputer. In Proceedings of the 23rd Annual Symposium


Performance Evaluation of Two Home-Based Lazy Release.. - Zhou, Iftode, Li (1996)   (65 citations)  Self-citation (Iftode Li)   (Correct)

....network bandwidth (Table 3) As a consequence, a roundtrip communication for either a page or lock transfer is at best on the order of a millisecond. Current network technologies [6, 13, 7] as well as aggressive software for fast interrupts, exceptions [30] and virtual memory mapped communication [10, 11] have brought such latencies down signi cantly to the neighborhood of a couple of microseconds. An interesting question is to what extent our results are speci c to the Paragon architecture and how they would be a ected by di erent architectural parameters. Fast interrupts and low latency messages ....

E.W. Felten, R.D. Alpert, A. Bilas, M.A. Blumrich, D.W. Clark, S. Damianakis, C. Dubnicki, L. Iftode, and K. Li. Early Experience with Message-Passing on the SHRIMP Multicomputer. In Proceedings of the 23rd Annual Symposium on Computer Architecture, May 1996.


cBSP: Zero-Cost Synchronization in a Modified BSP Model - Alpert, Philbin (1997)   (4 citations)  Self-citation (Alpert)   (Correct)

No context found.

E Felten, R Alpert, A Bilas, M Blumrich, D W Clark, S Damianakis, C Dubnicki, L Iftode, and K Li. Early Experience with Message-Passing on the Shrimp Multicomputer. In International Symposium on Computer Architecture XXIII, 1996.


Network Interface - Angelos Bilas Edward   Self-citation (Felten Bilas)   (Correct)

....directly to memory. Hence, there is no explicit receive operation. CPU involvement in receiving data can be as little as checking a flag, although a hardware notification mechanism is also supported. Numbers for the latency and bandwidth delivered by the SHRIMP VMMC layer can be found in [13]. Notifications The notification mechanism is used to transfer control to a receiving process, or to notify the receiving process about external events. It consists of a message transfer followed by an invocation of a user specified, user level handler function. The receiving process can ....

E.W. Felten, R.D. Alpert, A. Bilas, M.A. Blumrich, D.W. Clark, S.N. Damianakis, C. Dubnicki, L. Iftode, and K. Li. Early Experience with Message-Passing on the SHRIMP Multicomputer. Proceedings of 23rd International Symposium on Computer Architecture, May 1996, pages 296--307.


Improving the Performance of Shared Virtual Memory on System Area.. - Bilas (1998)   (5 citations)  Self-citation (Bilas)   (Correct)

....(VMMC) 25] is a communication model that provides direct data transfers between the sender s and receiver s virtual address spaces. This section provides a high level overview of VMMC as implemented on Myrinet hardware. The model has been designed and implemented for the SHRIMP multicomputer [16, 14, 32, 25]. Since the SHRIMP network interface supports VMMC mostly in hardware, the implementation requires either no software overhead or only a few user level instructions to transfer data between the separate virtual address spaces of two machines on a network. In short, VMMC on the customized network ....

....address spaces of two machines on a network. In short, VMMC on the customized network interface of SHRIMP has somewhat better performance, at the cost of more operating system modifications and substantially reduced flexibility. VMMC provides support for protected, user level message passing [25, 32]. The main idea is to allow data to be transmitted directly from a source virtual memory to a destination virtual memory. For messages that pass data without passing control, the VMMC approach can completely eliminate software overheads associated with message reception. The VMMC model eliminates ....

E. Felten, R. Alpert, A. Bilas, M. Blumrich, D. Clark, S. Damianakis, C. Dubnicki, L. Iftode, and K. Li. Early experience with message-passing on the shrimp multicomputer. In Proceedings of the 23rd Annual Symposium on Computer Architecture, May 1996.


Simplifying Distributed File Systems Using a Shared Logical Disk - Shillner, Felten (1996)   (6 citations)  Self-citation (Felten)   (Correct)

....reflect the initial state of the implementation; we expect the performance to improve as we tune the prototype to fit our hardware and software environment. 3. 1 Apparatus We took performance measurements of the shared logical disk running on two nodes of the Princeton SHRIMP multicomputer [5, 10]. SHRIMP consists of a number of ordinary Linux Pentium PCs connected by an Intel Paragon backplane. SHRIMP uses hardware support to provide protected, low latency, userlevel communication. The SHRIMP hardware has a raw user to user latency of about four microseconds; the SHRIMP stream sockets ....

Edward W. Felten, Richard Alpert, Angelos Bilas, Matthias A. Blumrich, Douglas W. Clark, Stefanos Damianakis, Cezary Dubnicki, Liviu Iftode, and Kai Li. Early experience with message-passing on the SHRIMP multicomputer. In Proceedings of the 23rd International Symposium on Computer Architecture, 1996. To appear.


High Performance Communication Subsystem for Clustering.. - Zhu, Lee, Wang (2000)   Self-citation (Li)   (Correct)

....and lower cost. With the supports of such high performance interconnection networks, multiple SHV servers can be connected to form a powerful supercomputing environment. In the past, various fast messaging mechanisms for clusters have been proposed, such as AM [4] FM [5] UNet [6] VMMC [13], and BIP [7] These mechanisms have been ported on Fast Ethernet, ATM or Myrinet. Recently several prototype cluster communication systems using Gigabit networking have been built. For example, Berkeley s Linux VIA [9] is a high performance implementation of the Virtual Interface Architecture ....

E. Felten, R. Alpert, A. Bilas, M. Blumrich, D. Clark, S. Damianakis, C. Dubnicki, L. Ifode, and K. Li, "Early Experience with Message-passing on the Shrimp Multicomputer", Proc. of the 23rd Annual Symposium on Computer Architecture, 1996.


Design Choices in the SHRIMP System: An Empirical Study - Blumrich, Alpert, Chen.. (1998)   (12 citations)  Self-citation (Felten Alpert Blumrich Clark Damianakis Dubnicki Iftode Li)   (Correct)

....interrupts In addition to answering these questions, we discuss other lessons learned, including some things that consumed much of our design time, yet turned out not to matter. 2 The SHRIMP System The architecture of the SHRIMP system has been described in several previous publications [10, 11, 12, 22] notably [9] and will only be described in as much detail as necessary here. Specific details of the architecture and implementation will be described more thoroughly throughout this paper. 2.1 Architecture The SHRIMP system consists of sixteen PC nodes connected by an Intel routing ....

Edward W. Felten, Richard Alpert, Angelos Bilas, Matthias A. Blumrich, Douglas W. Clark, Stefanos N. Damianakis, Cezary Dubnicki, Liviu Iftode, and Kai Li. Early Experience with Message-Passing on the Shrimp Multicomputer. In Proceedings of the 23nd Annual Symposium on Computer Architecture, pages 296--307, May 1996.


Home-based Shared Virtual Memory - Iftode (1998)   (30 citations)  Self-citation (Iftode)   (Correct)

No context found.

E.W. Felten, R.D. Alpert, A. Bilas, M.A. Blumrich, D.W. Clark, S. Damianakis, C. Dubnicki, L. Iftode, and K. Li. Early Experience with Message-Passing on the SHRIMP Multicomputer. In Proceedings of the 23rd Annual Symposium on Computer Architecture, May 1996.


Reducing Waiting Costs in User-Level Communication - Damianakis, Chen, Felten (1997)   (8 citations)  Self-citation (Felten Damianakis)   (Correct)

....sockets library. Experiments show that a hybrid spin thenblock strategy offers good performance in a wide variety of situations, and that speeding up the interrupt path significantly improves performance. 1. Introduction Many network interfaces can place incoming data directly in user memory [1, 7, 3, 2]. This capability enables the construction of very efficient network software since the network interface can deliver a burst of packets without any software intervention. On such an architecture, communication can be handled entirely in a user level library. In message passing systems, software ....

....and be awakened later by an interrupt. This requires both a policy for when to poll and when to block, and a mechanism for efficient blocking. This paper considers the questions of which receive policy and which mechanism to use. We present an implementation on the prototype SHRIMP multicomputer [1, 7], and the results of experiments using our user level sockets library [5] for micro benchmarks, larger benchmarks, and for a distributed file system. Our results show that a hybrid spin block policy is best in a wide range of situations, and that reducing the interrupt service overhead ....

[Article contains additional citation context not shown here]

E. W. Felten, R. Alpert, A. Bilas, M. A. Blumrich, D. W. Clark, S. Damianakis, C. Dubnicki, L. Iftode, and K. Li. Early experience with message-passing on the shrimp multicomputer. In Proceedings of 23th International Symposium on Computer Architecture, May 1996.


Initial Evaluation of a User-Level Device Driver - Framework Kevin Elphinstone   (Correct)

No context found.

Felten, E.W., Alpert, R.D., Bilas, A., Blumrich, M.A., Clark, D.W., Damianakis, S.N., Dubnicki, C., Iftode, L., Li, K.: Early experience with message-passing on the SHRIMP multicomputer. In: Proc. 23rd Symp. on Computer Architecture. (1996) 296--307

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC