21 citations found. Retrieving documents...
Ron Minnich, Dan Burns, and Frank Hady. The Memory-Integrated Network Interface. IEEE Micro, pages 11--20, February 1995.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
High-Performance All-Software Distributed Shared Memory - Johnson (1995)   (9 citations)  (Correct)

....transfer bandwidth roughly in half, it provides significantly reduced latencies for small transfers by avoiding the need for prenegotiation with the receiving node. Networks of workstations with interprocessor communication performance rivaling that of the CM 5 are rapidly becoming reality [7, 56, 77, 80]. For example, Thekkath et al. 78] describe the implementation of a specialized data transfer mechanism implemented on a pair of 25 MHz DECstations connected with a first generation FORE ATM network. They report round trip times of 45 microseconds (1125 cycles) to read 40 bytes of data from a ....

Ron Minnich, Dan Burns, and Frank Hady. The Memory-Integrated Network Interface. IEEE Micro, pages 11--20, February 1995.


Quantifying the Impact of Architectural Scaling on.. - Heath, Kaur, Martin.. (2001)   (1 citation)  (Correct)

....used in zero copy, is complex. Thus, zero copy only becomes attractive compared to single copy when the per page overhead of page table manipulations is less than the per page copy cost [40] We call this cross over point the OS page budget. The final approach is shared memory communication [6, 11, 20, 34]. The basic idea behind this approach is that the sender and the receiver agree to transparently share a block of memory. Communication then occurs via writes, DMA, or sometimes reads, into the shared region. While shared memory approaches deliver excellent performance, they have several ....

....sometimes by substantial amounts. These approaches can be summarized as: Integrate messaging into the processor [14] Integrate messaging into the cache controller [1, 21] Integrate the messaging unit into the memory system [35] A more radical approach integrates it into the DRAM slots [34]. The key challenge for future high performance networking will be to maintain a high degree of connectivity while increasing performance. This means either working within the confines of the kernel and I O busses or completely replacing the standards which form the underlying communication ....

R. Minnich, D. Burns, and F. Hady. The Memory Integrated Network Interface. IEEE Micro, Feb. 1995.


Quantifying the Impact of Architectural Scaling on.. - Heath, Kaur, Martin.. (2001)   (1 citation)  (Correct)

....used in zero copy, is complex. Thus, zero copy only becomes attractive compared to single copy when the per page overhead of page table manipulations is less than the per page copy cost [40] We call this cross over point the OS page budget. The final approach is shared memory communication [6, 11, 20, 34]. The basic idea behind this approach is that the sender and the receiver agree to transparently share a block of memory. Communication then occurs via writes, DMA, or sometimes reads, into the shared region. While shared memory approaches deliver excellent performance, they have several ....

....by substantial amounts. These approaches can be summarized as: # Integrate messaging into the processor [14] # Integrate messaging into the cache controller [1, 21] # Integrate the messaging unit into the memory system [35] A more radical approach integrates it into the DRAM slots [34]. The key challenge for future high performance networking will be to maintain a high degree of connectivity while increasing performance. This means either working within the confines of the kernel and I O busses or completely replacing the standards which form the underlying communication ....

R. Minnich, D. Burns, and F. Hady. The Memory Integrated Network Interface. IEEE Micro, Feb. 1995.


Quantifying the Impact of Architectural Scaling on.. - Heath, Kaur, Martin.. (2001)   (1 citation)  (Correct)

....used in zero copy, is complex. Thus, zero copy only becomes attractive compared to single copy when the per page overhead of page table manipulations is less than the per page copy cost [40] We call this cross over point the OS page budget. The nal approach is shared memory communication [6, 11, 20, 34]. The basic idea behind this approach is that the sender and the receiver agree to transparently share a block of memory. Communication then occurs via writes, DMA, or sometimes reads, into the shared region. While shared memory approaches deliver excellent performance, they have several ....

....sometimes by substantial amounts. These approaches can be summarized as: Integrate messaging into the processor [14] Integrate messaging into the cache controller [1, 21] Integrate the messaging unit into the memory system [35] A more radical approach integrates it into the DRAM slots [34]. The key challenge for future high performance networking will be to maintain a high degree of connectivity while increasing performance. This means either working within the con nes of the kernel and I O busses or completely replacing the underlying communication substructure. Recent work in ....

Minnich, R., Burns, D., and Hady, F. The Memory Integrated Network Interface. IEEE Micro (Feb. 1995).


High Performance Messaging on Workstations: - Illinois Fast Messages   (Correct)

....computing on workstation clusters has largely been limited to coarse grained applications. Attempts to improve performance based on specialized hardware can achieve dramatically higher performance, but generally require specialized components and interfacing deep into a computer system design [16, 18, 19]. This increases cost, and decreases the potential market (and hence sale volume) of the network hardware. The goal of the Illinois Fast Messages (FM) project is to deliver a large fraction of the network s physical performance (latency and bandwidth) to the user at small packet sizes. 1 ....

....megabyte versus 128 kilobytes for Myrinet) on the interface card. This is a key difference which affects the buffering protocols feasible in the two systems. A number of other researchers have explored the development of special hardware to achieve low latency high bandwidth communication (MINI [16], FUNet [18] VUNet [19] etc. However, these hardware approaches have the drawback that they depend on specific memory bus interfaces, and require significant hardware investment. FM demonstrates that decent performance can be achieved without moving the interfaces closer to the host ....

F. Hady, R. Minnich, and D. Burns. The Memory Integrated Network Interface. In Proceedings of the IEEE Symposium on Hot Interconnects, 1994. 22


Design and Evaluation of an HPVM-based Windows NT.. - Chien, Lauria.. (1999)   (5 citations)  (Correct)

....Sistemistica, Universit a di Napoli Federico II , via Claudio 21, 80125 Napoli, Italy. Visiting at time of writing. x Department of Computer Science, University of Illinois at Urbana Champaign, 1304 W. Springfield Ave. Urbana, IL 61801, USA 1 U Net [40] VMMC 2 [14] PM [38] BIP [32] MINI [22], and Osiris [16] These efforts have forged a consensus on core requirements for network interfaces to deliver high communication performance to application programs. This consensus includes the following key features which have recently been incorporated in the Intel Compaq Microsoft standard ....

....to inexpensive general purpose parallel computers. Over the past four years, the research community has produced dramatic progress in delivering hardware communication performance to applications (Fast Messages (FM) 30] Active Messages (AM) 9] U Net [40] VMMC 2 [14] PM [38] BIP [32] MINI [22], and Osiris [16] These efforts have forged a consensus on core requirements for network interfaces that has eventually been cristallized in the Intel Compaq Microsoft standard for cluster interfaces the Virtual Interface Architecture [1] Some of these and other efforts have also produced ....

F. Hady, R. Minnich, and D. Burns. The Memory Integrated Network Interface. In Proceedings of the IEEE Symposium on Hot Interconnects, 1994.


The ParaStation Project: Using Workstations as Building Blocks.. - Warschko (1996)   (10 citations)  (Correct)

....software architecture (section 4) Various benchmarks are presented in section 5. 2 Related Work There are several approaches targeting efficient parallel computing on workstation clusters which can be classified as shared memory and distributed memory systems. Sharedmemory systems such as MINI [11], SHRIMP [4] SCI based SALMON [10, 13] Digital s MemoryChannel [15] and Sun s S Connect [12] support memory mapped communication, allowing user processes to communicate without expensive buffer management and without system calls across the protection boundary separating user processes from the ....

Ron Minnich, Dan Burns, and Frank Hady. The Memory-Integrated Network Interface. IEEE Micro, 15(1):11--20, February 1995.


Lightning: A Scalable Dynamically Reconfigurable.. - Dowd, Perreault.. (1995)   (2 citations)  (Correct)

....Internet traffic, and a common trait is the bimodal distribution of packet size. Essentially the traffic consists of either big packets or small packets with little traffic in between [6, 7] This traffic characteristic is also true with the distributed shared memory environment we are studying [8]. The traffic, generated by a memory coherence protocol needed to achieve DSM, has two major forms: memory consistency control packets (such as memory block requests, invalidations, acknowledgments) and memory block packets [9] Memory consistency control packets are very small while the memory ....

....achieve DSM, has two major forms: memory consistency control packets (such as memory block requests, invalidations, acknowledgments) and memory block packets [9] Memory consistency control packets are very small while the memory blocks can be up to 8 Kbytes. In a non broadcast scheme, like MNFS [8], there can be multiple memory consistency control packets for every memory block that needs to be transferred. The FatMAC protocol has been designed to exploit this characteristic. This protocol reserves access on a pre allocated channel through control packets. Transmission on each channel ....

[Article contains additional citation context not shown here]

F. Hady, R. Minnich, and D. Burns, "The memory integrated network interface," in ACM Hot Interconnects Symposium, (San Jose, CA), 1994.


CRL: High-Performance All-Software Distributed Shared Memory - Johnson, Kaashoek, Wallach (1995)   (135 citations)  (Correct)

....for applications requiring more expressive programming environments than PVM or MPI. We have developed an implementation of CRL for Thinking Machines CM 5 family of multiprocessors. Because today s networks of workstations offer interprocessor communication performance rivaling that of the CM 5 [6, 30, 41, 43], we believe that the performance of our CRL implementation for the CM 5 is indicative of what should be possible for an implementation targeting networks of workstations using current technology. Using the CM 5 implementation of CRL, we have run applications on systems with up to 128 processors. ....

....transfer bandwidth roughly in half, it provides significantly reduced latencies for small transfers by avoiding the need for prenegotiation with the receiving node. Networks of workstations with interprocessor communication performance rivaling that of the CM 5 are rapidly becoming reality [6, 30, 41, 43]. For example, Thekkath et al. 42] describe the implementation of a specialized data transfer mechanism implemented on a pair of 25 MHz DECstations connected with a FORE ATM network. They report round trip times of 45 microseconds (1125 cycles) to read 40 bytes of data from a remote processor. ....

Ron Minnich, Dan Burns, and Frank Hady. The Memory-Integrated Network Interface. IEEE Micro, pages 11--20, February 1995.


LIGHTNING Network and Systems Architecture - Dowd, al. (1996)   (14 citations)  (Correct)

....is to take place. The unique configuration of the memory interface, achieving a zero copy interface, enhances the capability of the system level software. For example, TCP and NFS can be supported natively with a single buffer copy. The system software aspect of this project has been described in [25]. 2.3.2 Synchronization This section briefly described the frame and slot synchronization strategies used in the prototype testbed. Due to space constraints, the performance analysis of the schemes along with the implementational details have been removed from this paper but may be found in [26] ....

F. Hady, R. Minnich, and D. Burns, "The memory integrated network interface," in ACM Hot Interconnects Symposium, (San Jose, CA), 1994.


Distributed Shared Memory Using Reflective Memory: The LAM System - Roger Denton   (Correct)

....media. Reports from the SHRIMP project [Blu95] show that virtual memory mapped communication can reduce the send latency overhead by as much as 78 percent. In addition to extending processing times this overhead also serves to effectively limit the useful bandwidth of a communication medium. [Min95] noted that a network interface capable of 48MB s was only able to drive 20MB s the other 28MB s was lost to overhead associated with the protocol and data path to the interface. An interface that is mapped into the address space of the process avoids virtually all of the overhead associated ....

R. Minnich, D. Burns, F. Hady, "The Memory-Integrated Network Interface," IEEE Micro, vol. 15, no. 1, February 1995, pp 11--20.


Software-Based Communication Latency Hiding for Commodity.. - Strumpen (1996)   (2 citations)  (Correct)

....calls effectively block the entire pod [collection of threads sharing an address space] 80 90 on Ethernet and Internet. However, for future high performance networks, lifting the communication protocol into user space is the natural step towards delivering the bandwidth of such networks [19, 26]. 3 Implementation Issues This section describes the design of a user level implementation for communication latency hiding. Since a singlethreaded application is a special case of a multithreaded application with respect to our scheduling policy, multithreaded applications are treated as the ....

Ron Minnich, Dan Burns, and Frank Hady. The memoryintegrated network interface. IEEE Micro, 15(1):11--20, February 1995.


Networking Support For High-Performance Servers - Nahum (1997)   (Correct)

....single byte packets. Our work, in contrast, separates the benefits of branch prediction from code repositioning, and shows that the latter has at least as much of an effect as the former. Much research has been done supporting high speed network interfaces, both in the kernel and in user space [8, 14, 15, 33, 38, 39, 41, 80]. A common theme throughout this body of work is the desire to reduce the number of data copies as much as possible, as naive network protocol implementations can copy packet data as much as five times. As a consequence, single copy and even zero copy protocol stacks have been demonstrated [28, ....

....computation to overlap with load instructions. 4.4.2 Impact of Copying Data Our baseline results are for the case where there is no copying of data between buffers or across address spaces. Avoiding or eliminating copies is a well known technique for improving network protocol performance [8, 14, 15, 33, 38, 41, 80]. In certain cases, however, this copy may be unavoidable, due to insufficient operating system or device support. In this section we evaluate how copying data affects network protocol performance. Our protocols run in user space, and do not incur any additional overhead other than the raw ....

Minnich, R., Burns, D., and Hady, F. The memory-integrated network interface. IEEE Micro, 15(1):11--20, Feb. 1995.


Perspectives for High Performance Computing in.. - Strumpen, Ramkumar, ..   (Correct)

....a higher availability of the CPU for other useful (interleaved) computation. The numbers reported are similar to the communication latency hiding capability of stock workstation hardware reported in [29] The effects of reducing the copies in the communication protocol s data path are reported in [2, 6, 7, 23, 31, 32]. All these approaches have a common objective: Increasing performance while maintaining existing protection mechanisms or boundaries. We argue that these approaches tackle only the tip of the iceberg. Abandoning nearly all kernel protection will allow for substantial performance improvements. ....

Ron Minnich, Dan Burns, and Frank Hady. The memory-integrated network interface. IEEE Micro, 15(1):11--20, February 1995.


On Tailoring Thread Schedules in Protocol Design.. - Gomez, Rego, al.   (Correct)

....of the future. Traditional software architectures for distributed systems are based on in kernel protocols and are not capable of delivering to an application the low latency and high bandwidth offered by current network technology. New software and hardware architectures have been proposed [3, 4, 5, 6], to overcome performance limitations. These are alternatives which depart from the traditional approach by attempting to move the OS kernel out of the critical communication path. In general, this is accomplished by giving an application direct access to a network interface device, through ....

R. Minnich, D. Burns, and F. Hady. The Memory Integrated Network Interface. IEEE Micro, pages 11--20, February 1995.


The ParaPC / ParaStation Project: Efficient Parallel.. - Thomas Warschko (1996)   (1 citation)  (Correct)

....ScaLAPACK equation solver and others) execute with nearly linear speedup on a wide range of different problem sizes. 2 Related Work There are several projects targeting low latency and high throughput parallel computing on workstation clusters. MINI (Memory Integrated Network Interface) MBH95] targets a 1 Gbps bandwidth with 1.2 s latency interconnect using an ATM network. Communication in MINI is based on channels between participating processes using ATM s virtual channel concept. Performance figures ATM cell round trip time of 3.9 s at 10Mbytes s are based on VHDL ....

Ron Minnich, Dan Burns, and Frank Hady. The Memory-Integrated Network Interface. IEEE Micro, 15(1):11--20, February 1995.


High-Performance All-Software Distributed Shared Memory - Johnson (1995)   (9 citations)  (Correct)

....transfer bandwidth roughly in half, it provides significantly reduced latencies for small transfers by avoiding the need for prenegotiation with the receiving node. Networks of workstations with interprocessor communication performance rivaling that of the CM 5 are rapidly becoming reality [7, 56, 77, 80]. For example, Thekkath et al. 78] describe the implementation of a specialized data transfer mechanism implemented on a pair of 25 MHz DECstations connected with a first generation FORE ATM network. They report round trip times of 45 microseconds (1125 cycles) to read 40 bytes of data from a ....

Ron Minnich, Dan Burns, and Frank Hady. The Memory-Integrated Network Interface. IEEE Micro, pages 11--20, February 1995.


High Performance Messaging on Workstations: Illinois Fast.. - Pakin, Lauria, Chien (1995)   (275 citations)  (Correct)

....computing on workstation clusters has largely been limited to coarse grained applications. Attempts to improve performance based on specialized hardware can achieve dramatically higher performance, but generally require specialized components and interfacing deep into a computer system design [16, 18, 19]. This increases cost, and decreases the potential market (and hence sale volume) of the network hardware. The goal of the Illinois Fast Messages (FM) project is to deliver a large fraction of the network s physical performance (latency and bandwidth) to the user at small packet sizes. 1 ....

....megabyte versus 128 kilobytes for Myrinet) on the interface card. This is a key difference which affects the buffering protocols feasible in the two systems. A number of other researchers have explored the development of special hardware to achieve low latency high bandwidth communication (MINI [16], FUNet [18] VUNet [19] etc. However, these hardware approaches have the drawback that they depend on specific memory bus interfaces, and require significant hardware investment. FM demonstrates that decent performance can be achieved without moving the interfaces closer to the host processor. ....

F. Hady, R. Minnich, and D. Burns. The Memory Integrated Network Interface. In Proceedings of the IEEE Symposium on Hot Interconnects, 1994.


CLAM: Connection-less, Lightweight, and Multiway.. - Gomez, Rego, Sunderam   (Correct)

....Traditional distributed computing software architectures that are based on in kernel protocols cannot fully deliver to an application the low latencies and high bandwidths offered by current network technology. To overcome this limitation, new software and hardware architectures have been proposed [2, 15, 22, 32] that depart from the traditional approach by moving the kernel out of the critical path in communication. The idea is to implement protocols in user space, with user level processes having direct access to network devices. As a result, such architectures have reduced message latencies from ....

....is to implement protocols in user space, with user level processes having direct access to network devices. As a result, such architectures have reduced message latencies from milliseconds to tens of microseconds, and in some cases, round trip times in the order of microseconds have been reported [22]. There are several reasons for implementing communication protocols in user space: to keep the kernel simple [29] i.e. Mach, Amoeba) to obtain functionality not available with traditional in kernel protocols (e.g. many to many communication [24] to increase scalability, and to make protocols ....

R. Minnich, D. Burns, and F. Hady. The Memory Integrated Network Interface. IEEE Micro, pages 11--20, February 1995.


Efficient Parallel Computing on Workstation Clusters - Warschko (1995)   (6 citations)  (Correct)

....are presented in section 6. 2 Related Work There are several approaches with related goals targeting low latency and high throughput parallel computing on workstation clusters. A special issue of IEEE Micro (Feb. 95) presented the most advanced ones. MINI (Memory Integrated Network Interface) MBH95] targets a 1 Gbps bandwidth with 1.2 s latency interconnect using an ATM network. Communication in MINI is based on Channels between participating processes using ATM s virtual channel concept. Presented performance figures ATM cell round trip time of 3.9 s at 10Mbytes s are based on ....

Ron Minnich, Dan Burns, and Frank Hady. The Memory-Integrated Network Interface. IEEE Micro, 15(1):11--20, February 1995.


ParaStation - Efficient Parallel Computing by Clustering.. - Warschko, Blum, Tichy (1997)   (2 citations)  (Correct)

....others) execute with nearly linear speedup on a wide range of problem sizes. 2 Related Work There are several approaches targeting to efficient parallel computing on workstation clusters which can be classified as shared memory and distributedmemory systems. Shared memory systems such as MINI [MBH95] SHRIMP [BDF 95] SCI based SALMON [IEE92,Oma95] Digital s MemoryChannel [Ros95] and Sun s S Connect [NBKP95] support memory mapped communication, allowing user processes to communicate without expensive buffer management and without system calls across the protection boundary separating ....

Ron Minnich, Dan Burns, and Frank Hady. The Memory-Integrated Network Interface. IEEE Micro, 15(1):11--20, February 1995.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC