61 citations found. Retrieving documents...
D. S. Henry and C. F. Joerg. A tightly-coupled processornetwork interface. In Proc. Fifth Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS V), pages 111--122, Oct. 1992.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Integrating User-Level Networks with SMT - Parker, Davis, Hsieh   (Correct)

....quickly by the processor, and acts as a staging area for outgoing messages. A zero copy message protocol allows messages to be delivered directly to user space without copying. Not all of these ideas are new. For example, previous research has explored the use of user level network interfaces[3,9,11,13,18]. However, this specific combination of features is unique, in that it exposes interrupts directly to user level programs. The important aspect of our architecture lies in its support for user level messaging (for both interprocessor communication and I O) in a general purpose operating system ....

....message arrival notification. Sends, receives, and notifications all make passes through operating system code. Since the operating system code is unlikely to reside in the cache, these system calls result in cache misses. Figure 1: Anatomy of a message for a kernel mode NI User level interfaces[3,9,11,13,18] and zero copy protocols[5,7] significantly reduce the overhead of message sends and receives by eliminating operating system and copying overhead on the message send and receive sides. Notifications still have significant opportunity for optimization, as they remain the performance and ....

Dana S. Henry and Christopher F. Joerg. A Tightly-Coupled Processor-Network Interface. In Proceedings of the 5th International ASPLOS, October 1992, pp. 111-122.


The Optimistic Direct Access File System: Design and Network.. - Magoutis   (Correct)

....overall file access latency using client initiated RDMA is 20 to 40 lower than with using RPC. We note that this is a rough estimate based on measurements of a prototype with few optimizations. We plan for a more thorough evaluation in the future. NICs that are tightly coupled with the host [29], 30] 31] 14] 32] aim at lowering the NIC overhead as well as the overhead of the NIC interaction with the host for control and data transfer. Previous research [31] has pointed to the importance of NIC design for low latency RPC communication. Scheduling delays included in TNullRPC can be ....

D. S. Henry and C. F. Joerg, "A Tightly-Coupled ProcessorNetwork Interface," in Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating System (ASPLOS), Boston, MA, October 1992, pp. 111--121.


Evaluation of Design Choices for Gang Scheduling Using . . . - Feitelson, al. (1996)   (16 citations)  (Correct)

....as load increases, and fosters support for interactive response times. In addition, the fact that interacting processes are guaranteed to execute simultaneously allows them to access hardware communication devices in user mode, without the overheads associated with operating system protection [29, 39, 23]. A distributed hierarchical control (DHC) scheme for supporting gang scheduling has been proposed previously [16] DHC defines a control structure over the parallel machine and combines time slicing with a buddy system partitioning scheme. Given the DHC framework, this paper investigates several ....

Henry, D. S., and Joerg, C. F. A tightly coupled processor-network interface. Fifth Intl. Conf. Architect. Support for Prog. Lang. & Operating Syst., Sep. 1992, pp. 111--122.


Dynamic Computation Migration in Distributed Shared Memory Systems - Hsieh (1995)   (6 citations)  (Correct)

....in the register file, so that marshaling and unmarshaling would be cheaper. We did not directly simulate the extra registers, but simply changed the accounting for the marshaling and unmarshaling costs. The performance advantages of such an organization have been explored by Henry and Joerg [42]. 45 46 Static Computation Migration in Prelude To evaluate the benefits of computation migration, we implemented static computation migration in the Prelude system. This chapter describes the Prelude implementation, and summarizes some of our performance results. A more complete description ....

D.S. Henry and C.F. Joerg. "A Tightly-Coupled Processor-Network Interface". In Proceedings of the 5th Conference on Architectural Support for Programming Languages and Systems, pages 111--122, October 1992.


Integration of Message Passing and Shared Memory.. - Heinlein.. (1994)   (40 citations)  (Correct)

....mapped or register based. The Connection Machine CM 5 provides access to the network through a memory mapped interface [21] Register based approaches provide tighter coupling by moving the network interface into the processor and providing direct access to the interface through special registers [5, 9]. One of the problems with the above systems is that they are typically optimized for short messages, thus limiting the achievable bandwidth for large transfers. Another drawback is that the compute processor handles the complete transfer, thus taking cycles away from the main computation. ....

Dana S. Henry and Christopher F. Joerg. A tightly coupled processor-network interface. In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 111 122, September 1992.


Evaluation of Design Choices for Gang Scheduling using.. - Feitelson, Rudolph (1996)   (16 citations)  (Correct)

....gradually as load increases, and fosters support for interactive response times. And the fact that interacting processes are guaranteed to execute simultaneously allows them to access hardware communication devices in user mode, without the overheads associated with operating system protection [29, 39, 23]. A distributed hierarchical control (DHC) scheme for supporting gang scheduling has been proposed previously [16] DHC defines a control structure over the parallel machine and combines time slicing with a buddy system partitioning scheme. Given the DHC framework, this paper investigates several ....

D. S. Henry and C. F. Joerg, "A tightly coupled processor-network interface". In 5th Intl. Conf. Architect. Support for Prog. Lang. & Operating Syst., pp. 111--122, Sep 1992.


Architectural Support for an Efficient Implementation of a.. - Grahn, Stenström (1995)   (Correct)

....cache Second level cache Memory Network interface Network Processor chip FLWB SLWB Interrupt Buffer (IB) Send buffer (SB) Local bus module Figure 3: Processor node organization for a software only directory protocol. 6 faced to the second level cache bus as proposed in [12] where the first entry in the buffer (last for SB) is accessible from software. However, we propose to access the buffers through memorymapped addresses instead of register mapped as in [12] to adjust to mainstream processor designs. Coherence interrupts, also called high availability interrupts, ....

....for a software only directory protocol. 6 faced to the second level cache bus as proposed in [12] where the first entry in the buffer (last for SB) is accessible from software. However, we propose to access the buffers through memorymapped addresses instead of register mapped as in [12] to adjust to mainstream processor designs. Coherence interrupts, also called high availability interrupts, need to be precise as pointed out in [14] to avoid protocol deadlock; we cannot delay the software handler execution until a pending load completes. To see this, consider two nodes, i and ....

D. S. Henry and C. F. Joerg, "A Tightly-Coupled Processor-Network Interface," In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-V), pages 111-122, October, 1992.


Design and Evaluation of Communication Latency Hiding/Reduction.. - Afsahi (2000)   (Correct)

....major sources of the communication overhead. The communication hardware aspect includes the architecture and placement of the network interface, and the interconnection network and its services. Many architectures have been proposed for the network interfaces. They are classified as (1) direct [52, 7, 63, 80, 97, 88] and (2) memory based [48, 112, 126, 23] Direct network interfaces allow a processor to directly access the network queue. However, they mostly ignore the issue of multiprogramming. That is, a single thread can only use the network interface at a time. Memory based interfaces provide protection ....

D. S. Henry and C. F. Joerg, "A Tightly-Coupled Processor-Network Interface", Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems, 1992.


Network Interface for Message-Passing Parallel Computation on a.. - Hoe (1994)   (5 citations)  (Correct)

....a network interface for stock workstations can only communicate with the processor through a bus. A straightforward message passing interface could be implemented as memory mapped registers such as in CM 5 [14] or as a packet sized array of memory mapped registers as suggested by Joerg and Henry [9] (Figure 1) These interfaces are passive devices that only respond to the processor s direct manipulation through memory mapped operations. A user program composes an outbound packet by writing the content of the packet, with its header, to the registers through memory mapped writes. The content ....

....parallel processing occurs in frequent and small size messages. The communication overhead must be further minimized by giving the user processes direct control of the network interface. These lowoverhead user level network interface designs can be found in many contemporary MPP architectures [9, 14]. However, these designs typically involve the support of custom system or CPU design. In most contemporary workstation designs, the RISC microprocessors are optimized for cached accesses while the bus architectures are optimized for blocked transfers. The network interface design must take these ....

D. S. Henry and C. F. Joerg. A tightly-coupled processor-network interface. In Proceedings of ASPLOS, October 1992.


The Cranium Network Interface Architecture: Support for Message.. - McKenzie (1997)   (Correct)

....Most tightly coupled interface designs use special purpose message instructions (e.g. a send command) in which general purpose processor registers are the operands. Some examples include the Message Driven Processor (MDP) 22] the Caltech Mosaic [24] the Henry Joerg network interface [29] and the Start ( T) 30] network interface. An exception is iWarp from CMU [23] whose systolic communication model is based on operands rather than operators. A send command is constructed by using a message register as the destination of an arithmetic operation; a receive command is constructed ....

Dana S. Henry and Christopher F. Joerg. A tightly coupled processor-network interface. Proc. of the 5th ASPLOS, October 1992, pp. 111-122.


Infrastructure for Research towards Ubiquitous.. - Grosz, Kung.. (1994)   (Correct)

....appropriately distribute among the compiler, operating system, and hardware the functionality that is needed to efficiently support I O bound applications. One approach to a tighter coupling of the network and processor is to directly map the network ports into the processor s register file [10][49]. This approach makes it efficient for a compiler to send a message since the compiler simply needs to build a message in the processor s register file. A shortcoming of this approach is that it bypasses the operating system, and we want to take advantage of the operating system functions that ....

Dana S. Henry and Christopher F. Joerg. A tightly-coupled processor-network interface. In Proceedings of 5th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 111--122, October 1992.


Architectural Support for Compiler-Generated Data-Parallel Programs - Klaiber (1994)   (1 citation)  (Correct)

....message passing for streamlining data and control transfer between workstations on a local area network. Though our approach is similar in nature, our emphasis lies on userlevel communication and compiler support for parallel programming; their emphasis is on distributed applications. In [Henry Joerg 92b] the authors propose a network interface design that provides special support for Id [Nikhil 90] programs that have been compiled to Berkeley s Threaded Abstract Machine [Culler et al. 91a] They reduce communication overhead by implementing message dispatching, forwarding and replying in ....

D. S. Henry and C. F. Joerg. A tightly-coupled processor-network interface. In Proceedings of 5th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 111--122, October 1992.


Architectural Support for Compiler-Generated Data-Parallel Programs - Klaiber (1994)   (1 citation)  (Correct)

....interface. 2.1 Processing Nodes A fundamental decision in designing the processing node is whether to use commodity or custom processors. A custom processor design can improve communication performance; for example, the architect can integrate the network interface more tightly with the CPU [Henry Joerg 92a] or include special purpose communication instructions in the CPU. The Kendall Square KSR 1 shared memory computer [KSR 92] uses both approaches; its processors provide instructions for prefetching or post storing cache lines, plus a host of instructions that control the memory system, ....

....be useful to examine how the communication requirements for other programming models differ from those of data parallel languages. For example, Henry and Joerg designed a NI for use with the TAM [Culler et al. 91b] model of execution and reached conclusions that are somewhat different from ours [Henry Joerg 92a] While this is a fairly extreme example, in that their programming model is radically different from the data parallel model, it shows the tight connection between the choice of a programming model and the design of the communication architecture. 7.2 Conclusions We have identified ....

D. S. Henry and C. F. Joerg. A tightly-coupled processor-network interface. In Proceedings of 5th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 111--122, October 1992. 117


Do Faster Routers Imply Faster Communication? - Karamcheti, Chien (1994)   (6 citations)  (Correct)

....transfers, and is implemented using CMAM xfer function which splits up the transfer into a sequence of hardware packets at the source, and CMAM handle left xfer function which reassembles the packets at the destination. 1 While this is not the most efficient type of network interface [13, 8, 4], it has the significant virtue that no changes to the processor are required. Many researchers believe that this type of interface is basically representative of future network interfaces. 2 The CM 5 NI also supports an interrupt driven interface for reception; however, the cost is very high ....

....exploring what impact advanced network features (adaptive routing, virtual channels) have on network interface complexity and software overhead. Our work addresses some of these issues. Research on network interfaces has focused primarily on reducing message injection (and reception) overhead [13, 8, 19, 4] or offloading the communication onto a coprocessor [14, 16, 3] Such efforts are complementary to our goal of software protocol overhead reduction. Improvements in network interface can reduce the basic communication cost in our studies. While reducing the basic cost is important, as can be seen ....

D. S. Henry and C. F. Joerg. A tightly-coupled processor-network interface. In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages an Operating Systems, pages 111--122, 1992.


Network Interface Support for User-Level Buffer Management - Dubnicki, Li, Mesarina (1994)   (4 citations)  (Correct)

....is still hundreds of CPU instructions. In addition, the node is complex and expensive to build. Several projects have taken the approach of lowering communication latency by bringing the network all the way into the processor and mapping the network interface FIFOs to special processor registers [2, 6, 3]. Writing and reading these registers queues and dequeues data from the FIFOs respectively. While this is efficient for fine grain, low latency communication, it requires the use of a nonstandard CPU, and it does not support the protection of multiple contexts in a multiprogramming environment. ....

Dana S. Henry and Christopher F. Joerg. A tightly-coupled processor-network interface. In Proceedings of 5th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 111--122, October 1992.


UTLB: A Mechanism for Address Translation on Network Interfaces - Angelos (1998)   (8 citations)  (Correct)

....to remote virtual memory bu#ers. Digital s Memory Channel [20] uses an approach that is similar to PRAM on the sending side and to SHRIMP s automatic update on the receiving side. Another direct approach allows applications to compose and retrieve messages using network interface registers [22, 42]. In addition to network interface registers, the Cray T3E [42] supports remote memory accesses with an approach similar to UDMA. It uses complete page tables to describe global communication segments and all communication pages are pinned in memory. A network interface typically can hold only a ....

Dana S. Henry and Christopher F. Joerg. A tightly-coupled processor-network interface. In Proceedings of 5th International Conference on Architectur al Support for Programming Languages and Operating Systems, pages 111--122, October 1992.


Design Choices in the SHRIMP System: An Empirical Study - Blumrich, Alpert, Chen.. (1998)   (12 citations)  (Correct)

.... also supports user level message passing, but places more burden on application programs by requiring them to construct their own message headers [15] Some previous machines have worked to streamline the hardware software interface by mapping network interface FIFOs into processor registers [14, 24, 37]. Such approaches go against SHRIMP s goal of using commodity CPUs. A slightly less integrated approach mapping FIFOs to memory rather than registers was employed in the CM 5 [42] CM 5 implementation restrictions limited the degree of multiprogramming, however, and applications were still ....

Dana S. Henry and Christopher F. Joerg. A TightlyCoupled Processor-Network Interface. In Proceedings of 5th International Conference on Architectur al Support for Programming Languages and Operating Systems, pages 111--122, October 1992.


Virtual Memory Mapped Network Interface for the.. - Blumrich, Li.. (1994)   (238 citations)  (Correct)

....is still hundreds of CPU instructions. In addition, the node is complex and expensive to build. Several projects have taken the approach of lowering communication latency by bringing the network all the way into the processor and mapping the network interface FIFOs to special processor registers [5, 11, 7]. Writing and reading these registers queues and dequeues data from the FIFOs respectively. While this is efficient for fine grain, low latency communication, it requires the use of a non standard CPU, and it does not support the protection of multiple contexts in a multiprogramming environment. ....

Dana S. Henry and Christopher F. Joerg. A tightlycoupled processor-network interface. In Proceedings of 5th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 111--122, October 1992.


Software Overhead in Messaging Layers: Where Does the Time Go? - Karamcheti, Chien (1994)   (32 citations)  (Correct)

....Second, the nodes, NI, and the network all have finite buffering, so software buffer management is required. Third, the CM 5 network provides error detection at the packet level, but no error correction, requiring a software 1 While this is not the most efficient type of network interface [12, 6], it requires no changes to the processor. Many researchers believe that this type of interface is representative of future network interfaces. protocol to ensure reliable delivery. And finally, the CM5 network hardware only supports packets with five 32 bit words, so a typical message is broken ....

....remains significant over the range of packet sizes. For finite sequence multi packet deliveries, the messaging overhead is lower, but still significant, accounting for 9 11 of the total cost. Improved network interfaces and DMA hardware If network interfaces can be integrated on chip, as in [12, 6], the basic cost of communication can be reduced, but this will not reduce protocol costs in the messaging layer on which our study focuses. If the base cost is reduced, that increases the importance of the costs in the rest of the messaging layer. Similarly, while DMA hardware can reduce the cost ....

[Article contains additional citation context not shown here]

D. S. Henry and C. F. Joerg. A tightly-coupled processornetwork interface. In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages an Operating Systems, pages 111--122, 1992.


Two Virtual Memory Mapped Network Interface Designs - Blumrich, Dubnicki.. (1994)   (5 citations)  (Correct)

....is still hundreds of CPU instructions. In addition, the node is complex and expensive to build. Several projects have taken the approach of lowering communication latency by bringing the network all the way into the processor and mapping the network interface FIFOs to special processor registers [3, 8, 4]. Writing and reading these registers queues and dequeues data from the FIFOs respectively. While this is efficient for fine grain, low latency communication, it requires the use of a non standard CPU, and it does not support the protection of multiple contexts in a multiprogramming environment. ....

Dana S. Henry and Christopher F. Joerg. A tightlycoupled processor-network interface. In Proceedings of 5th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 111--122, October 1992.


Design and Evaluation of Network Interfaces for System Area.. - Mukherjee (1998)   (Correct)

....No T Zero [103] Partial Partial No SHRIMP [12] Yes Write Through No DI Multicomputer[23] No No Network Interface Table 3.5: Comparison of CNI with other network interfaces 80 communicate through the cachable memory accesses, for which most processors and buses are optimized. Henry and Joerg [50] and Dally, et al. 34] advocate changes to a processor s registers. MIT Alewife [2] and Fugu [72] rely on a custom cache controller. MIT StarT NG [22] requires a co processor interface at the same level as the L2 cache. AP1000 [110] requires integrated cache and DMA controllers. Stanford FLASH ....

....of buffering using a synthetic workload and concluded that buffering messages in virtual memory can occur only rarely for realistic applications. However, in contrast I found that for two of my seven macrobenchmarks, buffering can play a significant role in improving performance. Henry and Joerg [50] compared the performance of three NIs mapped respectively to the processor registers, L1 cache bus, and an off chip L2 cache bus. However, unlike my study, they did not examine the impact of buffering on the performance of these NIs. 5.6 Conclusions In this chapter I have systematically ....

Dana S. Henry and Christopher F. Joerg. A Tightly-Coupled Processor-Network Interface. In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS V), pages 111--122, October 1992.


The M-Machine Multicomputer - Fillo, Keckler, Dally, Carter.. (1995)   (22 citations)  (Correct)

.... the sender and the receiver, and eliminating the dedicated memory for message arrival, as is found on the J Machine [8] Register mapped network interfaces have been used previously in the Mars Machine [2] J Machine, and iWarp [4] and have been described by T [26] as well as Henry and Joerg [15]. However, none of these systems provide protection for user level messages. Systems, like the J Machine, that provide user access to the network interface without atomicity must temporarily disable interrupts to allow the sending process to complete the message. The M Machine s atomic SEND ....

Henry, D. S., and Joerg, C. F. A tightly-coupled processor-network interface. In Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS V) (Oct. 1992), ACM, pp. 111--122.


The Performance Potential of an Integrated Network.. - Binkert, Dreslinski.. (2004)   (Correct)

No context found.

D. S. Henry and C. F. Joerg. A tightly-coupled processornetwork interface. In Proc. Fifth Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS V), pages 111--122, Oct. 1992.


Analyzing NIC Overheads in Network-Intensive Workloads - Binkert, Hsu, Saidi.. (2004)   (Correct)

No context found.

Dana S. Henry and Christopher F. Joerg. A tightly-coupled processor-network interface. In Proc. Fifth Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS V), pages 111-- 122, October 1992.


Two Virtual Memory Mapped Network Interface Designs - Blumrich, Dubnicki.. (1994)   (5 citations)  (Correct)

No context found.

D. S. Henry and C. F. Joerg, "A tightly-coupled processor-network interface," in Proceedings of 5th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 111--122, Oct. 1992.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC