61 citations found. Retrieving documents...
D. S. Henry and C. F. Joerg. A tightly-coupled processornetwork interface. In Proc. Fifth Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS V), pages 111--122, Oct. 1992.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Integrating User-Level Networks with SMT - Parker, Davis, Hsieh   (Correct)

....quickly by the processor, and acts as a staging area for outgoing messages. A zero copy message protocol allows messages to be delivered directly to user space without copying. Not all of these ideas are new. For example, previous research has explored the use of user level network interfaces[3,9,11,13,18]. However, this specific combination of features is unique, in that it exposes interrupts directly to user level programs. The important aspect of our architecture lies in its support for user level messaging (for both interprocessor communication and I O) in a general purpose operating system ....

....message arrival notification. Sends, receives, and notifications all make passes through operating system code. Since the operating system code is unlikely to reside in the cache, these system calls result in cache misses. Figure 1: Anatomy of a message for a kernel mode NI User level interfaces[3,9,11,13,18] and zero copy protocols[5,7] significantly reduce the overhead of message sends and receives by eliminating operating system and copying overhead on the message send and receive sides. Notifications still have significant opportunity for optimization, as they remain the performance and ....

Dana S. Henry and Christopher F. Joerg. A Tightly-Coupled Processor-Network Interface. In Proceedings of the 5th International ASPLOS, October 1992, pp. 111-122.


The Optimistic Direct Access File System: Design and Network.. - Magoutis   (Correct)

....overall file access latency using client initiated RDMA is 20 to 40 lower than with using RPC. We note that this is a rough estimate based on measurements of a prototype with few optimizations. We plan for a more thorough evaluation in the future. NICs that are tightly coupled with the host [29], 30] 31] 14] 32] aim at lowering the NIC overhead as well as the overhead of the NIC interaction with the host for control and data transfer. Previous research [31] has pointed to the importance of NIC design for low latency RPC communication. Scheduling delays included in TNullRPC can be ....

D. S. Henry and C. F. Joerg, "A Tightly-Coupled ProcessorNetwork Interface," in Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating System (ASPLOS), Boston, MA, October 1992, pp. 111--121.


Evaluation of Design Choices for Gang Scheduling Using . . . - Feitelson, al. (1996)   (16 citations)  (Correct)

....as load increases, and fosters support for interactive response times. In addition, the fact that interacting processes are guaranteed to execute simultaneously allows them to access hardware communication devices in user mode, without the overheads associated with operating system protection [29, 39, 23]. A distributed hierarchical control (DHC) scheme for supporting gang scheduling has been proposed previously [16] DHC defines a control structure over the parallel machine and combines time slicing with a buddy system partitioning scheme. Given the DHC framework, this paper investigates several ....

Henry, D. S., and Joerg, C. F. A tightly coupled processor-network interface. Fifth Intl. Conf. Architect. Support for Prog. Lang. & Operating Syst., Sep. 1992, pp. 111--122.


Dynamic Computation Migration in Distributed Shared Memory Systems - Hsieh (1995)   (6 citations)  (Correct)

....in the register file, so that marshaling and unmarshaling would be cheaper. We did not directly simulate the extra registers, but simply changed the accounting for the marshaling and unmarshaling costs. The performance advantages of such an organization have been explored by Henry and Joerg [42]. 45 46 Static Computation Migration in Prelude To evaluate the benefits of computation migration, we implemented static computation migration in the Prelude system. This chapter describes the Prelude implementation, and summarizes some of our performance results. A more complete description ....

D.S. Henry and C.F. Joerg. "A Tightly-Coupled Processor-Network Interface". In Proceedings of the 5th Conference on Architectural Support for Programming Languages and Systems, pages 111--122, October 1992.


Integration of Message Passing and Shared Memory.. - Heinlein.. (1994)   (40 citations)  (Correct)

....mapped or register based. The Connection Machine CM 5 provides access to the network through a memory mapped interface [21] Register based approaches provide tighter coupling by moving the network interface into the processor and providing direct access to the interface through special registers [5, 9]. One of the problems with the above systems is that they are typically optimized for short messages, thus limiting the achievable bandwidth for large transfers. Another drawback is that the compute processor handles the complete transfer, thus taking cycles away from the main computation. ....

Dana S. Henry and Christopher F. Joerg. A tightly coupled processor-network interface. In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 111 122, September 1992.


Evaluation of Design Choices for Gang Scheduling using.. - Feitelson, Rudolph (1996)   (16 citations)  (Correct)

....gradually as load increases, and fosters support for interactive response times. And the fact that interacting processes are guaranteed to execute simultaneously allows them to access hardware communication devices in user mode, without the overheads associated with operating system protection [29, 39, 23]. A distributed hierarchical control (DHC) scheme for supporting gang scheduling has been proposed previously [16] DHC defines a control structure over the parallel machine and combines time slicing with a buddy system partitioning scheme. Given the DHC framework, this paper investigates several ....

D. S. Henry and C. F. Joerg, "A tightly coupled processor-network interface". In 5th Intl. Conf. Architect. Support for Prog. Lang. & Operating Syst., pp. 111--122, Sep 1992.


Architectural Support for an Efficient Implementation of a.. - Grahn, Stenström (1995)   (Correct)

....cache Second level cache Memory Network interface Network Processor chip FLWB SLWB Interrupt Buffer (IB) Send buffer (SB) Local bus module Figure 3: Processor node organization for a software only directory protocol. 6 faced to the second level cache bus as proposed in [12] where the first entry in the buffer (last for SB) is accessible from software. However, we propose to access the buffers through memorymapped addresses instead of register mapped as in [12] to adjust to mainstream processor designs. Coherence interrupts, also called high availability interrupts, ....

....for a software only directory protocol. 6 faced to the second level cache bus as proposed in [12] where the first entry in the buffer (last for SB) is accessible from software. However, we propose to access the buffers through memorymapped addresses instead of register mapped as in [12] to adjust to mainstream processor designs. Coherence interrupts, also called high availability interrupts, need to be precise as pointed out in [14] to avoid protocol deadlock; we cannot delay the software handler execution until a pending load completes. To see this, consider two nodes, i and ....

D. S. Henry and C. F. Joerg, "A Tightly-Coupled Processor-Network Interface," In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-V), pages 111-122, October, 1992.


Design and Evaluation of Communication Latency Hiding/Reduction.. - Afsahi (2000)   (Correct)

....major sources of the communication overhead. The communication hardware aspect includes the architecture and placement of the network interface, and the interconnection network and its services. Many architectures have been proposed for the network interfaces. They are classified as (1) direct [52, 7, 63, 80, 97, 88] and (2) memory based [48, 112, 126, 23] Direct network interfaces allow a processor to directly access the network queue. However, they mostly ignore the issue of multiprogramming. That is, a single thread can only use the network interface at a time. Memory based interfaces provide protection ....

D. S. Henry and C. F. Joerg, "A Tightly-Coupled Processor-Network Interface", Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems, 1992.


Network Interface for Message-Passing Parallel Computation on a.. - Hoe (1994)   (5 citations)  (Correct)

....a network interface for stock workstations can only communicate with the processor through a bus. A straightforward message passing interface could be implemented as memory mapped registers such as in CM 5 [14] or as a packet sized array of memory mapped registers as suggested by Joerg and Henry [9] (Figure 1) These interfaces are passive devices that only respond to the processor s direct manipulation through memory mapped operations. A user program composes an outbound packet by writing the content of the packet, with its header, to the registers through memory mapped writes. The content ....

....parallel processing occurs in frequent and small size messages. The communication overhead must be further minimized by giving the user processes direct control of the network interface. These lowoverhead user level network interface designs can be found in many contemporary MPP architectures [9, 14]. However, these designs typically involve the support of custom system or CPU design. In most contemporary workstation designs, the RISC microprocessors are optimized for cached accesses while the bus architectures are optimized for blocked transfers. The network interface design must take these ....

D. S. Henry and C. F. Joerg. A tightly-coupled processor-network interface. In Proceedings of ASPLOS, October 1992.


The Cranium Network Interface Architecture: Support for Message.. - McKenzie (1997)   (Correct)

....Most tightly coupled interface designs use special purpose message instructions (e.g. a send command) in which general purpose processor registers are the operands. Some examples include the Message Driven Processor (MDP) 22] the Caltech Mosaic [24] the Henry Joerg network interface [29] and the Start ( T) 30] network interface. An exception is iWarp from CMU [23] whose systolic communication model is based on operands rather than operators. A send command is constructed by using a message register as the destination of an arithmetic operation; a receive command is constructed ....

Dana S. Henry and Christopher F. Joerg. A tightly coupled processor-network interface. Proc. of the 5th ASPLOS, October 1992, pp. 111-122.


Infrastructure for Research towards Ubiquitous.. - Grosz, Kung.. (1994)   (Correct)

....appropriately distribute among the compiler, operating system, and hardware the functionality that is needed to efficiently support I O bound applications. One approach to a tighter coupling of the network and processor is to directly map the network ports into the processor s register file [10][49]. This approach makes it efficient for a compiler to send a message since the compiler simply needs to build a message in the processor s register file. A shortcoming of this approach is that it bypasses the operating system, and we want to take advantage of the operating system functions that ....

Dana S. Henry and Christopher F. Joerg. A tightly-coupled processor-network interface. In Proceedings of 5th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 111--122, October 1992.


Architectural Support for Compiler-Generated Data-Parallel Programs - Klaiber (1994)   (1 citation)  (Correct)

....message passing for streamlining data and control transfer between workstations on a local area network. Though our approach is similar in nature, our emphasis lies on userlevel communication and compiler support for parallel programming; their emphasis is on distributed applications. In [Henry Joerg 92b] the authors propose a network interface design that provides special support for Id [Nikhil 90] programs that have been compiled to Berkeley s Threaded Abstract Machine [Culler et al. 91a] They reduce communication overhead by implementing message dispatching, forwarding and replying in ....

D. S. Henry and C. F. Joerg. A tightly-coupled processor-network interface. In Proceedings of 5th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 111--122, October 1992.


Architectural Support for Compiler-Generated Data-Parallel Programs - Klaiber (1994)   (1 citation)  (Correct)

....interface. 2.1 Processing Nodes A fundamental decision in designing the processing node is whether to use commodity or custom processors. A custom processor design can improve communication performance; for example, the architect can integrate the network interface more tightly with the CPU [Henry Joerg 92a] or include special purpose communication instructions in the CPU. The Kendall Square KSR 1 shared memory computer [KSR 92] uses both approaches; its processors provide instructions for prefetching or post storing cache lines, plus a host of instructions that control the memory system, ....

....be useful to examine how the communication requirements for other programming models differ from those of data parallel languages. For example, Henry and Joerg designed a NI for use with the TAM [Culler et al. 91b] model of execution and reached conclusions that are somewhat different from ours [Henry Joerg 92a] While this is a fairly extreme example, in that their programming model is radically different from the data parallel model, it shows the tight connection between the choice of a programming model and the design of the communication architecture. 7.2 Conclusions We have identified ....

D. S. Henry and C. F. Joerg. A tightly-coupled processor-network interface. In Proceedings of 5th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 111--122, October 1992. 117


Do Faster Routers Imply Faster Communication? - Karamcheti, Chien (1994)   (6 citations)  (Correct)

....transfers, and is implemented using CMAM xfer function which splits up the transfer into a sequence of hardware packets at the source, and CMAM handle left xfer function which reassembles the packets at the destination. 1 While this is not the most efficient type of network interface [13, 8, 4], it has the significant virtue that no changes to the processor are required. Many researchers believe that this type of interface is basically representative of future network interfaces. 2 The CM 5 NI also supports an interrupt driven interface for reception; however, the cost is very high ....

....exploring what impact advanced network features (adaptive routing, virtual channels) have on network interface complexity and software overhead. Our work addresses some of these issues. Research on network interfaces has focused primarily on reducing message injection (and reception) overhead [13, 8, 19, 4] or offloading the communication onto a coprocessor [14, 16, 3] Such efforts are complementary to our goal of software protocol overhead reduction. Improvements in network interface can reduce the basic communication cost in our studies. While reducing the basic cost is important, as can be seen ....

D. S. Henry and C. F. Joerg. A tightly-coupled processor-network interface. In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages an Operating Systems, pages 111--122, 1992.


Network Interface Support for User-Level Buffer Management - Dubnicki, Li, Mesarina (1994)   (4 citations)  (Correct)

....is still hundreds of CPU instructions. In addition, the node is complex and expensive to build. Several projects have taken the approach of lowering communication latency by bringing the network all the way into the processor and mapping the network interface FIFOs to special processor registers [2, 6, 3]. Writing and reading these registers queues and dequeues data from the FIFOs respectively. While this is efficient for fine grain, low latency communication, it requires the use of a nonstandard CPU, and it does not support the protection of multiple contexts in a multiprogramming environment. ....

Dana S. Henry and Christopher F. Joerg. A tightly-coupled processor-network interface. In Proceedings of 5th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 111--122, October 1992.


UTLB: A Mechanism for Address Translation on Network Interfaces - Angelos (1998)   (8 citations)  (Correct)

....to remote virtual memory bu#ers. Digital s Memory Channel [20] uses an approach that is similar to PRAM on the sending side and to SHRIMP s automatic update on the receiving side. Another direct approach allows applications to compose and retrieve messages using network interface registers [22, 42]. In addition to network interface registers, the Cray T3E [42] supports remote memory accesses with an approach similar to UDMA. It uses complete page tables to describe global communication segments and all communication pages are pinned in memory. A network interface typically can hold only a ....

Dana S. Henry and Christopher F. Joerg. A tightly-coupled processor-network interface. In Proceedings of 5th International Conference on Architectur al Support for Programming Languages and Operating Systems, pages 111--122, October 1992.


Design Choices in the SHRIMP System: An Empirical Study - Blumrich, Alpert, Chen.. (1998)   (12 citations)  (Correct)

.... also supports user level message passing, but places more burden on application programs by requiring them to construct their own message headers [15] Some previous machines have worked to streamline the hardware software interface by mapping network interface FIFOs into processor registers [14, 24, 37]. Such approaches go against SHRIMP s goal of using commodity CPUs. A slightly less integrated approach mapping FIFOs to memory rather than registers was employed in the CM 5 [42] CM 5 implementation restrictions limited the degree of multiprogramming, however, and applications were still ....

Dana S. Henry and Christopher F. Joerg. A TightlyCoupled Processor-Network Interface. In Proceedings of 5th International Conference on Architectur al Support for Programming Languages and Operating Systems, pages 111--122, October 1992.


Virtual Memory Mapped Network Interface for the.. - Blumrich, Li.. (1994)   (238 citations)  (Correct)

....is still hundreds of CPU instructions. In addition, the node is complex and expensive to build. Several projects have taken the approach of lowering communication latency by bringing the network all the way into the processor and mapping the network interface FIFOs to special processor registers [5, 11, 7]. Writing and reading these registers queues and dequeues data from the FIFOs respectively. While this is efficient for fine grain, low latency communication, it requires the use of a non standard CPU, and it does not support the protection of multiple contexts in a multiprogramming environment. ....

Dana S. Henry and Christopher F. Joerg. A tightlycoupled processor-network interface. In Proceedings of 5th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 111--122, October 1992.


Software Overhead in Messaging Layers: Where Does the Time Go? - Karamcheti, Chien (1994)   (32 citations)  (Correct)

....Second, the nodes, NI, and the network all have finite buffering, so software buffer management is required. Third, the CM 5 network provides error detection at the packet level, but no error correction, requiring a software 1 While this is not the most efficient type of network interface [12, 6], it requires no changes to the processor. Many researchers believe that this type of interface is representative of future network interfaces. protocol to ensure reliable delivery. And finally, the CM5 network hardware only supports packets with five 32 bit words, so a typical message is broken ....

....remains significant over the range of packet sizes. For finite sequence multi packet deliveries, the messaging overhead is lower, but still significant, accounting for 9 11 of the total cost. Improved network interfaces and DMA hardware If network interfaces can be integrated on chip, as in [12, 6], the basic cost of communication can be reduced, but this will not reduce protocol costs in the messaging layer on which our study focuses. If the base cost is reduced, that increases the importance of the costs in the rest of the messaging layer. Similarly, while DMA hardware can reduce the cost ....

[Article contains additional citation context not shown here]

D. S. Henry and C. F. Joerg. A tightly-coupled processornetwork interface. In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages an Operating Systems, pages 111--122, 1992.


Two Virtual Memory Mapped Network Interface Designs - Blumrich, Dubnicki.. (1994)   (5 citations)  (Correct)

....is still hundreds of CPU instructions. In addition, the node is complex and expensive to build. Several projects have taken the approach of lowering communication latency by bringing the network all the way into the processor and mapping the network interface FIFOs to special processor registers [3, 8, 4]. Writing and reading these registers queues and dequeues data from the FIFOs respectively. While this is efficient for fine grain, low latency communication, it requires the use of a non standard CPU, and it does not support the protection of multiple contexts in a multiprogramming environment. ....

Dana S. Henry and Christopher F. Joerg. A tightlycoupled processor-network interface. In Proceedings of 5th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 111--122, October 1992.


Design and Evaluation of Network Interfaces for System Area.. - Mukherjee (1998)   (Correct)

....No T Zero [103] Partial Partial No SHRIMP [12] Yes Write Through No DI Multicomputer[23] No No Network Interface Table 3.5: Comparison of CNI with other network interfaces 80 communicate through the cachable memory accesses, for which most processors and buses are optimized. Henry and Joerg [50] and Dally, et al. 34] advocate changes to a processor s registers. MIT Alewife [2] and Fugu [72] rely on a custom cache controller. MIT StarT NG [22] requires a co processor interface at the same level as the L2 cache. AP1000 [110] requires integrated cache and DMA controllers. Stanford FLASH ....

....of buffering using a synthetic workload and concluded that buffering messages in virtual memory can occur only rarely for realistic applications. However, in contrast I found that for two of my seven macrobenchmarks, buffering can play a significant role in improving performance. Henry and Joerg [50] compared the performance of three NIs mapped respectively to the processor registers, L1 cache bus, and an off chip L2 cache bus. However, unlike my study, they did not examine the impact of buffering on the performance of these NIs. 5.6 Conclusions In this chapter I have systematically ....

Dana S. Henry and Christopher F. Joerg. A Tightly-Coupled Processor-Network Interface. In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS V), pages 111--122, October 1992.


The M-Machine Multicomputer - Fillo, Keckler, Dally, Carter.. (1995)   (22 citations)  (Correct)

.... the sender and the receiver, and eliminating the dedicated memory for message arrival, as is found on the J Machine [8] Register mapped network interfaces have been used previously in the Mars Machine [2] J Machine, and iWarp [4] and have been described by T [26] as well as Henry and Joerg [15]. However, none of these systems provide protection for user level messages. Systems, like the J Machine, that provide user access to the network interface without atomicity must temporarily disable interrupts to allow the sending process to complete the message. The M Machine s atomic SEND ....

Henry, D. S., and Joerg, C. F. A tightly-coupled processor-network interface. In Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS V) (Oct. 1992), ACM, pp. 111--122.


Efficient Strategies for Software-Only Directory Protocols.. - Grahn, Stenström (1995)   (7 citations)  (Correct)

....request. When a coherence request is present, the compute processor is interrupted and executes the corresponding software handler. Active messages [18] can be used to rapidly select which coherence routine the processor will execute. IB is interfaced to the second level cache bus as proposed in [11], i.e. it has the same access time as the second level cache and the first request in IB is accessible through memory mapped addresses. In addition, a Send Buffer (SB) that is interfaced to the second level cache bus in the same way as IB allows the processor to efficiently send messages in ....

D. S. Henry and C. F. Joerg, "A Tightly-Coupled Processor-Network Interface", In Proceedings of ASPLOS-V, pages 111-122, October, 1992.


Design Choices in the SHRIMP System: An Empirical Study - Matthias Blumrich (1998)   (12 citations)  (Correct)

.... also supports user level message passing, but places more burden on application programs by requiring them to construct their own message headers [15] Some previous machines have worked to streamline the hardware software interface by mapping network interface FIFOs into processor registers [14, 25, 38]. Such approaches go against SHRIMP s goal of using commodity CPUs. A slightly less integrated approach#mapping FIFOs to memory rather than registers#was employed in the CM 5 [43] CM 5 implementation restrictions limited the degree of multiprogramming, however, and applications were still ....

Dana S. Henry and Christopher F. Joerg. A TightlyCoupled Processor-Network Interface. In Proceedings of 5th International Conference on Architectur al Support for Programming Languages and Operating Systems, pages 111#122, October 1992.


Early Experience with Message-Passing on the SHRIMP.. - Felten, Alpert.. (1996)   (23 citations)  (Correct)

....build packet headers. They have not tried to implement messagepassing libraries using the underlying communication mechanism. Several projects have tried to lower overhead by bringing the network all the way into the processor and mapping the network interface FIFOs to special processor registers [11, 21, 13]. While this is efficient for fine grain, low latency communication, it requires the use of a non standard CPU, and it does not support the protection of multiple contexts in a multiprogramming environment. The Connection Machine CM 5 implements user level communication through memory mapped ....

Dana S. Henry and Christopher F. Joerg. A TightlyCoupled Processor-Network Interface. In Proceedings of 5th International Conference on Architectur al Support for Programming Languages and Operating Systems, pages 111--122, October 1992.


Differences Between Distributed and Parallel Systems - Riesen, Brightwell, al. (1998)   (Correct)

.... Most provide direct memory access (DMA) to local node memory (if not the capability to transfer data to and from the cache or even the registers of the CPU) This tight integration with the CPU and memory subsystem on the node allows for low latency and high bandwidth access to the network(see [18], 23] or [17] for examples) CPU Memory NI Memory Bus Network CPU Memory I O Controller Memory Bus NI Network I O Bus Figure 2: Network interface (NI) integration: On modern MP systems the network interface is either integrated into the CPU or is tightly coupled with the memory and CPU on a ....

....the service partition operating system on the Intel Paragon is a version of OSF 1 AD running on Mach. It provides a single system image that gives processes on any service node the illusion of a single file system and a single, large node. Several papers describe MP systems and their networks [18, 7, 11, 21]. The need to lower software overhead to access MP networks as well as in distributed systems with high performance network interfaces has been noted in several papers. A thorough treatment of this topic can be found in [32] A theoretical model that takes real world aspects of message passing, ....

D. S. Henry and C. F. Joerg. A tightly-coupled processor-network interface. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 111--122, Boston, MA, Sept. 1992. ACM Press, New York. Published as SIGPLAN Notices, volume 27, number 9.


Polling Watchdog: Combining Polling and Interrupts for Efficient .. - Maquelin (1996)   (33 citations)  (Correct)

....If parallel architects have to give in and use off the shelf microprocessors, then it is important to study how best to take advantage of their capabilities. Considerable gains in communication performance can be achieved by making the network interface accessible by user level code [8] and by using low latency message passing mechanisms such as Active Messages [20] As designers try to improve communication performance further, especially for short messages, ways must be found for the system to react to network events quickly and cost effectively. We believe that the ....

Dana S. Henry and Christopher F. Joerg, "A TightlyCoupled Processor-Network Interface," in Proc. of the Fifth Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, Boston, Mass., pp. 111--122, Oct. 1992.


Computation Structures Group Progress Report 1991-92 - Computation Structures   (Correct)

....related to our overall project developing parallel architectures and languages. 10.1 Network interface studies Dana Henry and Chris Joerg have been studying network interfaces. Their proposed interface architecture typically achieves a three fold improvement over the best existing interfaces. [10] Most of the performance gain comes from simple, low cost hardware support mechanisms for fast dispatching on, forwarding of, and replying to messages. The remaining improvement is gained by mapping the network interface directly to the processor s register file rather than its memory. These ....

D. Henry and C. Joerg. A Tightly Coupled Processor Network Interface. In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, October 1992.


Stream Sockets on SHRIMP - Damianakis, Dubnicki, Felten (1996)   (6 citations)  (Correct)

....build packet headers. They have not tried to implement messagepassing libraries using the underlying communication mechanism. Several projects have tried to lower overhead by bringing the network all the way into the processor and mapping the network interface FIFOs to special processor registers [8, 16, 9]. While this is efficient for fine grain, low latency communication, it requires the use of a non standard CPU, and it does not support the protection of multiple contexts in a multiprogramming environment. The Connection Machine CM 5 implements user level communication through memory mapped ....

Dana S. Henry and Christopher F. Joerg. A TightlyCoupled Processor-Network Interface. In Proceedings of 5th International Conference on Architectur al Support for Programming Languages and Operating Systems, pages 111--122, October 1992.


The Persistent Relevance of IPC Performance: New.. - Hsieh, Kaashoek, Weihl (1993)   (5 citations)  (Correct)

....of parallel and distributed systems. In custom multiprocessors a great deal of money is invested in making networks fast [Agarwal et al. 1991, Leiserson et al. 1992] network delays are on the order of a few tens of processor cycles. Mapping the hardware interface into registers or user space [Henry and Joerg 1992, Dally et al. 1991, Leiserson et al. 1992] can further lower the cost of sending messages. In distributed systems, network performance is approaching the performance of networks in custom multiprocessors. For example, the Fore ATM network interface (which can be mapped into user space) can ....

Henry, D.S., and Joerg, C.F., "A Tightly-Coupled Processor-Network Interface," Proc. 5th Symposium on Architectural Support for Programming Languages and Operating Systems, pp. 111-122, Boston, MA, Oct. 1992.


Message Passing Support for Multi-grained.. - Ang, Chiou, Rudolph.. (1996)   (Correct)

....a general purpose system requires mechanisms that adequately address the three M s of message passing: multi granularity, multithreading and multi tasking. While the performance of certain aspects of message passing has improved with the introduction of direct user level access to network interface[43, 23, 39] and the exploitation of host system capabilities such as coherence on the memory bus[34] architectural support for the three M s remains inadequate. Parallel and distributed applications exhibit a multitude of communication granularity. At one extreme, a control message that conveys a simple ....

....write or read. In order to maximize the data sent by an Express message, low order address bits are used for the logical destination address and additional data, permitting almost 5 bytes of data to be sent with a single uncached 4 byte write. Such address bit stealing has been proposed before[23], but we are familiar with only one other machine (Cray T3E) that uses it. A major challenge of the Express message design is to maximize the amount of data transported by a message while keeping each compose and launch to a single, uncached memory access. The Express message mechanism assumes ....

C. F. Joerg and D. S. Henry. A Tightly-Coupled Processor-Network Interface. In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, Boston, MA, pages 111--122, Oct. 1992.


Exploiting Two-Case Delivery for Fast Protected Messaging - Mackenzie, Kubiatowicz, .. (1998)   (14 citations)  (Correct)

....then accesses the message buffers in memory. Although a definitive conclusion awaits further research, past research indicates that direct interfaces tend to be more efficient than memory based interfaces. Direct interfaces that can be accessed at cache speeds offer even better performance [14]. For example, the CNI paper [24] Current affiliation: Georgia Institute of Technology, Atlanta, GA 30332 y Current affiliation: University of California at Berkeley, Berkeley CA 94720 z Current affiliation: Intel Corporation, Santa Clara, CA 95052 showed that a direct, cache level ....

Dana S. Henry and Christopher F. Joerg. A Tightly-Coupled Processor-Network Interface. In Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, October 1992.


Virtual Memory Mapped Network Interface for the.. - Blumrich, Alpert.. (1993)   (238 citations)  (Correct)

....is still hundreds of CPU instructions. In addition, the node is complex and expensive to build. Several projects have taken the approach of lowering communication latency by bringing the network all the way into the processor and mapping the network interface FIFOs to special processor registers [5, 10, 6]. Writing and reading these registers queues and dequeues data from the FIFOs respectively. While this is efficient for fine grain, lowlatency communication, it requires the use of a nonstandard CPU, and it does not support the protection of multiple contexts in a multiprogramming environment. ....

Dana S. Henry and Christopher F. Joerg. A tightlycoupled processor-network interface. In Proceedings of 5th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 111--122, October 1992.


Message Passing Support on StarT-Voyager - Ang, Chiou, Rudolph, Arvind (1996)   (15 citations)  (Correct)

....Any change in these pointers is taken as a control signal. Another option is to use a full empty bit in each message buffer for coordination. In order to increase the amount of data transmitted by each memory operation, it is possible to use address bits to pass data from the processor to the NES[23]. Specific lower order (to minimize impact on TLB entry utilization) address bits are viewed as data bits by the NES while the higher order bits are constant to mark this special region of memory. When the NES sees memory operations to that region of memory, it extracts the additional data from ....

C. F. Joerg and D. S. Henry. A Tightly-Coupled Processor-Network Interface. In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, Boston, MA, pages 111--122, Oct. 1992.


UDM: User Direct Messaging for General-Purpose.. - Mackenzie.. (1996)   (4 citations)  (Correct)

....mechanisms including DMA for bulk transfer and hardware synthesized messages for accelerating shared memory. Sender: Network: Receiver: transit time ( 1 uS) compose (7 cycles) receive occupancy (65 cycles, trap, 9 cycles, poll) Figure 1: Direct message timing. memory system [17, 6, 11, 8, 1]. Low overhead and latency are achieved by avoiding the memory system, so that message overheads scale with processor performance rather than with memory performance. Figure 1 represents the one way message latency and overheads predicted for our prototype system, FUGU, operating at 20MHz. By ....

Dana S. Henry and Christopher F. Joerg. A Tightly-Coupled Processor-Network Interface. In Fifth Internataional Architectural Support for Programming Languages and Operating Systems (ASPLOS V), Boston, October 1992. ACM.


An Efficient Virtual Network Interface in the FUGU Scalable.. - Mackenzie (1998)   (1 citation)  (Correct)

....then accesses the message buffers in memory. Although a definitive conclusion awaits further research, past research indicates that direct interfaces tend to be more efficient than memory based interfaces. Direct interfaces that can be accessed at cache speeds offer even better performance [28]. For example, the CNI paper [56] showed that a direct, cache level interface exhibited 50 higher bandwidth than their best interface placed on the memory bus. Direct interfaces are challenging to protect without sacrificing efficiency or seriously impairing the multiprogramming model. Therefore, ....

....4 4) show that the base message passing costs in FUGU are comparable to the costs of an unprotected, direct interface on a single user machine built on the same hardware base. Others have shown that tightly coupled, direct interfaces tend to be more efficient than indirect, memory based interfaces [28, 56]. Thus FUGU s peak performance is high. Two case delivery and virtual buffering provide good system performance in terms other than speed as well. Virtual buffering enables low memory consumption compared to a system that must provide a fixed amount of physical buffering per application. The ....

[Article contains additional citation context not shown here]

Dana S. Henry and Christopher F. Joerg. A Tightly-Coupled Processor-Network Interface. In Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, October 1992.


The Stanford FLASH Multiprocessor Page 1 - The Stanford   (Correct)

....paper discusses them only at a high level, giving the instruction set additions but no hardware blocks. T also does not have effective support of cache coherence as a goal, and consequently does not discuss the implications of their design for that issue. In a paper related to T, Henry and Joerg [HJ92] discuss issues in the design of the network node interface. They argue that most protocol processing for handling messages can be done by a general purpose processor. Hardware support is needed in only a couple of places, namely for fast dispatch based on type of incoming message and for ....

Dana S. Henry and Christopher F. Joerg. A Tightly-Coupled Processor-Network Interface. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 111-22, Boston, MA, October 1992.


Mechanisms for Efficient, Protected Messaging - Lee   (Correct)

....guarded pointer facility in the M Machine serves as a enforcement mechanism for these requirements, most Active Message implementations do not have comparable provisions. Instead, they must rely on cooperative behaviors in the software. The register mapped interface is inspired by Henry and Joerg [36], who describe an interface that scrolls a register window s worth of message words into or out of the network. Variable size messages are supported by way of successively scrolling the message words in or out, one window at a time. In contrast, for starvation avoidance, M Machine messages are ....

....words are used exactly once by the handler 2 , the M Machine adopts a more efficient streaming extraction interface that is optimized to automatically remove each word from the message queue as it is consumed. Although it does not provide random access to the message body through a window as in [36], no negative effect is expected as most instructions cannot simultaneously use more than two operands anyway, while the arrival order of message words can be re arranged so that those needed first also arrive first. The M Machine network employs four virtual channels [37] for performance and ....

Dana S. Henry, Christopher F. Joerg, "A Tightly-Coupled Processor Network Interface", in ASPLOS V, 1992, pp. 111--122.


A Comparison of Architectural Support for Messaging in the TMC.. - Karamcheti (1995)   (27 citations)  (Correct)

....all these areas, evaluating the hardware support and messaging protocols required to provide robust performance for a range of dynamic and irregular traffic patterns. Research on specialized hardware support for messaging has focused primarily on integrating message processing within the processor [10, 14, 1, 25]. These approaches are effective in reducing point to point costs, but provide no solutions for network and output contention. In contrast, we have investigated messaging atop shared address space primitives and demonstrated that it can deliver performance robust over output contention. Research ....

D. S. Henry and C. F. Joerg. A tightly-coupled processor-network interface. In Proceedingsof the Fifth International Conferenceon Architectural Support for Programming Languages an Operating Systems, pages 111--122, 1992.


Parallel Overhead in Executing Data-Parallel Programs on *T - Andrew Shaw   (Correct)

.... of 400 Megabytes second [4] The modifications to the processor are aimed at lowering the time to access the network from the processor additional register sets are incorporated into the processor which tightly couple the sequential core of the processor with the high bandwidth network [3]. 3 Dataparallel C The data parallel language we used for the measurements was Dataparallel C (DPC) 2] DPC is a variant of C , and it has the advantage that its source code is publically available, and that it was designed with portability in mind. The DPC compiler takes DPC code, compiles it ....

D. Henry and C Joerg. A Tightly coupled ProcessorNetwork Interface. In Proceedings of the 5th Internation Conference on Architectural Support for Languages and Operating Systems, pages 111--122, October 1992.


The Impact of Data Transfer and Buffering Alternatives on.. - Mukherjee (1998)   (6 citations)  (Correct)

....of buffering using a synthetic workload and concluded that buffering messages in virtual memory can occur only rarely for realistic applications. However, in contrast we found that for two of our seven macrobenchmarks, buffering can play a significant role in improving performance. Henry and Joerg [18] compared the performance of three NIs mapped respectively to the processor registers, L1 cache bus, and an off chip L2 cache bus. However, unlike our study, they did not examine the impact of buffering on the performance of these NIs. 8 Conclusions In this paper we have systematically ....

Dana S. Henry and Christopher F. Joerg. A Tightly-Coupled Processor-Network Interface. In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS V), pages 111--122, October 1992.


The Paros Operating System Microkernel - Labarta, Girona, Cortes, Gimenez, .. (1994)   (Correct)

....be done through a procedural interface [GEIS93] BOYL87] usually The Paros Operating System Microkernel 3 implemented on top of operating system services. In order to achieve efficient message passing systems, the communication mechanism should be closely integrated with the machine language [HENR92] in the line of the transputer [HOME87] approach. It is nevertheless very important that these mechanisms are designed in such a way that they offer efficient communication to the user program, and possibility of control to the operating system. With the duality between shared memory and message ....

Dana S. Henry and Christopher F. Joerg "A Tightly-Coupled Processor-Network Interface" ASPLOS V, Octouber 1992 p. 111-122


The W-Network: A Low-Cost Fault-Tolerant Multistage.. - Theobald (1995)   (Correct)

.... concerned with networks for large scale parallel machines exploiting fine grain parallelism, such as dataflow architectures [19] or the EARTH multithreaded project [10] Fine grain parallelism is important to such machines, because in many applications, large grain sizes constrain parallelism [8]. As more processors are added to parallel machines, the trend has been for computation grain sizes to decrease [4] If grain sizes are small, then the messages passed between them will be short and frequent, on average. For such systems, a low latency network is important [9] Although there are ....

....such as multithreading [10] achieving good processor utilization requires surplus parallelism, which may not always be available, especially when there are many processors. Therefore, interprocessor communication systems have been moving toward low latency networks transmitting short messages [8, 21]. In many problems, the communication patterns can be highly variable. For this reason, the network should be packet switched, meaning that each message contains a header that 0 1 0 1 b 1 0 b 1 b 1 2b 1 b 1 1 b N 1 0 1 b 1 b b 1 2b 1 (b 1)b (b 1)b 1 N 1 N b . ....

Dana S. Henry and Christopher F. Joerg, "A Tightly-Coupled Processor-Network Interface, " Proc. of the Fifth Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, Boston, Mass., pp. 111--122, Oct. 1992.


Coherent Network Interfaces for Fine-Grain Communication - Mukherjee, Falsafi, al. (1996)   (25 citations)  (Correct)

....low latency networks is rapidly making NIs a bottleneck. Rather than try to explore the entire NI design space here, we focus our efforts three ways: First, we concentrate on NIs that reside on memory or I O buses. In contrast, other research has examined placing NIs in processor registers [5,15,21], in the level one cache controller [1] and on the level two cache bus [10] Our NIs promise lower cost than the other alternatives, given the economics of current microprocessors and higher integration level we expect in the future. Nevertheless, closer integration is desirable if it can be made ....

....Unlike many other NIs, our implementation of CNIs does not require changes to an SMP board or other standard components. Yet they enable processors and network interfaces to communicate through the cachable memory accesses, for which most processors and buses are optimized. Henry and Joerg [21] and Dally, et al. 15] advocate changes to a processor s registers. MIT Alewife [1] and Fugu [32] rely on a custom cache controller. MIT StarT NG [10] requires a co processor interface at the same level as the L2 cache. AP1000 [41] requires integrated cache and DMA controllers. Stanford FLASH ....

Dana S. Henry and Christopher F. Joerg. A Tightly-Coupled Processor-Network Interface. In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS V), pages 111--122, October 1992.


The M-Machine Multicomputer - Fillo, Keckler, Dally (1995)   (22 citations)  (Correct)

.... the sender and the receiver, and eliminating the dedicated memory for message arrival, as is found on the J Machine [8] Registermapped network interfaces have been used previously in the Mars Machine [2] J Machine, and iWarp [4] and have been described by T [26] as well as Henry and Joerg [15]. However, none of these systems provide protection for user level messages. Systems, like the J Machine, that provide user access to the network interface without atomicity must temporarily disable interrupts to allow the sending process to complete the message. The M Machine s atomic SEND ....

HENRY, D. S., AND JOERG, C. F. A tightly-coupledprocessor-networkinterface. In Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS V) (Oct. 1992), ACM, pp. 111-- 122.


Application-Specific Protocols for User-Level Shared.. - Falsafi, Lebeck.. (1994)   (75 citations)  (Correct)

....but specify Stache32 or some other customized protocol for a data structure that exhibits false sharing. 2.3 Implementing Tempest Tempest s messaging and virtual memory support are largely conventional. Active message abstractions can be implemented very efficiently with custom hardware [6, 18], but also have reasonable performance on existing machines [25] Tempest s virtual memory mechanisms can be implemented as a userlevel library on a system that provides mmap( and munmap( or with custom kernel modifications [19] The key challenge in implementing Tempest is sup3 Appears in: ....

Dana S. Henry and Christopher F. Joerg. A Tightly-Coupled Processor-Network Interface. In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS V), pages 111--122, October 1992.


Effect of Short Term Scheduling on Message Passing.. - Sergi Girona   (Correct)

....for message passing which frequently has to be done through a procedural interface [GEIS93] BOYL87] usually implemented on top of operating system services. In order to achieve efficient message passing systems, the communication mechanism should be closely integrated with the machine language [HENR92] in the line of the transputer [HOME87] approach. It is nevertheless very important that these mechanisms are designed in such a way that they offer efficient communication to the user program, and possibility of control to the operating system. With the duality between shared memory and message ....

Dana S. Henry and Christopher F. Joerg "A Tightly-Coupled ProcessorNetwork Interface" ASPLOS V, Octouber 1992 p. 111-122


The Performance Potential of an Integrated Network.. - Binkert, Dreslinski.. (2004)   (Correct)

No context found.

D. S. Henry and C. F. Joerg. A tightly-coupled processornetwork interface. In Proc. Fifth Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS V), pages 111--122, Oct. 1992.


Analyzing NIC Overheads in Network-Intensive Workloads - Binkert, Hsu, Saidi.. (2004)   (Correct)

No context found.

Dana S. Henry and Christopher F. Joerg. A tightly-coupled processor-network interface. In Proc. Fifth Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS V), pages 111-- 122, October 1992.


Two Virtual Memory Mapped Network Interface Designs - Blumrich, Dubnicki.. (1994)   (5 citations)  (Correct)

No context found.

D. S. Henry and C. F. Joerg, "A tightly-coupled processor-network interface," in Proceedings of 5th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 111--122, Oct. 1992.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC