29 citations found. Retrieving documents...
Edward W. Felten. Protocol compilation: high-performance communication for parallel programs. PhD dissertation, University of Washington, Dept. of CSE, Sept. 1993, UW-CSE-TR 93-09-09. 9

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Relaxed Synchronization Message Passing - Alpert, Philbin   (Correct)

....not in any obvious or easily computable way. This behavior could be predicted completely, but the cost of doing so would be significant: run the application using a set of indentical input data while observing the communication behavior, a technique has been proven useful in some circumstances [Fel93] In spite of the irregularity and unpredictability of Barne s communication patterns, in the RSMP version the number of incoming messages typically was still known by receivers in 28.8 of the supersteps. Ocean Ocean is a fluid dynamics application that simulates large scale ocean movements. ....

Edward W Felten. Protocol Compilation: High-Performance Communication for Parallel Programs. PhD thesis, University of Washington, August 1993.


Reducing Communication Overhead in Asynchronous Distributed.. - Chethur (1998)   (Correct)

....microprocessors. Though this is very powerful mechanism, security aspects of the operating systems may limit its versatility [61] 4.2. 2 Protocol Compilation Felten identified the problem of the high overhead associated with communication primitives a problem he calls the communication gap [19]. He noticed that the software cost for sending a message (for a highly optimized message library) was two orders of magnitude higher than the hardware overhead for sending the message. This gap is due to the message send protocol being inefficient (because it is not specific to the application) ....

Felten, E. W. Protocol compilation: High-performance communication for parallel programs. Tech. rep., University of Washington --- Dept. of Computer Science, 1993.


Optimizing Communication in Time-Warp Simulators - Chetlur, Abu-Ghazaleh.. (1998)   (8 citations)  (Correct)

....the communication, and if the communication behavior is statically known. In addition, it requires considerable effort on the part of the application writer. Felten identified the problem of the high overhead associated with communication primitives a problem he calls the communication gap [4]. Felten investigated protocol compilation, where a communication protocol specific to the application is compiled with it for improved performance. For several applications, an average speedup of 7 was reported. Carothers et al. studied the effect of communication on the efficiency of Time Warp ....

E. W. Felten. Protocol compilation: High-performance communication for parallel programs. Technical report, University of Washington --- Dept. of Computer Science, 1993.


Synthesizing a Usable Global Time from Multiple Independent Clocks - Alpert   (Correct)

....generated during program execution. Using this method, however, some applicatoin programs are characterized by some recv operations completing before the corresponding send operations began. The search for a method of calculating usable global time again escalated. In his dissertation research [3], Ed Felten applied this technique iteratively to obtain a usable global time. The time is corrected using this method. The corrected times then are corrected, and those times corrected, either for a specified number of iterations or until corrected times converge. Rather than apply that ....

....In Hypercube Clock Synchronization [2] T H Dunigan extensively analyzes algorithms for time synthesis on Intel and Ncube hypercube multiprocessors. The technique is similar to that described at the beginning of section 3. 3 (see also Figure 6) Iterative application of this technique (as in [3]) probably would have synthesized a usable global time. title: Wait Free Clock Synchronization authors: Shlomi Dolev, Jennifer L. Welch title: Optimal Clock Synchronization under Different Delay Assumptions authors: Hagit Attiya, Amir Herzberg, Sergio Rajsbaum 5 Conclusion Acknowledgements ....

Edward W Felten. Protocol Compilation: HighPerformance Communication for Parallel Programs. PhD thesis, University of Washington, August 1993.


The Cranium Network Interface Architecture: Support for Message.. - McKenzie (1997)   (Correct)

....10 8. The application program at the receiving processing node is notified that the message has arrived, whereupon code is dispatched to perform any necessary post processing of the message. Arbitrarily complicated message protocols can be built on top of simple messages. A message protocol [10] is an agreement between the sender and the receiver concerning the size, format and sequence of the message. Typically, message passing systems provide a number of different message protocols for the application programmer. One example is a bi directional, synchronous message protocol that ....

....if the network interface does not use DMA. An active message includes the address of the receiving node s software handler routine in the message itself, and the overhead of dispatch is thereby reduced to a small fixed cost. Protocol compilers address the protocol choice problem. Parachute [10] is one such protocol compiler that analyzes message passing patterns in a parallel program; it automatically gen 17 erates a new program where the optimal protocol is selected for each message in the program. A protocol compiler can only choose from the existing protocols in the message passing ....

[Article contains additional citation context not shown here]

Edward W. Felten. Protocol compilation: high-performance communication for parallel programs. PhD dissertation, University of Washington, Dept. of CSE, Sept. 1993, UW-CSE-TR 93-09-09.


Packet Routing in Multiprocessor Networks - Chinn (1995)   (Correct)

....is today. As machines get bigger, messages will have to travel through a greater number of nodes to reach their destinations, incurring a greater delay. Also, the time it takes to create a message (a one time cost per message) will decrease as more sophisticated techniques are employed (e.g. see [Fel93] The interconnection network is composed of nodes, which usually correspond to processing elements, and links, the wires that connect nodes. In one time step, a node can transmit one message along each of its links. Messages travel from node to node, and each node decides how to send messages ....

E. Felten. Protocol Compilation: High-Performance Communication for Parallel Programs. PhD thesis, University of Washington, Seattle, WA, September 1993.


Architectural Support for Compiler-Generated Data-Parallel Programs - Klaiber (1994)   (1 citation)  (Correct)

....all the communication work and (assuming nonblocking caches) the CPU can proceed with local computations while the communication is in progress. In contrast, CPU overhead is very high in traditional message passing machines, often over an order of magnitude higher than the interconnect latency [Felten 93b] However, we have seen in the previous chapter that the message passing model has many desirable features. Our goal in this chapter is to design a network interface for distributed memory architectures that achieves low CPU overhead comparable to sharedmemory machines, while at the same time ....

E. W. Felten. Protocol Compilation: High-Performance Communication for Parallel Programs. PhD dissertation, Department of Computer Science and Engineering, University of Washington, Seattle, WA 98195, September 1993. Available as technical report 93-09-09.


Architectural Support for Compiler-Generated Data-Parallel Programs - Klaiber (1994)   (1 citation)  (Correct)

....largely focused on reducing the high per message overhead typically found in message passing systems. For example, active messages [von Eicken et al. 92] are a low level transport mechanism that achieves low latency by efficiently dispatching to a message handler on the receiving node. Felten [Felten 93a] proposes using a protocol compiler to custom generate message passing protocols for a given program and thus reduce protocol overhead. The above two approaches rely entirely on software techniques; however, hardware approaches have been suggested as well. For example, the Shrimp architecture ....

....communication model also has several drawbacks. First, traditional message passing incurs run time protocol overhead, e.g. for managing the buffers required for the just in time delivery semantics. Felten has shown that protocol overhead degrades the performance of message passing codes [Felten 93a] Since the C communication operations read and write parallel variables that are already allocated by the compiler, most of the buffer management overhead is completely unnecessary. Second, in a message passing model of communication, data transfer and synchronization are always combined, even ....

[Article contains additional citation context not shown here]

E. W. Felten. Protocol Compilation: High-Performance Communication for Parallel Programs. PhD dissertation, Department of Computer Science and Engineering, University of Washington, Seattle, WA 98195, September 1993. Available as technical report 93-09-09.


Network Interface Support for User-Level Buffer Management - Dubnicki, Li, Mesarina (1994)   (4 citations)  (Correct)

....and analysis indicate that moving communication buffer management out of the kernel to the user level can greatly reduce the software overhead of message passing. Felten found out that using compiled, applicationtailored runtime library for message passing, the latency can be improved by 30 [4]. With a non traditional design of network interface, the software overhead of message passing primitives for common cases can be reduced to less than 10 instructions [1] without sacrificing protection. An interesting question is whether it is possible to support user level buffer management with ....

Edward W. Felten. Protocol Compilation: High-Performance Communication for Parallel Programs. PhD thesis, Dept. of Computer Science and Engineering, University of Washington, August 1993. Available as technical report 93-09-09.


Improving the Communication Subsystem Performance of WARPED - Rajasekaran (1998)   (1 citation)  (Correct)

....In general, communication overhead is present in many parallel applications. There have been optimizations which increase the performance of the communication subsystems of parallel applications. These optimizations are discussed in this section. 3.2. 1 Special Communication Protocols Felten [28, 29] observed that message passing applications have communication gap which is the gap between the software and hardware in performing the communication otherwise called the protocol overhead. The communication gap is caused by three factors: ffl Protection Boundaries: Communication is often done ....

Felten, E. W. Protocol compilation: High-performance communication for parallel programs. Tech. rep., University of Washington --- Dept. of Computer Science, 1993.


Addressing Communication Latency Issues on.. - Kumar..   (Correct)

....network links. The nature of the communication interface in a message passing environment is such that the software overhead (time required for the preparation and authentication of the message) is significantly higher than the hardware overhead (network setup and message propagation time) [5]. Under such circumstances, an important lesson that has been learned is to minimize the frequency of communication, but not necessarily the size of the messages in order to arrive at an efficient implementation. However, a large class of distributed applications tend to have fine grained ....

....simulator) Specifically, our efforts in reducing the communication latency along three of the four aforementioned dimensions are detailed and discussed. As rewriting and restructuring the application code for less frequent communication is relatively a well studied and applicationspecific method [5], we will not discuss it here in this paper. However, each of the other three dimensions are dealt in detail and supported by empirical analysis. The remainder of this paper is organized as follows. Section 2 overviews parallel discrete event simulation (which is our domain of interest) and ....

Felten, E. W. Protocol compilation: High-performance communication for parallel programs. Tech. rep., University of Washington --- Dept. of Computer Science, 1993.


System Support for Efficient Network Communication - Thekkath (1994)   (4 citations)  (Correct)

....we believe that with little additional mechanism, it is possible to adequately support the needs of parallel applications. Ideally, we do not expect programmers to directly use the network access model to write their parallel applications on the cluster. That is the task of a protocol compiler [29]. This notion is similar in spirit to the idea of using RPC stub generators to hide the details of marshaling from users. In general terms, a protocol compiler analyses a parallel application for communication and computation phases. It then generates communication instructions, e.g. remote reads ....

Edward W. Felten. Protocol Compilation: High-Performance Communication for Parallel Programs. Ph.D. thesis, University of Washington, September 1993. Department of Computer Science and Engineering Technical Report 93-09-09.


An Efficient Virtual Network Interface in the FUGU Scalable.. - Mackenzie (1998)   (1 citation)  (Correct)

....such as data transfer time. More important than emulation, a low level model offers the promise that network traffic and protocol overhead may be reduced over that required by a higher level model by programmer specialization [78] or through automatic, compile time analysis and specialization [23, 34]. An ideal low level model provides a complete set of communication operations and exposes fundamental costs. The programmer is thus given the ability to craft communication protocols tailored to the application and to minimize communication costs using application specific knowledge. The ....

Edward W. Felten. Protocol Compilation: High-Performance Communication for Parallel Programs. PhD thesis, University of Washington, Department of Computer Science and Engineering, 1993.


Cranium: An Interface for Message Passing on Adaptive Packet.. - Mckenzie (1994)   (10 citations)  (Correct)

....of the older generation of network interfaces considered message passing to be an operating system service. The latency of message passing on a system with direct user level access to the network is reduced by one to two orders of magnitude compared with a similar system level interface. Felten [5] outlines the basic strategy for safe user level communication that Cranium supports system partitioning, hardware validation for message destinations, gang scheduling, saving and restoring the network state, and separate user and kernel level communication. The third requirement is the ability ....

....logic technology such as PALs and FPGAs for the DMA engine, bus interface and network link interface. A higherspeed version may involve a semi custom solution using sea of gates technology. The current design of Cranium uses the standard gang scheduling model for safe user level access [5, 13]. Recently, there has been interest in studying general processor scheduling on multicomputers that have user level support for message passing [11, 14] The idea is to let multiple user processes execute at the same time inside the same machine partition. The interaction between adaptive packet ....

Edward W. Felten. Protocol Compilation: High-Performance Communication for Parallel Programs. PhD dissertation, University of Washington, Dept. of CSE, Sept. 1993. Available as UW CSE technical report TR 93-09-09.


Triplex Router: A Versatile Torus Routing Algorithm - Melanie Fulgham (1996)   (Correct)

....both wormhole or packet switched flow control, including virtual cut through [KK79] and store and forward techniques. The choice of class may be specified at system boot up, dynamically, or individually by message. Dynamic class selection may be useful when compile time information is available [Fel93] enabling the system to select the best class for the expected traffic. Individual selection is useful, for example, when some messages require in order delivery, while other messages prefer the increased flexibility of adaptive routing. It is obvious that a multi class router is unlikely to be ....

....may be reasonable, since routing possibilities can be computed before the selection bits arrive. Individual mode selection might be useful when in order delivery is required for certain messages and others prefer the flexibility of adaptive routing, or when compile time information is available [Fel93] and a particular type of routing is preferred for the expected traffic. 4. Comparisons Performance of the Triplex algorithm is explored by simulation. The algorithm is compared to three other routers: the Dally Seitz oblivious router [DS87] Duato [GPBS94, Dua93] and the Chaos [KS94] router ....

E.W. Felten. Protocol Compilation: High-Performance Communication for Parallel Programs. PhD thesis, University of Washington, Seattle, WA, 1993.


Where is Time Spent in Message-Passing and Shared-Memory.. - Chandra, Larus, Rogers (1994)   (57 citations)  (Correct)

....paid a high cost in moving data in and out of the network. This cost appeared both in increased computation time to manage buffers and the 3 42 of program time spent in communication library routines. Many alternatives such as faster hardware [2] faster libraries, and protocol compilers [7] could reduce this overhead. Software overhead in processing low latency (fastturnaround) messages is a major weakness of the CM5 message passing system. These messages are fundamental to performing reductions or broadcasts in software (i.e. in Gauss) Hardware implementations of these ....

Edward W. Felten. Protocol Compilation: HighPerformance Communication for Parallel Programs. Technical Report 93-09-09, Department of Computer Science, University of Washington, September 1993.


Implementing Network Protocols at User Level - Thekkath (1993)   (98 citations)  (Correct)

....application requirements, a specialized variant of a standard protocol is used rather than the standard protocol itself. A different application would use a slightly different variant of the same protocol. Language based protocol implementations such as Morpheus [1] as well as protocol compilers [9, 10] are two recent attempts at exploiting user specified constraints to generate efficient implementations of communication protocols. The general idea of using partial evaluation to gain better I O performance in systems has been used elsewhere as well [16] In particular, the notion of specializing ....

....hardware packet demultiplexing mechanism is difficult to exploit because there is no separate connection setup phase that can negotiate the BQIs. There is much evidence to support the claim that application specific knowledge can be exploited to achieve highly efficient communication. For example, [1, 10] are some of the more recent systems that use application specific knowledge to generate communication protocols. By providing language level support for generating protocols, these systems go beyond providing a set of pre defined options to fine tune a protocol. In contrast to traditional ....

Edward W. Felten. Protocol Compilation: High-Performance Communication for Parallel Programs. Ph.D. thesis, Department of Computer Science and Enginerring, University of Washington, July 1993.


Improving Protocol Performance by Dynamic Control of.. - Ivan-Rosu, Schwan (1996)   (1 citation)  (Correct)

....like end to end latency, delay jitter or loss probability, are as important as total communication throughput. For instance, the performance of scientific parallel codes has been shown to improve when computations and their associated communications are scheduled jointly, using compiler information[6] or using runtime knowledge of communication delays [25] Similarly, both the performance and the predictability of real time ap Funded in part by DARPA through the Honeywell Technology Center under Contract No. B09332478 1 Quality of Service plications are improved if bounds on ....

E. Felten. Protocol compilation: High-performance communication for parallel programs. U. of Washington, Dept. Computer Science and Eng., TR 93-09-09, 1993.


A Portable Collective Communication Library using Communication.. - Rühl, Bal   (Correct)

....algorithms must be respectively employed. The building blocks in iCC are higher level than our unicast and broadcast primitives. Each building block can be optimized, but no optimizations can be performed over multiple blocks. Our work is also targeted at a wider range of architectures. Parachute [9] is a protocol compiler that uses a pattern description of the whole application to determine the most optimal communication strategy. Its primary focus is on reducing protocol overhead, such as buffer management. The prototype compiler only accepts data parallel programs and generates a tailored ....

E. W. Felten. Protocol Compilation: HighPerformance Communication for Parallel Programs. PhD thesis, Univ. of Washington, 1993.


Performance of User-Level Communication on Distributed-Memory.. - Lee (1993)   (Correct)

....to be either all computing or all communicating. 6 Related Work There is a large body of literature related to improving the performance of communication systems. Below we compare our work with four other systems that either have similar goals or use similar techniques. Protocol compilation [Felten 93] is a technique in which a compiler generates tailored protocols to reduce the cost of protocol processing. The key observation of this approach is that the compiler, unlike processors, may have global knowledge of how a parallel program runs, and this knowledge may be used to reduce the cost of ....

Edward W. Felten. Protocol Compilation: High-Performance Communication for Parallel Programs. PhD thesis, Department of Computer Science & Engineering, University of Washington, Technical Report 93-09-09, 1993.


Design and Implementation of NX Message Passing Using.. - Alpert, Dubnicki.. (1996)   (3 citations)  Self-citation (Felten)   (Correct)

....In addition, buffer management is still done at kernel level. Another way to improve NX message passing performance is to take advantage of application specific communication patterns to perform buffer management at compile time instead of run time to reduce the overhead of message passing [11]. This method works only for applications that have fixed communication patterns and requires a compiler to detect and generate application specific message passing libraries. MPI FM [1] is a high performance implementation of the MPI message passing interface, based on a port of the portable ....

Edward W Felten. Protocol Compilation: HighPerformance Communication for Parallel Programs. PhD thesis, University of Washington, August 1993.


Two Virtual Memory Mapped Network Interface Designs - Blumrich, Dubnicki.. (1994)   (5 citations)  Self-citation (Felten)   (Correct)

No context found.

E. W. Felten, Protocol Compilation: High-Performance Communication for Parallel Programs. PhD thesis, Dept. of Computer Science and Engineering, University of Washington, Aug. 1993. Available as technical report 93-09-09.


Two Virtual Memory Mapped Network Interface Designs - Blumrich, Dubnicki.. (1994)   (5 citations)  Self-citation (Felten)   (Correct)

....and analyses indicate that moving communication buffer management out of the kernel to the user level can greatly reduce the software overhead of message passing. By using a compiled, application tailored runtime library, the latency of multicomputer message passing can be improved by about 30 [6]. In addition, virtual memory mapped communication takes advantage of the protection provided by virtual memory systems. Since mappings are established at the virtual memory level, virtual address translation hardware guarantees that an application can only use mappings created by itself. This ....

Edward W. Felten. Protocol Compilation: HighPerformance Communication for Parallel Programs. PhD thesis, Dept. of Computer Science and Engineering, University of Washington, August 1993. Available as technical report 93-09-09.


Merl -- A Mitsubishi Electric Research Laboratory - Http Www Merl (1998)   (Correct)

No context found.

Edward W. Felten. Protocol compilation: high-performance communication for parallel programs. PhD dissertation, University of Washington, Dept. of CSE, Sept. 1993, UW-CSE-TR 93-09-09. 9


Merl -- A Mitsubishi Electric Research Laboratory - Http Www Merl (1998)   (Correct)

No context found.

Edward W. Felten. Protocol compilation: high-performance communication for parallel programs. PhD dissertation, University of Washington, Dept. of CSE, Sept. 1993, UW-CSE-TR 93-09-09. 9

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC