Results 11 - 20
of
72
Language constructs for distributed real-time programming
- in P m. IEEE Red-Time Syst. Symp
, 1985
"... For many distributed applications, it is not sufficient for programs to be logically correct. In addition, they must satisfy various timing constraints. This paper discusses primitives that support the construction of distributed real-time programs. Our discussion is focused in two are&: ~ spec ..."
Abstract
-
Cited by 59 (36 self)
- Add to MetaCart
For many distributed applications, it is not sufficient for programs to be logically correct. In addition, they must satisfy various timing constraints. This paper discusses primitives that support the construction of distributed real-time programs. Our discussion is focused in two are&: ~ specification and communication. To allow the specifications of timing constraints. we introduce the lanrmsge-- constructs for defining temporal scope and specifying meflqsge deadline. We also identify communication prmtives needed for real-time pkgremming. he issues underlying the selection of the primitives are explained, including handling of timiig exceptions. The primitives will eventually be provided as part of a ditributed programming system that will be used to construct ditributed multi-sensory systems.
Profiling and reducing processing overheads in TCP/IP
- IEEE/ACM Transactions on Networking
, 1996
"... This paper presents detailed measurements of processing overheads for the Ultrix 4.2a implementation of TCP/IP network software running on a DECstation 5000/200. The performance results were used to unco ver throughput and latency bottlenecks. We present a scheme for impro ving throughput when sendi ..."
Abstract
-
Cited by 50 (1 self)
- Add to MetaCart
(Show Context)
This paper presents detailed measurements of processing overheads for the Ultrix 4.2a implementation of TCP/IP network software running on a DECstation 5000/200. The performance results were used to unco ver throughput and latency bottlenecks. We present a scheme for impro ving throughput when sending large messages by avoiding most checksum computations in a relatively safe manner. We also show that for the implementation we studied, reducing latency (when sending small messages) is a more dif ficult problem because processing overheads are spread over many operations; gaining a significant savings would require the optimization of many different mechanisms. This is especially important because, when processing a realistic workload, we ha ve found that non-data-touching operations
Efficient Support for Multicomputing on ATM Networks
, 1993
"... The emergence of a new generation of networks will dramatically increase the attractiveness of loosely-coupled multicomputers based on workstation clusters. The key to achieving high performance in this environment is efficient network access, because the cost of remote access dictates the granulari ..."
Abstract
-
Cited by 47 (3 self)
- Add to MetaCart
The emergence of a new generation of networks will dramatically increase the attractiveness of loosely-coupled multicomputers based on workstation clusters. The key to achieving high performance in this environment is efficient network access, because the cost of remote access dictates the granularity of parallelism that can be supported. Thus, in addition to traditional distribution mechanisms such as RPC, workstation clusters should support lightweight communication paradigms for executing parallel applications. This paper describes a simple communication model based on the notion of remote memory access. Applications executing on one host can perform direct memory read or write operations on user-defined remote memory buffers. We have implemented a prototype system based on this model using commercially available workstations and ATM networks. Our prototype uses kernel-based emulation of remote read and write instructions, implemented through unused processor opcodes; thus, applica...
The J-machine network
- in Proceedings of the International Conference on Computer Design (ICCD
, 1992
"... Network throughput can be increased by dividing the buffer storage associated with each network channel into several virtual channels [DalSei]. Each physical channel is associated with several small queues, virtual channels, rather than a single deep queue. The virtual channels associated with one p ..."
Abstract
-
Cited by 43 (4 self)
- Add to MetaCart
(Show Context)
Network throughput can be increased by dividing the buffer storage associated with each network channel into several virtual channels [DalSei]. Each physical channel is associated with several small queues, virtual channels, rather than a single deep queue. The virtual channels associated with one physical channel are allocated in-dependently but compete with each other for physical bandwidth. Virtual channels decouple buffer resources from transmission resources. This decoupling allows ac-tive messages to pass blocked messages using network bandwidth that would otherwise be left idle. Simula-tion studies show that, given a fixed amount of buffer storage per link, virtual-channel flow control increases throughput by a factor of 3.5, approaching the capacity of the network.
Hamlyn - an Interface for sender-based communications
, 1992
"... This paper uses a characterization of three different types of interconnect traffic to drive the development of an innovative high-speed interconnect interface. This uses sender-controlled message placement at the recipient, which has the effect of greatly reducing the cost and complexity of message ..."
Abstract
-
Cited by 42 (6 self)
- Add to MetaCart
This paper uses a characterization of three different types of interconnect traffic to drive the development of an innovative high-speed interconnect interface. This uses sender-controlled message placement at the recipient, which has the effect of greatly reducing the cost and complexity of message handling. The contributions of this work are in (a) elucidating the traffic model; (b) in defining the sender-driven communication scheme; and (c) in the detailed description of an efficient, protected interface to the interconnect hardware that allows applications running in nonprivileged mode to access the interconnect directly, without operating system intervention. This version of the paper contains a complete high-level design for the first version of Hamlyn---a hardware interface that accommodates all the Hamlyn functionality. Future work on the protocol stacks and implementation work will doubtless improve and modify this interface. Until then, this description serves as a functionally complete snapshot of the Hamlyn approach.
Performance of the World's Fastest Distributed Operating System
- Operating Systems Review
, 1988
"... Distributed operating systems have been in the experimental stage for a number of years now, but few have progressed to the point of actually being used in a production environment. It is our belief that the reason lies primarily with the performance of these systems---they tend to be fairly slow ..."
Abstract
-
Cited by 34 (3 self)
- Add to MetaCart
Distributed operating systems have been in the experimental stage for a number of years now, but few have progressed to the point of actually being used in a production environment. It is our belief that the reason lies primarily with the performance of these systems---they tend to be fairly slow compared to traditional single computer systems. The Amoeba system has been designed with high performance in mind. In this paper some performance measurements of Amoeba are presented and comparisons are made with UNIX on the SUN, as well as with some other interesting systems. In particular, short remote procedure calls take 1.4 msec and long data transfers achieve a user-to-user bandwidth of 677 kbytes/sec. Furthermore, the file server is so fast that it is limited by the communication bandwidth to 677 kbytes/sec. The real speed of the file server is too high to measure. To the best of our knowledge, these are the best figures yet reported in the literature for the class of hard...
The Performance of the Amoeba Distributed Operating System. Software-Practice and Experience
, 1989
"... Amoeba is a capability-based distributed operating system designed for high performance interactions between clients and servers using the well-known RPC model. The paper starts out by describing the architecture of the Amoeba system, which is typified by specialized components such as workstations, ..."
Abstract
-
Cited by 28 (9 self)
- Add to MetaCart
(Show Context)
Amoeba is a capability-based distributed operating system designed for high performance interactions between clients and servers using the well-known RPC model. The paper starts out by describing the architecture of the Amoeba system, which is typified by specialized components such as workstations, several services, a processor pool, and gateways that connect other Amoeba systems transparently over wide-area networks. Next the RPC interface is described. The paper presents performance measurements of the Amoeba RPC on unloaded and loaded systems. The time to perform the simplest RPC between two user processes has been measured to be 1.4 msec. Compared to SUN 3/50’s RPC, Amoeba has 1/9 the delay, and over 3 times the throughput. Finally we describe the Amoeba file server. The Amoeba file server is so fast that it is limited by the communication bandwidth. To the best of our knowledge this is the fastest file server yet reported in the literature for this class of hardware.
Early Experience with Message-Passing on the SHRIMP Multicomputer
- IN PROCEEDINGS OF THE 23RD ANNUAL SYMPOSIUM ON COMPUTER ARCHITECTURE
, 1996
"... The SHRIMP multicomputer provides virtual memorymapped communication (VMMC), which supports protected, user-level message passing, allows user programs to perform their own buffer management, and separates data transfers from control transfers so that a data transfer can be done without the interven ..."
Abstract
-
Cited by 28 (13 self)
- Add to MetaCart
The SHRIMP multicomputer provides virtual memorymapped communication (VMMC), which supports protected, user-level message passing, allows user programs to perform their own buffer management, and separates data transfers from control transfers so that a data transfer can be done without the intervention of the receiving node CPU. An important question is whether such a mechanism can indeed deliver all of the available hardware performance to applications which use conventional message-passing libraries. This paper
Software Environments for Cluster-based Display Systems
- First IEEE/ACM International Symposium on Cluster Computing and the Grid
, 2001
"... An inexpensive way to construct a scalable display wall system is to use a cluster of PCs with commodity graphics accelerators to drive an array of projectors. A challenge is to bring off-the-shelf sequential applications to run on such a display wall efficiently without using expensive, high-perfor ..."
Abstract
-
Cited by 25 (3 self)
- Add to MetaCart
(Show Context)
An inexpensive way to construct a scalable display wall system is to use a cluster of PCs with commodity graphics accelerators to drive an array of projectors. A challenge is to bring off-the-shelf sequential applications to run on such a display wall efficiently without using expensive, high-performance interconnects. This paper studies two execution models for a scalable display wall system: master-slave and synchronized execution models. We have designed and implemented four software tools, two for each execution model, including VDD (Virtual Display Driver), GLP (GL-DLL Replacement), SSE (System-level Synchronized Execution), and ASE (Application-level Synchronized Execution). In order to support the synchronized execution model, we have also designed a broadcast, speculative file cache to provide scalable I/O performance. The paper reports our experimental results with several 3D applications on the display wall to understand the performance implications and tradeoffs of these methods. 1
Constructing a Configurable Group RPC Service
- In Proceedings of the 15th International Conference on Distributed Computing Systems
, 1995
"... Current Remote Procedure Call (RPC) services implement a variety of semantics, with many of the differences related to how communication and server failures are handled. The list increases even more when considering group RPC, a variant of RPC often used for fault-tolerance where an invocation is se ..."
Abstract
-
Cited by 24 (14 self)
- Add to MetaCart
(Show Context)
Current Remote Procedure Call (RPC) services implement a variety of semantics, with many of the differences related to how communication and server failures are handled. The list increases even more when considering group RPC, a variant of RPC often used for fault-tolerance where an invocation is sent to a group of servers rather than one. This paper presents an approach to constructing group RPC in which a single configurable system is used to build different variants of the service. The approach is based on implementing each property as a separate software module called a micro-protocol, and then configuring the microprotocols needed to implement the desired service together using a software framework based on the x-kernel. The properties of point-to-point and group RPC are identified and classified, and the general execution model described. An example consisting of a modular implementation of a group RPC service is given to illustrate the approach. Dependency issues that restrict c...