Results 1 - 10
of
623
Reliable Multicast Transport Protocol (RMTP)
"... This paper presents the design, implementation and performance of a reliable multicast transport protocol called RMTP. RMTP is based on a hierarchical structure in which receivers are grouped into local regions or domains and in each domain there is a special receiver called a Designated Receiver (D ..."
Abstract
-
Cited by 654 (10 self)
- Add to MetaCart
(Show Context)
This paper presents the design, implementation and performance of a reliable multicast transport protocol called RMTP. RMTP is based on a hierarchical structure in which receivers are grouped into local regions or domains and in each domain there is a special receiver called a Designated Receiver (DR) which is responsible for sending acknowledgments periodically to the sender, for processing acknowledgements from receivers in its domain and for retransmitting lost packets to the corresponding receivers. Since lost packets are recovered by local retransmissions as opposed to retransmissions from the original sender, end-to-end latency is significantly reduced, and the overall throughput is improved as well. Also, since only the DRs send their acknowledgments to the sender, instead of all receivers sending their acknowledgments to the sender, a single acknowledgement is generated per local region, and this prevents acknowledgement implosion. Receivers in RMTP send their acknowledgments to the DRs periodically, thereby simplifying error recovery. In addition, lost packets are recovered by selective repeat retransmissions, leading to improved throughput at the cost of minimal additional buffering at the receivers. This paper also describes the implementation of RMTP and its performance on the Internet.
The process group approach to reliable distributed computing
- Communications of the ACM
, 1993
"... The difficulty of developing reliable distributed softwme is an impediment to applying distributed computing technology in many settings. Expeti _ with the Isis system suggests that a structured approach based on virtually synchronous _ groups yields systems that are substantially easier to develop, ..."
Abstract
-
Cited by 572 (19 self)
- Add to MetaCart
The difficulty of developing reliable distributed softwme is an impediment to applying distributed computing technology in many settings. Expeti _ with the Isis system suggests that a structured approach based on virtually synchronous _ groups yields systems that are substantially easier to develop, exploit sophisticated forms of cooperative computation, and achieve high reliability. This paper reviews six years of resemr,.hon Isis, describing the model, its impl_nentation challenges, and the types of applicatiom to which Isis has been appfied. 1 In oducfion One might expect the reliability of a distributed system to follow directly from the reliability of its con-stituents, but this is not always the case. The mechanisms used to structure a distributed system and to implement cooperation between components play a vital role in determining how reliable the system will be. Many contemporary distributed operating systems have placed emphasis on communication performance, overlooking the need for tools to integrate components into a reliable whole. The communication primitives supported give generally reliable behavior, but exhibit problematic semantics when transient failures or system configuration changes occur. The resulting building blocks are, therefore, unsuitable for facilitating the construction of systems where reliability is impo/tant. This paper reviews six years of research on Isis, a syg_,,m that provides tools _ support the construction of reliable distributed software. The thesis underlying l._lS is that development of reliable distributed software can be simplified using process groups and group programming too/_. This paper motivates the approach taken, surveys the system, and discusses our experience with real applications.
Group Communication Specifications: A Comprehensive Study
- ACM COMPUTING SURVEYS
, 1999
"... View-oriented group communication is an important and widely used building block for many distributed applications. Much current research has been dedicated to specifying the semantics and services of view-oriented Group Communication Systems (GCSs). However, the guarantees of different GCSs are for ..."
Abstract
-
Cited by 370 (15 self)
- Add to MetaCart
View-oriented group communication is an important and widely used building block for many distributed applications. Much current research has been dedicated to specifying the semantics and services of view-oriented Group Communication Systems (GCSs). However, the guarantees of different GCSs are formulated using varying terminologies and modeling techniques, and the specifications vary in their rigor. This makes it difficult to analyze and compare the different systems. This paper provides a comprehensive set of clear and rigorous specifications, which may be combined to represent the guarantees of most existing GCSs. In the light of these specifications, over thirty published GCS specifications are surveyed. Thus, the specifications serve as a unifying framework for the classification, analysis and comparison of group communication systems. The survey also discusses over a dozen different applications of group communication systems, shedding light on the usefulness of the p...
Transis: A Communication Sub-System for High Availability
, 1992
"... This paper describes Transis, a communication sub-system for high availability. Transis is a transport layer package that supports a variety of reliable multicast message passing services between processors. It provides highly tuned multicast and control services for scalable systems with arbitrary ..."
Abstract
-
Cited by 363 (47 self)
- Add to MetaCart
(Show Context)
This paper describes Transis, a communication sub-system for high availability. Transis is a transport layer package that supports a variety of reliable multicast message passing services between processors. It provides highly tuned multicast and control services for scalable systems with arbitrary topology. The communication domain comprises of a set of processors that can initiate multicast messages to a chosen subset. Transis delivers them reliably and maintains the membership of connected processors automatically, in the presence of arbitrary communication delays, of message losses and of processor failures and joins. The contribution of this paper is in providing an aggregate definition of communication and control services over broadcast domains. The main benefit is the efficient implementation of these services using the broadcast capability. In addition, the membership algorithm has a novel approach in handling partitions and remerging; in allowing the regular flow of messages...
The Transis Approach to High Availability Cluster Communication
- Communications of the ACM
, 1996
"... Introduction In the local elections system of the municipality of "Wiredville" 1 , several computers were used to establish an electronic town hall. The computers were linked by a network. When an issue was put to a vote, voters could manually feed their votes into any of the computers, ..."
Abstract
-
Cited by 252 (14 self)
- Add to MetaCart
(Show Context)
Introduction In the local elections system of the municipality of "Wiredville" 1 , several computers were used to establish an electronic town hall. The computers were linked by a network. When an issue was put to a vote, voters could manually feed their votes into any of the computers, which replicated the updates to all of the other computers. Whenever the current tally was desired, any computer could be used to supply an up-to-the-moment count. On the night of an important election, a room with one of the computers became crowded with lobbyists and politicians. Unexpectedly, someone accidentally stepped on the network wire, cutting communication between two parts of the network. The vote counting stopped until the network was repaired, and the entire tally had to be restarted from scratch. This would not have happened if the vote-counting system had been built with partitions in mind. After the unexpected severance, vote counting could have continued at all t
Detecting Causal Relationships in Distributed Computations: In Search of the Holy Grail
- IN SEARCH OF THE HOLY GRAIL. DISTRIBUTED COMPUTING
, 1994
"... The paper shows that characterizing the causal relationship between significant events is an important but non-trivial aspect for understanding the behavior of distributed programs. An introduction to the notion of causality and its relation to logical time is given; some fundamental results concern ..."
Abstract
-
Cited by 230 (3 self)
- Add to MetaCart
(Show Context)
The paper shows that characterizing the causal relationship between significant events is an important but non-trivial aspect for understanding the behavior of distributed programs. An introduction to the notion of causality and its relation to logical time is given; some fundamental results concerning the characterization of causality are presented. Recent work on the detection of causal relationships in distributed computations is surveyed. The issue of observing distributed computations in a causally consistent way and the basic problems of detecting global predicates are discussed. To illustrate the major difficulties, some typical monitoring and debugging approaches are assessed, and it is demonstrated how their feasibility is severely limited by the fundamental problem to master the complexity of causal relationships.
Total order broadcast and multicast algorithms: Taxonomy and survey
- ACM COMPUTING SURVEYS
, 2004
"... ..."
A High Performance Totally Ordered Multicast Protocol
, 1994
"... This paper presents the Reliable Multicast Protocol (RMP). RMP provides a totally ordered, reliable, atomic multicast service on top of an unreliable multicast datagram service such as IP Multicasting. RMP is fully and symmetrically distributed so that no site bears an undue portion of the communica ..."
Abstract
-
Cited by 195 (4 self)
- Add to MetaCart
This paper presents the Reliable Multicast Protocol (RMP). RMP provides a totally ordered, reliable, atomic multicast service on top of an unreliable multicast datagram service such as IP Multicasting. RMP is fully and symmetrically distributed so that no site bears an undue portion of the communication load. RMP provides a wide range of guarantees, from unreliable delivery to totally ordered delivery, to K-resilient, majority resilient, and totally resilient atomic delivery. These QoS guarantees are selectable on a per packet basis. RMP provides many communication options, including virtual synchrony, a publisher/subscriber model of message delivery, a client/server model of delivery, an implicit naming service, mutually exclusive handlers for messages, and mutually exclusive locks.
Extended Virtual Synchrony
- in Proceedings of the IEEE 14th International Conference on Distributed Computing Systems
, 1994
"... . We formulate a model of extended virtual synchrony that defines a group communication transport service for multicast and broadcast communication in a distributed system. The model extends the virtual synchrony model of the Isis system to support continued operation in all components of a partitio ..."
Abstract
-
Cited by 188 (43 self)
- Add to MetaCart
(Show Context)
. We formulate a model of extended virtual synchrony that defines a group communication transport service for multicast and broadcast communication in a distributed system. The model extends the virtual synchrony model of the Isis system to support continued operation in all components of a partitioned network. The significance of extended virtual synchrony is that, during network partitioning and remerging and during process failure and recovery, it maintains a consistent relationship between the delivery of messages and the delivery of configuration changes across all processes in the system and provides well-defined self-delivery and failure atomicity properties. We describe an algorithm that implements extended virtual synchrony and construct a filter that reduces extended virtual synchrony to virtual synchrony. 1 Introduction In many applications in distributed systems messages must be disseminated to multiple destinations. To achieve better performance, protocols have been deve...