Results 1 - 10
of
230
Group Communication Specifications: A Comprehensive Study
- ACM COMPUTING SURVEYS
, 1999
"... View-oriented group communication is an important and widely used building block for many distributed applications. Much current research has been dedicated to specifying the semantics and services of view-oriented Group Communication Systems (GCSs). However, the guarantees of different GCSs are for ..."
Abstract
-
Cited by 370 (15 self)
- Add to MetaCart
View-oriented group communication is an important and widely used building block for many distributed applications. Much current research has been dedicated to specifying the semantics and services of view-oriented Group Communication Systems (GCSs). However, the guarantees of different GCSs are formulated using varying terminologies and modeling techniques, and the specifications vary in their rigor. This makes it difficult to analyze and compare the different systems. This paper provides a comprehensive set of clear and rigorous specifications, which may be combined to represent the guarantees of most existing GCSs. In the light of these specifications, over thirty published GCS specifications are surveyed. Thus, the specifications serve as a unifying framework for the classification, analysis and comparison of group communication systems. The survey also discusses over a dozen different applications of group communication systems, shedding light on the usefulness of the p...
Sinfonia: a new paradigm for building scalable distributed systems
- In SOSP
, 2007
"... We propose a new paradigm for building scalable distributed systems. Our approach does not require dealing with message-passing protocols—a major complication in existing distributed systems. Instead, developers just design and manipulate data structures within our service called Sinfonia. Sinfonia ..."
Abstract
-
Cited by 153 (12 self)
- Add to MetaCart
(Show Context)
We propose a new paradigm for building scalable distributed systems. Our approach does not require dealing with message-passing protocols—a major complication in existing distributed systems. Instead, developers just design and manipulate data structures within our service called Sinfonia. Sinfonia keeps data for applications on a set of memory nodes, each exporting a linear address space. At the core of Sinfonia is a novel minitransaction primitive that enables efficient and consistent access to data, while hiding the complexities that arise from concurrency and failures. Using Sinfonia, we implemented two very different and complex applications in a few months: a cluster file system and a group communication service. Our implementations perform well and scale to hundreds of machines.
Optimistic Total Order in Wide Area Networks
- In Proc. 21st IEEE Symposium on Reliable Distributed Systems
, 2002
"... Total order multicast greatly simplifies the implementation of fault-tolerant services using the replicated state machine approach. The additional latency of total ordering can be masked by taking advantage of spontaneous ordering observed in LANs: A tentative delivery allows the application to proc ..."
Abstract
-
Cited by 47 (13 self)
- Add to MetaCart
Total order multicast greatly simplifies the implementation of fault-tolerant services using the replicated state machine approach. The additional latency of total ordering can be masked by taking advantage of spontaneous ordering observed in LANs: A tentative delivery allows the application to proceed in parallel with the ordering protocol. The effectiveness of the technique rests on the optimistic assumption that a large share of correctly ordered tentative deliveries offsets the cost of undoing the effect of mistakes. This paper proposes a simple technique which enables the usage of optimistic delivery also in WANs with much larger transmission delays where the optimistic assumption does not normally hold. Our proposal exploits local clocks and the stability of network delays to reduce the mistakes in the ordering of tentative deliveries. An experimental evaluation of a modified sequencer-based protocol is presented, illustrating the usefulness of the approach in fault-tolerant database management.
Mencius: Building Efficient Replicated State Machines for WANs
"... We present a protocol for general state machine replication – a method that provides strong consistency – that has high performance in a wide-area network. In particular, our protocol Mencius has high throughput under high client load and low latency under low client load even under changing wide-ar ..."
Abstract
-
Cited by 46 (3 self)
- Add to MetaCart
(Show Context)
We present a protocol for general state machine replication – a method that provides strong consistency – that has high performance in a wide-area network. In particular, our protocol Mencius has high throughput under high client load and low latency under low client load even under changing wide-area network environment and client load. We develop our protocol as a derivation from the well-known protocol Paxos. Such a development can be changed or further refined to take advantage of specific network or application requirements. 1
Ricochet: Lateral error correction for timecritical multicast
- In Submission
, 2007
"... Ricochet is a low-latency reliable multicast protocol designed for time-critical clustered applications. It uses IP Multicast to transmit data and recovers from packet loss in end-hosts using Lateral Error Correction (LEC), a novel repair mechanism in which XORs are exchanged between receivers and c ..."
Abstract
-
Cited by 35 (13 self)
- Add to MetaCart
(Show Context)
Ricochet is a low-latency reliable multicast protocol designed for time-critical clustered applications. It uses IP Multicast to transmit data and recovers from packet loss in end-hosts using Lateral Error Correction (LEC), a novel repair mechanism in which XORs are exchanged between receivers and combined across overlapping groups. In datacenters and clusters, application needs frequently dictate large numbers of fine-grained overlapping multicast groups. Existing multicast reliability schemes scale poorly in such settings, providing latency of packet recovery that depends inversely on the data rate within a single group: the lower the data rate, the longer it takes to recover lost packets. LEC is insensitive to the rate of data in any one group and allows each node to split its bandwidth between hundreds to thousands of fine-grained multicast groups without sacrificing timely packet recovery. As a result, Ricochet provides developers with a scalable, reliable and fast multicast primitive to layer under high-level abstractions such as publish-subscribe, group communication and replicated service/object infrastructures. We evaluate Ricochet on a 64-node cluster with up to 1024 groups per node: under various loss rates, it recovers almost all packets using LEC in tens of milliseconds and the remainder with reactive traffic within 200 milliseconds. 1
When Scalability Meets Consistency: Genuine Multiversion Update-Serializable Partial Data Replication
- In Proc. of International Conference on Distributed Systems
, 2012
"... Abstract—In this article we introduce GMU, a genuine partial replication protocol for transactional systems, which exploits an innovative, highly scalable, distributed multiversioning scheme. Unlike existing multiversion-based solutions, GMU does not rely on a global logical clock, which represents ..."
Abstract
-
Cited by 29 (16 self)
- Add to MetaCart
(Show Context)
Abstract—In this article we introduce GMU, a genuine partial replication protocol for transactional systems, which exploits an innovative, highly scalable, distributed multiversioning scheme. Unlike existing multiversion-based solutions, GMU does not rely on a global logical clock, which represents a contention point and can limit system scalability. Also, GMU never aborts read-only transactions and spares them from distributed validation schemes. This makes GMU particularly efficient in presence of read-intensive workloads, as typical of a wide range of real-world applications. GMU guarantees the Extended Update Serializability (EUS) isolation level. This consistency criterion is particularly attractive as it is sufficiently strong to ensure correctness even for very demanding applications (such as TPC-C), but is also weak enough to allow efficient and scalable implementations, such as GMU. Further, unlike several relaxed consistency models proposed in literature, EUS has simple and intuitive semantics, thus being an attractive, scalable consistency model for ordinary programmers. We integrated the GMU protocol in a popular open source in-memory transactional data grid, namely Infinispan. On the basis of a large scale experimental study performed on heterogeneous experimental platforms and using industry standard benchmarks (namely TPC-C and YCSB), we show that GMU achieves linear scalability and that it introduces negligible overheads (less than 10%), with respect to solutions ensuring non-serializable semantics, in a wide range of workloads.
Ring Paxos: A high-throughput atomic broadcast protocol,” DSN
, 2010
"... Atomic broadcast is an important communication primi-tive often used to implement state-machine replication. De-spite the large number of atomic broadcast algorithms pro-posed in the literature, few papers have discussed how to turn these algorithms into efficient executable protocols. Our main cont ..."
Abstract
-
Cited by 28 (7 self)
- Add to MetaCart
(Show Context)
Atomic broadcast is an important communication primi-tive often used to implement state-machine replication. De-spite the large number of atomic broadcast algorithms pro-posed in the literature, few papers have discussed how to turn these algorithms into efficient executable protocols. Our main contribution, Ring Paxos, is a protocol derived from Paxos. Ring Paxos inherits the reliability of Paxos and can be implemented very efficiently. We report a detailed performance analysis of Ring Paxos and compare it to other atomic broadcast protocols. 1.
Tombstone Transformation Functions for Ensuring Consistency in Collaborative Editing Systems
- Proceedings of the 2nd International Conference on Collaborative Computing: Networking, Applications and Worksharing
, 2006
"... Abstract — In collaborative editing, consistency maintenance of the copies of shared data is a critical issue. In the last decade, Operational Transformation (OT) approach revealed as a suitable mechanism for maintaining consistency. Unfortunately, none of the published propositions relying on this ..."
Abstract
-
Cited by 27 (16 self)
- Add to MetaCart
(Show Context)
Abstract — In collaborative editing, consistency maintenance of the copies of shared data is a critical issue. In the last decade, Operational Transformation (OT) approach revealed as a suitable mechanism for maintaining consistency. Unfortunately, none of the published propositions relying on this approach are able to satisfy the mandatory correctness properties T P1 and T P2 defined in the Ressel’s framework. This paper addresses this correctness issue by proposing a new way to model shared state by retaining tombstones when elements are removed. An instantiation of the proposed model for a linear data structure and the related transformation functions are provided. I.
Architecture-Based Autonomous Repair Management: An Application to J2EE Clusters
- In 24th IEEE Symposium on Reliable Distributed Systems (SRDS-2005
, 2005
"... This paper presents a component-based architecture for autonomous repair management in distributed systems, and a prototype implementation of this architecture, called JADE, which provides repair management for J2EE application server clusters. The JADE architecture features three major elements, wh ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
(Show Context)
This paper presents a component-based architecture for autonomous repair management in distributed systems, and a prototype implementation of this architecture, called JADE, which provides repair management for J2EE application server clusters. The JADE architecture features three major elements, which we believe to be of wide relevance for the construction of autonomic distributed systems: (1) a dynamically configurable, component-based structure that exploits the reflective features of the FRACTAL component model; (2) an explicit and configurable feedback control loop structure, that manifests the relationship between managed system and repair management functions; (3) an original replication structure for the management subsystem itself, which makes it fault-tolerant and self-healing. 1
Comparison of Failure Detectors and Group Membership: Performance Study of Two Atomic Broadcast Algorithms
- In Proc. Int’l Conf. on Dependable Systems and Networks
, 2003
"... Protocols that solve agreement problems are essential building blocks for fault tolerant distributed systems. While many protocols have been published, little has been done to analyze their performance, especially the performance of their fault tolerance mechanisms. In this paper, we present a perfo ..."
Abstract
-
Cited by 25 (11 self)
- Add to MetaCart
(Show Context)
Protocols that solve agreement problems are essential building blocks for fault tolerant distributed systems. While many protocols have been published, little has been done to analyze their performance, especially the performance of their fault tolerance mechanisms. In this paper, we present a performance evaluation methodology that can be generalized to analyze many kinds of fault-tolerant algorithms. We use the methodology to compare two atomic broadcast algorithms with different fault tolerance mechanisms: unreliable failure detectors and group membership. We evaluated the steady state latency in (1) runs with no crashes and no suspicions, (2) runs with crashes and (3) runs with no crashes in which correct processes are wrongly suspected to have crashed, as well as (4) the transient latency after a crash. We found that the two algorithms have the same performance in Scenario 1, and that the group membership based algorithm has an advantage in terms of performance and resiliency in Scenario 2, whereas the failure detector based algorithm offers better performance in the other scenarios. We discuss the implications of our results to the design of fault tolerant distributed systems.