MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Failure detectors: Implementation issues and impact on consensus performance (1999) [4 citations — 1 self]

Download:
pdf | ps
by Nicole Sergent, Xavier Defago, Andre Schiper
http://lsewww.epfl.ch/Documents/postscript/SDS99.ps
Add To MetaCart

Abstract:

Due to their nature, distributed systems are vulnerable to failures of some of their parts. Conversely, distribution also provides a way to increase the fault tolerance of the overall system. However, achieving fault tolerance is not a simple problem and requires complex techniques. An agreement problem known as the problem of consensus is at the heart of most problems encountered during the design of a fault tolerant system. This problem is however not solvable in the asynchronous system model, unless the model is augmented with adequate failure detectors. The resulting system model is a timefree model since all timing issues are abstracted by the characteristics of the failure detectors. It is sometimes claimed that time-based system models are more realistic than time-free models for solving distributed agreement problems. The goal of this paper is to show that solving consensus in the asynchronous system model augmented with failure detectors does not prevent from considering timing issues. We consider the consensus algorithm with various implementations of failure detectors, and we analyse their impact on the termination time of the consensus algorithm. This study shows that the design of fault-tolerant distributed algorithms in the asynchronous system model augmented with failure detectors is orthogonal to the issue of implementing the actual failure detectors. This nicely decouples logical issues (proof of safety and liveness of an algorithm) from engineering issues (e.g., performance and timing constraints). 1

Citations

1074 Impossibility of distributed consensus with one faulty process – Fischer, Lynch, et al. - 1985
683 Unreliable Failure Detectors for Reliable Distributed Systems – Chandra, Toueg - 1996
310 The Weakest Failure Detector for Solving Consensus – Chandra, Hadzilacos, et al. - 1996
51 Consensus service: a modular approach for building agreement protocols in distributed systems – Guerraoui, Schiper - 1996
45 Fail-awareness: An approach to construct fail-safe applications – Fetzer, Cristian - 1997
20 A performance comparison of asynchronous atomic broadcast protocols – Cristian, Beijer, et al. - 1994
13 Optimization Techniques for Replicating CORBA Objects – Defago, Felber, et al. - 1999
11 Reducing the cost for non-blocking in atomic commitment – Guerraoui, Larrea, et al. - 1996
11 Soft Real-Time Analysis of Asynchronous Agreement Algorithms Using Petri Nets – Sergent - 1998
6 Evaluating latency of distributed algorithms using Petri nets – Sergent - 1997
3 real time communication in multiple-access networks – Hard - 1995