| T.D. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems (preliminary version). In Proceedings of the tenth annual ACM symposium on Principles of distributed computing, pages 325--340. ACM Press, 1991. |
....and remove obstacles and distractions due to the usage of different terminology and notation. 2 Related Work A failure detector [6, 5] is a mechanism introduced to provide processes with information about failures of processes or communication links. Failure detectors were first defined in [4] in the context of the FLP model [14] and can be viewed as an extension of the FLP model. Failure detectors are defined in a very general way. The output range of a failure detector can be any set. In particular, the output of a failure detector does not have to be the set of suspected processes ....
CHANDRA, T., AND TOUEG, S. Unreliable failure detectors for asynchronous systems. In Proceedings of the 10th ACM Symposium on Principles of Distributed Computing (Aug 1991), pp. 325--340.
....instruction after the deadline (of its watchdog) X. RELATED WORK Failure detectors have received quite some research interest since Chandra, Hadzilacos and Toeug published their seminal paper about the weakest failure detector for solving consensus [3] Failure detectors were originally defined [4] to augment purely asynchronous systems [12] such that consensus becomes solvable. Perfect failure detectors are neither implementable in purely asynchronous systems nor in partially synchronous systems [18] If it were possible to implement then in purely asynchronous systems, one could solve ....
CHANDRA, T., AND TOUEG, S. Unreliable failure detectors for asynchronous systems. In Proceedings of the 10th ACM Symposium on Principles of Distributed Computing (Aug 1991), pp. 325--340.
....of time from the membership, a crashed process has to be removed from the membership eventually. This does not solve the impossibility of implementing services like a membership service in a time free asynchronous system [2] unless one introduces an additional mechanism like a failure detector [3]) We proposed a different way to change the specification of a synchronous service such that it becomes implementable in timed asynchronous systems: fail awareness [10] A server is required to provide its standard synchronous semantics as long as the failure frequency is within some given bound, ....
T. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems. In Proceedings of the 10th ACM Symposium on Principles of Distributed Computing, pages 325--340, Aug 1991.
....timely. However, progress assumption only require that infinitely often there exists a majority set of processes that for a certain minimum amount of time are timely and can communicate with each other in a timely manner. Progress assumptions have also a certain similarity with failure detectors [3], which are mechanisms to strengthen the timefree model: certain failure detector classes provide their desired behavior based on the observation that the system eventually stabilizes. The main differences between the model of [3] and the timed model are the following: 1) the timed model allows ....
....assumptions have also a certain similarity with failure detectors [3] which are mechanisms to strengthen the timefree model: certain failure detector classes provide their desired behavior based on the observation that the system eventually stabilizes. The main differences between the model of [3] and the timed model are the following: 1) the timed model allows messages to be dropped and processes to recover after a crash, and 2) the timed model provides processes with access to hardware clocks while the model of [3] provides processes with access to a failure detector. Note that hardware ....
[Article contains additional citation context not shown here]
T. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems. In Proceedings of the 10th ACM Symposium on Principles of Distributed Computing, pages 325--340, Aug 1991.
....processes maintain an up to date group. The timewheel membership protocol is fail aware in the sense that a process knows at any point in time if its current group is up to date. The group membership problem or the atomic broadcast problem is not solvable in a time free asynchronous system model [30, 11]. However, existing asynchronous systems have typically enough synchronism to allow a deterministic solution of the group membership or the atomic broadcast problem. For example, a typical execution of a system consists of long periods in which the system is stable interleaved by relatively ....
....assumption allows a deterministic m The delivery order of updates membership changes is not necessarily the same as the order of the ordinals associated with them. 12 solution of consensus [23] Since the consensus problem is as hard as the group membership or the atomic broadcast problem [11], it also allows a deterministic solution of the group membership or the atomic broadcast problem. 5 Timewheel Atomic Broadcast To broadcast an update at local time with order and atomicity WX cZ , a member in group disseminates a proposal message ,n( oL6poq oLU oL ....
T. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems. In Proceedings of the Tenth ACM Symposium on Principles of Distributed Computing, pages 325--340, Aug 1991.
....Related Work Many problems such as consensus [17] and weak membership [3] cannot be solved in asynchronous systems. Several approaches to overcome this problem have been proposed: a) the usage of randomization [2] b) the introduction of partially synchronous models [12, 14] failure detectors [4], and progress assumptions [15] and (c) the investigation of weaker problems [13, 1] Fail awareness is a method for transforming problems into weaker problems such that they become implementable in timed asynchronous systems. However, fail awareness can be combined with progress assumptions to ....
T. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems. In Proceedings of the 10th ACM Symposium on Principles of Distributed Computing, pages 325--340, Aug 1991.
....systems, 2) not necessarily perfect, and 3) can be used to solve the election problem. In particular, we show that there exists a fail aware failure detector that allows to solve the election problem and which is strictly weaker than a Perfect failure detector. 1 Introduction Failure detectors [2] are a mechanism for adding synchronism to the time free asynchronous system model [8] Processes of such systems have access to local failure detector modules which maintain a set of processes that are suspected to have crashed. Failure detectors typically satisfy An earlier version of this ....
....its local failure detector module to derive that other processes might wrongly suspect . Fail aware failure detectors provide such a knowledge and therefore allow a solution of the election problem even though some of them are strictly weaker than a Perfect failure detector. Failure detectors [2, 1, 3] are not the only basic distributed service that can be used to solve the election problem in asynchronous systems. General purpose asynchronous group membership protocols such as the one round and the three round protocols of [5] can be used to provide deterministic solutions to the election ....
T. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems. In Proceedings of the 10th ACM Symposium on Principles of Distributed Computing, pages 325--340, Aug 1991.
....ways to solve it nevertheless have emerged: Weakening the Model. Various models have been proposed for a system that behaves realistically, but that o ers enough synchrony to solve the Byzantine agreement problem [DLS88, VA95, CF95] For most recent implementations, the failure detector approach [CT91] has been chosen. Most practical protocols implemented in a failure detector model deal only with crash failures, as failure detectors in this model are much easier to handle. Recently, several groups started moving the failure detector approachinto the Byzantine setting [Rei95, KMMS97, DS98] ....
....synchronous rounds to reach agreement. Furthermore, as an authenticated source is assumed, at least one (transferable) authentication has to be veri ed. Using Failure Detectors. Instead of assuming xed timeouts, we can also implement the optimistic part of the protocol using failure detectors [CT91]. In this case, all timeouts are removed from the protocol; a party broadcasts the pessimism message as soon as a failure detector suspects some other party of being faulty prior to decision. In this model, nothing changes for the optimistic case; the eciency of the pessimistic case becomes ....
T. D. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems (preliminary version). In Proceedings of the Tenth Annual ACM Symposium on Principles of Distributed Computing, pages 325-340, Montreal, Quebec, Canada, 19-21 August 1991.
....results all stem from the FLP result [8] which proves that there is no protocol by which an asynchronous system of processes can agree on a binary value, even with only one faulty process. To provide a taxonomy of the complexity of the class of consensus protocols, Chandra and Toueg [4] proposed extending the network with failure detectors. However, the leader election problem can be solved if and only if a perfect failure detector is available one that suspects no alive processes, and eventually suspects every faulty one [20] 6] discusses several weakened system models and ....
T.D. Chandra, S. Toueg, "Unreliable failure detectors for asynchronous systems", Proc. 10th Annual ACM Symp. Principles of Distributed Computing, 1991, pp. 325-340.
....results all stem from the FLP result [12] which proves that there is no protocol by which an asynchronous system of processes can agree on a binary value, even with only one faulty process. To provide a taxonomy of the complexity of the class of consensus protocols, Chandra and Toueg [6] proposed extending the network with failure detectors. For example, the leader election problem can be solved if and only if a perfect failure detector is available one that suspects no alive processes, and eventually suspects every faulty one [24] 10] discusses several weakened system models ....
T.D. Chandra, S. Toueg, "Unreliable failure detectors for asynchronous systems", Proc. 10th Annual ACM Symp. Principles of Distributed Computing, 1991, pp. 325-340.
....synchrony to the time free asynchronous model so that deterministic solutions become possible. One way of weakening the time free asynchronous model is to study the necessary conditions for a deterministic solution to consensus [Dolev 8, Dwork 1987] The work on unreliable failure detectors in [Chandra 8, Toueg 1991, Chandra 8, Toueg 1995] and the partially synchronous models in [Dwork et al. 1988] has a similar objective. It should be noted that these theoretical studies seek to find limiting assumptions that enable consensus to be solved deterministically, without considering real implementations that ....
....of reliable failure detection in asynchronous systems. Chandra and Toueg have investigated another way of weakening the time free asynchronous model so that consensus becomes possible. They introduced the concept of failure suspectors or unreliable failure detectors that can make mistakes [Chandra Toueg 1991, Chandra Toueg 1995] and investigated which detectors could be used to solve the consensus problem with crash failures. Each process is assumed to possess a local implementation of such failure detector. Although the detector is unreliable, the mistakes it can make should not prevent any ....
[Article contains additional citation context not shown here]
T. D. Chandra and S. Toueg, "Unreliable Failure Detectors for Asynchronous Systems", in 11th ACM Symp. on Principles of Distributed Computing, (Montreal, Canada), pp.325-40, 1991.
....(if such a member exists) Detection of Unfair Links: A group member that doesn t crash will eventually report with a suspect( downcall any member of its view for which the link between the two members is unfair. These two properties are so called completeness properties (in terminology of [CT93] since they only specify in which cases an object must be reported as faulty, but do not set any bounds on accuracy of failure detection. For example, it is possible that a group member will be suspected and removed from the group view even though it didn t crash and is well connected to other ....
....which will perhaps merge back together when they can communicate again. It is thus inherent in the partitionable membership model that multiple concurrent views of the same group can simultaneously exist in the system (Figure 4. 5) Since failure detection is realistically assumed to be unreliable [CT93] and it is often not possible to distinguish crash failures from link failures or network partitions (which all manifest themselves as performance failures) a group component cannot automatically determine whether it is the only active view in the system or whether other group members are ....
[Article contains additional citation context not shown here]
T. Chandra and S. Toueg. Unreliable Failure Detectors for Asynchronous Systems. Journal of the ACM, 1993. 96 97
....their approach treats all partitions as minority ones and effectively halts system processing. Our approach differs in two important respects. We focus on link failures, and we never halt the system. The fact that distributed consensus is impossible in the presence of arbitrary link failures [7, 5, 10], a frequent event in ad hoc networks, makes our job particularly difficult. In fixed networks, one way of dealing with link failures and network partitions is to assume that they are benign and short lived and that the system has enough resources to resolve any data discrepancies after the ....
T. D. Chandra and S. Toueg. Unreliable Failure Detectors for Asynchronous Systems. Journal of ACM, 43(2):225--267, 1996.
....some extensions in order to be able to provide early delivery to all participants of the communication group. The idea of using a failure detection service together with the communication protocols, that we address in the present paper 2 , can also be related to work done by the Isis group [10, 3] to solve the problem of consensus in an asynchronous system [7] The major difference between the two scenarios is that we want to reach agreement in a known bounded time. This puts some special requirements on the failure detection service it must be implemented using a synchronous channel. ....
T. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems (preliminary version). Technical report, Department of Computer Science, Cornell University, Ithaca, USA, July 1991.
....point. In our paper we introduced extensions in order to be able to provide earlydelivery to all participants of the communication group. The use of a failure detection service together with the communication protocols, that we address in [3] can also be related to work done by the Isis Group [24, 8] to solve the problem of consensus in an asynchronous system [17] The major difference between the two scenarios is that we want to reach agreement in a known bounded time and also detect timing failures. This puts some special requirements on the failure detection service it must be ....
T. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems (preliminary version). Technical report, Department of Computer Science, Cornell University, Ithaca, USA, July 1991.
.... Motivation of State Definitions In this section, we provide some intuition on the definitions given in Section 5 and relate them to the agreement protocol of Chandra and Toueg [3] and the E3PC protocol of Keidar and Dolev [7] Note, however, that this section is intended to give some intuition, and not to cover all possible cases. Both the E3PC and the Chandra and Toueg protocols consist of rounds, each of which, if all goes well, requires 3 phases. In the first phase, ....
T. D. Chandra and S. Toueg. Unreliable Failure Detectors for Asynchronous Systems. Journal of the ACM. To appear, previous version in PODC 1991 pp. 325-340.
No context found.
T.D. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems (preliminary version). In Proceedings of the tenth annual ACM symposium on Principles of distributed computing, pages 325--340. ACM Press, 1991.
No context found.
T. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems. Journal of the ACM, 43(2):225--267, 1996.
No context found.
T. Chandra and S. Toueg. Unreliable Failure Detectors for Asynchronous Systems. Journal of the ACM, 43(4):685--722, July 1996.
No context found.
T. D. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems. In PODC91 Proceedings of the Tenth Annual ACM Symposium on Principles of Distributed Computing, pages 325-340, 1992.
No context found.
T.D. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems (preliminary version). In Proceedings of the 10th annual ACM symposium on Principles Of Distributed Computing, pages 325--340. ACM Press, 1991.
No context found.
11 T. D. Chandra and S. Toueg. Unreliable Failure Detectors for Asynchronous Systems. In proc. 10th annual ACM Symposium on Principles of Distributed Computing, pages 325--340, 1991.
No context found.
T. D. Chandra and S. Toueg. Unreliable Failure Detectors for Asynchronous Systems. Journal of ACM, 43(2):225--267, 1996.
No context found.
T. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems. Journal of the ACM, 43(2):225-- 267, 1996.
No context found.
D. Chandra, S. Toueg. Unreliable Failure Detectors for Asynchronous Systems. Proc. 10th ACM Symposium on Principles of Distributed Computing, pp. 325-340, Montreal, August, 1991.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC