81 citations found. Retrieving documents...
T.D. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems (preliminary version). In Proceedings of the tenth annual ACM symposium on Principles of distributed computing, pages 325--340. ACM Press, 1991.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

A Comparison of Timed Asynchronous Systems and Asynchronous.. - Fetzer (1999)   (Correct)

....and remove obstacles and distractions due to the usage of different terminology and notation. 2 Related Work A failure detector [6, 5] is a mechanism introduced to provide processes with information about failures of processes or communication links. Failure detectors were first defined in [4] in the context of the FLP model [14] and can be viewed as an extension of the FLP model. Failure detectors are defined in a very general way. The output range of a failure detector can be any set. In particular, the output of a failure detector does not have to be the set of suspected processes ....

CHANDRA, T., AND TOUEG, S. Unreliable failure detectors for asynchronous systems. In Proceedings of the 10th ACM Symposium on Principles of Distributed Computing (Aug 1991), pp. 325--340.


Perfect Failure Detection in Timed Asynchronous Systems - Fetzer (2000)   (5 citations)  (Correct)

....instruction after the deadline (of its watchdog) X. RELATED WORK Failure detectors have received quite some research interest since Chandra, Hadzilacos and Toeug published their seminal paper about the weakest failure detector for solving consensus [3] Failure detectors were originally defined [4] to augment purely asynchronous systems [12] such that consensus becomes solvable. Perfect failure detectors are neither implementable in purely asynchronous systems nor in partially synchronous systems [18] If it were possible to implement then in purely asynchronous systems, one could solve ....

CHANDRA, T., AND TOUEG, S. Unreliable failure detectors for asynchronous systems. In Proceedings of the 10th ACM Symposium on Principles of Distributed Computing (Aug 1991), pp. 325--340.


FORTRESS: A System to Support Fail-Aware Real-Time Applications - Fetzer, Cristian (1997)   (1 citation)  (Correct)

....of time from the membership, a crashed process has to be removed from the membership eventually. This does not solve the impossibility of implementing services like a membership service in a time free asynchronous system [2] unless one introduces an additional mechanism like a failure detector [3]) We proposed a different way to change the specification of a synchronous service such that it becomes implementable in timed asynchronous systems: fail awareness [10] A server is required to provide its standard synchronous semantics as long as the failure frequency is within some given bound, ....

T. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems. In Proceedings of the 10th ACM Symposium on Principles of Distributed Computing, pages 325--340, Aug 1991.


The Timed Asynchronous Distributed System Model - Cristian, Fetzer (1999)   (78 citations)  (Correct)

....timely. However, progress assumption only require that infinitely often there exists a majority set of processes that for a certain minimum amount of time are timely and can communicate with each other in a timely manner. Progress assumptions have also a certain similarity with failure detectors [3], which are mechanisms to strengthen the timefree model: certain failure detector classes provide their desired behavior based on the observation that the system eventually stabilizes. The main differences between the model of [3] and the timed model are the following: 1) the timed model allows ....

....assumptions have also a certain similarity with failure detectors [3] which are mechanisms to strengthen the timefree model: certain failure detector classes provide their desired behavior based on the observation that the system eventually stabilizes. The main differences between the model of [3] and the timed model are the following: 1) the timed model allows messages to be dropped and processes to recover after a crash, and 2) the timed model provides processes with access to hardware clocks while the model of [3] provides processes with access to a failure detector. Note that hardware ....

[Article contains additional citation context not shown here]

T. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems. In Proceedings of the 10th ACM Symposium on Principles of Distributed Computing, pages 325--340, Aug 1991.


The Timewheel Group Communication System - Mishra, Fetzer, Cristian (2002)   (3 citations)  (Correct)

....processes maintain an up to date group. The timewheel membership protocol is fail aware in the sense that a process knows at any point in time if its current group is up to date. The group membership problem or the atomic broadcast problem is not solvable in a time free asynchronous system model [30, 11]. However, existing asynchronous systems have typically enough synchronism to allow a deterministic solution of the group membership or the atomic broadcast problem. For example, a typical execution of a system consists of long periods in which the system is stable interleaved by relatively ....

....assumption allows a deterministic m The delivery order of updates membership changes is not necessarily the same as the order of the ordinals associated with them. 12 solution of consensus [23] Since the consensus problem is as hard as the group membership or the atomic broadcast problem [11], it also allows a deterministic solution of the group membership or the atomic broadcast problem. 5 Timewheel Atomic Broadcast To broadcast an update at local time with order and atomicity WX cZ , a member in group disseminates a proposal message ,n( oL6poq oLU oL ....

T. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems. In Proceedings of the Tenth ACM Symposium on Principles of Distributed Computing, pages 325--340, Aug 1991.


Fail-Awareness in Timed Asynchronous Systems - Fetzer, Cristian (2003)   (18 citations)  (Correct)

....Related Work Many problems such as consensus [17] and weak membership [3] cannot be solved in asynchronous systems. Several approaches to overcome this problem have been proposed: a) the usage of randomization [2] b) the introduction of partially synchronous models [12, 14] failure detectors [4], and progress assumptions [15] and (c) the investigation of weaker problems [13, 1] Fail awareness is a method for transforming problems into weaker problems such that they become implementable in timed asynchronous systems. However, fail awareness can be combined with progress assumptions to ....

T. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems. In Proceedings of the 10th ACM Symposium on Principles of Distributed Computing, pages 325--340, Aug 1991.


Fail-Aware Failure Detectors - Fetzer, Cristian (1996)   (5 citations)  (Correct)

....systems, 2) not necessarily perfect, and 3) can be used to solve the election problem. In particular, we show that there exists a fail aware failure detector that allows to solve the election problem and which is strictly weaker than a Perfect failure detector. 1 Introduction Failure detectors [2] are a mechanism for adding synchronism to the time free asynchronous system model [8] Processes of such systems have access to local failure detector modules which maintain a set of processes that are suspected to have crashed. Failure detectors typically satisfy An earlier version of this ....

....its local failure detector module to derive that other processes might wrongly suspect . Fail aware failure detectors provide such a knowledge and therefore allow a solution of the election problem even though some of them are strictly weaker than a Perfect failure detector. Failure detectors [2, 1, 3] are not the only basic distributed service that can be used to solve the election problem in asynchronous systems. General purpose asynchronous group membership protocols such as the one round and the three round protocols of [5] can be used to provide deterministic solutions to the election ....

T. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems. In Proceedings of the 10th ACM Symposium on Principles of Distributed Computing, pages 325--340, Aug 1991.


Optimistic Asynchronous Byzantine Agreement - Kursawe (1999)   (1 citation)  (Correct)

....ways to solve it nevertheless have emerged: Weakening the Model. Various models have been proposed for a system that behaves realistically, but that o ers enough synchrony to solve the Byzantine agreement problem [DLS88, VA95, CF95] For most recent implementations, the failure detector approach [CT91] has been chosen. Most practical protocols implemented in a failure detector model deal only with crash failures, as failure detectors in this model are much easier to handle. Recently, several groups started moving the failure detector approachinto the Byzantine setting [Rei95, KMMS97, DS98] ....

....synchronous rounds to reach agreement. Furthermore, as an authenticated source is assumed, at least one (transferable) authentication has to be veri ed. Using Failure Detectors. Instead of assuming xed timeouts, we can also implement the optimistic part of the protocol using failure detectors [CT91]. In this case, all timeouts are removed from the protocol; a party broadcasts the pessimism message as soon as a failure detector suspects some other party of being faulty prior to decision. In this model, nothing changes for the optimistic case; the eciency of the pessimistic case becomes ....

T. D. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems (preliminary version). In Proceedings of the Tenth Annual ACM Symposium on Principles of Distributed Computing, pages 325-340, Montreal, Quebec, Canada, 19-21 August 1991.


A Probabilistically Correct Leader Election Protocol for .. - Gupta, van Renesse.. (2000)   (3 citations)  (Correct)

....results all stem from the FLP result [8] which proves that there is no protocol by which an asynchronous system of processes can agree on a binary value, even with only one faulty process. To provide a taxonomy of the complexity of the class of consensus protocols, Chandra and Toueg [4] proposed extending the network with failure detectors. However, the leader election problem can be solved if and only if a perfect failure detector is available one that suspects no alive processes, and eventually suspects every faulty one [20] 6] discusses several weakened system models and ....

T.D. Chandra, S. Toueg, "Unreliable failure detectors for asynchronous systems", Proc. 10th Annual ACM Symp. Principles of Distributed Computing, 1991, pp. 325-340.


A Probabilistically Correct Leader Election Protocol for .. - Gupta, van Renesse..   (3 citations)  (Correct)

....results all stem from the FLP result [12] which proves that there is no protocol by which an asynchronous system of processes can agree on a binary value, even with only one faulty process. To provide a taxonomy of the complexity of the class of consensus protocols, Chandra and Toueg [6] proposed extending the network with failure detectors. For example, the leader election problem can be solved if and only if a perfect failure detector is available one that suspects no alive processes, and eventually suspects every faulty one [24] 10] discusses several weakened system models ....

T.D. Chandra, S. Toueg, "Unreliable failure detectors for asynchronous systems", Proc. 10th Annual ACM Symp. Principles of Distributed Computing, 1991, pp. 325-340.


Consensus and Membership in Synchronous and Asynchronous.. - Galleni, Powell (1996)   (2 citations)  (Correct)

....synchrony to the time free asynchronous model so that deterministic solutions become possible. One way of weakening the time free asynchronous model is to study the necessary conditions for a deterministic solution to consensus [Dolev 8, Dwork 1987] The work on unreliable failure detectors in [Chandra 8, Toueg 1991, Chandra 8, Toueg 1995] and the partially synchronous models in [Dwork et al. 1988] has a similar objective. It should be noted that these theoretical studies seek to find limiting assumptions that enable consensus to be solved deterministically, without considering real implementations that ....

....of reliable failure detection in asynchronous systems. Chandra and Toueg have investigated another way of weakening the time free asynchronous model so that consensus becomes possible. They introduced the concept of failure suspectors or unreliable failure detectors that can make mistakes [Chandra Toueg 1991, Chandra Toueg 1995] and investigated which detectors could be used to solve the consensus problem with crash failures. Each process is assumed to possess a local implementation of such failure detector. Although the detector is unreliable, the mistakes it can make should not prevent any ....

[Article contains additional citation context not shown here]

T. D. Chandra and S. Toueg, "Unreliable Failure Detectors for Asynchronous Systems", in 11th ACM Symp. on Principles of Distributed Computing, (Montreal, Canada), pp.325-40, 1991.


Building Reliable Interoperable Distributed Objects With The.. - Vaysburd (1998)   (15 citations)  (Correct)

....(if such a member exists) Detection of Unfair Links: A group member that doesn t crash will eventually report with a suspect( downcall any member of its view for which the link between the two members is unfair. These two properties are so called completeness properties (in terminology of [CT93] since they only specify in which cases an object must be reported as faulty, but do not set any bounds on accuracy of failure detection. For example, it is possible that a group member will be suspected and removed from the group view even though it didn t crash and is well connected to other ....

....which will perhaps merge back together when they can communicate again. It is thus inherent in the partitionable membership model that multiple concurrent views of the same group can simultaneously exist in the system (Figure 4. 5) Since failure detection is realistically assumed to be unreliable [CT93] and it is often not possible to distinguish crash failures from link failures or network partitions (which all manifest themselves as performance failures) a group component cannot automatically determine whether it is the only active view in the system or whether other group members are ....

[Article contains additional citation context not shown here]

T. Chandra and S. Toueg. Unreliable Failure Detectors for Asynchronous Systems. Journal of the ACM, 1993. 96 97


Relying on Safe Distance to Ensure Consistent Group.. - Qingfeng Huang Christine (2001)   (1 citation)  (Correct)

....their approach treats all partitions as minority ones and effectively halts system processing. Our approach differs in two important respects. We focus on link failures, and we never halt the system. The fact that distributed consensus is impossible in the presence of arbitrary link failures [7, 5, 10], a frequent event in ad hoc networks, makes our job particularly difficult. In fixed networks, one way of dealing with link failures and network partitions is to assume that they are benign and short lived and that the system has enough resources to resolve any data discrepancies after the ....

T. D. Chandra and S. Toueg. Unreliable Failure Detectors for Asynchronous Systems. Journal of ACM, 43(2):225--267, 1996.


Timing Failure Detection and Real-Time Group.. - Quasi-Synchronous..   (Correct)

....some extensions in order to be able to provide early delivery to all participants of the communication group. The idea of using a failure detection service together with the communication protocols, that we address in the present paper 2 , can also be related to work done by the Isis group [10, 3] to solve the problem of consensus in an asynchronous system [7] The major difference between the two scenarios is that we want to reach agreement in a known bounded time. This puts some special requirements on the failure detection service it must be implemented using a synchronous channel. ....

T. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems (preliminary version). Technical report, Department of Computer Science, Cornell University, Ithaca, USA, July 1991.


Using Light-Weight Groups to Handle Timing Failures in.. - Almeida, al. (1998)   (1 citation)  (Correct)

....point. In our paper we introduced extensions in order to be able to provide earlydelivery to all participants of the communication group. The use of a failure detection service together with the communication protocols, that we address in [3] can also be related to work done by the Isis Group [24, 8] to solve the problem of consensus in an asynchronous system [17] The major difference between the two scenarios is that we want to reach agreement in a known bounded time and also detect timing failures. This puts some special requirements on the failure detection service it must be ....

T. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems (preliminary version). Technical report, Department of Computer Science, Cornell University, Ithaca, USA, July 1991.


Synchronous System and Perfect Failure Detector.. - Charron-Bost.. (2000)   (3 citations)  (Correct)

.... atomic commit) are at the heart of such systems [3, 4, 14] one need to consider models that are strong enough to circumvent the impossibility result of [13] For this concern, two main approaches have been proposed: the timing based approach [11, 12, 10, 16, 9] and the failure detector approach [6, 5, 1, 2]. The first approach consists in providing processes with information about time: the resulting models are called timing based models. For example, message delays and relative processes speeds are bounded, and these bounds are known in the perfect timing based model, namely the synchronous ....

....models are those in which timing information is partial or inexact. The second approach, i.e. the failure detector approach, is based on the observation that the impossibility results in the asynchronous model stem from the inherent lack of reliable failure detection. Chandra and Toueg [6] propose to augment the asynchronous model with an external failure detection mechanism, which may make mistakes. Instead of focusing on timing features, the models of [6] are defined according to axiomatic properties of failure detectors. Failure detectors are classified in a hierarchy according ....

[Article contains additional citation context not shown here]

T. D. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems. Journal of the ACM, 43(2):225--267, March 1996.


The Timewheel Group Membership Protocol - Mishra, Fetzer, Cristian (1998)   (Correct)

....current group has been out of date for time units, it is excluded from all up to date groups, and (5) an up to date group contains at least a majority of the processes. The group membership problem or the atomic broadcast problem is not solvable in the time free asynchronous system model [18, 4]. However, existing asynchronous systems have typically enough synchronism to allow a deterministic solution of the group membership or the atomic broadcast problem. For example, a typical execution of a system consists of long periods in which the system is Delta stable interleaved by ....

....We formalize this observation by a progress assumption [14] we assume that the system will be infinitely often Delta stable. This progress assumption allows a deterministic solution of consensus [14] Since the consensus problem is as hard as the group membership or the atomic broadcast problem [4], it also allows a deterministic solution of the group membership or the atomic broadcast problem. 4 Protocol Description 4.1 Overview This protocol is based on the ideas developed in the membership protocols described in [20, 1, 21, 2] Informally, the key idea in these protocols is that the ....

T. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems. In Proceedings of the Tenth ACM Symposium on Principles of Distributed Computing, pages 325--340, Aug 1991.


The Timed Asynchronous Distributed System Model - Cristian, Fetzer (1999)   (78 citations)  (Correct)

....timely. However, progress assumption only require that in nitely often there exists a majority set of processes that for a certain minimum amount of time are timely and can communicate with each other in a timely manner. Progress assumptions have also a certain similarity with failure detectors [15], which are mechanisms to strengthen the time free model: certain failure detector classes provide their desired behavior based on the observation that the system eventually stabilizes. The main di erences between the model of [15] and the timed model are the following: 1) the timed model allows ....

....assumptions have also a certain similarity with failure detectors [15] which are mechanisms to strengthen the time free model: certain failure detector classes provide their desired behavior based on the observation that the system eventually stabilizes. The main di erences between the model of [15] and the timed model are the following: 1) the timed model allows messages to be dropped and processes to recover after a crash, and 2) the timed model provides processes with access to hardware clocks while the model of [15] provides processes with access to a failure detector. Note that hardware ....

[Article contains additional citation context not shown here]

T. Chandra and S. Toueg, \Unreliable failure detectors for asynchronous systems," in Proceedings of the 10th ACM Symposium on Principles of Distributed Computing, Aug 1991, pp. 325-340.


The Unified Structure of Consensus: a Layered Analysis Approach - Moses, Rajsbaum (1998)   (1 citation)  (Correct)

....providing a characterization of solvability of decision problems in the style of [8] which, for some of the models, is given for the first time. 1 Introduction For almost two decades now, the consensus problem has played a central role in the study of fault tolerant distributed computing, e.g. [23, 13, 12, 10, 14, 20, 16, 8, 9]. It has clearly received the greatest amount of attention in the theoretical literature on distributed computing, and has been studied in a large variety of models and under many types of failure assumptions. Work on different variants often in This work has been supported by a Helen and Milton ....

T. Chandra and S. Toueg, "Unreliable failure detectors for asynchronous systems,". in Proceedings of the 10th Annual ACM Symposium on Principles of Distributed Computing, pages 257-272, 1991.


Structured Derivations of Consensus Algorithms for Failure.. - Yang, Neiger, Gafni (1998)   (8 citations)  (Correct)

....Oregon; gil cse.ogi.edu z Department of Computer Science, University of California at Los Angeles; eli cs.ucla.edu Recognizing this, researchers have considered ways in which asynchronous systems might realistically be strengthened to allow consensus to be achieved. In a seminal paper [5], Chandra and Toueg studied the use of unreliable failure detectors for asynchronous message passing systems. A failure detector is an oracle that gives processors some information about failures in the system; in the simplest cases, it gives a processor a list of other processors that it ....

....considers algorithms in which the number of failures is bounded. As noted above, processors may fail by stopping (or crashing) Normally, other processes are not directly aware of these failures. Failure detectors allow processors to gain additional knowledge about failures. Chandra and Toueg [5] showed that, in order to solve consensus, this additional knowledge need not be perfect or even correct. A failure detector is defined by specifying, for each failure pattern (i.e. which processes crash at what times) a set of allowable failure detector histories. Each of these indicates, for ....

[Article contains additional citation context not shown here]

CHANDRA, T. D., AND TOUEG, S. Unreliable failure detectors for asynchronous systems. J. ACM 43, 2 (Mar. 1996), 225--267.


Real-Time Fault-Tolerant Atomic Broadcast - Delporte-Gallet, Fauconnier   (Correct)

....Synchronized Phase System (SPS) and then we develop, in this model, an algorithm for Uniform Atomic Broadcast. This model is sufficiently powerful to overcome the impossibility result of [13] sufficiently abstract to be implemented in different systems (asynchronous systems with failure detectors [6], timed asynchronous distributed systems [9] or more generally partially synchronous systems [11, 12] Moreover, we had to be able, after embedding in a timed system like ATR, to ensure real time properties of the algorithms. It would be notice that SPS can be used to solve other Consensus ....

....Phase p , a vector of integers (called a phases vector) indexed by the set of processes P. The order relation for the phases is the order defined on the vectors by: V V 0 ( 8p 2 P V [p] V [p 0 ] A process uses a failure detector. This concept was introduced by Chandra and Toueg in [6] for asynchronous systems. The failure detector for the process p gives information about the (supposed) crashed processes. This information may be wrong. We suppose that the failure detector of process p maintains a list, D p ) of trusted processes. 5 A process is in this list, if it is ....

[Article contains additional citation context not shown here]

T. D. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems. J. ACM, 43(2):225--267, Mar. 1996.


Greatest Lower Bounds for Consensus Using Unreliable.. - Delporte-Gallet..   (Correct)

....message chain of all algorithms using a strong failure detector is greater than the number of processes, no matter the number of faulty processes tolerated by the algorithm. Key words: distributed algorithm, fault tolerance, unreliable failure detectors, complexity measures. 1 Introduction In [3], Chandra and Toueg showed that, by augmenting the asynchronous system model with unreliable failure detectors, the consensus problem become solvable. Beyond the theorical significance of this result, from a practical point of view, an evaluation of the effectiveness of this approach is ....

....correct. Moreover, each process p has access to a local failure detector module FD p that provides to the process a list of suspected processes. Information given by the failure detector may be wrong. The precise definition and the classification of unreliable failure detectors can be found in [3]. All failure detectors we consider here satisfy the completeness property: Eventually, every crashed process is permanently suspected by every correct process. If no process is never suspected before it crashes, the failure detector is in the class P (Perfect) If some correct process is never ....

[Article contains additional citation context not shown here]

T. D. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems. J. ACM, 43(2):225--267, Mar. 1996.


Fault-Tolerant Genuine Atomic Multicast - Delporte-Gallet, Fauconnier   (Correct)

....are involved in the protocol needed to deliver the message [11] Clearly, inside a destination group, the atomic multicast has to ensure an atomic broadcast restricted to this group. However, atomic broadcast and consensus are equivalent problem in asynchronous systems prone to crash failure [4] and the impossibility result of [8] leads also to the impossibility of atomic broadcast restricted to a group in which at least one process can crash. To circumvent this impossibility result, following [4] we can, for example, use, inside each group, unreliable failure detectors given ....

....and consensus are equivalent problem in asynchronous systems prone to crash failure [4] and the impossibility result of [8] leads also to the impossibility of atomic broadcast restricted to a group in which at least one process can crash. To circumvent this impossibility result, following [4], we can, for example, use, inside each group, unreliable failure detectors given information about the crashes of processes in the group. More abstractly, we suppose that, inside each group, we can realize the consensus. LIAFA, Universit e Denis Diderot, 2, pl. Jussieu, F75251 Paris Cedex 05. ....

[Article contains additional citation context not shown here]

T. D. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems. J. ACM, 43(2):225--267, Mar. 1996.


Lower Bounds with Unreliable Failure Detectors - Delporte-Gallet, Fauconnier (1999)   (Correct)

....message chain of all algorithms using a strong failure detector is greater than the number of processes, no matter the number of faulty processes tolerated by the algorithm. Key words: distributed algorithm, fault tolerance, unreliable failure detector, complexity measures. 1 Introduction In [2], Chandra and Toueg showed that, by augmenting the asynchronous system model with unreliable failure detectors, consensus become solvable. Beyond the theorical significance of this result, from a practical point of view, an evaluation of the effectiveness of this approach is necessary. For this, ....

....one latency degree, a slight variation of the latency degree defined by Schiper [7] is essentially a best case analysis and the second one, CTC, a worst case analysis. With help of these measures, we study the consensus problem and give lower bounds for the main unreliable failure detectors of [2]. The main result is the lower bound concerning the class of strong failure detectors. Finally, in conclusion we propose other complexity measures more suited to the most interesting failure detectors classes 3S and 3P . E mail: fcd,hfg liafa.jussieu.fr 2 Definitions We consider ....

[Article contains additional citation context not shown here]

T. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems. J. ACM, 43(2):225--267, Mar. 1996.


The Timed Asynchronous Distributed System Model - Cristian, Fetzer (1999)   (78 citations)  (Correct)

....timely. However, progress assumption only require that infinitely often there exists a majority set of processes that for a certain minimum amount of time are timely and can communicate with each other in a timely manner. Progress assumptions have also a certain similarity with failure detectors [3], which are mechanisms to strengthen the time free model: certain failure detector classes provide their desired behavior based on the observation that the system eventually stabilizes. The main differences between the model of [3] and the timed model are the following: 1) the timed model allows ....

....assumptions have also a certain similarity with failure detectors [3] which are mechanisms to strengthen the time free model: certain failure detector classes provide their desired behavior based on the observation that the system eventually stabilizes. The main differences between the model of [3] and the timed model are the following: 1) the timed model allows messages to be dropped and processes to recover after a crash, and 2) the timed model provides processes with access to hardware clocks while the model of [3] provides processes with access to a failure detector. Note that hardware ....

[Article contains additional citation context not shown here]

T. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems. In Proceedings of the 10th ACM Symposium on Principles of Distributed Computing, pages 325--340, Aug 1991.


The Design and Implementation of Lansis/E - Rodeh   (Correct)

....Following the Relacs and Phoenix projects we define reachability between processes. We say that process q is reachable from process p if and only if p s messages eventually reach q (we denote this by p ;q) Reachability changes with time and network conditions. Typically, a failure detector [CT91] is used by each process to decide which processes are reachable at a given time instance. A failure detector usually works via timeouts and I m alive messages. Denote the failure detector at process p by FD p . FD p broadcasts each timeout an ImAlive message. A process q from which FD p ....

T. D. Chandra and S. Toueg. Unreliable Failure Detectors for Asynchronous Systems. In 10th ACM Symp. on Prin. of Database Systems (PODS), pages 325--340, 1991.


Using Optimistic Atomic Broadcast in Transaction.. - Kemme, Pedone, Alonso, .. (1999)   (2 citations)  (Correct)

....primitives are often proposed as a mechanism to increase fault tolerance in distributed systems. These primitives use different ordering semantics to provide a very flexible framework in which to develop distributed systems. One example of the available semantics is the Atomic Broadcast primitive [CT91, BSS91] which guarantees that all sites deliver all messages in the same order. Unfortunately, it is also widely recognized that group communication systems suffer from scalability problems [BC94, FvR95] While performance characteristics depend on the implementation strategy, the fundamental ....

....in the same order. Furthermore, reliability is provided in the sense that all sites decide on the same set of messages to deliver. Sites that have crashed will deliver the messages after recovering from the failure. Although there exist many different ways to implement total order delivery [CT91, BSS91, DM96, MMSA 96, vRBM96] all of them require some coordination between sites to guarantee that all messages are delivered in the same order at the different sites. However, when network broadcast (e.g. IP multicast) is used, there is a high probability that the messages arrive at all ....

[Article contains additional citation context not shown here]

T. D. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems. In Proc. of the 10th ACM Symp. on Principles of Distributed Computing, pages 325--340, August 1991.


Collective Consistency - Dwork, Ho, Strong (1996)   (1 citation)  (Correct)

....desirable and possible to achieve is the reduction of the window (time period) during which a process failure can cause some or all other processes to block. Specifically, we propose the following approach: 1. Augment the collective calls to the transport layer with a simple failure detector [CT, CHT], so that no process blocks during the collective communication. 2. Run a (preferably simple and brief) collective consistency protocol (CCP) during which some participants may block, allowing those processes that return to share a consistent view of the set of non failed processes. ....

.... in the worst case makes no global decisions (no coordination) and allows all the work to be done by each of the participants [BW, W] or depends on the timing or randomization assumptions we associate with consensus above [DHW, KS] Strong collective consistency has been studied before, e.g. by [CT, DLS, R]. Rabin called the problem choice coordination, and studied it in the context of shared memory with atomic test and set (the atomicity of test and set can be viewed as an additional timing assumption) The relevant results of Chandra and Toueg and Dwork, Lynch, and Stockmeyer discuss conditions ....

[Article contains additional citation context not shown here]

T. Chandra and S. Toueg, Unreliable failure detectors for asynchronous systems, Proc. 10th Ann. ACM Symp. on PODC, pages 325-340, 1991, to appear in J. ACM.


Impossibility of (Repeated) Reliable Broadcast - Ricciardi (1995)   (Correct)

....and Leader Election are concrete examples. We show that, in the absence of infinite storage capacity, perfect failure detectors are needed to solve Repeated Distributed Consensus (RDC) even though eventually weak failure detectors suffice for solving a single instance of Distributed Consensus [4]. This points out a subtle assumption (that of infinite storage capacity) made in using Consensus to solve Atomic Broadcast [4] We generalize these results to any problem that requires all correct processes to reach the same set of decisions (possibly in the same order) An interesting ....

.... are needed to solve Repeated Distributed Consensus (RDC) even though eventually weak failure detectors suffice for solving a single instance of Distributed Consensus [4] This points out a subtle assumption (that of infinite storage capacity) made in using Consensus to solve Atomic Broadcast [4]. We generalize these results to any problem that requires all correct processes to reach the same set of decisions (possibly in the same order) An interesting corollary of our results shows that the commonlystated reason behind the impossibility of Distributed Consensus, the inability to know ....

[Article contains additional citation context not shown here]

T. D. Chandra and S. Toueg. Unreliable Failure Detectors for Asynchronous Systems. In Tenth PODC, pages 325--340. ACM, 1991.


Processing Transactions over Optimistic Atomic.. - Kemme, Pedone, Alonso, .. (1999)   (4 citations)  (Correct)

....that employs the new atomic broadcast primitive in such a way that the coordination phase of the atomic broadcast is fully overlapped with the execution of transactions, providing high performance without relaxing transaction correctness. 1. Introduction and Motivation Atomic Broadcast [6, 5] primitives are a well known mechanism to increase fault tolerance and to provide a semantically rich framework to develop distributed systems. Unfortunately, it is also recognized that atomic broadcast suffers from scalability problems [4, 8] as it involves coordination between sites before ....

....transactions. 2.1. Atomic Broadcast with Optimistic Delivery Communication is based on atomic broadcast providing an ordering of all messages in the system, i.e. all sites receive all messages in the same order. Although there exist many different approaches on how to implement total order [6, 5, 7, 14, 21], all of them require some coordination between sites to guarantee that all messages are delivered in the same order at the different sites. However, when network broadcast (e.g. IP multicast) is used, there is a high probability that messages arrive at all sites spontaneously totally ordered ....

T. D. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems. In Proc. of the 10th ACM Symp. on Principles of Distributed Computing, pages 325--340, August 1991.


Impossibility of (Repeated) Reliable Broadcast - Ricciardi (1996)   (Correct)

.... repeatedly given the reality of finite storage space and the need to maintain information pertaining to decisions for all correct processes (i.e. processes that never crash) We consider first Repeated Reliable Broadcast (RRB) and show that solutions require either perfect failure detectors [5] or infinite storage capacity. While falling media prices may make it seem that storage space is, for all intents and purposes infinite , sliding window protocols, which are the typical implementations of reliable channels, accommodate on the order of ten distinct messages [9] Our results point ....

....channels (communication channels with the property that each message has non zero probability of reaching its destination) suffice. We then discuss the implications with respect to implementing strong failure detectors from weak ones, for solving Consensus, and for solving Atomic Broadcast [5]. We show that some chance channels suffice in the first two cases, but not the last. That is, when storage space considered in the computing model, Consensus is weaker than both RRB and Atomic Broadcast, because, in the absence of reliable channels, it can still be solved with an eventually weak ....

[Article contains additional citation context not shown here]

T. D. Chandra and S. Toueg. Unreliable Failure Detectors for Asynchronous Systems. In Tenth PODC, pages 325--340. ACM, 1991. (also Cornell University TR93-1374).


Deciding in Partitionable Networks - Friedman, Keidar, Malki, Birman.. (1995)   (5 citations)  (Correct)

....In the present paper, we consider the problem of reaching agreement decisions in an asynchronous environment with all possible benign failure types: crash and recovery, message omission failures, and network partitions and re merges. We extend the definitions of failure detectors given in [3] to such environments. By these definitions, correct processes are alive and incorrect processes are crashed. Thus, there may be two processes considered correct that are unable to exchange messages because of communication problems. The first part of the paper concludes that agreement is ....

....in the component. Informally, the first condition states a majority connected condition, and the second and third conditions characterize an eventually strong failure detector 1 . These conditions are defined formally in the paper. The eventually strong failure detector is analogous to 3S in [3] but is defined for the extended failure model. Similar but stronger conditions for reaching agreement on the membership of a group processes are given in the timed asynchronous model of Cristian and Schmuck [4] In our terminology, protocols that achieve agreement in such an environment are ....

[Article contains additional citation context not shown here]

T. D. Chandra and S. Toueg. Unreliable Failure Detectors for Asynchronous Systems. Journal of the ACM. To appear, previous version in PODC 1991 pp. 325-340.


The Rampart Toolkit for Building High-Integrity Services - Reiter (1995)   (75 citations)  (Correct)

....view can be guaranteed only if there exists a correct group member whose removal is not requested for sufficiently long (and thus who is reachable for sufficiently long) by more than two thirds of the current group members. Advances on reaching consensus in the presence of crash failures (e.g. [6]) hint that it may be possible to somewhat weaken this requirement; this is a topic for future research. Changing the group membership (i.e. generating a new group view) in our present implementation is a heavyweight operation. For instance, if RSA keys with 512 bit moduli are used, then the ....

T. D. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems. In Proceedings of the 10th ACM Symposium on Principles of Distributed Computing, pages 325--340, Aug. 1991.


Self-Stabilization Bibliography: Access Guide - Herman (1998)   (2 citations)  (Correct)

.... Several authors take a critical view of self stabilization [JT90, BG95] however it should be noted that one aspect of self stabilization, that a system eventually reaches a legitimate state or behavior, is seen with increasing frequency in the literature of distributed computing [CT92] many recent papers could as well be cited, but they are outside the scope of the bibliography) 4 Rings The ring topology, useful and beloved in general for distributed computing and network protocols, is the starting point for many of bibliography s entries. Citations in this section are ....

....sections, is likely to appear in future versions of this bibliography We hope to see growing connections between self stabilization and other areas of fault tolerance. As noted in Section 3, there appears to be growing appreciation of programs and services that make eventuality guarantees [CT92, FGL 96] such work may offer new application of self stabilization beyond currently recognized potential application domains (communication protocols and load balancing) The paper [AK95] shows how self stabilization can be useful even in fault tolerant systems with more stringent ....

TD Chandra and S Toueg. Unreliable failure detectors for asynchronous systems. In PODC91 Proceedings of the Tenth Annual ACM Symposium on Principles of Distributed Computing, pages 325--340, 1992.


A Framework for Partitionable Membership Service - Dolev, Malki, Strong (1995)   (45 citations)  (Correct)

....In practice, this deficiency does not limit our approach, since failure detection in asynchronous environments is inaccurate anyway. However, adding such restrictions would allow us to analyze the framework in environments extended with failure detectors (e.g. the failure detectors discussed in [6]) We are currently investigating the extension of our framework to incorporate such restrictions. 7 Conclusions In a world of growing dependency on computers, the ability to continue operation in a dynamic environment is crucial. Algorithms that completely depend on the existence of a primary ....

T. D. Chandra and S. Toueg. Unreliable Failure Detectors for Asynchronous Systems. In proc. 10th annual ACM Symposium on Principles of Distributed Computing, pages 325--340, 1991.


A Configurable Membership Service - Hiltunen, Schlichting (1994)   (18 citations)  (Correct)

.... one that detects a change only if the change has indeed occurred (i.e. no false detections) while a live membership service is one that is guaranteed to detect all changes eventually [BG93] In an asynchronous system, it is impossible to have a membership service that is both live and accurate [CT91, FLP85] We can further distinguish between properties related to detecting failures and those related to detecting recoveries. For example, in most systems, failure detection is live, while recovery detection is accurate. Confidence. The confidence property is the degree of certainty in a ....

T. D. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems. In Proceedings of the 10th ACM Symposium on Principles of Distributed Computing, pages 325--340, Aug 1991.


Towards a Unified Comparison of Synchronous and Asynchronous.. - Galleni, Powell (1995)   (2 citations)  (Correct)

....value that is related to the proposed values. In an asynchronous system, the progress property requires that consensus is reached eventually (liveness property) in a synchronous system, this condition requires that consensus is reached in a bounded time (timeliness property) It has been shown [12] that the consensus problem is equivalent to the atomic broadcast problem, that is, given a model in which there exists a protocol solving one problem, it is 4 Like in Cristian s timed asynchronous system [20] 5 possible to reduce it to a protocol solving the other one. Atomic broadcast in an ....

....that is, given a model in which there exists a protocol solving one problem, it is 4 Like in Cristian s timed asynchronous system [20] 5 possible to reduce it to a protocol solving the other one. Atomic broadcast in an asynchronous system satisfies the following four properties (as defined in [12]) 1) validity: if a correct processor broadcasts a message m, then all correct processors deliver m (progress) 2) agreement: if a correct processor delivers a message m, then all correct processors deliver m (progress) 3) uniform integrity: for any message m, each processor delivers m at ....

[Article contains additional citation context not shown here]

R. D. Chandra and S. Toueg, "Unreliable Failure Detectors for Asynchronous Systems", in Proc. 10th Symp. on Principles of Distributed Computing, pp.325-340, ACM Press, August 1991.


A Motivation of State Definitions - In This Section   Self-citation (Chandra Toueg)   (Correct)

.... Motivation of State Definitions In this section, we provide some intuition on the definitions given in Section 5 and relate them to the agreement protocol of Chandra and Toueg [3] and the E3PC protocol of Keidar and Dolev [7] Note, however, that this section is intended to give some intuition, and not to cover all possible cases. Both the E3PC and the Chandra and Toueg protocols consist of rounds, each of which, if all goes well, requires 3 phases. In the first phase, ....

T. D. Chandra and S. Toueg. Unreliable Failure Detectors for Asynchronous Systems. Journal of the ACM. To appear, previous version in PODC 1991 pp. 325-340.


Thrifty Generic Broadcast - Aguilera, Delporte-Gallet.. (2000)   (9 citations)  Self-citation (Toueg)   (Correct)

....with generic broadcast. Similarly, abroadcast and adeliver are associated with atomic broadcast. 3 Even though one can implement failure detectors that are fairly accurate in practice [14, 6] they may have bad periods of time when they make too many mistakes to be useful. For example, from [5] there is an atomic broadcast algorithm that never deliver messages out of order, but message delivery is delayed if when the algorithm happens to rely on the failure detector during one of its bad periods. unlimited number of times. This motivates our second definition. An implementation of ....

....time t. We assume that if no process ever uses the oracle (all queries in H are ) then the oracle never gives any answer (all answers in H are ) An oracle O is function that takes a failure pattern F and returns a set O(F ) of oracle histories 5 . Oracles of interest include failure detectors [5], an atomic broadcast black box, and a consensus black box. For example, an atomic broadcast black box can be modeled as an oracle that accepts broadcast(m) queries, and outputs deliver(m) answers, where the queries answers satisfy the usual specification of atomic broadcast (see Section ....

T. D. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems. J. ACM, 43(2):225--267, Mar. 1996.


Synchronization in Massive Multiplayer - Online Games Stefano   (Correct)

No context found.

T.D. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems (preliminary version). In Proceedings of the tenth annual ACM symposium on Principles of distributed computing, pages 325--340. ACM Press, 1991.


An Architecture for Dynamic Scalable Self-Managed.. - Anceaume, Friedman, .. (2004)   (Correct)

No context found.

T. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems. Journal of the ACM, 43(2):225--267, 1996.


On the Benefits of the Functional Modular Approach to.. - FRIEDMAN, RAYNAL (2004)   (Correct)

No context found.

T. Chandra and S. Toueg. Unreliable Failure Detectors for Asynchronous Systems. Journal of the ACM, 43(4):685--722, July 1996.


Using JavaGroups for protocols comparison - Miedes Gald Amez (2003)   (Correct)

No context found.

T. D. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems. In PODC91 Proceedings of the Tenth Annual ACM Symposium on Principles of Distributed Computing, pages 325-340, 1992.


The DARX Framework: Adapting Fault Tolerance for Agent Systems - Marin (2003)   (Correct)

No context found.

T.D. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems (preliminary version). In Proceedings of the 10th annual ACM symposium on Principles Of Distributed Computing, pages 325--340. ACM Press, 1991.


Practical Impact of Group Communication Theory - Schiper (2003)   (1 citation)  (Correct)

No context found.

11 T. D. Chandra and S. Toueg. Unreliable Failure Detectors for Asynchronous Systems. In proc. 10th annual ACM Symposium on Principles of Distributed Computing, pages 325--340, 1991.


Spatiotemporal Multicast And Partitionable Group Membership Service - Huang (2003)   (Correct)

No context found.

T. D. Chandra and S. Toueg. Unreliable Failure Detectors for Asynchronous Systems. Journal of ACM, 43(2):225--267, 1996.


An Architecture for Dynamic Scalable Self-Managed Distributed.. - Anceaume, al. (2004)   (Correct)

No context found.

T. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems. Journal of the ACM, 43(2):225-- 267, 1996.


The BCG Membership Service Performance Analysis - Greve, Macedo   (Correct)

No context found.

D. Chandra, S. Toueg. Unreliable Failure Detectors for Asynchronous Systems. Proc. 10th ACM Symposium on Principles of Distributed Computing, pp. 325-340, Montreal, August, 1991.


Programming Partition-Aware Network Applications - Babaoglu, Bartoli, Dini (1999)   (3 citations)  (Correct)

No context found.

Chandra, T.D., Toueg, S.; Unreliable Failure Detectors for Asynchronous Systems. In: Proc. of the 10th ACM Symp. on Princ. of Distr. Comp. (1991) 325-340.


Abstractions for Mobile Computation - Cardelli (1998)   (67 citations)  (Correct)

No context found.

Chandra, T.D., S.Toueg, Unreliable failure detectors for asynchronous systems. ACM Symposium on Principles of Distributed Computing, 325-340. 1991.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC