Results 1  10
of
127
Unreliable Failure Detectors for Reliable Distributed Systems
 Journal of the ACM
, 1996
"... We introduce the concept of unreliable failure detectors and study how they can be used to solve Consensus in asynchronous systems with crash failures. We characterise unreliable failure detectors in terms of two properties — completeness and accuracy. We show that Consensus can be solved even with ..."
Abstract

Cited by 1089 (19 self)
 Add to MetaCart
(Show Context)
We introduce the concept of unreliable failure detectors and study how they can be used to solve Consensus in asynchronous systems with crash failures. We characterise unreliable failure detectors in terms of two properties — completeness and accuracy. We show that Consensus can be solved even with unreliable failure detectors that make an infinite number of mistakes, and determine which ones can be used to solve Consensus despite any number of crashes, and which ones require a majority of correct processes. We prove that Consensus and Atomic Broadcast are reducible to each other in asynchronous systems with crash failures; thus the above results also apply to Atomic Broadcast. A companion paper shows that one of the failure detectors introduced here is the weakest failure detector for solving Consensus [Chandra et al. 1992].
Consensus in the presence of partial synchrony
 JOURNAL OF THE ACM
, 1988
"... The concept of partial synchrony in a distributed system is introduced. Partial synchrony lies between the cases of a synchronous system and an asynchronous system. In a synchronous system, there is a known fixed upper bound A on the time required for a message to be sent from one processor to ano ..."
Abstract

Cited by 521 (19 self)
 Add to MetaCart
The concept of partial synchrony in a distributed system is introduced. Partial synchrony lies between the cases of a synchronous system and an asynchronous system. In a synchronous system, there is a known fixed upper bound A on the time required for a message to be sent from one processor to another and a known fixed upper bound (I, on the relative speeds of different processors. In an asynchronous system no fixed upper bounds A and (I, exist. In one version of partial synchrony, fixed bounds A and (I, exist, but they are not known a priori. The problem is to design protocols that work correctly in the partially synchronous system regardless of the actual values of the bounds A and (I,. In another version of partial synchrony, the bounds are known, but are only guaranteed to hold starting at some unknown time T, and protocols must be designed to work correctly regardless of when time T occurs. Faulttolerant consensus protocols are given for various cases of partial synchrony and various fault models. Lower bounds that show in most cases that our protocols are optimal with respect to the number of faults tolerated are also given. Our consensus protocols for partially synchronous processors use new protocols for faulttolerant "distributed clocks" that allow partially synchronous processors to reach some approximately common notion of time.
Atomic Broadcast: From Simple Message Diffusion to Byzantine Agreement
 Information and Computation
, 1985
"... In distributed systems subject to random communication delays and component failures, atomic broadcast can be used to implement the abstraction of synchronous replicated storage, a distributed storage that displays the same contents at every correct processor as of any clock time. This paper present ..."
Abstract

Cited by 247 (15 self)
 Add to MetaCart
(Show Context)
In distributed systems subject to random communication delays and component failures, atomic broadcast can be used to implement the abstraction of synchronous replicated storage, a distributed storage that displays the same contents at every correct processor as of any clock time. This paper presents a systematic derivation of a family of atomic broadcast protocols that are tolerant of increasingly general failure classes: omission failures, timing failures, and authenticationdetectable Byzantine failures. The protocols work for arbitrary pointtopoint network topologies, and can tolerate any number of link and process failures up to network partitioning. After proving their correctness, we also prove two lower bounds that show that the protocols provide in many cases the best possible termination times. Keywords and phrases: Atomic Broadcast, Byzantine Agreement, Computer Network, Correctnesss, Distributed System, Failure Classification, FaultTolerance, Lower Bound, RealTime Syste...
Oneway accumulators: A decentralized alternative to digital signatures
, 1993
"... Abstract. This paper describes a simple candidate oneway hash function which satisfies a quasicommutative property that allows it to be used aa an accumulator. This property allows protocols to be developed in which the need for a trusted central authority can be eliminated. Spaceefficient distr ..."
Abstract

Cited by 153 (0 self)
 Add to MetaCart
Abstract. This paper describes a simple candidate oneway hash function which satisfies a quasicommutative property that allows it to be used aa an accumulator. This property allows protocols to be developed in which the need for a trusted central authority can be eliminated. Spaceefficient distributed protocols are given for document time stamping and for membership testing, and many other applications are possible. 1
Programming Simultaneous Actions Using Common Knowledge
 Algorithmica
, 1988
"... This work applies the theory of knowledge in distributed systems to the design of efficient faulttolerant protocols. We define a large class of problems requiring coordinated, simultaneous action in synchronous systems, and give a method of transforming specifications of such problems into protocol ..."
Abstract

Cited by 99 (32 self)
 Add to MetaCart
(Show Context)
This work applies the theory of knowledge in distributed systems to the design of efficient faulttolerant protocols. We define a large class of problems requiring coordinated, simultaneous action in synchronous systems, and give a method of transforming specifications of such problems into protocols that are optimal in all runs: for every possible input to the system and faulty processor behavior, these protocols are guaranteed to perform the simultaneous actions as soon as any other protocol could possibly perform them. This transformation is performed in two steps. In the first step, we extract directly from the problem specification a highlevel protocol programmed using explicit tests for common knowledge. In the second step, we carefully analyze when facts become common knowledge, thereby providing a method of efficiently implementing these protocols in many variants of the omissions failure model. In the generalized omissions model, however, our analysis shows that testing for common knowledge is NPhard. Given the close correspondence between common knowledge and simultaneous actions, we are able to show that no optimal protocol for any such problem can be computationally efficient in this model. The analysis in this paper exposes many subtle differences between the failure models, including the precise point at which this gap in complexity occurs.
Studies in Secure Multiparty Computation and Applications
, 1996
"... Consider a set of parties who do not trust each other, nor the channels by which they communicate. Still, the parties wish to correctly compute some common function of their local inputs, while keeping their local data as private as possible. This, in a nutshell, is the problem of secure multiparty ..."
Abstract

Cited by 88 (9 self)
 Add to MetaCart
Consider a set of parties who do not trust each other, nor the channels by which they communicate. Still, the parties wish to correctly compute some common function of their local inputs, while keeping their local data as private as possible. This, in a nutshell, is the problem of secure multiparty computation. This problem is fundamental in cryptography and in the study of distributed computations. It takes many different forms, depending on the underlying network, on the function to be computed, and on the amount of distrust the parties have in each other and in the network. We study several aspects of secure multiparty computation. We first present new definitions of this problem in various settings. Our definitions draw from previous ideas and formalizations, and incorporate aspects that were previously overlooked. Next we study the problem of dealing with adaptive adversaries. (Adaptive adversaries are adversaries that corrupt parties during the course of the computation, based on...
Faulttolerance in collaborative sensor networks for target detection
 IEEE Transactions on Computers
, 2004
"... Abstract—Collaboration in sensor networks must be faulttolerant due to the harsh environmental conditions in which such networks can be deployed. This paper focuses on finding algorithms for collaborative target detection that are efficient in terms of communication cost, precision, accuracy, and n ..."
Abstract

Cited by 86 (4 self)
 Add to MetaCart
(Show Context)
Abstract—Collaboration in sensor networks must be faulttolerant due to the harsh environmental conditions in which such networks can be deployed. This paper focuses on finding algorithms for collaborative target detection that are efficient in terms of communication cost, precision, accuracy, and number of faulty sensors tolerable in the network. Two algorithms, namely, value fusion and decision fusion, are identified first. When comparing their performance and communication overhead, decision fusion is found to become superior to value fusion as the ratio of faulty sensors to fault free sensors increases. As robust data fusion requires agreement among nodes in the network, an analysis of fully distributed and hierarchical agreement is also presented. The impact of hierarchical agreement on communication cost and system failure probability is evaluated and a method for determining the number of tolerable faults is identified. Index Terms—Collaborative target detection, decision fusion, fault tolerance, sensor networks, value fusion. 1
Fast Asynchronous Byzantine Agreement with Optimal Resilience
, 1998
"... It is known that, in both asynchronous and synchronous networks, no Byzantine Agreement (BA) protocol for n players exists if d e of the players are faulty (in other words, no BA protocol is d eresilient). The only known asynchronous (d e \Gamma 1)resilient BA protocol runs in expected ..."
Abstract

Cited by 81 (0 self)
 Add to MetaCart
It is known that, in both asynchronous and synchronous networks, no Byzantine Agreement (BA) protocol for n players exists if d e of the players are faulty (in other words, no BA protocol is d eresilient). The only known asynchronous (d e \Gamma 1)resilient BA protocol runs in expected exponential time, and the best resilience achieved by an asynchronous protocol with polynomial complexity is (d 4 e \Gamma 1). The question whether there exists an asynchronous (d BA protocol with polynomial complexity remained open.
Understanding Protocols for Byzantine Clock Synchronization
, 1987
"... All published faulttolerant clock synchronization protocols are shown to result from refining a single paradigm. This allows the differera clock synchronization protocols to be compared and permits presemation of a single correctness analysis that holds for all. The paradigm is based on a reliab ..."
Abstract

Cited by 80 (0 self)
 Add to MetaCart
All published faulttolerant clock synchronization protocols are shown to result from refining a single paradigm. This allows the differera clock synchronization protocols to be compared and permits presemation of a single correctness analysis that holds for all. The paradigm is based on a reliable time source that periodically causes events; detection of such an event causes a processor to reset its clock. In a distributed system, the reliable time source can be approximated by combining the values of processor clocks using a generalization of a "faulttolerant average", called a convergence function. The performance of a clock synchronization protocol based on our paradigm can be quantified in terms of the two parameters that characterize the behavior of the convergence function used: accuracy and precision.
Fully polynomial Byzantine agreement for n > 3t processors in t + 1 rounds
 SIAM Journal on Computing
, 1998
"... ..."
(Show Context)