• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Implementing fault-tolerant services using the state machine approach: a tutorial (1990)

by F B Schneider
Venue:ACM Computing Surveys
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 975
Next 10 →

Unreliable Failure Detectors for Reliable Distributed Systems

by Tushar Deepak Chandra, Sam Toueg - Journal of the ACM , 1996
"... We introduce the concept of unreliable failure detectors and study how they can be used to solve Consensus in asynchronous systems with crash failures. We characterise unreliable failure detectors in terms of two properties — completeness and accuracy. We show that Consensus can be solved even with ..."
Abstract - Cited by 1094 (19 self) - Add to MetaCart
We introduce the concept of unreliable failure detectors and study how they can be used to solve Consensus in asynchronous systems with crash failures. We characterise unreliable failure detectors in terms of two properties — completeness and accuracy. We show that Consensus can be solved even with unreliable failure detectors that make an infinite number of mistakes, and determine which ones can be used to solve Consensus despite any number of crashes, and which ones require a majority of correct processes. We prove that Consensus and Atomic Broadcast are reducible to each other in asynchronous systems with crash failures; thus the above results also apply to Atomic Broadcast. A companion paper shows that one of the failure detectors introduced here is the weakest failure detector for solving Consensus [Chandra et al. 1992].
(Show Context)

Citation Context

...es, so that they agree on the set of messages they deliver and the order of message deliveries. Applications based on these paradigms include SIFT [Wensley et al. 1978], State Machines [Lamport 1978; =-=Schneider 1990-=-], Isis [Birman and Joseph 1987; Birman et al. 1990], Psync [Peterson et al. 1989], Amoeba [Mullender 1987], Delta-4 [Powell 1991], Transis [Amir et al. 1991], HAS [Cristian 1987], FAA [Cristian et al...

Securing ad hoc networks

by Lidong Zhou, Zygmunt J. Haas
"... Ad hoc networks are a new wireless networking paradigm for mobile hosts. Unlike traditional mobile wireless networks, ad hoc networks do not rely on any fixed infrastructure. Instead, hosts rely on each other to keep the network connected. The military tactical and other security-sensitive operation ..."
Abstract - Cited by 1064 (15 self) - Add to MetaCart
Ad hoc networks are a new wireless networking paradigm for mobile hosts. Unlike traditional mobile wireless networks, ad hoc networks do not rely on any fixed infrastructure. Instead, hosts rely on each other to keep the network connected. The military tactical and other security-sensitive operations are still the main applications of ad hoc networks, although there is a trend to adopt ad hoc networks for commercial uses due to their unique properties. One main challenge in design of these networks is their vulnerability to security attacks. In this paper, we study the threats an ad hoc network faces and the security goals to be achieved. We identify the new challenges and opportunities posed by this new networking environment and explore new approaches to secure its communication. In particular, we take advantage of the inherent redundancy in ad hoc networks — multiple routes between nodes — to defend routing against denial of service attacks. We also use replication and new cryptographic schemes, such as threshold cryptography, to build a highly secure and highly available key management service, which forms the core of our security framework.
(Show Context)

Citation Context

...ong the servers suces to achieve the guarantee of the service because of the intersection property of Byzantine quorum systems. In [2], Castro and Liskov extend the replicated state-machine approach =-=[41]-=- to achieve Byzantine fault tolerance. They use a three-phase protocol to mask away disruptive behavior of compromised servers. A small portion of servers may be left behind, but can recover by commun...

The part-time parliament

by Leslie Lamport , 1998
"... ..."
Abstract - Cited by 855 (12 self) - Add to MetaCart
Abstract not found

Practical Byzantine fault tolerance

by Miguel Castro, Barbara Liskov , 1999
"... This paper describes a new replication algorithm that is able to tolerate Byzantine faults. We believe that Byzantinefault-tolerant algorithms will be increasingly important in the future because malicious attacks and software errors are increasingly common and can cause faulty nodes to exhibit arbi ..."
Abstract - Cited by 673 (15 self) - Add to MetaCart
This paper describes a new replication algorithm that is able to tolerate Byzantine faults. We believe that Byzantinefault-tolerant algorithms will be increasingly important in the future because malicious attacks and software errors are increasingly common and can cause faulty nodes to exhibit arbitrary behavior. Whereas previous algorithms assumed a synchronous system or were too slow to be used in practice, the algorithm described in this paper is practical: it works in asynchronous environments like the Internet and incorporates several important optimizations that improve the response time of previous algorithms by more than an order of magnitude. We implemented a Byzantine-fault-tolerant NFS service using our algorithm and measured its performance. The results show that our service is only 3 % slower than a standard unreplicated NFS.
(Show Context)

Citation Context

...lty nodes to exhibit Byzantine (i.e., arbitrary) behavior, Byzantine-fault-tolerant algorithms are increasingly important. This paper presents a new, practical algorithm for state machine replication =-=[17, 34]-=- that tolerates Byzantine faults. The algorithm offers both liveness and safety provided at most b n;1 3 c out of a total of n replicas are simultaneously faulty. This means that clients eventually re...

Small Byzantine Quorum Systems

by Jean-Philippe Martin, Lorenzo Alvisi, Michael Dahlin - DISTRIBUTED COMPUTING , 2001
"... In this paper we present two protocols for asynchronous Byzantine Quorum Systems (BQS) built on top of reliable channels---one for self-verifying data and the other for any data. Our protocols tolerate Byzantine failures with fewer servers than existing solutions by eliminating nonessential work in ..."
Abstract - Cited by 468 (49 self) - Add to MetaCart
In this paper we present two protocols for asynchronous Byzantine Quorum Systems (BQS) built on top of reliable channels---one for self-verifying data and the other for any data. Our protocols tolerate Byzantine failures with fewer servers than existing solutions by eliminating nonessential work in the write protocol and by using read and write quorums of different sizes. Since engineering a reliable network layer on an unreliable network is difficult, two other possibilities must be explored. The first is to strengthen the model by allowing synchronous networks that use time-outs to identify failed links or machines. We consider running synchronous and asynchronous Byzantine Quorum protocols over synchronous networks and conclude that, surprisingly, "self-timing" asynchronous Byzantine protocols may offer significant advantages for many synchronous networks when network time-outs are long. We show how to extend an existing Byzantine Quorum protocol to eliminate its dependency on reliable networking and to handle message loss and retransmission explicitly.
(Show Context)

Citation Context

...rage can also be implemented using Rampart [30], a toolkit for distributed applications in Byzantine environments. Rampart does not use quorum systems but instead relies on the state machine approach =-=[35]-=- to implement the abstraction of highly available distributed shared memory. Castro and Liskov [10] also attacked the problem of reliable storage under Byzantine failures. They 18implement a Byzantin...

Practical Byzantine fault tolerance and proactive recovery

by Miguel Castro, Barbara Liskov - ACM Transactions on Computer Systems , 2002
"... Our growing reliance on online services accessible on the Internet demands highly available systems that provide correct service without interruptions. Software bugs, operator mistakes, and malicious attacks are a major cause of service interruptions and they can cause arbitrary behavior, that is, B ..."
Abstract - Cited by 410 (7 self) - Add to MetaCart
Our growing reliance on online services accessible on the Internet demands highly available systems that provide correct service without interruptions. Software bugs, operator mistakes, and malicious attacks are a major cause of service interruptions and they can cause arbitrary behavior, that is, Byzantine faults. This article describes a new replication algorithm, BFT, that can be used to build highly available systems that tolerate Byzantine faults. BFT can be used in practice to implement real services: it performs well, it is safe in asynchronous environments such as the Internet, it incorporates mechanisms to defend against Byzantine-faulty clients, and it recovers replicas proactively. The recovery mechanism allows the algorithm to tolerate any number of faults over the lifetime of the system provided fewer than 1/3 of the replicas become faulty within a small window of vulnerability. BFT has been implemented as a generic program library with a simple interface. We used the library to implement the first Byzantine-fault-tolerant NFS file system, BFS. The BFT library and BFS perform well because the library incorporates several important optimizations, the most important of which is the use of symmetric cryptography to authenticate messages. The performance results show that BFS performs 2 % faster to 24 % slower than production implementations of the NFS protocol that are not replicated. This supports our claim that the

Group Communication Specifications: A Comprehensive Study

by Roman Vitenberg, Idit Keidar, Gregory V. Chockler, Danny Dolev - ACM COMPUTING SURVEYS , 1999
"... View-oriented group communication is an important and widely used building block for many distributed applications. Much current research has been dedicated to specifying the semantics and services of view-oriented Group Communication Systems (GCSs). However, the guarantees of different GCSs are for ..."
Abstract - Cited by 370 (15 self) - Add to MetaCart
View-oriented group communication is an important and widely used building block for many distributed applications. Much current research has been dedicated to specifying the semantics and services of view-oriented Group Communication Systems (GCSs). However, the guarantees of different GCSs are formulated using varying terminologies and modeling techniques, and the specifications vary in their rigor. This makes it difficult to analyze and compare the different systems. This paper provides a comprehensive set of clear and rigorous specifications, which may be combined to represent the guarantees of most existing GCSs. In the light of these specifications, over thirty published GCS specifications are surveyed. Thus, the specifications serve as a unifying framework for the classification, analysis and comparison of group communication systems. The survey also discusses over a dozen different applications of group communication systems, shedding light on the usefulness of the p...

The Time-Triggered Architecture

by Hermann Kopetz, Günther Bauer - PROCEEDINGS OF THE IEEE , 2003
"... The time-triggered architecture (TTA) provides a computing infrastructure for the design and implementation of dependable distributed embedded systems. A large real-time application is decomposed into nearly autonomous clusters and nodes, and a fault-tolerant global time base of known precision is g ..."
Abstract - Cited by 275 (10 self) - Add to MetaCart
The time-triggered architecture (TTA) provides a computing infrastructure for the design and implementation of dependable distributed embedded systems. A large real-time application is decomposed into nearly autonomous clusters and nodes, and a fault-tolerant global time base of known precision is generated at every node. In the TTA, this global time is used to precisely specify the interfaces among the nodes, to simplify the communication and agreement protocols, to perform prompt error detection, and to guarantee the timeliness of real-time applications. The TTA supports a two-phased design methodology, architecture design, and component design. During the architecture design phase, the interactions among the distributed components and the interfaces of the components are fully specified in the value domain and in the temporal domain. In the succeeding component implementation phase, the components are built, taking these interface specifications as constraints. This two-phased design methodology is a prerequisite for the composability of applications implemented in the TTA and for the reuse of prevalidated components within the TTA. This paper presents the architecture model of the TTA, explains the design rationale, discusses the time-triggered communication protocols TTP/C and TTP/A, and illustrates how transparent fault tolerance can be implemented in the TTA.

Building Secure and Reliable Network Applications

by Kenneth Birman , 1996
"... ly, the remote procedure call problem, which an RPC protocol undertakes to solve, consists of emulating LPC using message passing. LPC has a number of "properties" -- a single procedure invocation results in exactly one execution of the procedure body, the result returned is reliably deliv ..."
Abstract - Cited by 230 (16 self) - Add to MetaCart
ly, the remote procedure call problem, which an RPC protocol undertakes to solve, consists of emulating LPC using message passing. LPC has a number of "properties" -- a single procedure invocation results in exactly one execution of the procedure body, the result returned is reliably delivered to the invoker, and exceptions are raised if (and only if) an error occurs. Given a completely reliable communication environment, which never loses, duplicates, or reorders messages, and given client and server processes that never fail, RPC would be trivial to solve. The sender would merely package the invocation into one or more messages, and transmit these to the server. The server would unpack the data into local variables, perform the desired operation, and send back the result (or an indication of any exception that occurred) in a reply message. The challenge, then, is created by failures. Were it not for the possibility of process and machine crashes, an RPC protocol capable of overcomi...

Total order broadcast and multicast algorithms: Taxonomy and survey

by Xavier Défago, André Schiper, Péter Urbán - ACM COMPUTING SURVEYS , 2004
"... ..."
Abstract - Cited by 230 (25 self) - Add to MetaCart
Abstract not found
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University