Results 1 - 10
of
975
Unreliable Failure Detectors for Reliable Distributed Systems
- Journal of the ACM
, 1996
"... We introduce the concept of unreliable failure detectors and study how they can be used to solve Consensus in asynchronous systems with crash failures. We characterise unreliable failure detectors in terms of two properties — completeness and accuracy. We show that Consensus can be solved even with ..."
Abstract
-
Cited by 1094 (19 self)
- Add to MetaCart
(Show Context)
We introduce the concept of unreliable failure detectors and study how they can be used to solve Consensus in asynchronous systems with crash failures. We characterise unreliable failure detectors in terms of two properties — completeness and accuracy. We show that Consensus can be solved even with unreliable failure detectors that make an infinite number of mistakes, and determine which ones can be used to solve Consensus despite any number of crashes, and which ones require a majority of correct processes. We prove that Consensus and Atomic Broadcast are reducible to each other in asynchronous systems with crash failures; thus the above results also apply to Atomic Broadcast. A companion paper shows that one of the failure detectors introduced here is the weakest failure detector for solving Consensus [Chandra et al. 1992].
Securing ad hoc networks
"... Ad hoc networks are a new wireless networking paradigm for mobile hosts. Unlike traditional mobile wireless networks, ad hoc networks do not rely on any fixed infrastructure. Instead, hosts rely on each other to keep the network connected. The military tactical and other security-sensitive operation ..."
Abstract
-
Cited by 1064 (15 self)
- Add to MetaCart
(Show Context)
Ad hoc networks are a new wireless networking paradigm for mobile hosts. Unlike traditional mobile wireless networks, ad hoc networks do not rely on any fixed infrastructure. Instead, hosts rely on each other to keep the network connected. The military tactical and other security-sensitive operations are still the main applications of ad hoc networks, although there is a trend to adopt ad hoc networks for commercial uses due to their unique properties. One main challenge in design of these networks is their vulnerability to security attacks. In this paper, we study the threats an ad hoc network faces and the security goals to be achieved. We identify the new challenges and opportunities posed by this new networking environment and explore new approaches to secure its communication. In particular, we take advantage of the inherent redundancy in ad hoc networks — multiple routes between nodes — to defend routing against denial of service attacks. We also use replication and new cryptographic schemes, such as threshold cryptography, to build a highly secure and highly available key management service, which forms the core of our security framework.
Practical Byzantine fault tolerance
, 1999
"... This paper describes a new replication algorithm that is able to tolerate Byzantine faults. We believe that Byzantinefault-tolerant algorithms will be increasingly important in the future because malicious attacks and software errors are increasingly common and can cause faulty nodes to exhibit arbi ..."
Abstract
-
Cited by 673 (15 self)
- Add to MetaCart
(Show Context)
This paper describes a new replication algorithm that is able to tolerate Byzantine faults. We believe that Byzantinefault-tolerant algorithms will be increasingly important in the future because malicious attacks and software errors are increasingly common and can cause faulty nodes to exhibit arbitrary behavior. Whereas previous algorithms assumed a synchronous system or were too slow to be used in practice, the algorithm described in this paper is practical: it works in asynchronous environments like the Internet and incorporates several important optimizations that improve the response time of previous algorithms by more than an order of magnitude. We implemented a Byzantine-fault-tolerant NFS service using our algorithm and measured its performance. The results show that our service is only 3 % slower than a standard unreplicated NFS.
Small Byzantine Quorum Systems
- DISTRIBUTED COMPUTING
, 2001
"... In this paper we present two protocols for asynchronous Byzantine Quorum Systems (BQS) built on top of reliable channels---one for self-verifying data and the other for any data. Our protocols tolerate Byzantine failures with fewer servers than existing solutions by eliminating nonessential work in ..."
Abstract
-
Cited by 468 (49 self)
- Add to MetaCart
(Show Context)
In this paper we present two protocols for asynchronous Byzantine Quorum Systems (BQS) built on top of reliable channels---one for self-verifying data and the other for any data. Our protocols tolerate Byzantine failures with fewer servers than existing solutions by eliminating nonessential work in the write protocol and by using read and write quorums of different sizes. Since engineering a reliable network layer on an unreliable network is difficult, two other possibilities must be explored. The first is to strengthen the model by allowing synchronous networks that use time-outs to identify failed links or machines. We consider running synchronous and asynchronous Byzantine Quorum protocols over synchronous networks and conclude that, surprisingly, "self-timing" asynchronous Byzantine protocols may offer significant advantages for many synchronous networks when network time-outs are long. We show how to extend an existing Byzantine Quorum protocol to eliminate its dependency on reliable networking and to handle message loss and retransmission explicitly.
Practical Byzantine fault tolerance and proactive recovery
- ACM Transactions on Computer Systems
, 2002
"... Our growing reliance on online services accessible on the Internet demands highly available systems that provide correct service without interruptions. Software bugs, operator mistakes, and malicious attacks are a major cause of service interruptions and they can cause arbitrary behavior, that is, B ..."
Abstract
-
Cited by 410 (7 self)
- Add to MetaCart
Our growing reliance on online services accessible on the Internet demands highly available systems that provide correct service without interruptions. Software bugs, operator mistakes, and malicious attacks are a major cause of service interruptions and they can cause arbitrary behavior, that is, Byzantine faults. This article describes a new replication algorithm, BFT, that can be used to build highly available systems that tolerate Byzantine faults. BFT can be used in practice to implement real services: it performs well, it is safe in asynchronous environments such as the Internet, it incorporates mechanisms to defend against Byzantine-faulty clients, and it recovers replicas proactively. The recovery mechanism allows the algorithm to tolerate any number of faults over the lifetime of the system provided fewer than 1/3 of the replicas become faulty within a small window of vulnerability. BFT has been implemented as a generic program library with a simple interface. We used the library to implement the first Byzantine-fault-tolerant NFS file system, BFS. The BFT library and BFS perform well because the library incorporates several important optimizations, the most important of which is the use of symmetric cryptography to authenticate messages. The performance results show that BFS performs 2 % faster to 24 % slower than production implementations of the NFS protocol that are not replicated. This supports our claim that the
Group Communication Specifications: A Comprehensive Study
- ACM COMPUTING SURVEYS
, 1999
"... View-oriented group communication is an important and widely used building block for many distributed applications. Much current research has been dedicated to specifying the semantics and services of view-oriented Group Communication Systems (GCSs). However, the guarantees of different GCSs are for ..."
Abstract
-
Cited by 370 (15 self)
- Add to MetaCart
View-oriented group communication is an important and widely used building block for many distributed applications. Much current research has been dedicated to specifying the semantics and services of view-oriented Group Communication Systems (GCSs). However, the guarantees of different GCSs are formulated using varying terminologies and modeling techniques, and the specifications vary in their rigor. This makes it difficult to analyze and compare the different systems. This paper provides a comprehensive set of clear and rigorous specifications, which may be combined to represent the guarantees of most existing GCSs. In the light of these specifications, over thirty published GCS specifications are surveyed. Thus, the specifications serve as a unifying framework for the classification, analysis and comparison of group communication systems. The survey also discusses over a dozen different applications of group communication systems, shedding light on the usefulness of the p...
The Time-Triggered Architecture
- PROCEEDINGS OF THE IEEE
, 2003
"... The time-triggered architecture (TTA) provides a computing infrastructure for the design and implementation of dependable distributed embedded systems. A large real-time application is decomposed into nearly autonomous clusters and nodes, and a fault-tolerant global time base of known precision is g ..."
Abstract
-
Cited by 275 (10 self)
- Add to MetaCart
The time-triggered architecture (TTA) provides a computing infrastructure for the design and implementation of dependable distributed embedded systems. A large real-time application is decomposed into nearly autonomous clusters and nodes, and a fault-tolerant global time base of known precision is generated at every node. In the TTA, this global time is used to precisely specify the interfaces among the nodes, to simplify the communication and agreement protocols, to perform prompt error detection, and to guarantee the timeliness of real-time applications. The TTA supports a two-phased design methodology, architecture design, and component design. During the architecture design phase, the interactions among the distributed components and the interfaces of the components are fully specified in the value domain and in the temporal domain. In the succeeding component implementation phase, the components are built, taking these interface specifications as constraints. This two-phased design methodology is a prerequisite for the composability of applications implemented in the TTA and for the reuse of prevalidated components within the TTA. This paper presents the architecture model of the TTA, explains the design rationale, discusses the time-triggered communication protocols TTP/C and TTP/A, and illustrates how transparent fault tolerance can be implemented in the TTA.
Building Secure and Reliable Network Applications
, 1996
"... ly, the remote procedure call problem, which an RPC protocol undertakes to solve, consists of emulating LPC using message passing. LPC has a number of "properties" -- a single procedure invocation results in exactly one execution of the procedure body, the result returned is reliably deliv ..."
Abstract
-
Cited by 230 (16 self)
- Add to MetaCart
ly, the remote procedure call problem, which an RPC protocol undertakes to solve, consists of emulating LPC using message passing. LPC has a number of "properties" -- a single procedure invocation results in exactly one execution of the procedure body, the result returned is reliably delivered to the invoker, and exceptions are raised if (and only if) an error occurs. Given a completely reliable communication environment, which never loses, duplicates, or reorders messages, and given client and server processes that never fail, RPC would be trivial to solve. The sender would merely package the invocation into one or more messages, and transmit these to the server. The server would unpack the data into local variables, perform the desired operation, and send back the result (or an indication of any exception that occurred) in a reply message. The challenge, then, is created by failures. Were it not for the possibility of process and machine crashes, an RPC protocol capable of overcomi...
Total order broadcast and multicast algorithms: Taxonomy and survey
- ACM COMPUTING SURVEYS
, 2004
"... ..."