Results 1 - 10
of
60
Fault-scalable Byzantine fault-tolerant services
- In Proceedings of the 20th ACM Symposium on Operating Systems Principles
, 2005
"... A fault-scalable service can be configured to tolerate increasing numbers of faults without significant decreases in performance. The Query/Update (Q/U) protocol is a new tool that enables construction of fault-scalable Byzantine faulttolerant services. The optimistic quorum-based nature of the Q/U ..."
Abstract
-
Cited by 92 (6 self)
- Add to MetaCart
A fault-scalable service can be configured to tolerate increasing numbers of faults without significant decreases in performance. The Query/Update (Q/U) protocol is a new tool that enables construction of fault-scalable Byzantine faulttolerant services. The optimistic quorum-based nature of the Q/U protocol allows it to provide better throughput and fault-scalability than replicated state machines using agreement-based protocols. A prototype service built using the Q/U protocol outperforms the same service built using a popular replicated state machine implementation at all system sizes in experiments that permit an optimistic execution. Moreover, the performance of the Q/U protocol decreases by only 36 % as the number of Byzantine faults tolerated increases from one to five, whereas the performance of the replicated state machine decreases by 83%.
A Comparison of Bus Architectures for Safety-Critical Embedded Systems
, 2001
"... Abstract. Embedded systems for safety-critical applications often integrate multiple “functions ” and must generally be fault-tolerant. These requirements lead to a need for mechanisms and services that provide protection against fault propagation and ease the construction of distributed fault-toler ..."
Abstract
-
Cited by 78 (4 self)
- Add to MetaCart
Abstract. Embedded systems for safety-critical applications often integrate multiple “functions ” and must generally be fault-tolerant. These requirements lead to a need for mechanisms and services that provide protection against fault propagation and ease the construction of distributed fault-tolerant applications. A number of bus architectures have been developed to satisfy this need. This paper reviews the requirements on these architectures, the mechanisms employed, and the services provided. Four representative architectures (SAFEbus TM, SPIDER, TTA, and FlexRay) are briefly described. 1
Efficient Byzantine-Tolerant Erasure-Coded Storage
- PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, JUNE 2004
, 2004
"... This paper describes a decentralized consistency protocol for survivable storage that exploits local data versioning within each storage-node. Such versioning enables the protocol to efficiently provide linearizability and wait-freedom of read and write operations to erasure-coded data in asynchrono ..."
Abstract
-
Cited by 73 (12 self)
- Add to MetaCart
This paper describes a decentralized consistency protocol for survivable storage that exploits local data versioning within each storage-node. Such versioning enables the protocol to efficiently provide linearizability and wait-freedom of read and write operations to erasure-coded data in asynchronous environments with Byzantine failures of clients and servers. By exploiting versioning storage-nodes, the protocol shifts most work to clients and allows highly optimistic operation: reads occur in a single round-trip unless clients observe concurrency or write failures. Measurements of a storage system prototype using this protocol show that it scales well with the number of failures tolerated, and its performance compares favorably with an efficient implementation of Byzantine-tolerant state machine replication.
Ursa Minor: versatile cluster-based storage
, 2005
"... No single encoding scheme or fault model is optimal for all data. A versatile storage system allows them to be matched to access patterns, reliability requirements, and cost goals on a per-data item basis. Ursa Minor is a cluster-based storage system that allows data-specific selection of, and on-li ..."
Abstract
-
Cited by 56 (30 self)
- Add to MetaCart
No single encoding scheme or fault model is optimal for all data. A versatile storage system allows them to be matched to access patterns, reliability requirements, and cost goals on a per-data item basis. Ursa Minor is a cluster-based storage system that allows data-specific selection of, and on-line changes to, encoding schemes and fault models. Thus, different data types can share a scalable storage infrastructure and still enjoy specialized choices, rather than suffering from "one size fits all." Experiments with Ursa Minor show performance benefits of 2--3 when using specialized choices as opposed to a single, more general, configuration. Experiments also show that a single cluster supporting multiple workloads simultaneously is much more efficient when the choices are specialized for each distribution rather than forced to use a "one size fits all" configuration. When using the specialized distributions, aggregate cluster throughput nearly doubled.
Byzantine Agreement with Authentication: Observations and Applications in Tolerating Hybrid and Link Faults
- IN DEPENDABLE COMPUTING FOR CRITICAL APPLICATIONS---5
, 1995
"... We show that the assumptions required of the authentication mechanism in Byzantine agreement protocols that use "signed messages" are stronger than generally realized, and require more than simple digital signatures. The protocols may fail if these assumptions are violated. We then present new proto ..."
Abstract
-
Cited by 47 (7 self)
- Add to MetaCart
We show that the assumptions required of the authentication mechanism in Byzantine agreement protocols that use "signed messages" are stronger than generally realized, and require more than simple digital signatures. The protocols may fail if these assumptions are violated. We then present new protocols for Byzantine agreement that add authentication to "oral message" protocols so that additional resilience is obtained with authentication, but with no assumptions required about the security of authentication when the number and kind of faults present are within the resilience of the unauthenticated protocol. Our analysis is performed under a "hybrid" fault model that admits manifest (e.g., crash) and symmetric faults as well as arbitrary (i.e., Byzantine) faults. We also extend the classical signed messages protocol to this fault model, and show that its fault tolerance is matched by one of our new protocols. We then explore the behavior of these various protocols under the combinatio...
Attested append-only memory: Making adversaries stick to their word
- In Proc. of SOSP
, 2007
"... Researchers have made great strides in improving the fault tolerance of both centralized and replicated systems against arbitrary (Byzantine) faults. However, there are hard limits to how much can be done with entirely untrusted components; for example, replicated state machines cannot tolerate more ..."
Abstract
-
Cited by 45 (7 self)
- Add to MetaCart
Researchers have made great strides in improving the fault tolerance of both centralized and replicated systems against arbitrary (Byzantine) faults. However, there are hard limits to how much can be done with entirely untrusted components; for example, replicated state machines cannot tolerate more than a third of their replica population being Byzantine. In this paper, we investigate how minimal trusted abstractions can push through these hard limits in practical ways. We propose Attested Append-Only Memory (A2M), a trusted system facility that is small, easy to implement and easy to verify formally. A2M provides the programming abstraction of a trusted log, which leads to protocol designs immune to equivocation – the ability of a faulty host to lie in different ways to different clients or servers – which is a common source of Byzantine headaches. Using A2M, we improve upon the state of the art in Byzantine-fault tolerant replicated state machines, producing A2M-enabled protocols (variants of Castro and Liskov’s PBFT) that remain correct (linearizable) and keep making progress (live) even when half the replicas are faulty, in contrast to the previous upper bound. We also present an A2M-enabled single-server shared storage protocol that guarantees linearizability despite server faults. We implement A2M and our protocols, evaluate them experimentally through micro- and macro-benchmarks, and argue that the improved fault tolerance is cost-effective for a broad range of uses, opening up new avenues for practical, more reliable services.
A formally verified algorithm for interactive consistency under a hybrid fault model
- IN FAULT TOLERANT COMPUTING SYMPOSIUM 23
, 1993
"... ..."
Formally Verified On-Line Diagnosis
- IEEE Transactions on Software Engineering
, 1997
"... A reconfigurable fault tolerant system achieves the attributes of dependability of operations through fault detection, fault isolation and reconfiguration, typically referred to as the FDIR paradigm. Fault diagnosis is a key component of this approach, requiring an accurate determination of the h ..."
Abstract
-
Cited by 31 (9 self)
- Add to MetaCart
A reconfigurable fault tolerant system achieves the attributes of dependability of operations through fault detection, fault isolation and reconfiguration, typically referred to as the FDIR paradigm. Fault diagnosis is a key component of this approach, requiring an accurate determination of the health and state of the system. An imprecise state assessment can lead to catastrophic failure due to an optimistic diagnosis, or conversely, result in underutilization of resources because of a pessimistic diagnosis. Differing from classical testing and other off-line diagnostic approaches, we develop procedures for maximal utilization of the system state information to provide for continual, on-line diagnosis and reconfiguration capabilities as an integral part of the system operations. Our diagnosis approach, unlike existing techniques, does not require administered testing to gather syndrome information but is based on monitoring the system message traffic among redundant system fu...
An Overview of Formal Verification for the Time-Triggered Architecture
, 2002
"... We describe formal verification of some of the key algorithms in the Time-Triggered Architecture (TTA) for real-time safety-critical control applications. ..."
Abstract
-
Cited by 22 (3 self)
- Add to MetaCart
We describe formal verification of some of the key algorithms in the Time-Triggered Architecture (TTA) for real-time safety-critical control applications.
GUARDS: A Generic Upgradable Architecture for Real-Time Dependable Systems
- IEEE Transactions on Parallel and Distributed Systems
, 1999
"... Personal use of the material in this paper is permitted. However, permission to reprint or republish this material for advertising or promotional purposes or for creating new works for resale or redistribution, or to reuse any copyrighted component of this work in other works must be obtained from t ..."
Abstract
-
Cited by 21 (5 self)
- Add to MetaCart
Personal use of the material in this paper is permitted. However, permission to reprint or republish this material for advertising or promotional purposes or for creating new works for resale or redistribution, or to reuse any copyrighted component of this work in other works must be obtained from the authors of this paper. This paper has appeared in special issue on Real-Time Systems.

