Results 1 - 10
of
14
Zyzzyva: Speculative byzantine fault tolerance
- In Symposium on Operating Systems Principles (SOSP
, 2007
"... We present Zyzzyva, a protocol that uses speculation to reduce the cost and simplify the design of Byzantine fault tolerant state machine replication. In Zyzzyva, replicas respond to a client’s request without first running an expensive three-phase commit protocol to reach agreement on the order in ..."
Abstract
-
Cited by 78 (10 self)
- Add to MetaCart
We present Zyzzyva, a protocol that uses speculation to reduce the cost and simplify the design of Byzantine fault tolerant state machine replication. In Zyzzyva, replicas respond to a client’s request without first running an expensive three-phase commit protocol to reach agreement on the order in which the request must be processed. Instead, they optimistically adopt the order proposed by the primary and respond immediately to the client. Replicas can thus become temporarily inconsistent with one another, but clients detect inconsistencies, help correct replicas converge on a single total ordering of requests, and only rely on responses that are consistent with this total order. This approach allows Zyzzyva to reduce replication overheads to near their theoretical minima.
On the declarativity of declarative networking
- SIGOPS Oper. Syst. Rev
"... Initiated by the declarative networking project, rule-based declarative programming languages have gained increasing popularity in building complex networked systems across multiple application domains. This paper investigates the declarativity of those systems. First, by analyzing the language sema ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Initiated by the declarative networking project, rule-based declarative programming languages have gained increasing popularity in building complex networked systems across multiple application domains. This paper investigates the declarativity of those systems. First, by analyzing the language semantics, we classify rules into deductive rules and Event-Condition-Action (ECA) rules, and reveal their different levels of declarativities. Then, we use case studies to show that ECA rules that are less declarative are dominantly used in most of the proposed systems. As a result, the benefit of declarative programming is undermined. We identify the key factors that cause the low declarativity effect, and present our ongoing work towards addressing those challenges. 1.
FSR: Formal Analysis and Implementation Toolkit for Safe Inter-domain Routing
"... Abstract—Inter-domain routing stitches the disparate parts of the Internet together, making protocol stability a critical issue to both researchers and practitioners. Yet, researchers create safety proofs and counter-examples by hand, and build simulators and prototypes to explore protocol dynamics. ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Abstract—Inter-domain routing stitches the disparate parts of the Internet together, making protocol stability a critical issue to both researchers and practitioners. Yet, researchers create safety proofs and counter-examples by hand, and build simulators and prototypes to explore protocol dynamics. Similarly, network operators analyze their router configurations manually, or using home-grown tools. In this paper, we present a comprehensive toolkit for analyzing and implementing routing policies, ranging from high-level guidelines to specific router configurations. Our Formally Safe Routing (FSR) toolkit performs all of these functions from the same algebraic representation of routing policy. We show that routing algebra has a natural translation to both integer constraints (to perform safety analysis with SMT solvers) and declarative programs (to generate distributed implementations). Our extensive experiments with realistic topologies and policies show how FSR can detect problems in an AS’s iBGP configuration, prove sufficient conditions for BGP safety, and empirically evaluate convergence time. I.
Zz and the art of practical bft
, 2009
"... The high replication cost of Byzantine fault-tolerance (BFT) methods has been a major barrier to their widespread adoption in commercial distributed applications. We present ZZ, a new approach that reduces the replication cost of BFT services from 2f +1 to practically f+1. The key insight in ZZ is t ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
The high replication cost of Byzantine fault-tolerance (BFT) methods has been a major barrier to their widespread adoption in commercial distributed applications. We present ZZ, a new approach that reduces the replication cost of BFT services from 2f +1 to practically f+1. The key insight in ZZ is to use f+1 execution replicas in the normal case and to activate additional replicas only upon failures. In shared hosting data centers where multiple applications share a physical server, ZZ reduces the aggregate number of execution replicas running in the data center, thereby improving throughput and response times. ZZ relies on virtualization—a technology already employed in modern data centers—for fast replica activation upon failures, and enables newly activated replicas to immediately begin processing requests by fetching state on-demand. A prototype implementation of ZZ using the BASE library and Xen shows that, when compared to a system with 2f + 1 replicas, our approach yields lower response times and up to 33 % higher throughput in a prototype data center with four BFT web applications. We also show that ZZ can handle simultaneous failures and achieve sub-second recovery. 1
ZZ and the Art of Practical BFT Execution
"... The high replication cost of Byzantine fault-tolerance (BFT) methods has been a major barrier to their widespread adoption in commercial distributed applications. We present ZZ, a new approach that reduces the replication cost of BFT services from 2f + 1 to practically f + 1. The key insight in ZZ i ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
The high replication cost of Byzantine fault-tolerance (BFT) methods has been a major barrier to their widespread adoption in commercial distributed applications. We present ZZ, a new approach that reduces the replication cost of BFT services from 2f + 1 to practically f + 1. The key insight in ZZ is to use f + 1 execution replicas in the normal case and to activate additional replicas only upon failures. In data centers where multiple applications share a physical server, ZZ reduces the aggregate number of execution replicas running in the data center, improving throughput and response times. ZZ relies on virtualization—a technology already employed in modern data centers—for fast replica activation upon failures, and enables newly activated replicas to immediately begin processing requests by fetching state on-demand. A prototype implementation of ZZ using the BASE library and Xen shows that, when compared to a system with 2f + 1 replicas, our approach yields lower response times and up to 33 % higher throughput in a prototype data center with four BFT web applications. We also show that ZZ can handle simultaneous failures and achieve sub-second recovery.
Efficient Byzantine Fault Tolerance for Scalable Storage and Services
, 2009
"... Distributed systems experience and should tolerate faults beyond simple component crashes as such systems grow in size and importance. Unfortunately, tolerating arbitrary faults, also known as Byzantine faults, poses several challenges to system designers, often limiting performance, requiring addit ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Distributed systems experience and should tolerate faults beyond simple component crashes as such systems grow in size and importance. Unfortunately, tolerating arbitrary faults, also known as Byzantine faults, poses several challenges to system designers, often limiting performance, requiring additional hardware, or both. This dissertation presents new protocols that provide substantially better performance than previously demonstrated. The Byzantine fault-tolerant erasure-coded block storage protocol proposed in this thesis provides 40 % higher write throughput than the best prior approach. The Byzantine fault-tolerant replicated state machine provides a factor of 2.2–2.9 times
Prime: Byzantine Replication Under Attack
"... Abstract—Existing Byzantine-resilient replication protocols satisfy two standard correctness criteria, safety and liveness, even in the presence of Byzantine faults. The runtime performance of these protocols is most commonly assessed in the absence of processor faults and is usually good in that ca ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract—Existing Byzantine-resilient replication protocols satisfy two standard correctness criteria, safety and liveness, even in the presence of Byzantine faults. The runtime performance of these protocols is most commonly assessed in the absence of processor faults and is usually good in that case. However, in some protocols faulty processors can significantly degrade performance, limiting the practical utility of these protocols in adversarial environments. This paper demonstrates the extent of performance degradation possible in some existing protocols that do satisfy liveness and that do perform well absent Byzantine faults. We propose a new performanceoriented correctness criterion that requires a consistent level of performance, even when the system exhibits Byzantine faults. We present a new Byzantine fault-tolerant replication protocol that meets the new correctness criterion and evaluate its performance in fault-free executions and when under attack.
Improving Server Applications with System Transactions
"... Server applications must process requests as quickly as possible. Because some requests depend on earlier requests, there is often a tension between increasing throughput and maintaining the proper semantics for dependent requests. Operating system transactions make it easier to write reliable, high ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Server applications must process requests as quickly as possible. Because some requests depend on earlier requests, there is often a tension between increasing throughput and maintaining the proper semantics for dependent requests. Operating system transactions make it easier to write reliable, high-throughput server applications because they allow the application to execute non-interfering requests in parallel, even if the requests operate on OS state, such as file data. By changing less than 200 lines of application code, we improve performance of a replicated Byzantine Fault Tolerant (BFT) system by up to 88 % using server-side speculation, and we improve concurrent performance up to 80 % for an IMAP email server by changing only 40 lines. Achieving these results requires substantial enhancements to system transactions, including the ability to pause and resume transactions, and an API to commit transactions in a pre-defined order.
An Attack-Resilient Architecture for Large-Scale Intrusion-Tolerant Replication ∗
, 2009
"... This paper presents the first architecture for large-scale, wide-area intrusion-tolerant state machine replication that is specifically designed to perform well even when some of the servers are Byzantine. The architecture is hierarchical and runs attack-resilient state machine replication protocols ..."
Abstract
- Add to MetaCart
This paper presents the first architecture for large-scale, wide-area intrusion-tolerant state machine replication that is specifically designed to perform well even when some of the servers are Byzantine. The architecture is hierarchical and runs attack-resilient state machine replication protocols within and among the wide-area sites. Given the constraints of the wide-area environment, we explore the challenges and tradeoffs of building inter-site communication protocols that use widearea bandwidth efficiently yet can resist attempts to degrade performance. The paper provides evidence that the optional use of simple dependable components, whose compromise or malfunction cannot cause inconsistency in the replicated service, can significantly improve performance when the system is under attack. 1
The Next 700 BFT Protocols Rachid Guerraoui,
"... Modern Byzantine fault-tolerant state machine replication (BFT) protocols involve about 20.000 lines of challenging C++ code encompassing synchronization, networking and cryptography. They are notoriously difficult to develop, test and prove. We present a new abstraction to simplify these tasks. We ..."
Abstract
- Add to MetaCart
Modern Byzantine fault-tolerant state machine replication (BFT) protocols involve about 20.000 lines of challenging C++ code encompassing synchronization, networking and cryptography. They are notoriously difficult to develop, test and prove. We present a new abstraction to simplify these tasks. We treat a BFT protocol as a composition of instances of our abstraction. Each instance is developed and analyzed independently. To illustrate our approach, we first show how, with our abstraction, the benefits of a BFT protocol like Zyzzyva could have been obtained with much less pain. Namely, we develop AZyzzyva, a new protocol that mimics the behavior of Zyzzyva in best-case situations (for which Zyzzyva was optimized) using less than 24 % of the actual code of Zyzzyva. To cover worst-case situations, our abstraction enables to compose AZyzzyva with any existing BFT protocol, typically, a classical one like PBFT which has been proved correct and widely tested. We then present Aliph, a new BFT protocol that outperforms previous BFT protocols both in terms of latency (by up to 30%) and throughput (by up to 360%). Development of Aliph required two new instances of our abstraction. Each instance contains less than 25 % of the code needed to develop state-of-the-art BFT protocols. 1.

