MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Combining Abstraction with Byzantine Fault-Tolerance

Download:
pdf | ps
by Barbara Liskov, Rodrigo Seromenho, Rodrigo Seromenho, Miragaia Rodrigues, Miragaia Rodrigues
http://pmg.csail.mit.edu/~rodrigo/thesis.ps.gz
Add To MetaCart

Abstract:

This thesis describes a technique to build replicated services that combines Byzantine fault tolerance with work on abstract data types. Tolerating Byzantine faults is important because software errors are a major cause of outages and they can make faulty replicas behave arbitrarily. Abstraction hides implementation details to enable the reuse of existing service implementations and to improve the ability to mask software errors. We improve resilience to software errors by enabling the recovery of faulty replicas using state stored in replicas with distinct implementations; using an opportunistic N-version programming technique that runs distinct, off-the-shelf implementations at each replica to reduce the probability of common mode failures; and periodically repairing each replica using an abstract view of the state stored by the correct replicas in the group, which improves tolerance to faults due to software aging. We have built two replicated services that demonstrate the use of this technique. The first is

Citations

1747 Time, clocks and the ordering of events in a distributed system – Lamport - 1978
1139 Transaction Processing: Concepts and Techniques – Gray, Reuter - 1993
1027 Distributed Algorithm – Lynch - 1996
703 Scale and performance in a distributed file system – Howard, Kazar, et al. - 1988
573 Implementing fault-tolerant services using the state machine approach: A tutorial – Schneider - 1990
367 Reaching agreement in the presence of faults – Pease, Shostak, et al. - 1980
353 Practical byzantine fault tolerance – Castro, Liskov - 1999
294 Why aren’t operating systems getting faster as fast as hardware – Ousterhout - 1990
260 Notes on database operating systems – Gray - 1978
248 The OO7 benchmark – Carey, DeWitt, et al. - 1993
135 Replication in the Harp file system – Liskov, Ghemawat, et al. - 1991
120 The SecureRing protocols for securing group communication – Kihlstrom, Moser, et al. - 1998
103 Proactive Recovery in a Byzantine-Fault-Tolerant System – Castro, Liskov - 2000
99 Efficient optimistic concurrency control using loosely synchronized clocks – Adya, Gruber, et al.
90 Software rejuvenation: Analysis, module and applications – Huang, Kintala, et al. - 1995
85 N-version programming: A fault-tolerance approach to reliability of software operation – Chen, Avizienis - 1978
84 Axioms for Concurrent Objects – Herlihy, Wing - 1987
81 Fine-Grained Sharing in a Page Server OODBMS – Cary, Franklin, et al. - 1994
81 The MD5 message-digest algorithm. Internet RFC-1321. Available at ftp ://ftp.isi.edu/in-notes/rfc 1321 .txt – Rivest - 1992
80 UMAC: Fast and secure message authentication – Black, Halevi, et al. - 1999
74 A secure group membership protocol – Reiter - 1996
63 Reliable Computer Systems – Siewiorek, Swarz - 1992
62 Distributed object management in Thor – Liskov, Day, et al. - 1993
54 and efficient sharing of persistent objects in Thor – LISKOV, ADYA, et al. - 1996
52 A new paradigm for collision-free hashing: Incrementality at reduced cost – Bellare, Micciancio - 1997
50 Using abstraction to improve fault tolerance – BASE - 2001
46 Fault-tolerant distributed garbage collection in a client-server object-oriented database – Maheshwari, Liskov - 1994
42 HAC: Hybrid Adaptive Caching for Distributed Storage Systems – Castro, Adya, et al. - 1997
37 Observations on Optimistic Concurrency Control Schemes – Haerder - 1984
37 Providing persistent objects in distributed systems – Liskov, Castro, et al. - 1999
34 Efficient commit protocols for the tree of processes model of distributed transactions – Mohan, Lindsay - 1983
33 Collecting Cyclic Distributed Garbage by Controlled Migration – Maheshwari, Liskov - 1995
32 The modified object buffer: a storage management technique for object-oriented databases – Ghemawat - 1995
32 Network Time Protocol (Version 1) Specification and Implementation. DARPA-Internet Report RFC 1059 – Mills - 1988
27 Using abstraction to improve fault tolerance – RODRIGUES, CASTRO, et al. - 2001
24 Inside ODBC – Geiger - 1995
23 Minimizing Completion Time of a Program by Checkpointing and Rejuvenation – Garg, Huang, et al. - 1996
20 A Correctness Proof for a Practical Byzantine-Fault-Tolerant Replication Algorithm – Castro, Liskov - 1999
19 Community error recovery in N-version software: A design study with experimentation – Tso, Avizienis - 1987
18 NFS Illustrated – Callaghan - 2000
14 et al. Design and Implementation of the Sun Network Filesystem – Sandberg - 1985
12 Transaction Management for Mobile Objects Using Optimistic Concurrency Control – Adya - 1994
6 Faulty version recovery in object-oriented N-version programming – Romanovsky - 2000
5 A scalable byzantine fault tolerant secure domain name system – Ahmed - 2001
4 Partitioned Collection of a Large Object Store – Maheshwari, Liskov - 1997
4 Collecting Cyclic Distributed Garbage using Back Tracing – Maheswari, Liskov - 1997
2 std 1003.1-1990, information technology Portable Operating System Interface (POSIX) part 1: System application program interface (API) [C language – IEEE - 1990
2 A Liveness Proof for a Practical Byzantine FaultTolerant Replication Algorithm – Rodrigues, Jamieson, et al.