MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  The cost of recovery in message logging protocols (1998) [13 citations — 3 self]

Download:
Download as a PDF | Download as a PS
by Sriram Rao, Lorenzo Alvisi, Harrick M. Vin
Proceedings of the 17th Symposium on Reliable Distributed Systems
http://www.cs.utexas.edu/ftp/pub/techreports/tr98-02.ps.Z
Add To MetaCart

Abstract:

Message logging is a popular technique for building low-overhead protocols that tolerate process crash failures. Past research in message logging has focused on studying the relative overhead imposed by pessimistic, optimistic, and causal protocols during failure-free executions. In this paper, we give the first experimental evaluation of the performance of these protocols during recovery. We discover that, if a single failure is to be tolerated, pessimistic and causal protocols perform best, because they avoid rollbacks of correct processes. For multiple failures, however, the dominant factor in determining performance becomes where the recovery information is logged (i.e. at the sender, at the receiver, or replicated at a subset of the processes in the system) rather than when this information is logged (i.e. if logging is synchronous or asynchronous). From our results, we distil a few lessons that can guide the design of message-logging protocols that combine low-overhead during failure-free executions with fast recovery.

Citations

592 the ordering of events in a distributed system – Time - 1978
572 Implementing fault-tolerant services using the state machine approach: A tutorial – Schneider - 1990
501 Virtual time and global states of distributed systems – Mattern - 1989
329 A Survey of Rollback-Recovery Protocols in Message-Passing Systems – Elnozahy, Alvisi, et al. - 1999
253 Optimistic recovery in distributed systems – Strom, Yemini - 1985
209 Libckpt: Transparent checkpointing under Unix – Plank, Beck, et al. - 1995
194 Recovery in distributed systems using optimistic message logging and checkpointing – Johnson, Zwaenepoel - 1990
170 The performance of consistent checkpointing – Elnozahy, Johnson, et al. - 1992
162 Manetho: Transparent rollback-recovery with low overhead, limited rollback, and fast output commit – Elnozahy, Zwaenepoel - 1992
118 SenderBased Message Logging – Johnson, Zwaenepoel - 1987
112 A Message System Supporting Fault Tolerance – Borg, Baumbach, et al. - 1983
110 Monitors, Message, and Clusters: The p4 Parallel Programming System – Butler, Lusk - 1994
93 Efficient distributed recovery using message logging – Sistla, Welch - 1989
93 The Rio File Cache: Surviving Operating System Crashes – Chen, Ng, et al. - 1996
92 PUBLISHING: A Reliable Broadcast Communication Mechanism – Powell, Presotto - 1983
78 Message logging: pessimistic, optimistic, causal and optimal – Alvisi, Marzullo - 1998
68 Checkpointing and its applications – Wang, Huang, et al. - 1995
67 Volatile logging in n-fault-tolerant distributed systems – Strom, Bacon, et al. - 1988
61 Nonblocking and Orphan-Free Message Logging Protocols – Alvisi, Hoppe, et al. - 1994
55 On the Use and Implementation of Message Logging – Elnozahy, Zwaenepoel - 1994
49 The recovery box: Using fast recovery to provide high availability in the UNIX environment – Baker, Sullivan - 1992
43 Crash recovery with little overhead – Juang, Venkatesan - 1991
43 Efficient transparent optimistic rollback recovery for distributed application programs – Johnson - 1993
42 Distributed System Fault Tolerance Using Message Logging and Checkpointing – Johnson - 1989
36 How to recover efficiently and asynchronously when optimism fails – Damani, Garg - 1996
27 Egida: An extensible toolkit for low-overhead fault-tolerance – Rao, Alvisi, et al. - 1999
23 Manetho: Fault Tolerance in Distributed Systems Using Rollback-Recovery and Process Replication – Elnozahy - 1993
21 MPI: The Complete Reference. Scientific and Engineering Computation Series – Snir, Otto, et al. - 1996
17 Trade-offs in implementing optimal message logging protocols – Alvisi, Marzullo - 1996
16 On the relevance of communication costs of rollback-recovery protocols – Elnozahy - 1995
14 Efficient Algorithms for Optimistic Crash Recovery – Venkatesan, Juang - 1994
9 Message Logging – Alvisi, Marzullo - 1998
7 A non-blocking recovery algorithm for causal message logging – Mitchell, Garg - 1998
5 the Ordering of Events in a Distributed System,º – Lamport, ªTime - 1978
2 ªOptimistic Recovery in Distributed Systems,º Proc – Strom, Yemeni - 1985
1 ªTradeoffs in Implementing Optimal Message Logging – Alvisi, Marzullo - 1996
1 ªMessage Logging – Alvisi, Marzullo - 1998
1 ªA Message System Supporting Fault Tolerance,º – Borg, Baumbach, et al. - 1983
1 ªSender-Based Message Logging,º Digest of Papers: 17th Ann. Int'l Symp. Fault-Tolerant Computing – Johnson, Zwaenepoel - 1987
1 ªCrash Recovery with Little Overhead,º – Juang, Venkatesan - 1987
1 ªScientific and Engineering Computation Series,º MPI: The Complete Reference – Snir, Otto, et al. - 1996
1 ªVolatile Logging in nFault-Tolerant Distributed Systems,º Proc. Third Ann. Int'l Symp. Fault-Tolerant Computing – Strom, Bacon, et al. - 1988