156 citations found. Retrieving documents...
D. B. Johnson and W. Zwaenepoel. Recovery in distributed systems using optimistic message logging and checkpointing. Journal of Algorithms, 11(3):462--491, 1990.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Duplex: A Reusable Fault Tolerance Extension.. - Sharma, Chen, Li.. (2003)   (3 citations)  (Correct)

....nondeterministic events, such as changes made to the device configuration. In the event of a failure, the logged events are replayed in their original order to recreate device s prefailure state. Duplex s logging mechanism achieves the best of both pessimistic logging [2, 4] or optimistic logging [11]. Pessimistic logging requires the service to block till the log is written to stable storage. Optimistic logging allows the log message to be flushed asynchronously to stable storage while the process goes ahead with other operations. The concept of recursive restartability has recently gained ....

D.B. Johnson and W. Zwaenepoel. Recovery in distributed systems using optimistic message logging and checkpointing. Journal of Algorithms, 11(3):462--491, Sep. 1990.


Detection of Global Predicates: Techniques and their Limitations - Chase, Garg (1998)   (22 citations)  (Correct)

....state for which linear is true. By considering the dual predicate of linearity, we get a necessary and sufficient condition for a given set of global states to be a lattice. This generalizes many earlier results. For example, the fact that the set of all recoverable states form a lattice [18] is an easy consequence of our result. Similarly, the monotonicity condition on channel predicates [11] is also a special case of linearity. We define a larger class of predicates, semi linear that includes linear, observer independent and stable predicates as proper subclasses. We describe the ....

....dual versions for post linear predicates. Thus, is a post linear predicate iff C is a sup semilattice. Combining these results, we get: Theorem 4.7 C is a lattice iff is linear and post linear. As an application of Theorem 4. 7, we consider the problem of recovery in distributed systems [18]. We call a local state recoverable if after a failure, the state can be recovered from the disk using a checkpoint and the message log. A cut is called recoverable if all states belonging to that cut are recoverable and the cut is consistent The following is an easy corollary of the Theorem ....

[Article contains additional citation context not shown here]

D. B. Johnson and W. Zwaenepoel. Recovery in distributed systems using optimistic message logging and checkpointing. Journal of Algorithms, 11(3):462--491, September 1990.


Causality Tracking in Causal Message-Logging Protocols - Alvisi, Bhatia, Marzullo (2002)   (1 citation)  (Correct)

....of a crashed process. Thus, message logging protocols guarantee either through careful logging or through a somewhat complex recovery protocol that after recovery no process is an orphan. Message logging protocols can be pessimistic (for example, 5, 11, 17, 24] optimistic (for example, [12, 22, 23, 26]) or causal [4] Like pessimistic protocols, causal protocols [3, 10] never create orphans, and, like optimistic protocols, they do not log synchronously to stable storage. They are able to do this by piggybacking information onto the ambient message traffic. Causal message logging protocols ....

D.B. Johnson and W. Zwaenepoel. Recovery in Distributed Systems Using Optimistic Message Logging and Checkpointing. Journal of Algorithms, 11:462--491, 1990.


Characterization of Message Ordering Specifications and Protocols - Murty, Garg (1996)   (1 citation)  (Correct)

....as synchronization primitives, or are embedded in some protocol. Our results show that all these protocols must use additional control messages for implementation. Many asynchronous consistent cut protocols [25] such as global snapshot algorithms [7, 11, 17] check pointing and rollback recovery [10, 15, 14], and deadlock detection [5] require special messages to find consistent cuts in a computation. These protocols require some form of inhibition of the special messages in order to guarantee correctness. The inhibition of the messages can also be viewed as a restriction on the set X. In [22] ....

D. B. Johnson and W. Zwaenepeol. Recovery in distributed systems using optimistic message logging and checkpointing. In Proceedings of the 8th Annual ACM Symposium on Principles of Distributed Computing, pages 171--181. ACM, 1988.


Understanding The Message Logging Paradigm For Masking Process.. - Alvisi (1996)   (4 citations)  (Correct)

....As we noted above, the main problem in message logging derives from the difficulty of satisfying the consistency condition that must hold upon recovery of a faulty process in a way that is both simple and efficient. Yet, in spite of the numerous message logging protocols [SY85,JZ87,JV87,SBY88,SW89,JZ90,Eln93,VJ94] no precise specification of what such consistency condition requires has been presented in the literature. In this dissertation, we present the first formal specification of a necessary and sufficient condition for avoiding orphan processes. After showing how pessimistic and ....

....caching messages, and checkpointing process states. 2. A recovery component, which uses the information saved by the logging component to recover the application to a state that satisfies the no orphan consistency condition. Checkpoints can be independent [SY85,KT87,SW89,JZ87,SBY88,JV87,JZ90, Joh93,WF92,EZ92,VJ94] or coordinated [EZ94] The storage abstraction implemented by the logging component is called a log. To emphasize the fundamentally different roles played by message determinants and message contents, we explicitly distinguish between storing messages and storing ....

[Article contains additional citation context not shown here]

D.B. Johnson and W. Zwaenepoel. Recovery in distributed systems using optimistic message logging and checkpointing. Journal of Algorithms, 11:462--491, 1990.


Trade-Offs in Implementing Causal Message Logging Protocols - Alvisi, Marzullo (1996)   (4 citations)  (Correct)

....of a crashed process. Thus, in the terminology of message logging, message logging protocols must guarantee that there are no orphan processes, either through careful logging or through a somewhat complex recovery protocol. The two main approaches to message logging are optimistic (for example, [18, 17, 11, 20]) and pessimistic (for example, 7, 15, 10, 19] We have recently defined a third approach that we call causal [3] There are two published causal message logging protocols: Family Based Logging (FBL) 4] and Manetho [9] In the same paper we defined a message logging protocol to be optimal if ....

D.B. Johnson and W. Zwaenepoel. Recovery in distributed systems using optimistic message logging and checkpointing. Journal of Algorithms, 11:462-- 491, 1990.


Transparent Fault Tolerance for Web Services based.. - Dialani, Miles.. (2002)   (Correct)

....process to the previous checkpoint will not a#ect dependent processes. Optimistic independent checkpointing requires that dependencies are explicitly recorded somewhere in the system, so that on rollback of a process, dependent processes will be informed appropriately and possibly also rolled back [6, 11, 20]. The pessimistic approach places more restrictions on a process autonomy in checkpointing and may require more checkpointing than optimistic approaches. Optimistic mechanisms will have more overhead in rollback, on the other hand. However, it should be noted that no single mechanism is ....

David B. Johnson and Willy Zwaenepoel. Recovery in distributed systems using optimistic message logging and checkpointing. In Proc. 7th Annual ACM Symp. on Principles of Distributed Computing, pages 171--181, Toronto (Canada), 1988.


Causality Considerations in Distributed.. - Vaughan, Dearle.. (1994)   (1 citation)  (Correct)

....in the system. It therefore contains all the information needed to formulate a consistent cut. N4 i 1,0,0,0 2,0,0,0 1,0,0,0 x . 2,2,0,0 1, 1,2,0,1, Figure 6: Consistent cuts and vector time Vector time may be used to characterise consistent cuts. Johnson and Zwaenepeol [11] show a method which involves the construction of a dependency matrix which is a matrix whose rows correspond to the vector times of all nodes in a system. For example, the matrix for the cut 8 in Figure 6 above is: 100 201 001 They show that a dependency matrix M represents a consistent cut ....

....is aware of its causal dependencies and can control appropriate recovery after failure. Recovery consists of finding some earlier checkpoint and replaying messages where possible. An implementation of these scheme based on the Mach operating system is described in [8] Johnson and Zwaenepoel [11] provide an extended treatment which uses checkpoints and message logs to find a maximal recoverable state applicable to both optimistic and pessimistic logging protocols. The algorithms presented by Johnson and Zwaenepoel form the basis for recovery control in Grasshopper. 4. Grasshopper ....

Johnson, D. and Zwaenepoel, W. "Recovery in Distributed Systems Using Optimistic Message Logging and Checkpointing", Journal of Algorithms, vol 11, 3, pp.462-491, 1990.


Fast Cluster Failover Using Virtual Memory-Mapped Communication - Zhou, Chen, Li (1999)   (8 citations)  (Correct)

....the primary to backup either automatically or explicitly using virtual memory mapped communication mechanism. Checkpointing and log and replay [10] are the two main techniques for reconstructing the state of a failed process. Checkpointing has been used for many years [6, 26] and in many systems [31, 25, 29, 17]. In this paper, we checkpoint to remote memory for fast failover. PERSEAS is another work [28] similar to our study. It is a transaction library based on reliable main memory provided by mirroring the data at the remote memory. It di ers from our work because it does not support failover of ....

David Johnson and Willy Zwaenepoel. Recovery in Distributed Systems Using Optimistic Message Logging and Checkpointing. In PODC'88.


A Non-blocking Recovery Algorithm for Causal Message Logging - Roger Mitchell Vijay (1998)   (4 citations)  (Correct)

....received. By assuming that receive ordering is the only source of non determinism, execution is recoverable using this ordering. Pessimistic message logging [4, 11] forces a process to wait before sending any message while the message log is written to stable storage. Optimistic logging methods [9, 12, 13, 15] (and the similar sender based logging [8, 14] assume failures are rare and therefore allow ordering information to be lost in a failure. That is, a message is logged in the background while execution proceeds) Consequently, received messages and any sends that depend on them may not be ....

....order of not recovered While it may seem that the solution is to have m 00 include the incarnation of P 1 as well as P 2 , there is a less costly solution, which we will present in the next section. First, however, we will discuss previous solutions. Alvisi, 1] and Johnson and Zwaenepoel [9] avoid this problem using the following idea: processes that respond with determinant information are required to first write this information to stable storage. These processes must then also block the receive of any messages until after a special message has been received from the recovering ....

[Article contains additional citation context not shown here]

D. Johnson, W. Zwaenepoel, "Recovery in Distributed Systems Using Optimistic Message Logging and Checkpointing," Journal of Algorithms, Vol. 11, Sept. 1990, pp.462-491.


Checkpointing and Rollback of Wide-Area Distributed.. - Cao, Chan, Jia, Dillon (2001)   (Correct)

....the discussion of our future work. 2.System Model and the MACR Framework Rollback recovery can be either checkpoint based or log based [7] In checkpoint based rollback recovery, recovery relies solely on saved checkpoints. In log based rollback recovery, both checkpointing and logging are used [14, 9]. In this paper, we are concerned with only checkpoint based approaches. There exist three primary approaches to checkpointbased checkpointing and rollback in distributed systems: 1. Independent checkpointing [16, 2, 15] Processes do not coordinate their actions for checkpointing during normal ....

D.B. Johnson and W. Zwaenepoel, "Recovery in Distributed Systems Using Optimistic Message Logging and Checkpointing", Journal of Algorithms, 11, 1990, pp462-491.


Minimizing Timestamp Size for Completely Asynchronous.. - Smith, Johnson (1996)   Self-citation (Johnson)   (Correct)

....number of times; this pathology arose from the lack of the Immediate Rollback property described in Section 1.2. Strom and Yemini used timestamps of size O(n log s L ) bits. Some subsequent work in optimistic recovery minimized the number of rollbacks by sacrificing asynchrony during recovery [13, 20, 24, 7], and some of these even reduced the timestamp size to O(log s L ) bits [13, 24] Smith, Johnson, and Tygar. Our earlier protocol [27] achieves fully asynchronous recovery while also minimizing rollbacks and wasted computation. However, we obtained this result by using large timestamps. 1 We ....

....described in Section 1.2. Strom and Yemini used timestamps of size O(n log s L ) bits. Some subsequent work in optimistic recovery minimized the number of rollbacks by sacrificing asynchrony during recovery [13, 20, 24, 7] and some of these even reduced the timestamp size to O(log s L ) bits [13, 24]. Smith, Johnson, and Tygar. Our earlier protocol [27] achieves fully asynchronous recovery while also minimizing rollbacks and wasted computation. However, we obtained this result by using large timestamps. 1 We introduced a second level of partial order time, separating the user computation ....

D. B. Johnson and W. Zwaenepoel. "Recovery in Distributed Systems Using Optimistic Message Logging and Checkpointing." Journal of Algorithms. 11: 462--491. September 1990.


Consistent Main-memory Database Federations under Deferred.. - Schmidt, Pedone (2005)   (Correct)

No context found.

D. B. Johnson and W. Zwaenepoel. Recovery in distributed systems using optimistic message logging and checkpointing. Journal of Algorithms, 11(3):462--491, 1990.


Libra: A Library for Reliable Distributed Applications - Jinsong Ouyang And   (Correct)

No context found.

D. B. Johnson and W. Zwaenepoel. Recovery in distributed systems using optimistic message logging and checkpointing. Journal of Algorithms, 11:462--491, 1990.


Dependable High Performance Computing on a Parallel.. - Blochinger, Bündgen, al. (2000)   (1 citation)  (Correct)

No context found.

D.B. Johnson and W. Zwaenepoel. Recovery in distributed systems using optimistic message logging and checkpointing. Journal of Algorithms, 11(3):462--491, September 1990.


Functional Grid Programming with ConCert - VII (2004)   (Correct)

No context found.

David B. Johnson and Willy Zwaenepoel. Recovery in distributed systems using optimistic message logging and checkpointing. In Proc. 7th Annual ACM Symp. on Principles of Distributed Computing, pages 171--181, Toronto (Canada), 1988.


Extendible, Long-Lived Transaction Processing on Distributed and.. - Gore (2001)   (Correct)

No context found.

Johnson, D. B., and Zwaenepoel, W. Recovery in distributed systems using optimistic message logging and checkpointing. Journal of Algorithms 11(3) (September 1990), 462-491.


Services For Networks With Mobile Hosts - Arup Acharya Graduate   (Correct)

No context found.

Johnson, D. B., and Zwaenopoel, W. Recovery in distributed systems using optimistic message logging and checkpointing. In 7 ACM Symposium on Principles of Distributed Computing (1988).


Flashback: A Lightweight Extension for Rollback and .. - Srinivasan.. (2004)   (1 citation)  (Correct)

No context found.

D. Johnson and W. Zwaenepoel. Recovery in distributed systems using optimistic message logging and checkpointing. In Proceedings of the Seventh Annual ACM Symposium on Principles of Distributed Computing, pages 171-181, Aug. 1988.


Memory Management for Networked Servers - Zhou (2000)   (Correct)

No context found.

David Johnson and Willy Zwaenepoel. Recovery in distributed systems using optimistic message logging and checkpointing. In Proceedings of the Seventh Annual ACM Symposium on Principles of Distributed Computing, pages 171--181, August 1988.


Duplex: A Reusable Fault Tolerance Extension.. - Sharma, Chen, Li.. (2003)   (3 citations)  (Correct)

No context found.

D.B. Johnson and W. Zwaenepoel. Recovery in distributed systems using optimistic message logging and checkpointing. Journal of Algorithms, 11(3):462--491, Sep. 1990.


A Debugger for Distributed Programs - Side, Shoja (1994)   (3 citations)  (Correct)

No context found.

D. B. Johnson and W. Zwaenepoel, `Recovery in distributed systems using optimistic message logging and checkpointing', Journal of Algorithms, 11, (3), 462--491 (1990).


Guaranteed Deadlock Recovery: Deadlock Resolution with.. - Wang, Merritt.. (1995)   (Correct)

No context found.

D. B. Johnson and W. Zwaenepoel, "Recovery in distributed systems using optimistic message logging and checkpointing," J. Algorithms, Vol. 11, pp. 462--491, 1990. 20


K. Venkatesh, T. Radhakrishnan, and H.F. Li. Global.. - Friedemann Mattern.. (1993)   (Correct)

No context found.

D. Johnson and W. Zwaenepoel. Recovery in distributed systems using optimistic mes- sage logging and checkpointing. doural of Algorithms, 3(11):462 491, 1990.


Reduced State Space Markov Decision Process and the Dynamic.. - Yu (1997)   (Correct)

No context found.

Johnson, D. B. and Zwaenepoel, W. "Recovery in Distributed Systems Using Optimistic Message Logging and Checkpointing", J. of Algorithms, Vol. 11, pp. 462-491, 1990

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC