| J. Cao and K.C. Wang, "Efficient Synchronous Checkpointing in Distributed Systems" Australia Computer Science Communications, 14:1 (1992), pp165-179. |
....fundamental concepts, and implementation issues of rollback recovery protocols in distributed systems. The coverage excludes the use of rollback recovery in many Draft 2 related fields such hardware level instruction retry, distributed shared memory [38] real time systems, and debugging [37]. The coverage also excludes the issues of using rollback recovery when failures could include Byzantine modes or are not restricted to the fail stop model [51] Also excluded are rollback recovery techniques that rely on special language constructs such as recovery blocks [47] and transactions. ....
....proceeds until an event of interest occurs, at which time the content of the counter is sampled, and the number of instructions executed since the time the counter was set is computed and logged. The use of instruction counters has been suggested for debugging shared memory parallel programs [37]. Instruction counters can be used in rollback recovery to track the number of instructions that occur between asynchronous interrupts [54] These instruction counts are logged as part of the log that describes the nondeterministic events. During recovery, the system recovers the instruction ....
[Article contains additional citation context not shown here]
J. Cao. "Efficient synchronous checkpointing in distributed systems." In Proceedings of the 15 th Australia Computer Science Conference, pp 165---179, Jan. 1992.
....do not coordinate their actions for checkpointing during normal execution and each process takes its local checkpoint independently. When a failure occurs, consistent global checkpoints are established by dependency tracking and possibly message logging. 2. Synchronous (coordinated) checkpointing [10, 5, 3, 5, 6]: The processes coordinate their checkpointing actions in such a way that the set of local checkpoints taken is consistent. Whenever a process P requests to take a checkpoint, a set of processes, called the cohorts set of p, must be checked and some of them may also need to take their checkpoints ....
J. Cao and K.C. Wang, "Efficient Synchronous Checkpointing in Distributed Systems" Australia Computer Science Communications, 14:1 (1992), pp165-179.
....with the previous checkpoint in software, and writing the difference in a new checkpoint [46] The required storage and computation overhead to perform such a comparison may waste the benefit of incremental checkpointing. Another variation on this technique is to use probabilistic checkpointing [40]. The unit of checkpointing in this scheme is a memory block that is typically much smaller than a memory page. Changes to a memory block are detected by computing a signature and comparing it to the corresponding signature in the previous checkpoint. Probabilistic checkpointing is portable, and ....
....Parallel and Distributed Systems, 8(9) 959 969, Sep. 1997. 39] G. Muller, M. Hue and N. Peyrouz. Performance of consistent checkpointing in a modular operating system: Results of the FTM Experiment. In Lecture Notes in Computer Science: Dependable Computing, EDCC 1, pp. 491 508, Oct. 1994. [40] H C Nam, J. Kim, SJ. Hong and S. Lee. Probabilistic checkpointing. In Proceedings of the Twenty Seventh International Symposium on Fault Tolerant Computing (FTCS 27) pp.48 57, Jun. 1997. 41] R.B. Netzer and J. Xu. Necessary and sufficient conditions for consistent global snapshots. In ....
J. Cao and K. C. Wang. "Efficient synchronous checkpointing in distributed systems." Technical Report 91/6, Department of Computer Science, James Cook University of North Queensland, Australia, Dec. 1991.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC