Download:
|
by Jie Xu, Brian R
IEEE International Conference on Parallel and Distributed Systems
http://www.laas.research.ec.org/deva/trs/../papers/02.ps
Add To MetaCart
Abstract:
Roll-forward checkpointing schemes [8][10] are developed in order to avoid rollback in the presence of independent faults and to increase the possibility that a task completes within a tight deadline. However, despite of the adoption of roll-forward recovery, these schemes are not necessarily appropriate for time-critical applications because interactions with the external environment and communications between processes must be deferred during checkpoint validation steps (typically, two checkpoint intervals) until the fault-free processors are identified. The deadlines on providing services may thus be violated. In this paper we present and discuss two alternative rollforward recovery schemes, especially for time-critical and interaction-intensive applications, that deliver correct, timely results even when checkpoint validation is required. Key Words--- Checkpoint validation, dynamic redundancy, fault tolerance, forward error recovery, real-time
Citations
|
438
|
System Structure for Software Fault Tolerance
– Randell
- 1975
|
|
43
|
Sequoia: A fault-tolerant tightly coupled multiprocessor for transaction processing
– Bernstein
- 1988
|
|
37
|
Rollback and recovery strategies for computer programs
– Chandy, Ramamoorthy
- 1972
|
|
22
|
Roll-forward checkpointing scheme: A novel fault tolerant architecture
– Pradhan, Vaidya
- 1994
|
|
17
|
Fault Tolerance: Principles and Practice, Second Edition
– Lee, Anderson
- 1990
|
|
14
|
Fault tolerance in embedded real-time systems
– Jahanian
- 1994
|
|
14
|
Hardware and software fault tolerance: Definition and analysis of architectural solutions
– Laprie, Arlat, et al.
- 1987
|
|
13
|
Implementing forward recovery using checkpoints in distributed systems
– Long, Fuchs, et al.
- 1992
|
|
12
|
Roll-forward checkpointing scheme: Concurrent retry with nondedicated spares
– Pradhan, Vaidya
- 1992
|
|
11
|
Spare Capacity as a Means of Fault Detection and Diagnosis in Multiprocessor Systems
– Dahbura, Sabnani, et al.
- 1989
|
|
11
|
Roll-forward and rollback recovery: Performance-reliability trade-off
– Pradhan, Vaidya
- 1994
|
|
5
|
A fault-tolerant mechanism for simple controllers
– Silva, Silva, et al.
- 1994
|
|
2
|
A forward recovery strategy using checkpointing in parallel systems
– Long, Fuchs, et al.
- 1990
|
|
2
|
Chapter 1: The evolution of the recovery block concept
– Randell, Xu
- 1994
|