by Ashis Tarafdar, Vijay K. Garg
In Proceedings of the 13th Symposium on Distributed Computing (DISC
http://www.ece.utexas.edu/~garg/dist/disc99.ps.Z
Add To MetaCart
Abstract:
Abstract. Concurrent programs often encounter failures, such as races, owing to the presence of synchronization faults (bugs). One existing technique to tolerate synchronization faults is to roll back the program to a previous state and re-execute, in the hope that the failure does not recur. Instead of relying on chance, our approach is to control the re-execution in order to avoid a recurrence of the synchronization failure. The control is achieved by tracing information during an execution and using this information to add synchronizations during the re-execution. The approach gives rise to a general problem, called the off-line predicate control problem, which takes a computation and a property specified on the computation, and outputs a "controlled " computation that maintains the property. We solve the predicate control problem for the mutual exclusion property, which is especially important in synchronization fault tolerance. 1
Citations
|
1747
|
Time, clocks and the ordering of events in a distributed system
– Lamport
- 1978
|
|
501
|
Virtual time and global states of distributed systems
– Mattern
- 1989
|
|
425
|
System Structure for Software Fault Tolerance
– Randell
- 1975
|
|
252
|
Understanding fault-tolerant distributed systems
– Cristian
- 1991
|
|
97
|
On the implementation of n-version programming for software fault tolerance during execution
– Avizienis, Chen
- 1977
|
|
92
|
Algorithms for Mutual Exclusion
– Raynal
- 1986
|
|
69
|
Software implemented fault tolerance: Technologies and experience
– Huang, Kintala
- 1993
|
|
61
|
Optimal tracing and replay for debugging a message-passing parallel program
– Netzer, Miller
- 1992
|
|
59
|
Deterministic replay of java multithreaded applications
– Choi, Srinivasan
- 1998
|
|
36
|
Optimal tracing and replay for debugging shared-memory parallel programs
– Netzer
- 1993
|
|
30
|
Efficient detection of determinacy races in Cilk programs
– Feng, Leiserson
- 1997
|
|
24
|
Race condition detection for debugging shared-memory parallel programs
– Netzer
- 1991
|
|
21
|
Predicate Control for Active Debugging of Distributed Programs
– Tarafdar, Garg
- 1998
|
|
14
|
Execution replay for TreadMarks
– Ronsse, Zwaenepoel
- 1997
|
|
13
|
Progressive Retry for Software Failure Recovery in Message-Passing Applications
– Wang, Huang, et al.
- 1997
|
|
5
|
Software Fault Tolerance in Computer Operating Systems
– Iyer, Lee
- 1995
|