MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  Software Fault Tolerance of Concurrent Programs Using Controlled Re-execution (1999) [14 citations — 11 self]

Download:
pdf | ps
by Ashis Tarafdar, Vijay K. Garg
In Proceedings of the 13th Symposium on Distributed Computing (DISC
http://www.ece.utexas.edu/~garg/dist/disc99.ps.Z
Add To MetaCart

Abstract:

Abstract. Concurrent programs often encounter failures, such as races, owing to the presence of synchronization faults (bugs). One existing technique to tolerate synchronization faults is to roll back the program to a previous state and re-execute, in the hope that the failure does not recur. Instead of relying on chance, our approach is to control the re-execution in order to avoid a recurrence of the synchronization failure. The control is achieved by tracing information during an execution and using this information to add synchronizations during the re-execution. The approach gives rise to a general problem, called the off-line predicate control problem, which takes a computation and a property specified on the computation, and outputs a "controlled " computation that maintains the property. We solve the predicate control problem for the mutual exclusion property, which is especially important in synchronization fault tolerance. 1

Citations

1747 Time, clocks and the ordering of events in a distributed system – Lamport - 1978
501 Virtual time and global states of distributed systems – Mattern - 1989
425 System Structure for Software Fault Tolerance – Randell - 1975
252 Understanding fault-tolerant distributed systems – Cristian - 1991
97 On the implementation of n-version programming for software fault tolerance during execution – Avizienis, Chen - 1977
92 Algorithms for Mutual Exclusion – Raynal - 1986
69 Software implemented fault tolerance: Technologies and experience – Huang, Kintala - 1993
61 Optimal tracing and replay for debugging a message-passing parallel program – Netzer, Miller - 1992
59 Deterministic replay of java multithreaded applications – Choi, Srinivasan - 1998
36 Optimal tracing and replay for debugging shared-memory parallel programs – Netzer - 1993
30 Efficient detection of determinacy races in Cilk programs – Feng, Leiserson - 1997
24 Race condition detection for debugging shared-memory parallel programs – Netzer - 1991
21 Predicate Control for Active Debugging of Distributed Programs – Tarafdar, Garg - 1998
14 Execution replay for TreadMarks – Ronsse, Zwaenepoel - 1997
13 Progressive Retry for Software Failure Recovery in Message-Passing Applications – Wang, Huang, et al. - 1997
5 Software Fault Tolerance in Computer Operating Systems – Iyer, Lee - 1995