@MISC{Gray85whydo, author = {Jim Gray}, title = {Why Do Computers Stop And What Can Be Done About It?}, year = {1985} }
Years of Citing Articles
Bookmark
OpenURL
Abstract
An analysis of the failure statistics of a commercially available fault-tolerant system shows that administration and software are the major contributors to failure. Various approaches to software fault-tolerance are then discussed -- notably process-pairs, transactions and reliable storage. It is pointed out that faults in production software are often soft (transient) and that a transaction mechanism combined with persistent processpairs provides fault-tolerant execution -- the key to software fault-tolerance.