(Enter summary)
Abstract: An analysis of the failure statistics of a commercially available fault-tolerant system
shows that administration and software are the major contributors to failure. Various
approaches to software fault-tolerance are then discussed -- notably process-pairs,
transactions and reliable storage. It is pointed out that faults in production software are
often soft (transient) and that a transaction mechanism combined with persistent processpairs
provides fault-tolerant execution -- the key to... (Update)
Cited by: More
Embracing Failure: - Case For Repair-Centric (2001)
(Correct)
Recovering Device Drivers - Michael Swift Muthukaruppan (2004)
(Correct)
Active Server Availability Feedback - James Hamilton Microsoft (2003)
(Correct)
Active bibliography (related documents): More All
1.2: Fault Tolerance In Tandem Computer Systems - Bartlett, Gray, Horst (1986)
(Correct)
0.5: Fail-Stop Processors: An Approach to Designing.. - Schlichting, Schneider (1983)
(Correct)
0.5: Tandem TR 85.5 - Distributed Computer Systems
(Correct)
Similar documents based on text: More All
0.1: An Approach To Decentralized Computer Systems - Gray (1985)
(Correct)
0.1: Critical Issues in the Design of a Fault-Tolerant.. - Hvasshovd.. (1991)
(Correct)
0.0: Fine grained Process Modelling: An Experiment at British.. - Emmerich, Bandinelli
(Correct)
Related documents from co-citation: More All
18: Measuring System and Software Reliability Using an Automated Data Collection Pro.. (context) - Murphy, Gent - 1995
13: System structure for software fault tolerance (context) - Randell - 1975
12: Software defects and their impact on system availability --- a study of field fa..
- Sullivan, Chillarege - 1991
BibTeX entry: (Update)
Jim Gray. Why do computers stop and what can be done about it? In Proc. Fifth Symposium on Reliability in Distributed Software and Database Systems, pages 3--12, 1986. http://citeseer.ist.psu.edu/gray85why.html More
@inproceedings{ gray86why,
author = "Jim Gray",
title = "Why Do Computers Stop and What Can Be Done About It?",
booktitle = "Symposium on Reliability in Distributed Software and Database Systems",
pages = "3-12",
year = "1986",
url = "citeseer.ist.psu.edu/gray85why.html" }
Citations (may not include all citations):
177
Fail-Stop Processors, an Approach to Designing Fault-Toleran..
- Schlichting, Schneider - 1983
98
A Message System Supporting Faulttolerance (context) - Borg, Baumbach et al. - 1984
68
A Nonstop Kernel (context) - Bartlett - 1981
60
Probabilistic Logics and the Synthesis of Reliable Organisms.. (context) - von Neumann - 1956
19
Exception Handling and Software Fault Tolerance (context) - Cristian - 1982
9
Robustness to Crash in a Distributed Database: A Non SharedM.. (context) - Borr - 1984
7
Optimizing Preventative Service of Software Products (context) - Adams - 1984
3
Highly Available Systems for Database Applications (context) - Kim - 1984
3
Transaction Monitoring in ENCOMPASS (context) - Borr - 1981
3
The Reliability of the IBM/XA Operating System (context) - Mourad, Andrews - 1985
2
Lecture Notes in Computer Science Vol (context) - Lampson - 1982
2
Aspects of a High Volume Production Online Banking System (context) - Burman - 1985
2
Principals of Transaction-Oriented Database Recovery (context) - Haeder, Reuter - 1983
1
Distributed Database Systems -- Four Case Studies (context) - Gray, Anderton
1
DP2 Performance Analysis (context) - Enright - 1985
1
PhD Thesis (context) - for, Networks - 1981
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://research.microsoft.com/~Gray/JimGrayPublications.htm): More
Unknown - (2002)
(Correct)
A Quick Look at Serial ATA (SATA) Disk Performance - Barclay, Chong, Gray (2003)
(Correct)
A "Measure of Transaction Processing" 20 Years Later - Gray (2005)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC