See this document in CiteSeerX!

Why Do Computers Stop And What Can Be Done About It? (1985)  (Make Corrections)  (79 citations)
Jim Gray
Symposium on Reliability in Distributed Software and Database Systems



  Home/Search   Context   Related

 
View or download:
microsoft.com/~Gra...oComputersStop.pdf
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  microsoft.com/~...rayPublications (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: An analysis of the failure statistics of a commercially available fault-tolerant system shows that administration and software are the major contributors to failure. Various approaches to software fault-tolerance are then discussed -- notably process-pairs, transactions and reliable storage. It is pointed out that faults in production software are often soft (transient) and that a transaction mechanism combined with persistent processpairs provides fault-tolerant execution -- the key to... (Update)

Cited by:   More
Embracing Failure: - Case For Repair-Centric (2001)   (Correct)
Recovering Device Drivers - Michael Swift Muthukaruppan (2004)   (Correct)
Active Server Availability Feedback - James Hamilton Microsoft (2003)   (Correct)

Active bibliography (related documents):   More   All
1.2:   Fault Tolerance In Tandem Computer Systems - Bartlett, Gray, Horst (1986)   (Correct)
0.5:   Fail-Stop Processors: An Approach to Designing.. - Schlichting, Schneider (1983)   (Correct)
0.5:   Tandem TR 85.5 - Distributed Computer Systems   (Correct)

Similar documents based on text:   More   All
0.1:   An Approach To Decentralized Computer Systems - Gray (1985)   (Correct)
0.1:   Critical Issues in the Design of a Fault-Tolerant.. - Hvasshovd.. (1991)   (Correct)
0.0:   Fine grained Process Modelling: An Experiment at British.. - Emmerich, Bandinelli   (Correct)

Related documents from co-citation:   More   All
18:   Measuring System and Software Reliability Using an Automated Data Collection Pro.. (context) - Murphy, Gent - 1995
13:   System structure for software fault tolerance (context) - Randell - 1975
12:   Software defects and their impact on system availability --- a study of field fa.. - Sullivan, Chillarege - 1991

BibTeX entry:   (Update)

Jim Gray. Why do computers stop and what can be done about it? In Proc. Fifth Symposium on Reliability in Distributed Software and Database Systems, pages 3--12, 1986. http://citeseer.ist.psu.edu/gray85why.html   More

@inproceedings{ gray86why,
    author = "Jim Gray",
    title = "Why Do Computers Stop and What Can Be Done About It?",
    booktitle = "Symposium on Reliability in Distributed Software and Database Systems",
    pages = "3-12",
    year = "1986",
    url = "citeseer.ist.psu.edu/gray85why.html" }
Citations (may not include all citations):
177   Fail-Stop Processors, an Approach to Designing Fault-Toleran.. - Schlichting, Schneider - 1983
98   A Message System Supporting Faulttolerance (context) - Borg, Baumbach et al. - 1984
68   A Nonstop Kernel (context) - Bartlett - 1981
60   Probabilistic Logics and the Synthesis of Reliable Organisms.. (context) - von Neumann - 1956
19   Exception Handling and Software Fault Tolerance (context) - Cristian - 1982
9   Robustness to Crash in a Distributed Database: A Non SharedM.. (context) - Borr - 1984
7   Optimizing Preventative Service of Software Products (context) - Adams - 1984
3   Highly Available Systems for Database Applications (context) - Kim - 1984
3   Transaction Monitoring in ENCOMPASS (context) - Borr - 1981
3   The Reliability of the IBM/XA Operating System (context) - Mourad, Andrews - 1985
2   Lecture Notes in Computer Science Vol (context) - Lampson - 1982
2   Aspects of a High Volume Production Online Banking System (context) - Burman - 1985
2   Principals of Transaction-Oriented Database Recovery (context) - Haeder, Reuter - 1983
1   Distributed Database Systems -- Four Case Studies (context) - Gray, Anderton
1   DP2 Performance Analysis (context) - Enright - 1985
1   PhD Thesis (context) - for, Networks - 1981



The graph only includes citing articles where the year of publication is known.


Documents on the same site (http://research.microsoft.com/~Gray/JimGrayPublications.htm):   More
Unknown - (2002)   (Correct)
A Quick Look at Serial ATA (SATA) Disk Performance - Barclay, Chong, Gray (2003)   (Correct)
A "Measure of Transaction Processing" 20 Years Later - Gray (2005)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC