22 citations found. Retrieving documents...
I. Lee and R. Iyer (1993). Faults, symptoms, and software fault tolerance in the tandem GUARDIAN operating system. In Proceedings of the Inter-national Symposium on Fault-Tolerant Computing.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Evaluating the Impact of Communication.. - Nagaraja.. (2003)   (Correct)

....appropriate and so have proposed many alternative messaging based protocols, e.g. 28, 33] The key difference from the MPP and SAN networks is that like TCP, these protocols viewed packet loss as signaling congestion. There has been extensive work in analysing faults and how they impact systems [13, 22, 34]. However, the focus of these studies was not on the communication system. Studies benchmarking system behavior under fault loads include [20, 24] However, these works do not provide a good understanding of how one would estimate overall system availability under a given fault load. System ....

I. Lee and R. Iyer. Faults, Symptoms, and Software Fault Tolerance in the Tandem GUARDIAN90 Operating System. In Proceedings of International Symposium on FaultTolerant Computing (FTCS-23), pages 20--29, 1993.


Using Fault Injection and Modeling to Evaluate the .. - Nagaraja, Li.. (2003)   (2 citations)  (Correct)

....main group to cause the automatic reboot of that node. While this is an extreme example of FME, it does improve the availability of PRESS substantially, as well as reduces the need for operator coverage. 8 Related Work There has been extensive work in analyzing faults and how they impact systems [11, 31, 17]. Studies benchmarking system behavior under fault loads include [15, 19] Unfortunately, these works do not provide a good understanding of how one would estimate overall system availability under a given fault load. There has also been a large number of system availability studies. Two ....

I. Lee and R. Iyer. Faults, symptoms, and software fault tolerance in the tandem guardian90 operating system. In Int. Symp. on Fault-Tolerant Computing (FTCS-23), pages 20--29, 1993.


Quantifying and Improving the Availability of.. - Nagaraja.. (2003)   (Correct)

....have studied the problem of fault tolerance extensively. A full treatment of this body of work is beyond the scope of this paper. Instead, we concentrate on efforts that have focused on improving the availability of cluster based services. Of course, work analyzing how faults impact systems [14, 19, 31, 32], as well as empirical measurement of actual fault rates [2, 16, 23, 18, 24] are necessary background for a model based quantification effort such as ours. Our methodology and infrastructure seem to be the first directed to quantifying the availability impact of a range of techniques as applied ....

I. Lee and R. Iyer. Faults, Symptoms, and Software Fault Tolerance in the Tandem GUARDIAN90 Operating System. In Proceedings of International Symposium on Fault-Tolerant Computing (FTCS-23), pages 20-29, 1993.


Evaluating the Impact of Communication.. - Nagaraja.. (2002)   (Correct)

....appropriate and so have proposed many alternative messaging based protocols, e.g. 28, 33] The key difference from the MPP and SAN networks is that like TCP, these protocols viewed packet loss as signaling congestion. There has been extensive work in analysing faults and how they impact systems [13, 22, 34]. However, the focus of these studies was not on the communication system. Studies benchmarking system behavior under fault loads include [20, 24] However, these works do not provide a good understanding of how one would estimate overall system availability under a given fault load. System ....

I. Lee and R. Iyer. Faults, Symptoms, and Software Fault Tolerance in the Tandem GUARDIAN90 Operating System. In Proceedings of International Symposium on FaultTolerant Computing (FTCS-23), pages 20--29, 1993.


Evaluating the Impact of Communication.. - Nagaraja.. (2003)   (Correct)

....alternative messaging based protocols. A few of the originals include [12, 36, 40] The key difference from the MPP and SAN networks models is that these protocols, like TCP, viewed packet loss to signal congestion. There has been extensive work in analysing faults and how they impact systems [17, 41, 28]. However, the focus of these studies was not on the communication system. Studies benchmarking system behavior under fault loads include [25, 30] However, these works do not provide a good understanding of how one would estimate overall system availability under a given fault load. System ....

I. Lee and R. Iyer. Faults, symptoms, and software fault tolerance in the tandem guardian90 operating system. In Int. Symp. on Fault-Tolerant Computing (FTCS-23), pages 20-- 29, 1993. 12


Classification of Software Defects in Parallel Programs - Henryk Krawczyk Bogdan (1994)   (Correct)

....the source of a fault, and the immediate effect is a local error in some system component. When it propagates across the component s interface, then that component fails from the system point of view. A sequence of interactions: failure, fault, error, failure, etc. is called an error propagation [18]. The systematic approach requires classification of attributes for such basic entities as failures, errors, and faults [23] They are shown in Table 2. Tab.2: Attributes of entities Attributes Definitions Location Where is the entity Timing When did the event occur Mode What was ....

....errors Ambiguous syntax of a programming language Documentation Incomplete description, mistakes in text Other Unclear causes software metrics used to improve and control the design process. These two categorizations of faults conform to the categorization given in Table 3, except two classes [18]: microcode defects and so called unclear faults. The former can be classified as either data or code error in Table 3, while the latter as other error with unclear causes. It should be noted that the categories characterized above tend to overlap. Study of some faults including network ....

Lee, I., Iyer, R.K.: Faults, symptoms, and software fault tolerance in the tandem GUARDIAN90 operating system. Proc. Int. Symp. FTCS-23, Tolouse, France, 1993, pp.20-29.


An Empirical Study of Operating Systems Errors - Chou, Yang, Chelf, Hallem.. (2001)   (34 citations)  (Correct)

....regardless of whether a particular workload triggered them. We consider a representative sample of these studies below. Gray surveyed outages in Tandem systems between 1985 and 1990, using manually gathered bug reports to classify the causes of outages [10] In a subsequent study, Lee and Iyer [15] looked at 200 memory dumps of eld software failures in the Tandem GUARDIAN 90 operating system collected over 1 year. They focused on the e ectiveness of fault detection and recovery, and classifying errors by type (uninitialized variables, race conditions) Sullivant and Chillarege [23] ....

I. Lee, R. Iyer, and F. Symptoms. Faults, Symptoms, and Software Fault Tolerance in the Tandem GUARDIAN Operating System. In International Symposium on FaultTolerant Computing (FTCS), 1993.


An Empirical Study of Operating Systems Errors - Chou, Yang, Chelf, Hallem.. (2001)   (34 citations)  (Correct)

....regardless of whether a particular workload triggered them. We consider a representative sample of these studies below. Gray surveyed outages in Tandem systems between 1985 and 1990, using manually gathered bug reports to classify the causes of outages [10] In a subsequent study, Lee and Iyer [15] looked at 200 memory dumps of field software failures in the Tandem GUARDIAN 90 operating system collected over 1 year. They focused on the e#ectiveness of fault detection and recovery, and classifying errors by type (uninitialized variables, race conditions) Sullivant and Chillarege [23] ....

I. Lee, R. Iyer, and F. Symptoms. Faults, Symptoms, and Software Fault Tolerance in the Tandem GUARDIAN Operating System. In International Symposium on FaultTolerant Computing (FTCS), 1993.


An Empirical Study of Operating Systems Errors - Chou, Yang, Chelf, Hallem.. (2001)   (34 citations)  (Correct)

....regardless of whether a particular workload triggered them. We consider a representative sample of these studies below. Gray surveyed outages in Tandem systems between 1985 and 1990, using manually gathered bug reports to classify the causes of outages [10] In a subsequent study, Lee and Iyer [15] looked at 200 memory dumps of field software failures in the Tandem GUARDIAN 90 operating system collected over 1 year. They focused on the effectiveness of fault detection and recovery, and classifying errors by type (uninitialized variables, race conditions) Sullivant and Chillarege [23] ....

I. Lee, R. Iyer, and F. Symptoms. Faults, Symptoms, and Software Fault Tolerance in the Tandem GUARDIAN Operating System. In International Symposium on FaultTolerant Computing (FTCS), 1993.


An Evaluation of the Recovery-Related Properties of Software Faults - Chandra (2000)   (Correct)

....it in the case of a failure. 3.4 Fault Model Faults injected into the applications form the second part of the workload. Our primary goal in designing these faults is to generate a wide variety of software crashes. Our models are derived from studies of commercial databases and operating systems [Sullivan92, Sullivan91, Lee93] and from prior models used in fault injection studies [Barton90, Kao93, Kanawati95, Chen96, Ng97] The faults we inject range from low level faults such as flipping bits in memory to high level software faults such as memory management errors and uninitialized variables. We classify injected ....

....of this study is to provide information to guide research on how to survive application faults. In particular, we wish to test the hypothesis that generic recovery techniques, such as process pairs, can survive a majority of application faults. Our methodology differs from that of prior studies [Lee93]. We reason from bug reports and source code as to whether a purely generic recovery system would have recovered from application faults, while past studies examine the field behavior of implemented, mostly generic recovery systems. This comparison is valuable because of our focus on purely ....

[Article contains additional citation context not shown here]

Inhwan Lee and Ravishankar K. Iyer. Faults, Symptoms, and Software Fault Tolerance in the Tandem GUARD IAN Operating System. In International Symposium on Fault-Tolerant Computing (FTCS), pages 20--29, 1993.


Cost of Ensuring Safety in Distributed Database.. - Sabaratnam..   (Correct)

....the software faults causing overlay errors are hard to trace and their impact are much higher than the regular errors. Common software faults causing overlay errors are: assignment faults, initialization faults, wild pointers, copy overflow, type mismatch, memory allocation, and undefined state [14, 15, 11]. Overlay errors may corrupt any part of code or data. The data buffer corruption is a subset of the errors resulting from the software faults causing overlays. We inject errors directly into the data buffer area in order to accelerate the database corruption, instead of injecting faults that ....

I. Lee and R. Iyer. Faults, Symtoms, and Software Fault Tolerance in the Tandem GUARDIAN Operating System. Proc. FTCS-23, pages 20--29, 1993.


Integrating Reliable Memory in Databases - Ng, Chen (1998)   (6 citations)  (Correct)

....exist at the point at which the crash occurred. 6.1 Fault models This section describes the types of faults we inject. Our primary goal in designing these faults is to generate a wide variety of database crashes. Our models are derived from studies of commercial databases and operating systems [Sullivan92, Sullivan91b, Lee93] and from prior models used in faultinjection studies [Barton90, Kao93, Kanawati95, Chen96] 200 writes transaction ends buffer replacement transaction begins clean dirty uncommitted dirty committed Fig. 7. Possible states of buffers in a protected persistent database buffer cache. The ....

....of faults imitate specific programming errors in the database [Sullivan91b] These are more targeted at specific programming errors than the previous fault category. We inject an initialization fault by deleting instructions responsible for initializing a variable at the start of a procedure [Kao93, Lee93]. We inject pointer corruption by (1) finding a register that is used as a base register of a load or store and (2) deleting the most recent instruction before the load store that modifies that register [Sullivan91b, Lee93] We do not corrupt the stack pointer register, as this is used to access ....

[Article contains additional citation context not shown here]

Lee I, Iyer RK (1993) Faults, Symptoms, and Software Fault Tolerance in the Tandem GUARDIAN Operating System. In: Proceedings of the 1993 International Symposium on Fault-Tolerant Computing (FTCS), pp 20--29


Evaluating the Effectiveness of Fault Tolerance in .. - Sabaratnam.. (1999)   (Correct)

....with a 200 MHz UltraSPARC processor, running ClustRa v.1.1, as illustrated in Figure 4. 3.2. Fault Model Extensive field error reports and surveys are not available in young products like ClustRa, therefore we have to base our fault model on studies conducted in operating systems and databases [10, 11, 8]. Common software faults causing overlay errors are: assignment faults: e.g. code line PreviousLog = tmp; instead of PreviousTrLog = tmp; corrupts both PreviousLog and PreviousTrLog variables, initialization faults: wrong or forgotten initialization of vari ables, wild pointers: assignment or ....

I. Lee and R. Iyer. Faults, Symtoms, and Software Fault Tolerance in the TandemGUARDIAN Operating System. Proc. FTCS-23, pages 20--29, 1993.


Perspectives for High Performance Computing in.. - Strumpen, Ramkumar, ..   (Correct)

....that ensures security, and is capable of motivating owners of machines to participate in a very large configuration, can this problem be solved. Reliability. Usually, reliability is being viewed as an absolute characteristic of systems that must be delivered at any cost. Tandem systems [13, 17] are an example of this approach. Another notion of reliability is the probabilistic performability measure [22] which combines performance and dependability, and models the accomplishment (reward) made by the system at user level. Up to now, fault tolerance issues have been treated extensively at ....

Inhwan Lee and Ravishankar K. Iyer. Faults, symptoms, and software fault tolerance in the tandem guardian90 operating system. In 23rd International Symposium on Fault-Tolerant Computing, pages 20--29, 1993.


The Systematic Improvement of Fault Tolerance in the Rio File Cache - Ng, Chen (1999)   (5 citations)  (Correct)

....2.1 Description of Faults This section describes the types of faults we inject into the operating system. Our primary goal in designing these faults is to generate a wide variety of operating system crashes. Our models are derived from studies of commercial operating systems and databases [Sullivan92, Sullivan91, Lee93] and from prior models used in fault injection studies [Barton90, Kao93, Kanawati95, Chen96] The faults we inject range from low level hardware faults such as flipping bits in memory to high level software faults such as memory allocation errors. We concentrate on software faults as studies have ....

....faults imitate specific programming errors in the operating system [Sullivan91] These are targeted more at specific programming errors than the previous fault category. We inject an initialization fault by deleting instructions responsible for initializing a variable at the start of a procedure [Kao93, Lee93]. We inject pointer corruption by corrupting the addressing bytes of instructions which access operands in memory [Sullivan91, The Systematic Improvement of Fault Tolerance in the Rio File Cache 4 Lee93] We either flip a bit within the addressing form specifier byte (ModR M) or the scale, ....

[Article contains additional citation context not shown here]

Inhwan Lee and Ravishankar K. Iyer. Faults, Symptoms, and Software Fault Tolerance in the Tandem GUARDIAN Operating System. In Proceedings of the 1993 International Symposium on Fault-Tolerant Computing (FTCS), pages 20--29, 1993.


Measuring Memory's Resistance to Operating System Crashes - Ng   (Correct)

....into common programming mistakes that cause software to fail, events that cause latent errors in programs to surface in the field, and failure symptoms. Lee and Iyer study and classify software failures in Tandem s Guardian operating system using memory dumps collected from field software failures [Lee93a, Lee93b]. They also identify the effects of software faults on the system, and trace the propagation of the effects to other subsystems. Their results indicate that the fault tolerant Tandem system tolerates 82 of the reported field software faults, 72 of reported field software failures are recurrences ....

....majority of the faults (82 ) are either quickly detected or does not propagate to other subsystems. These studies provide valuable information about failures in production environments; in fact many of the fault types in Section 3 were inspired by the major error categories from [Sullivan91] and [Lee93a]. However, they do not provide specific information about how often system crashes corrupt the permanent data in memory. 2.2 Using Software to Inject Faults Software fault injection is a popular technique for evaluating how prototype systems behave in the presence of hardware and software ....

[Article contains additional citation context not shown here]

Inhwan Lee and Ravishankar K. Iyer. Faults, Symptoms, and Software Fault Tolerance in the Tandem GUARDIAN Operating System. In International Symposium on Fault-Tolerant Computing (FTCS), pages 20--29, 1993.


Rio: Storing Files Reliably in Memory - Chen, Aycock, Ng, Rajamani.. (1995)   (Correct)

....they would trust the contents of memory after a system crash, most of them would likely give a resounding no . This intuition is backed by field studies of MVS and Guardian (Tandem s operating system) which show that between 1 4 to 1 2 of all software induced system crashes could corrupt memory [Sullivan91, Lee93]. It is not yet clear how often these crashes corrupt the file cache; we are conducting experiments to measure this. Memory is vulnerable to software corruption because writes to memory invoke no protocols and hence are not scrutinized by any error checking a simple store instruction by any ....

Inhwan Lee and Ravishankar K. Iyer. Faults, Symptoms, and Software Fault Tolerance in the Tandem GUARDIAN Operating System. In International Symposium on Fault-Tolerant Computing (FTCS), pages 20--29, 1993.


Unknown - Measuring Relative Attack   (Correct)

No context found.

I. Lee and R. Iyer (1993). Faults, symptoms, and software fault tolerance in the tandem GUARDIAN operating system. In Proceedings of the Inter-national Symposium on Fault-Tolerant Computing.


An Attack Surface Metric - Pratyusa Manadhata Jeannette (2005)   (Correct)

No context found.

I. Lee and R. Iyer, Faults, Symptoms, and Software Fault Tolerance in the Tandem GUARDIAN Operating System, Proceedings of International Symposium on Fault-Tolerant Computing (1993).


An Attack Surface Metric - Pratyusa Manadhata Jeannette (2005)   (Correct)

No context found.

I. Lee and R. Iyer, Faults, Symptoms, and Software Fault Tolerance in the Tandem GUARDIAN Operating System, Proceedings of International Symposium on Fault-Tolerant Computing (1993).


Software Fault Tolerance: A Tutorial - Torres-Pomales (2000)   (Correct)

No context found.

Inhwan Lee and Ravishankar K. Iyer, Faults, Symptoms, and Software Fault Tolerance in Tandem GUARDIAN90 Operating System, Digest of Papers: The Twenty-Third International Symposium on Fault-Tolerant Computing (FTCS23) , Toulouse, France, June 22 -- 24, 1993, pp. 20 -- 29. 48


The Rio File Cache: Surviving Operating System Crashes - Chen, Ng, Rajamani, Aycock (1996)   (60 citations)  (Correct)

No context found.

Inhwan Lee and Ravishankar K. Iyer. Faults, Symptoms, and Software Fault Tolerance in the Tandem GUARDIAN Operating System. In International Symposium on Fault-Tolerant Computing (FTCS), pages 20--29, 1993.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC