| Y. Huang, C. Kintala, Software fault tolerance in application layer, in: M. Lyu (Ed.), Software Fault Tolerance, Wiley, New York, 1995, Ch. 10. 59 |
....implement a primary and backups for both GA and GR. The primary periodically checkpoints its state on the backups, and when the primary fails, a backup takes over as the primary. This approach has been successfully used in many practical distributed computing environments using reusable components [20] that provide the needed fault tolerance without extensive progamming effort. The computational overhead for providing this kind of fault tolerance varies from 1 to 7 [20] More details on our approach can be found in [2] Note that the backup, by watching its primary, relieves an application ....
....This approach has been successfully used in many practical distributed computing environments using reusable components [20] that provide the needed fault tolerance without extensive progamming effort. The computational overhead for providing this kind of fault tolerance varies from 1 to 7 [20]. More details on our approach can be found in [2] Note that the backup, by watching its primary, relieves an application from having to itself try the secondary when the primary fails. The burden of implementing the fault tolerance is thus not on the application. 4.2. Inter domain operations ....
Y. Huang, C. Kintala, Software fault tolerance in application layer, in: M. Lyu (Ed.), Software Fault Tolerance, Wiley, New York, 1995, Ch. 10. 59
....or processes in spite of node or communication failures. Some work has been done in implementing specific tools for monitoring and controlling distributed applications, such as the Meta Toolkit[12] the Megascope tool[15] within the Project Pilgrim[14] and the tools developed by Huang and Kintala[7]. However, until now, we have not seen any system supporting the availability management of DCE based applications. Sampa, which stands for System for Availability Management of Process based Applications, will be a decentralized and fault tolerant system intended to support the management of ....
.... support for checkpointing, its fault detection and recovery capabilities are constrained to the automatic detection and recovery of the faults mentioned above, and to periodic checkpointing and recovery of some of the program s internal state, which corresponds to level 2 in Huang and Kintala s [7] classification of fault tolerance facilities. The main application areas for this kind of fault tolerance are systems with higher demands on availability than on strong data consistency, such as telephone switching systems or information retrieval systems. In this paper we will focus on the ....
[Article contains additional citation context not shown here]
Y. Huang and C. Kintala. Software Fault Tolerance in the Application Layer, chapter 10. John Wiley & Sons, 1995.
....to model different fault classes. We illustrate this in Figure 3. The task Processing contains a single hardware resource HW, with five modes: four failure modes and one success mode. The different failure modes trigger different fault tolerance mechanisms, as described by Huang and Kintala [9]. If the fault tolerance task ( FT1 to FT4 ) fails, the path ends in the Failed sink, otherwise in the Succeeded sink. The fault classes model illustrates the kind of analyses we want to perform using action models. We are interested in the influence of fault tolerance mechanisms on ....
Y. Huang and C. Kintala. Software fault-tolerance in the application layer. In B. Krishnamurthy and M. R. Lyu, editors, Software Fault Tolerance, volume 3 of Trends in Software, chapter 10, pages 231--248. John Wiley & Sons, New York, 1995.
....polling requests with I am alive messages immediately. If the answer is missing for three times consecutively, the crash manager assumes that this process has crashed. As another example, consider the algorithm in Fig. 3 which is similar to watchd process implemented on Unix and reported in [HK95] It is tempting to implement failure detectors using the algorithm in Fig. 3 since it only requires the failure detector to listen for incoming messages during the timeout interval. This algorithm, however, does not satisfy IO accuracy. It may show some process P i to be suspected at all times ....
Y. Huang and C. Kintala. Software fault tolerance in the application layer. In Michael Lyu, editor, Software Fault Tolerance, pages 249--278. Wiley, Trends in Software, 1995.
....earlier the algorithm does not guarantee properties of 3W . Our contribution lies in formalizing the exact properties guaranteed by that algorithm and showing its usefulness in asynchronous systems. We also show that some other natural timeout implementation of failure detectors for Unix [HK95] and other systems[Bec91] do not satisfy the properties of an IO detector. Although no algorithm can solve the consensus problem in an asynchronous system for all runs, it is desirable that the processes reach agreement in finite time for those runs which satisfy partial synchrony condition. ....
....that does not satisfy IO accuracy The crash detection manager in [Bec91] does not satisfy infinitely often accuracy since it may permanently suspect a correct process. As another example, consider the algorithm in Fig. 3 which is similar to the watchd process implemented on Unix and reported in [HK95] It is tempting to implement failure detectors using the algorithm in Fig. 3 since it only requires the failure detector to listen for incoming messages during the timeout interval. This algorithm, however, does not satisfy IO accuracy. It may suspect some process P i at all times even when P ....
Y. Huang and C. Kintala. Software fault tolerance in the application layer. In Michael Lyu, editor, Software Fault Tolerance, pages 249--278. Wiley, Trends in Software, 1995.
....on the automatic enforcement of availability policies. Some work has been done in implementing specific tools for monitoring and controlling distributed applications, such as the Meta Toolkit[12] the Megascope tool[15] within the Project Pilgrim[14] and the tools developed by Huang and Kintala[8]. However, until now, there is no system that supports the management of DCE based applications with respect to fault tolerance and availability requirements. Sampa, which stands for System for Availability Management of Process based Applications, has been designed to support the management of ....
.... limited support for checkpointing its fault detection and recovery capabilities are constrained to the automatic detection recovery of the faults mentioned above, and periodic checkpointing and recovery of some of the program s internal state, which corresponds to level 2 in Huang Kintala s[8] classification of fault tolerance facilities. Therefore, the main application areas for such kind of fault tolerance are systems with higher demands on availability than on strong data consistency, such as telephone switching systems or information retrieval systems. In this paper we will focus ....
[Article contains additional citation context not shown here]
Y. Huang and C. Kintala. Software Fault Tolerance in the Application Layer, chapter 10. John Wiley & Sons, 1995.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC