| R. M. Kieckhafer, C. J. Walter, A. M. Finn, and P. M. Thambidurai. The MAFT architecture for distributed fault tolerance. IEEE Transactions on Computers, 37(4):398--405, April 1988. |
....sends the result in its buffer to the user, and the algorithm is terminated. 4 Discussion Replication and majority voting are the conventional methods for achieving fault tolerance in distributed systems. Distributed voting has become the strategy of choice, and has had a number of incarnations [12, 13, 14]. Heavy reliance has been placed on the 2 phase commit protocol [8] in which the voters first exchange votes and independently determine the majority result, and then one arbitrary voter within the majority commits this value to the user. This method is widely advocated in designing ....
Kieckhafer, R., Walter, C., Finn, A., Thambidurai, P., "The MAFT Architecture for Distributed Fault Tolerance," IEEE Transactions On Computers, Vol. 37, No. 4, April 1988, pp. 398-405.
....of the system as a whole revolves around the dependability of that one voter. Current distributed voting schemes assume that there is a sufficiently low probability of failure during that last stage that this doesn t become a problem. This voting methodology has had several embodiments ([3, 4, 5, 6]) in the development of fault tolerant computing. More recently, distributed voting has been used for fault diagnosis in linear processor arrays ( 7] where, in the absence of a centralized voter, the array elements share error flags stemming from output comparisons performed between connected ....
Kieckhafer, R., Walter, C., Finn, A., Thambidurai, P., "The MAFT Architecture for Distributed Fault Tolerance," IEEE Transactions On Computers, Vol. 37, No. 4, April 1988, pp. 398-405.
.... realtime computer architectures for safety critical applications started more than thirty years ago with the design of the STAR computer [2] and the two projects SIFT [3] and FTMP [4] These projects were carefully evaluated and gave rise to new designs about ten years later: FTPP [5] MAFT [6], and the architectural concepts of the AIRBUS flight control system [7] In 1992 the first paper on SAFEbus [8] the architecture that was later deployed in the Boeing 777 aircraft for flight control, became available. In excellent publications by Lala [9] Avizienis [10] and the books by Rechtin ....
R. Kieckhafer, C. Walter, A. Finn, and P. Thambidurai. The MAFT Architecture for Distributed Fault Tolerance. IEEE Transactions on Computers, 37(4):398--405, 1988.
.... The interactive consistency algorithm of SPIDER has been formally verified by Miner (it is similar to that previously performed for the Draper FTP architecture [LR94] Its diagnosis algorithm also has recently been formally verified by Geser; it is similar to the algorithms developed for MAFT [KWFT88] whose verification is described by Walter, Lincoln, and Suri [WLS97] Formal verification of the SPIDER clock synchronization algorithm is in progress. One of the main goals of the SPIDER project is to serve as a demonstration study for certification under the DO 254 guidelines for airborne ....
R. M. Kieckhafer, C. J. Walter, A. M. Finn, and P. M. Thambidurai. The MAFT architecture for distributed fault tolerance. IEEE Transactions on Computers, 37(4):398--405, April 1988.
....Symmetric faults deliver wrong values but do so consistently. Manifest faults are those that can be detected by all nonfaulty receivers. 2 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 21, NO. 2, FEBRUARY 1995, pp. 107 125 including one (called MAFT) by a manufacturer of ightcontrol systems [6]. These fault tolerant architectures must be able to withstand multiple faults, and it can require an excessive amount of redundancy to do this if failed channels are left operating (e.g. seven channels are required to withstand two simultaneously active Byzantine faults) Recon guration to ....
R. M. Kieckhafer, C. J. Walter, A. M. Finn, and P. M. Thambidurai, \The MAFT architecture for distributed fault tolerance ", IEEE Transactions on Computers, vol. 37, no. 4, pp. 398-405, Apr. 1988.
....2 can be drawn into this framework to provide robotic fault tolerance. Finally, Chapter 7gives a brief summary of the results and future extensions. 8 Chapter 2 Previous Work in Fault Tolerance Many fault tolerant systems have been developed for computer, airplane, and industrial systems [8,15,21,26,41,42]. Several of these techniques have provided models for robotic fault tolerance schemes such as those presented in [35] However, the trend in robotics seems to be to use only those schemes which rely on physical redundancy of components. Many methods of fault tolerance exist which do not alter the ....
R. M. Kieckhafer, C. J. Walter, A. M. Finn, and P. M. Thambidurai. MAFT Architecture for Distributed Fault Tolerance. IEEE Transactions on Computers, 37(4):398--405, April 1988.
....and analysis of DSA are also given in this paper. KEY WORDS Fault tolerant, real time task, efficient scheduling, performance analysis, heuristics, distributed systems 1 INTRODUCTION Many hard real time systems must finish before deadlines despite the presence of hardware and software failures[1][2] 3] These hard real time systems are mission critical systems, which have two characters, one is guaranteeing real time applications to meet timing constraints and another is continuing the specified operations even hardware or software has errors. Many fault tolerant mechanisms have been ....
Kieckhafer, R. M., C. J. Walter, A. M. Finn, and P. M. Thambidurai. The MAFT Architecture for distributed fault tolerance. IEEE Transactions on Computers, 1988, 37(4) : 398-405.
....because the resources are better used, but in general there will be more effort necessary at run time. Therefore, existing parallel systems fix most details of the schedule already at compile time (see e.g. 8] For a more dynamic and also fault tolerant approach see for example the MAFT project [13, 12]. In many cases, the set of tasks that have to be executed is not precisely known at compile time. El Rewini and Ali have introduced a parallel program model that allows a suitable data representation for static scheduling algorithms [4] The representation is based on two directed graphs: the ....
R. Kieckhafer, C. Walter, A. Finn, P. Thambidurai, The MAFT Architecture For Distributed FaultTolerance, IEEE Trans. Computers, April 1988, 398-405.
....methods for achieving fault tolerance in distributed systems. Decentralized voting, in which the replicated voters independently determine the majority rather than relying on a central server to tally the results, has become the strategy of choice, and has had a number of incarnations [7, 8, 9]. Most of these systems have used the 2 phase commit protocol in order to implement the voting scheme. In this protocol, the replicated voters first exchange their votes and independently determine the majority result. Once a final result has been calculated, one of the voters is arbitrarily ....
Kieckhafer, R., Walter, C., Finn, A., Thambidurai, P., "The MAFT Architecture for Distributed Fault Tolerance," IEEE Transactions On Computers, Vol. 37, No. 4, April 1988, pp. 398-405.
....being given to software based, application level implementations. Very tight clock synchronization can be obtained by implementing the time stamping mechanism in hardware, usually in a special purpose network interface. This approach is used, for example, in the MARS [6] 7] and MAFT [8] [9] systems. Kopetz and Ochsenreiter [6] estimate that the error associated with the MARS time stamping mechanism is about 1 s. Other sources of uncertainty contribute to the message delay variation, such as the variable propagation delay of messages and the granularity of local clocks. In the case ....
....of local clocks. In the case of 4 the MARS system, the message delay variation is about 4 s [7] leading to a worst case clock synchronization tightness of 10 s. Although no analysis of message delay variation for MAFT is given, the clock synchronization tightness was reported to be 66 s [8] [9]. If special hardware assistance is not available, an alternative approach consists of implementing the clock synchronization algorithm in the kernel of the operating system. An example of such an implementation is TEMPO [10] Kopetz and Ochsenreiter [6] estimate that the message delay variation ....
R. M. Kieckhafer, C. J. Walter, A. M. Finn and P. M. Thambidurai, "The MAFT Architecture for Distributed Fault-Tolerance," Proceedings of the 19th Fault-Tolerant Computing Symposium, June 1989, pp. 142-149.
....much more in order to check the equivalence of successive executions. 6 Related work Many run time supports for distributed real time applications have been developed, like for example Spring [28] and Maruti [20] Some of them provide fault tolerance mechanisms, like for instance FTMP [16] Maft [17], Mars [18] Delta 4 [3, 10] While Hades shares many similarities with all these projects, for space considerations we focus below only on the environments the most similar to Hades: ARMADA [1] Mars [18] and GUARDS [25] Hades shares many goals with the ARMADA project [1] It aims at developing ....
R.M. Kieckhafer, C.J. Walter, A.M. Finn, and P.M. Thambidurai. The MAFT architecture for distributed fault tolerance. IEEE Transactions on Computers, 37(4):398405, April 1988.
....involves extending the system model of REACT to represent (and analyze) other architectures. REACT currently evaluates the dependability of tightly coupled multiprocessor systems. Although many fault tolerant architectures are multiprocessor based, several have been realized as distributed systems [68, 71, 90] which cannot be modeled within the existing framework of REACT. Given the advantages of distributed computing coupled with recent advances in networking technology, one would expect a growing number of distributed systems to be employed in life and cost critical applications in the near future. ....
Kieckhafer, R. M., Walter, C. J., Finn, A. M., and Thambidurai, P. M., "The MAFT architecture for distributed fault tolerance," IEEE Transactions on Computers, vol. 37, no. 4, pp. 398--405, Apr. 1988. 141
....transmitted and received messages, which may reduce the message delay variation by about 5 orders of magnitude when compared to a time stamping mechanism implemented in software. Examples of system which rely on special hardware support for clock synchronization are MARS [5] 12] and MAFT [13] [14]. The interactive convergence functions used by these systems are FTAA and FTMA, respectively. Worst case clock synchronization tightness in MARS and MAFT is in the order of 10 s and 66 s, respectively. Kernel based implementations can be used alternatively if special hardware assistance is not ....
R. M. Kieckhafer, C. J. Walter, A. M. Finn and P. M. Thambidurai, "The MAFT Architecture for Distributed Fault-Tolerance," Proceedings of the 19th Fault-Tolerant Computing Symposium, June 1989, pp. 142-149.
....along with processors and memory components, the interconnection network is an important component. One of the goals in the design of the interconnection network is to provide reliable communication in the presence of component failures. The approach taken in multicomputers like SIFT [16] and MAFT [26] is to provide a fully connected network, where each node is connected to every other node with dedicated point to point links. Although this method is extremely reliable, it does not scale and can be used only in systems with a small number of nodes. For larger systems, it is necessary to use ....
R. M. Kieckhafer, C. J. Walter, A. M. Finn, and P. M. Thambidurai, "The MAFT architecture for distributed fault tolerance," IEEE Transactions on Computers, vol. C-37, no. 4, pp. 398--405, April 1988.
....processors nodes communicating through an interconnection network. Since, in these applications, communication between nodes is vital even in the presence of failures, direct link between all pairs of nodes and or redundant broadcast buses have been customarily used as the interconnection network [3,4,6,9]. Although, these two interconnection networks are very reliable, they do not scale well to large systems due to their bandwidth limitations. Thus, the use of distributed systems with point to point interconnection network such as hypercubes or meshes have recently gained considerable attention. ....
R. M. Kieckhafer, C. J. Walter, A. M. Finn, and P. M. Thambidurai, "The MAFT architecture for distributed fault tolerance," IEEE Trans. Comput., vol. 37, no. 4, pp. 398--405, April 1988.
....to D3; therefore, it is necessary to vote on the value being written. There is clearly a trade off between the frequency of performing comparisons between channels and the early detection of errors and the overheads (and scheduling constraints) imposed by such comparisons. For example, in MAFT (Keichafer, Walter et al. 1988; Hugue and Stotts 1991) which supports a cooperative computational model, the scheduling table for each site is replicated. Each local scheduler as well as selecting the next thread for executing on its site, also mirrors the operations performed by the other schedulers. Every time a thread is ....
Keichafer, R. M., C. J. Walter, et al. (1988). "The MAFT architecture for distributed fault tolerance." IEEE Transactions on Computers 37(4): 398-404.
.... Transient Recovery Our focus is fault tolerance through active replication of components that may exhibit Byzantine (i.e. uncontrolled or arbitrary) failures, using what is sometimes called the state machine approach [22] in the form introduced by SIFT [28] and subsequently refined by MAFT [11]. 3 A frame synchronous architecture based on state machine replication operates as follows. There are n 3 Schneider s tutorial [22] describes the state machine approach in its client server form. Here, we use the original SIFT form (which is suited to control applications) where both clients ....
....components diagnosed as faulty to continue operation on probation, so that transiently faulty components can be allowed to recover and to repair their state, but in such a way that they do not degrade the fault tolerance of clock synchronization and interactive consistency. The MAFT architecture [11, 25] took a similar overall approach to that proposed here, with its operating set being similar to our core, but without our extended treatment for interactive consistency. Because diagnosis is uncertain, the fault tolerance achieved by the proposed architecture is a stochastic property unlike ....
R. M. Kieckhafer, C. J. Walter, A. M. Finn, and P. M. Thambidurai. The MAFT architecture for distributed fault tolerance. IEEE Transactions on Computers, 37(4):398--405, April 1988.
....(e.g. clock synchronization) already require four or more processors, there seems no compelling reason to use written message protocols. In fact, there is an argument against these protocols which Chris Walter, one of the developers of the MAFT architecture for fault tolerant flight control [15] expressed to us as follows: you have to assume that digital signatures satisfy the requirements for written messages, and in life critical systems we prefer to make as few assumptions as possible. It turns out that this caution is justified. In the rest of the paper, we first describe the ....
.... Byzantine agreement protocols: secure systems that must maintain coordination in the face of capture and active subversion of system components (e.g. the AT T Rampart architecture [24] and safety critical embedded control systems (e.g. the MAFT architecture for aircraft flight control [15]) Sophisticated cryptographic and other attacks are a given in the first class of applications, so our concern about the security of authentication needs no further justification here (the literature is replete with broken cryptographic protocols [1, 21] Intelligent malicious attack is not ....
R. M. Kieckhafer, C. J. Walter, A. M. Finn, and P. M. Thambidurai. The MAFT architecture for distributed fault tolerance. IEEE Transactions on Computers, 37(4):398--405, April 1988.
....are given in Section 4. The other algorithms required are outlined in Section 5 and conclusions presented in Section 6. 2 State Machine Replication with Transient Recovery Our focus is the approach to fault tolerant computing first implemented in SIFT [27] and subsequently refined by MAFT [12] and several other projects (e.g. 13] This has become known as the state machine approach to fault tolerant computing [21] 2 A frame synchronous architecture based on state machine replication operates as follows. There are n replicated major components, generally called channels, that are ....
R. M. Kieckhafer, C. J. Walter, A. M. Finn, and P. M. Thambidurai. The MAFT architecture for distributed fault tolerance. IEEE Transactions on Computers, 37(4):398-- 405, April 1988.
....Designing systems that support such demanding applications is hard because a triple goal must be reached. First, such systems have to be highly reliable since, for example, the maximum acceptable probability of failure typically ranges from 10 Gamma4 to 10 Gamma10 per hour [RSL95, KWFT88] Second, these systems have to meet strict timeliness requirements since their typical response times range from 1ms to 100ms, depending on the criticality of the application [MRS 90, RSL95] Third, these systems must enforce data consistency due to the concurrent executions of multiple ....
....the particular requirements of their application domain. Such solutions are thus specific, inflexible and dedicated to a single application domain. Furthermore, they are often based on specialized and costly hardware, making the implemented software seldom reusable in a different context [RSL95, KWFT88] To significantly decrease the global cost of such solutions, HADES provides a toolkit offering basic services needed by most distributed safety critical real time applications. The services offered implement basic functionalities (e.g. scheduling policy, time bounded and reliable ....
[Article contains additional citation context not shown here]
R. Kieckhafer, C Walter, A. Finn, and P. Thambidurai. The MAFT architecture for distributed fault tolerance. IEEE Transactions on Computers, 37(4):398--405, April 1988.
.... speaking, these techniques rely on one of the following two approaches [10] One approach is to have adequate spare capacity in the system so that the tasks can be reassigned or re executed on fault free processors upon detection of a failure without violating the deadline constraints of any task [11]. The main drawback of this approach is that the system resources are often under utilized when no faults are present. The other approach is to invoke an overload management technique upon detection of a failure. For example, one can prioritize tasks based on their importance to the application ....
R. M. Kieckhafer, C. J. Walter, A. M. Finn, and P. M. Thambidurai, "The MAFT architecture for distributed fault tolerance," IEEE Transactions on Computers, vol. 37, pp. 398--405, April 1988.
....where the system activities are initiated as a consequence of the occurrence of external or internal events. Event triggered real time architectures are assumed to provide a high degree of flexibility and have therefore received considerable attention in the literature (FTPP [4] ARTS [27] MAFT [7]) Because of their event triggered nature, however, an excessive number of possible behaviors must be analyzed in order to establish timeliness guarantees. Furthermore, the implementation of active redundancy by the replication of the components is hard because of the issue of replica determinism ....
R. M. Kieckhafer, C. J. Walter, A. M. Finn, and P. M. Thambidural. The MAFT architecture for distributed fault tolerance. IEEE Transactions on Computers, 37(4):398--404, Apr. 1988.
....because the resources are better used, but in general there will be more effort necessary at run time. Therefore, existing parallel systems fix most details of the schedule already at compile time (see e.g. 7] For a more dynamic and also fault tolerant approach see for example the MAFT project [12, 11]. In many cases, the set of tasks that have to be executed is not precisely known at compile time. El Rewini and Ali have introduced a parallel program model that allows a suitable data representation for static scheduling algorithms [3] The representation is based on two directed graphs: the ....
R. Kieckhafer, C. Walter, A. Finn, P. Thambidurai, The MAFT Architecture For Distributed Fault-Tolerance, IEEE Trans. Computers, April 1988, 398-405.
....Symmetric faults deliver wrong values but do so consistently. Manifest faults are those that can be detected by all nonfaulty receivers. 2 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 21, NO. 2, FEBRUARY 1995, pp. 107 125 including one (called MAFT) by a manufacturer of flightcontrol systems [6]. These fault tolerant architectures must be able to withstand multiple faults, and it can require an excessive amount of redundancy to do this if failed channels are left operating (e.g. seven channels are required to withstand two simultaneously active Byzantine faults) Reconfiguration to ....
R. M. Kieckhafer, C. J. Walter, A. M. Finn, and P. M. Thambidurai, "The MAFT architecture for distributed fault tolerance ", IEEE Transactions on Computers, vol. 37, no. 4, pp. 398--405, Apr. 1988.
....automotive equipments. Designing systems supporting such demanding applications is difficult because they must reach a triple goal. First, these systems have to be highly reliable to meet the maximum acceptable probability of failure ranging from 10 Gamma4 to 10 Gamma10 failures per hour [22, 14]. Second, they have to ensure strict timeliness requirements to guarantee response times typically ranging from 1 to Proc. of the 2nd IEEE International Symposium on Object oriented Real time distributed Computing, may 2 5 1999, Saint Malo, France. This work is partially supported by the French ....
....in terms of resources consumption, specific scheduling needs, or faulttolerance needs. Existing solutions are therefore dedicated to a single application domain, and often based on specialized (and costly) hardware, making the implemented software seldom reusable in a different context [22, 14]. The HADES project (Highly Available Distributed Embedded System) 2, 20] developed at IRISA addresses these issues and provides an environment for the development and the execution of distributed dependable hard real time applications. Application development in HADES relies on the use of an ....
[Article contains additional citation context not shown here]
R. Kieckhafer, C. Walter, A. Finn, and P. Thambidurai. The MAFT architecture for distributed fault tolerance. IEEE Trans. on Computers, 37(4):398--405, Apr. 1988.
.... Byzantine failures was recognized and a Byzantine agreement protocol was developed to deal with it [11, 16] Following SIFT, several systems, such as FTP (Fault Tolerant Processor) 9, 10] FTPP (Fault Tolerant Parallel Processors) 3] and MAFT (Multicomputer Architecture for Fault Tolerance) [6, 15], implemented a Byzantine resilient core to handle malicious failures. However, each of these existing approaches has some of the following limitations: 1) a significant overhead in the number of messages required by the execution protocol, 2) a relatively complex recovery protocol, 3) the ....
....tightly synchronized (TS) or loosely synchronized (LS) operation. Considering processors and memory units independently, we have TS TS systems, such as C. vmp [14] and FTMP [4] that require both the processors and memory units to operate in lock step, LS LS systems, such as SIFT [16] and MAFT [6], that allow both to operate in a loosely synchronized way, TS LS systems that require tightly synchronized processors but loosely synchronized memory operations, and LS TS systems that require loosely synchronized processors but tightly synchronized memory units. Due to the potential of having ....
[Article contains additional citation context not shown here]
R.M. Kieckhafer, C.J. Walter, A.M. Finn, and P.M. Thambidurai, "The MAFT architecture for distributed fault tolerance," IEEE Trans. on Computers, Vol. 37, No. 4, April 1988, pp. 398-405.
....elements. Thus, network faults and processing element faults can be overcome by this architecture. Some multiprocessor architectures with processor fault tolerance have been developed for various uses, including realtime computing. One such architecture, MAFT, was developed by Kieckhafer, et al. [11]. MAFT was designed to provide high performance and reliability for a wide range of real time applications. The performance requirements for a commercial flight control system were the minimum design goals for this architecture. This design separates executive functions (such as internode ....
....meaning that as new data is received, it is compared with data already at the node. When the paper was published, two prototypes were being implemented. Four of six planned nodes had been assembled and operated as a system for one version of MAFT. The architecture will support up to eight nodes [11]. There are several differences between our design and MAFT. One important difference is that MAFT has replicated memory for fault tolerance, and message passing. This memory architecture has the difficulty and overhead of communicating data between nodes. Also, due to memory replication, ....
R.M. Kieckhafer, C.J. Walter, A.M. Finn, and P.M. Thambidurai. The MAFT Architecture for Distributed Fault Tolerance. IEEE Transactions on Computers, 37(4):398--405, April 1988.
....focus validation obligations on a minimum set of critical components; 2. re use of already validated components in different instances; and 3. the support of application components of different criticalities. Drawing on experience from systems such as SIFT (Melliar Smith and Schwartz, 1982) MAFT (Keickhafer et al. 1988), FTPP (Harper and Lala, 1990) and Delta 4 (Powell(Ed. 1991) the generic architecture is defined along three axes (see Figure 1) Powell, 1997) ffl the channel axis (C) channels provide the primary hardware fault containment regions; it is possible to configure instances of the architecture ....
....consistency agreement protocols can be significant. There is therefore a need to keep their use to a minimum. There is clearly a trade off between the frequency of performing comparisons between channels and the overheads (and scheduling constraints) imposed by such comparisons. In MAFT (Keickhafer et al. 1988; 2 This is adequate because it is assumed that the number of hosts within a channel is small. Hugue and Stotts, 1991) which supports a cooperative computational model, the scheduling table for each site is replicated for fault tolerance. Each local scheduler, as well as selecting the next ....
Keickhafer, R., Walter, C., Finn, A. and Thambidurai, P. (1988). The MAFT architecture for distributed fault tolerance, IEEE Transactions on Computers 37 A02(4): 398--404.
....Langley Research Center, under contract NAS1 18969. 1 Introduction The general design outline of a reliable computing platform for ultra critical applications was established in the late 1970s and early 1980s by the SIFT architecture [8, 32] and later refined in the FTP [12] and MAFT [11] architectures: the system workload is executed by several independent processors in approximate synchrony, and the results are subjected to exact match majority voting. Clock synchronization, and also the distribution of single source data such as sensor samples, is performed in a manner that is ....
....of authentication when within the competence of the basic algorithm. comes from the C. S. Draper Laboratories [12] and the hybrid fault model was developed at the Aerospace Technology Center of Allied Signal for their MAFT ( Multicomputer Architecture for Fault Tolerance ) architecture [11]. Prototypes were constructed for both architectures, and they and their successors are being considered, evaluated, or used for safety and control applications in nuclear plants, aircraft, helicopters, submarines, and rockets. Although the architecture, algorithm, and fault model investigated ....
R. M. Kieckhafer, C. J. Walter, A. M. Finn, and P. M. Thambidurai. The MAFT architecture for distributed fault tolerance. IEEE Transactions on Computers, 37(4):398--405, April 1988.
....limited fault coverage. Therefore, active replication is not used for mission critical applications. On the other hand, passive replication masks faults by removing their effects. Faults are masked by executing voting algorithms that select the most reliable response from the replicated computers [12, 7]. Redundancy management is necessary to synchronize the execution of multiple computers into a common clock and to vote on data to detect and mask faults. However, managing the redundancy requires overhead to keep consistency between replicas and this overhead can increase the complexity of the ....
....to keep consistency between replicas and this overhead can increase the complexity of the application development process. The AlliedSignal research team has developed the Multi computer Architecture for Fault Tolerance (MAFT) to support the development of real time mission critical applications [12, 18]. The philosophy used in the MAFT architecture is to separate redundancy management and fault tolerance support from the applications (e.g. control functions, etc. so that the overall development complexity and effort of dependable systems can be reduced. The architecture is scalable to support ....
[Article contains additional citation context not shown here]
R. Kieckhafer, C. Walter, A. Finn, and P. Thambidurai. The maft architecture for distributed fault tolerance. IEEE Transactions on Computers, pages 398--405, April 1988.
....Such redundancy can be a replica used in case of failure to supply the same function. A technique known as passive replication is used to mask faults by removing their effects. Faults are masked by executing voting algorithms that select the most reliable response from the replicated computers [10]. Redundancy management is necessary to synchronize the execution of multiple computers into a common clock and to vote on data to detect and mask faults. However, managing the redundancy requires overhead to keep consistency between replicas and this overhead can increase the complexity of the ....
....to keep consistency between replicas and this overhead can increase the complexity of the application development process. The AlliedSignal research team has developed the Multicomputer Architecture for Fault Tolerance (MAFT) to support the development of real time mission critical applications [10, 15]. The philosophy used in the MAFT architecture is to separate redundancy management and faulttolerance support from the applications (e.g. control functions, etc. so that the overall development complexity and effort of dependable systems can be reduced. The architecture is scalable to support ....
[Article contains additional citation context not shown here]
R. Kieckhafer, C. Walter, A. Finn, and P. Thambidurai. The MAFT architecture for distributed fault tolerance. IEEE Transactions on Computers, pages 398-- 405, April 1988.
....this paper, we concentrate only on the fault tolerant scheduling of non preemptive tasks in real time systems. For this reason, in the above short survey, we have not discussed systems that have been built with the aim of providing real time fault tolerance but do not study scheduling specifically [33, 10, 12]. For the same reason, we have not discussed scheduling techniques that provide fault tolerance for preemptive real time tasks [26, 2, 22] 3 The Fault Tolerant Scheduling Problem In this section we describe the system, fault, and task models used in this paper. We also introduce the approach ....
R.M. Kieckhafer, C.J. Walter, A.M. Finn, and P.M. Thambidurai. The MAFT Architecture for Distributed Fault Tolerance. IEEE Trans on Computers, 37(4):398--405, April 1988.
....as replicas execute in parallel, failures do not incur an additional overhead. It seems that one could use active replication with the piecewise deterministic computation model by reaching a consensus [BMD93] on events. In fact, several systems employ this method, see e.g. SIFT [WLG 78] MAFT [KWFT88] or MARS [KFG 93] However, these are special purpose reliable hard real time systems, not general systems. They are synchronous systems and use severely constrained tasking models, where inter task communication is strongly reduced and task scheduling is restricted or even performed ....
Kieckhafer, R. M.; Walter, C. J.; Finn, A. M.; Thambidural, P. M.: "The MAFT Architecture for Distributed Fault Tolerance", IEEE Transactions on Computers 37(4), Apr. 1988, pp. 398 -- 405.
....Tho79] and majority weighted voting [Gif79] In this technique, the computation must be deterministic or agreement protocols have to be implemented between replicas in order to ensure that all replicas reach the same decisions. Some examples of such replication techniques are CIRCUS [Coo84] MAFT [KTWF88] and SIFT [LWGG 78] Replicated Application inputs outputs Time replica i replica j replica k Voter Computation Synchronization task Figure 3.2: Active Replication The active replication technique can detect and mask all kind of failures including Byzantine failures. The voter is a single ....
R.M. Kieckhafer, P.M. Thambidurai, C.J. Walter, and A.M. Finn. The MAFT architecture for distributed fault-tolerance. IEEE Transactions on Computers, Vol.37, No.4, pp:394-405, 1988.
....faults. Thambidurai and Park present a modification to the Oral Messages algorithm of Lamport, Pease and Shostak [6] that achieves consensus under their fault model; this algorithm is employed in the MAFT ( Multicomputer Architecture for Fault Tolerance ) system for flight control applications [4]. Unfortunately, the algorithm and its proof of correctness are flawed (though its implementation in MAFT is not) The flaw was detected through a failed attempt at formal verification by Lincoln and Rushby, who then developed a corrected algorithm [8] and a mechanically checked formal ....
R. M. Kieckhafer, C. J. Walter, A. M. Finn, and P. M. Thambidurai. The MAFT architecture for distributed fault tolerance. IEEE Transactions on Computers, 37(4):398--405, April 1988.
....Each node participates in both the cluster level and inter cluster level convergence and consistency functions. The architecture supports a variety of synchronization primitives including those described in [3] 4] 12] 15] 11] We assume algorithm variants of those used in the MAFT[9]. Virtually all existing convergence algorithms are based on fully connected structures, in which each non faulty participant derives the same set of clock values for all other nodes. The achievable synchronization skew for an N node system is ffi . The communication cost is O(N 2 ) and each ....
R. Kieckhafer, C. Walter, A. Finn, and P. Thambidurai, "The MAFT architecture for distributed fault tolerance," IEEE Trans. on Computers, vol. 37, pp. 398--405, Apr 1988.
No context found.
R. M. Kieckhafer, C. J. Walter, A. M. Finn, and P. M. Thambidurai. The MAFT architecture for distributed fault tolerance. IEEE Transactions on Computers, 37(4):398--405, April 1988.
No context found.
R. M. Kieckhafer, C. J. Walter, A. M. Finn, and P. M. Thambidurai. The MAFT architecture for distributed fault tolerance. IEEE Transactions on Computers, 37(4):398--405, April 1988.
No context found.
R. M. Kieckhafer, C. J. Walter, A. M. Finn, and P. M. Thambidurai. "The MAFT Architecture for Distributed Fault Tolerance". IEEE Transactions on Computers, 37(4):398--405, April 1988.
No context found.
R. M. Kieckhafer, C. J. Walter, A. M. Finn, and P. M. Thambidurai. "The MAFT Architecture for Distributed Fault Tolerance ". IEEE Transactions on Computers, 37(4):398--405, April 1988.
No context found.
R. M. Kieckhafer, C. J. Walter, A. M. Finn, and P. M. Thambidurai. The MAFT architecture for distributed fault tolerance. IEEE Transactions on Computers, 37(4):398--405, April 1988.
No context found.
R.M. Kieckhafer, P.M. Thambidurai, C.J. Walter, and A.M. Finn. "The MAFT Architecture for Distributed FaultTolerance ". IEEE Transactions on Computers. Vol. 37, Nr. 4, 1988, pages 394--405.
No context found.
R. M. Kieckhafer, C. J. Walter, A. M. Finn, and P. M. Thambidural. The MAFT architecture for distributed fault tolerance. IEEE Transactions on Computers, 37(4):398--404, Apr. 1988.
No context found.
Kieckhafer, R., Walter, C., Finn, A., Thambidurai, P., "The MAFT Architecture for Distributed Fault Tolerance, " IEEE Transactions On Computers, Vol. 37, No. 4, April 1988, pp. 398-405.
No context found.
R.M. Kieckhafer, C.J. Walter, A.M. Finn, and P.M. Thambidurai, The MAFT Architecture for Distributed Fault Tolerance, IEEE Transactions on Computers 37 (4), April 1988, 398-405.
No context found.
Roger M. Kieckhafer, et al, The MAFT Architecture for Distributed Fault Tolerance, IEEE Transaction on Computers, Vol. 37, No. 4, April 1988, pp. 398 -- 405. 46
No context found.
Kieckhafer, R. et al, "The MAFT architecture for distributed fault tolerance", IEEE Transactions on Computers, 37(4):398--405, April 1988.
No context found.
R.M. Kieckhafer, C.J. Walter, A.M. Finn and P.M. Thambidurai (1988) The MAFT Architecture for distributed fault tolerance. IEEE Transactions on Computers 37, 398-405.
No context found.
R.M. Kieckhafer, C.J. Walter, A.M. Finn, and P.M. Thambidurai. The MAFT Architecture for Distributed Fault Tolerance. IEEE Transactions on Computers, 37(4):398-- 405, April 1988.
No context found.
Kieckhafer, R.M., C.J. Walter, A.M. Finn, and P.M. Thambidurai. "THe MAFT Architecture for distributed fault tolerance," IEEE Transactions on Computers, Vol. 37, No. 4, April 1988, pp. 398-405.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC