| Powell, D., "Distributed Fault Tolerance: Lessons from Delta-4", IEEE Micro, Feb. 1994, pp. 36-47. |
.... system kernel and employing memory mapping techniques (such as copy on write or incremental checkpointing) to make it efficient [1, 2] The second approach is to implement checkpointing services as user level routines, which can then be tailored to copy only the critical data of the application [3, 4, 5]. Although both approaches are complementary, there is no implementation that combines their advantages, since current operating system kernels do not provide suitable support for implementing checkpointing and rollback recovery at the application layer. As it is suggested in [6] incremental and ....
....of replicated data, other requirements of faulttolerant applications should also be considered. In particular, when using modular redundancy, tasks such as voting, selection or reordering of messages are performed either by library code at senders or receivers, or by intermediate processes [14, 5, 15, 2, 16]. Hence, it is important that IMs can be handled efficiently by these intermediate agents. Furthermore, reordering of messages by library code is not possible with IMs as they have been described so far, since the data copy is updated when the message is received. To cope with these requirements, ....
D. Powell, "Distributed Fault Tolerance: Lessons from Delta-4", IEEE Micro, February 1994, pp. 36-47.
....and co located with replicas of the distributed registry DR. This excludes the possibility that partitions separate RM and DR replicas, which would prevent the system from making progress since RM depends on DR. To ensure consistent behavior of the replicated RM, a semi active replication [25] scheme is used, together with a state merging protocol to reconcile divergent states. As mentioned in Section 2, RM provides an external interface through which object groups can be installed. RM exploits this mechanism to bootstrap itself, making use of the replica distribution scheme to obtain ....
....will become clear in Sections 4.6 and 4.8. Delivery of these events does not require any total ordering guarantees among the RM replicas, making their implementation very simple. As mentioned in Section 4. 2, RM uses a semi active like replication scheme for rendering it fault22 tolerant [25]. In this scheme, all RM replicas receive method invocations, but only the leader replica actually performs actions that produce output. That is, only a single RM replica will actually create group object replicas on behalf of applications, while the state of follower RM replicas are kept ....
D. Powell. Distributed Fault Tolerance: Lessons from Delta-4. IEEE Micro, pages 36--47, Feb. 1994.
....tolerance of node failures.Network level fault tolerance is based on dependable distributed computing technologies. The aim of the section is two view these as competing options for achieving dependability in smart networks. Hence, too) little attention is paid to the in between systems, e.g. [34, 30], which uses a specially designed transport network to ensure consistencies between replica. 4.1 Fault tolerant network nodes This strategy is illustrated in Figure 10, where each node in the network, and the services it delivers, are made fault tolerant by having redundant computing and or ....
David Powell, Distributed Fault Tolerance: Lessons from Delta-4", IEEE Micro, February 1994, pp. 36 - 47.
....use objects. Some US approaches have not taken the object approach, but have focused on structured development for fault tolerance [160] A middleware approach was adopted by the European Delta 4 project for tolerating hardware faults, using passive, semi active and active replication of objects [157]. The MIT Actor model similarly pursued a software implementation for tolerance of hardware faults by extending an active object model [5] Work from the PDCS2 project [164, 219] is SOTA as it improves on other approaches by tolerating design faults too. The SOTA work on object oriented ....
D. Powell. Distributed Fault-Tolerance - Lessons from Delta-4. IEEE Micro, 14(1), 1994.
....units do not know whether the requests are legitimate or not, they have to store the requests till they have accumulated a majority of identical requests. This can easily flood the memory and create storage problems. Delta 4 was designed to support concurrent task execution by a different approach [1, 12]. It uses a special hardware network interface unit [12] to prevent an inconsistent message from being sent to different processors, thus limiting the scope of malicious behaviors. However, the scheme requires the support of an atomic broadcast that is relatively expensive to implement. An ....
....they have to store the requests till they have accumulated a majority of identical requests. This can easily flood the memory and create storage problems. Delta 4 was designed to support concurrent task execution by a different approach [1, 12] It uses a special hardware network interface unit [12] to prevent an inconsistent message from being sent to different processors, thus limiting the scope of malicious behaviors. However, the scheme requires the support of an atomic broadcast that is relatively expensive to implement. An alternative approach is the leader follower approach [1] It ....
[Article contains additional citation context not shown here]
D. Powell, "Distributed fault tolerance: Lessons from Delta-4," IEEE Micro, Feb. 1994, pp. 36-47.
....to contain faults and to crash silently if a fault should occur on the computer. 3 Related Work The most common way to make important programs like system management objects dependable is to use replicas that operate on more than one computer in the system. This method is used in the Delta 4 [2, 3, 4, 5] system developed under the European Strategic Programme for Research in Information Technology (ESPRIT) OSF DCE [6] uses replicas of the components, although their use is not only for dependability. Primarily, they are used as caches. Other systems deploying replicas have been reviewed in [7] ....
D. Powell. Distributed Fault Tolerance: Lessons from Delta-4. IEEE Micro, pp.36--47, February 1994.
....problems caused by replicated component interactions are thus avoided. Server replication client replication Fortunately, some systems like MANETHO [Elnozahy 92] extended AMOEBA [Wood 93] Many to Many Remote Procedure Call [Welling 92] CopyCat [Ladin 92] CIRCUS [Cooper 85] and DELTA 4 [Powell 94] consider both server replication and client replication. However, most of these systems focus on a single replication policy. MANETHO and recent work in AMOEBA implement support for the coordinator cohort [Birman 85] replication policy. Many to Many Remote Procedure Call focuses on passive ....
D. Powell. Distributed Fault Tolerance: Lessons from Delta-4. IEEE Micro, pages 37--47, February 1994.
.... the close interrelationships between solutions to agreement problems and broadcast capabilities are the Advanced Automation System [Benel eta[ 1989, Cristian et 1990] Amoeba [Tanenbaum eta[ 1990, Kaashoek Tanenbaum 1991] Consul [Mishra et 1992, Mishra eta[ 1993] Delta 4 [Powell eta[ 1988, Powell 1994] Horus [van Renesse et al. 1995, van Renesse et al. 1996] Newtop [Ezhilchelvan et al. 1994] In fact, they represent systems where the service layers of broadcast communication and membership maintenance are intertwined together. 4. Synchronous systems In a synchronous system there exist ....
D. Powell, "Distributed Fault Tolerance: Lessons from Delta-4", IEEE Micro, 14 (1), pp.36-47, February 1994.
....re use of already validated components in different instances; and the support of system and application components of different criticalities. Drawing on experience from systems such as SIFT [Melliar Smith Schwartz 1982] MAFT [Kieckhafer et al. 1988] FTPP [Harper Lala 1990] and Delta 4 [Powell 1994], the generic architecture is currently defined along three axes ( Figure 1) Powell 1996] the channel axis: channels provide the primary hardware fault containment regions; it should be possible to configure instances of the architecture with 1 to 4 channels; the intra channel or ....
D. Powell, "Distributed Fault-Tolerance --- Lessons from Delta-4", IEEE Micro, 14 (1), pp.36-47, February 1994.
....focus validation obligations on a minimum set of critical components; re use of already validated components in different instances; and the support of system and application components of different criticalities. Drawing on experience from systems such as SIFT [6] MAFT [5] FTPP [4] and Delta 4 [7], the generic architecture is currently defined along three axes (Figure 1) 8] ffl the channel axis: channels provide the primary hardware fault containment regions; it should be possible to configure instances of the architecture with 1 to 4 channels; ffl the intra channel or multiplicity ....
D. Powell, "Distributed Fault-Tolerance - Lessons from Delta-4", IEEE Micro, 14 (1), pp.36-47, February 1994.
No context found.
Powell, D., "Distributed Fault Tolerance: Lessons from Delta-4", IEEE Micro, Feb. 1994, pp. 36-47.
No context found.
D. Powell. Distributed fault tolerance: Lessons from delta-4. IEEE Micro, 14(1):36--47, February 1994.
No context found.
D. Powell, "Distributed Fault-Tolerance---Lessons from Delta-4," IEEE Micro, vol. 14, no. 1, pp. 36-47, Feb. 1994.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC