| P. Ver'issimo and Jos'e A. Marques. Reliable broadcast for fault-tolerance on local computer networks. In Proceedings of the Ninth Symposium on Reliable Distributed Systems, Huntsville, Alabama-USA, October 1990. IEEE. Also as INESC AR/24-90. |
....are bounded. pendability (e.g. broadcast multicast, bounded number of omission errors, bounded transmission delay) Although designed for LANs, xAMp does not depend on a given local area network in particular. This was achieved by defining an abstract network interface, discussed with detail in [18]. We summarize its properties in table 2. Having our protocols tuned for LANs does not mean we have overlooked the problem of interconnected networks. We argue that in an interconnected networking scenario protocols can be more efficient if they rely on low level local protocols that recognize ....
....the problem of interconnected networks. We argue that in an interconnected networking scenario protocols can be more efficient if they rely on low level local protocols that recognize important properties of the local networks. Our work has provided efficient solutions for the local scope [18], that we are now extending to interconnected networks [22] Protocol design assumes that communication components have a fail silent behavior. When high coverage is required, the use of self checking components must substantiate this assumption. Tests performed in the Delta 4 project have shown ....
[Article contains additional citation context not shown here]
P. Ver'issimo and Jos'e A. Marques. Reliable broadcast for fault-tolerance on local computer networks. In Proceedings of the Ninth Symposium on Reliable Distributed Systems, Huntsville, Alabama-USA, October 1990. IEEE. Also as INESC AR/24-90.
....Sect. 5 summarizes and concludes the paper. 2 Overview of the Electra Toolkit Currently, several distributed control technologies (DCT) exist, mostly in the form of C programming libraries, which aid programmers in building reliable, directly distributed systems. Examples are Amoeba [23] Delta 4 [26], Horus [25] Isis [5] Psync [18] and Transis [1] However, the methods and tools offered by such DCT are often difficult to use by people lacking many years of experience in building distributed systems, since the provided functions are mostly low level and not expressive enough for modelling ....
Ver' issimo, P., and Marques, J. A. Reliable Broadcast for Fault-Tolerance on Local Computer Networks. In 9th Symposium on Reliable Distributed Systems (1990), IEEE. This article was processed using the L a T E X macro package with LLNCS style
....are delivered. For example, processes may have to deliver all messages in the same order. Systems and applications based on faulttolerant broadcasts include SIFT [WLG 78] State Machines [Lam78a,Sch90] Atomic Commitment [BT93] Isis [BJ87,BCJ 90] Psync [PBS89] Amoeba [Kaa92] Delta 4 [VM90] Transis [ADKM92] Highly Available System [Cri87] and Advanced Automation System [CDD90] Another paradigm that simplifies the task of designing fault tolerant distributed applications is Consensus. Roughly speaking, Consensus allows processes to reach a common decision that depends on their ....
....takes place over a single shared channel that connects all processes. In such a network a process can broadcast a message to all other processes. Examples are Ethernet, Token Bus, Token Ring, and FDDI networks. Other types of networks include redundant broadcast channel networks (e.g. Delta 4 [VM90] and [Cri90b] packet radio networks (e.g. ALOHA [Abr85] switch based networks (e.g. AN2 [Owi93] etc. Many of the results in this paper are independent of the type of communication network. When we need to focus on a particular type of network we concentrate on point to point ones. This ....
[Article contains additional citation context not shown here]
Paulo Ver'issimo and Jos'e A. Marques. Reliable broadcast for fault-tolerance on local computer networks. In Proceedings of the Ninth Symposium on Reliable Distributed Systems, Huntsville, Alabama-USA, October 1990. IEEE.
....example, we call the send time variance, Delta Gamma send , the send error. In order not to depend on a particular network, the best approach is to define an abstract broadcast network, such that standard local area networks or their variants are represented[15, 16, 8, 21] The network model of [36] is followed, though modified to be more generic. Properties BNP 1 (Broadcast) Nodes receiving an uncorrupted message transmission, receive the same message 3 . BNP 2 (Error detection) Nodes detect any corruption done by the network in a locally received message and discard it. The network is ....
....select only one broadcast. No particular protocol is required, as long as election is reached in a known bounded time. Fault tolerant agreement protocols are well known and can be easily found in the literature, although existing reliable broadcast protocols for broadcast networks are recommended[36, 7]. The election will be detailed ahead in section 4.3. 4.2 Achieving precision The first phase of the algorithm (Figure 5) is very similar to the algorithm of [35] Let us further define: T , re synchronization period; r q , next synchronization round, from q s perspective a round r q starts ....
P. Ver'issimo and Jos'e A. Marques. Reliable broadcast for fault-tolerance on local computer networks. In Proceedings of the 9th Symposium on Reliable Distributed Systems, Huntsville, Alabama-USA, October 1990. IEEE. Also as INESC AR/24-90.
....(iii) the results presented here can be extended to arbitrary failures, by using signatures [28] and redundant broadcast channels. The network is a single channel broadcast local network, as detailed ahead, with the following failure semantics: ffl the network components: i) are weak fail silent[29], confined to crash if they exceed a given number of omission failures (otherwise behaving correctly) ii) have a bound f o on the number of omission failures they can produce during a protocol execution. It is possible to put a bound on the time to send a message, to process a received message ....
....propagation error receive error Figure 2. Network timing properties In order not to depend on a particular network, the best approach is to define an abstract broadcast network, such that standard local area networks or their variants are represented [9] 10] 6] 15] The network model of [29] is followed, though modified to be more generic. The abstract network components are: the channel, which comprises the passive medium and the interfacing electronics; and the adapter, comprising the low level network protocols, implemented partly in VLSI partly in firmware. The abstract ....
[Article contains additional citation context not shown here]
P. Ver'issimo and Jos'e A. Marques. Reliable broadcast for fault-tolerance on local computer networks. In Proceedings of the Ninth Symposium on Reliable Distributed Systems, Huntsville, Alabama-USA, October 1990. IEEE. Also as INESC AR/24-90.
....generate a simultaneous event at all correct processes in the system. In order not to depend on a particular network, the best approach is to define an abstract broadcast network, such that standard local area networks or their variants are represented [9,10,6,15] The network model of [23] is followed, though modified to be more generic. The abstract network components are: the channel, which comprises the passive medium and the interfacing electronics; and the adapter, comprising the lowlevel network protocols, implemented partly in VLSI partly in firmware. The abstract broadcast ....
.... 9M; p s:t: #M f o 8 b2M ; p 2 F b m : remove p from P i m ; 70 A b m = P i m : add b to Dm ; 80 end; case Figure 1: Detecting a simultaneous broadcast and can be easily found in the literature, although existing reliable broadcast protocols for broadcast networks are recommended [23,5]. Given that any simultaneous broadcast will do, agreement may be started immediately after detection of the first simultaneous broadcast. 5.2 Achieving precision The first phase of the algorithm (figure 2) is very similar to the algorithm of [22] When vc i Gamma1 m (t) iT , processor m ....
P. Ver'issimo and Jos'e A. Marques. Reliable broadcast for fault-tolerance on local computer networks. In Proceedings of the Ninth Symposium on Reliable Distributed Systems, Huntsville, Alabama-USA, October 1990. IEEE. Also as INESC AR/2490.
....be enhanced to recover from the others, if reliable real time operation is desired. However, the recovery process takes time, so in the meantime the LAN is partitioned. Let us call them periods of inaccessibility, to differentiate from classical partitions. The definition of inaccessibility in [9] is summarised here: Certain kinds of components may temporarily refrain from providing service, without that having to be necessarily considered a failure. That state is called inaccessibility. It can be made known to the users of the component; limits are specified (duration, rate) ....
P. Ver'issimo and Jos'e A. Marques. Reliable broadcast for fault-tolerance on local computer networks. In Proceedings of the Ninth Symposium on Reliable Distributed Systems, Huntsville, Alabama-USA, October 1990. IEEE. Also as INESC AR/24-90.
....based on a very simple idea: if one knows for how long a network is partitioned, and if those periods are acceptably short, real time operation of the system is possible. Let us call them periods of inaccessibility, to differentiate from classical partitions. The definition of inaccessibility in [13] is summarised here: Certain kinds of components may temporarily refrain from providing service, without that having to be necessarily considered a failure. That state is called inaccessibility. It can be made known to the users of the component; limits are specified (duration, rate) ....
P. Ver'issimo and Jos'e A. Marques. Reliable broadcast for fault-tolerance on local computer networks. In Proceedings of the Ninth Symposium on Reliable Distributed Systems, Huntsville, Alabama-USA, October 1990. IEEE. Also as INESC AR/24-90.
....at the tail of the FIFO queue until consumed by the state machine, however urgent they may be. A classical example of an agreement protocol is the well known Byzantine Agreement [9] although many other suitable protocols exist under the more general classification of reliable broadcast protocols [5,1,17]. The requirement for state machine command preemption can be avoided by appropriate application re design, as discussed in the next section. Based on the experience of the authors with the DELTA 4 system [12] this is undesirable, since it is a case by case approach, likely to introduce ....
P. Ver'issimo and Jos'e A. Marques. Reliable Broadcast for Fault-Tolerance on Local Computer Networks. Technical Report RT/67-89, revised with nr RT/14-90, INESC, Lisboa, Portugal, December 1989. 19
....based on a very simple idea: if one knows for how long a network is partitioned, and if those periods are acceptably short, real time operation of the system is possible. Let us call them periods of inaccessibility, to differentiate from classical partitions. The definition of inaccessibility in [14] is summarised here: Certain kinds of components may temporarily refrain from providing service, without that having to be necessarily considered a failure. That state is called inaccessibility. It can be made known to the users of the component; limits are specified (duration, rate) violation ....
P. Ver'issimo and Jos'e A. Marques. Reliable broadcast for fault-tolerance on local computer networks. In Proceedings of the Ninth Symposium on Reliable Distributed Systems, Huntsville, Alabama-USA, October 1990. IEEE. Also as INESC AR/24-90.
....generate a simultaneous event at all correct processes in the system. In order not to depend on a particular network, the best approach is to define an abstract broadcast network, such that standard local area networks or their variants are represented [9,10,6,15] The network model of [24] is followed, though modified to be more generic. The abstract network components are: the channel, which comprises the passive medium and the interfacing electronics; and the adapter, comprising 8 Obviously taken in the sense of accuracy preservation, which for internal synchronization means ....
....The algo6 rithm does not depend on any particular protocol, as long as agreement is reached in a known bounded time. Fault tolerant agreement protocols are wellknown and can be easily found in the literature, although existing reliable broadcast protocols for broadcast networks are recommended [24,5]. Given that any simultaneous broadcast will do, agreement may be started immediately after detection of the first simultaneous broadcast. 5.2 Achieving precision The first phase of the algorithm (figure 2) is very similar to the algorithm of [23] When vc i Gamma1 m (t) iT , processor m ....
P. Ver'issimo and Jos'e A. Marques. Reliable broadcast for fault-tolerance on local computer networks. In Proceedings of the Ninth Symposium on Reliable Distributed Systems, Huntsville, Alabama-USA, October 1990. IEEE. Also as INESC AR/24-90.
....based on a very simple idea: if one knows for how long a network is partitioned, and if those periods are acceptably short, real time operation of the system is possible. Let us call them periods of inaccessibility, to differentiate from classical partitions. The definition of inaccessibility in [14] is summarised here: Certain kinds of components may temporarily refrain from providing service, without that having to be necessarily considered a failure. That state is called inaccessibility. It can be made known to the users of the component; limits are specified (duration, rate) violation ....
P. Ver'issimo and Jos'e A. Marques. Reliable broadcast for fault-tolerance on local computer networks. In Proceedings of the Ninth Symposium on Reliable Distributed Systems, Huntsville, Alabama-USA, October 1990. IEEE. Also as INESC AR/24-90.
....they do not tolerate partitions in general. It is proposed to admit a class of controlled partitions, which can be tolerated with the necessary measures. 3 The Abstract Network Model A model for a network displaying reliable real time operation has been advanced in [8] and laid down in [9]. It was called the abstract network. The idea is to consider a set of networks of a given type (LANs in the case) and be able to define a set of common properties abstracting from their physical particularities. The abstract network thus forms a low level service, useful to build complex ....
....by the recipient are consecutive; in the case of consecutive transmitter failures, these may be interleaved with good transmissions from other points, from the recipient s viewpoint. 7 We call omission degree (Od) to the number of consecutive omissions produced by a component. single component [9]. A number of ways of handling omission failures are possible. Two alternative ways, based respectively on detection recovery and masking of omission errors, are presented in figure 1. If k is the maximum omission degree as per An2, then NrT ries = k 1. The detection recovery algorithm is ....
[Article contains additional citation context not shown here]
P. Ver'issimo and Jos'e A. Marques. Reliable broadcast for fault-tolerance on local computer networks. In Ninth Symposium on Reliable Distributed Systems, IEEE, Huntsville, Alabama-USA, October 1990. Also as INESC AR/24-90.
....and technology attributes of LANs that can be used to achieve improved performance and dependability. The MGS benefits of this low level approach without compromising openness by defining an abstract network interface with the properties presented in table 1. The interface, discussed in detail in [13], abstracts the useful communication properties that are common to most existing LANs ensuring MGS portability. Protocol design assumes that communication components have a fail silent behavior. That is, a processor fails by stopping producing outputs but never produces an erroneous output. Tests ....
....order to meet high expectancies with regard to fault tolerance and real time. The highly reliable and timely environment yielded by a single LAN used in a closed fashion had also to do with the LANbased approach taken. We carefully devised a dependability model and established its correctness in [13], for such an environment. The MGS protocol, although clock less (it is not based on synchronized clocks) is synchronous, in the sense that known and bounded execution times are enforced, using the techniques described in [14] Here we briefly enumerate the major requirements to achieve ....
[Article contains additional citation context not shown here]
P. Ver'issimo and Jos'e A. Marques. Reliable broadcast for fault-tolerance on local computer networks. In Proceedings of the Ninth Symposium on Reliable Distributed Systems, Huntsville, Alabama-USA, October 1990. IEEE. Also as INESC AR/24-90.
....and dependability (e.g. broadcast multicast, bounded number of omission errors, bounded transmission delay) Although designed for LANs, xAMp does not depend on a given local area network in particular. This was achieved by defining an abstract network interface, discussed with detail in [29]. We recapitulate its properties here, in table 2. Having our protocols tuned for LANs does not mean we have overlooked the problem of interconnected networks. We argue that in an interconnected networking scenario protocols can be more efficient if they rely on low level local protocols that ....
....the problem of interconnected networks. We argue that in an interconnected networking scenario protocols can be more efficient if they rely on low level local protocols that recognize important properties of the local networks. Our work has provided efficient solutions for the local scope [29], that we are now extending to interconnected networks [34] Protocol design assumes that communication components have a fail silent behavior. When high coverage is required, the use of self checking components must substantiate this assumption. Tests performed in the Delta 4 project have shown ....
[Article contains additional citation context not shown here]
P. Ver'issimo and Jos'e A. Marques. Reliable broadcast for fault-tolerance on local computer networks. In Proceedings of the Ninth Symposium on Reliable Distributed Systems, Huntsville, Alabama-USA, October 1990. IEEE. Also as INESC AR/24-90.
....can be measured by its steadiness, the greatest difference between delivery times observed at one site, and its tightness, the greatest difference between delivery times observed in one execution. According to this, there is a spectrum from tightly synchronous [14] through looselysynchronous [37], to asynchronous protocols [29,5] depending on whether those differences are large or small, compared to the execution time, or even not bounded at all. From the degree of synchronism depend not only real time but also ordering capabilities [40] In conclusion, a group communication subsystem ....
....and participant failure detection, one narrows the domain under the reach of the FLP result. Site failure detection remains unreliable, whereas participant failure detection, performed locally, can be made reliable. Incidentally, in the field of global clock less, acknowledgement based protocols [9,4,29,37], the site and participant membership functions have often been aggregated. This separation makes it clear that the requirements of these protocols, to do correct inter site communication, pertain to site membership management (e.g. if the protocol uses acknowledges, it is necessary to have a ....
P. Ver'issimo and Jos'e A. Marques. Reliable broadcast for fault-tolerance on local computer networks. In Proceedings of the Ninth Symposium on Reliable Distributed Systems, Huntsville, Alabama-USA, October 1990. IEEE. Also as INESC AR/24-90.
....and steady in the measure of the clock pre4 cision, which is normally a very low figure compared with the delivery time. The other end of the spectrum of synchronous protocols is represented by clockless protocols which though not using clocks, display a bounded and known message delivery time[24]. Practically all known clock less protocols are asynchronous (e.g. 21,5] A group communications subsystem should have a number of services, each formed by a combination of some of the properties enumerated. For example, the combination of total order with unanimity yields what is called an ....
P. Ver'issimo and Jos'e A. Marques. Reliable broadcast for fault-tolerance on local computer networks. In Proceedings of the Ninth Symposium on Reliable Distributed Systems, Huntsville, Alabama-USA, October 1990. IEEE. Also as INESC AR/24-90.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC