Download:
|
by Robbert Van Renesse, Yaron Minsky, Mark Hayden
Service,” Proc. Conf. Middleware
http://www.cs.cornell.edu/Info/People/rvr/papers/pfd/pfd.ps
Add To MetaCart
Abstract:
Failure Detection is valuable for system management, replication, load balancing, and other distributed services. To date, Failure Detection Services scale badly in the number of members that are being monitored. This paper describes a new protocol based on gossiping that does scale well and provides timely detection. We analyze the protocol, and then extend it to discover and leverage the underlying network topology for much improved resource utilization. We then combine it with another protocol, based on broadcast, that is used to handle partition failures. 1
Citations
|
1072
|
Impossibility of distributed consensus with one faulty process
– Fischer, Lynch, et al.
- 1985
|
|
416
|
Epidemic algorithms for replicated database maintenance
– Demers, Greene, et al.
- 1987
|
|
359
|
Horus: A flexible group communication system
– Renesse, Birman, et al.
- 1996
|
|
344
|
Transis: a communication sub-system for high availability
– Amir, Dolev, et al.
- 1992
|
|
309
|
The weakest failure detector for solving consensus
– Chandra, Hadzilacos, et al.
- 1996
|
|
159
|
A Gossip-Style Failure Detection Service
– Renesse, Minsky, et al.
- 1998
|
|
145
|
Bimodal multicast
– Birman, Hayden, et al.
- 1999
|
|
122
|
The Mathematical Theory of Infectious Diseases and its Applications
– BAILEY
- 1975
|
|
105
|
The Design and Analysis of Algorithms
– Kozen
- 1992
|
|
48
|
Group membership in the epidemic style
– Golding, Taylor
- 1992
|
|
32
|
Building adaptive systems using Ensemble. Software—Practice and Experience
– Renesse, Birman, et al.
- 1998
|
|
24
|
Gossips and telephones
– Baker, Shostak
- 1972
|
|
17
|
Impossibility of group membership in asynchronous systems
– Chandra, Hadzilacos, et al.
- 1995
|
|
12
|
A reliable ordered delivery protocol for interconnected local-area networks
– Agarwal, Moser, et al.
- 1995
|