This paper proposes a novel way to use virtual memorymapped communication (VMMC) to reduce the failover time on clusters. With the VMMC model, applications ' virtual address space can be eciently mirrored on remote memory either automatically or via explicit messages. When a machine fails, its applications can restart from the most recent checkpoints on the failover node with minimal memory copying and disk I/O overhead. This method requires little change to applications ' source code. We developed two fast failover protocols: deliberate update failover protocol (DU) and automatic update failover protocol (AU). The rst can run on any system that supports VMMC, whereas the other requires special network interface support. We implemented these two protocols on two dierent clusters that supported VMMC communication. Our results with three transaction-based applications show that both protocols work quite well. The deliberate update protocol imposes 4-21 % overhead when taking checkpoints every 2 seconds. If an application can tolerate 20 % overhead, this protocol can failover to another machine within 4 milliseconds in the best case and from 0.1 to 3 seconds in the worst case. The failover performance can be further improved by using special network interface hardware. The automatic update protocol is able to take checkpoints every 0.1 seconds with only 3-12 % overhead. If 10 % overhead is allowed, it can failover applications from 0.01 to 0.4 seconds in the worst case. 1
|
1137
|
Transaction Processing: Concepts and Techniques
– Gray, Reuter
- 1993
|
|
784
|
Myrinet: A Gigabit-per-second Local Area Network
– Boden, Cohen, et al.
- 1995
|
|
209
|
Libckpt: Transparent checkpointing under Unix
– Plank, Beck, et al.
- 1995
|
|
194
|
Recovery in distributed systems using optimistic message logging and checkpointing
– Johnson, Zwaenepoel
- 1990
|
|
123
|
Hypervisor-based Fault-tolerance
– Bressoud, Schneider
- 1995
|
|
117
|
Fault Tolerance Under UNIX
– Borg, Blau, et al.
- 1989
|
|
112
|
A Message System Supporting Fault Tolerance
– Borg, Baumbach, et al.
- 1983
|
|
92
|
PUBLISHING: A Reliable Broadcast Communication Mechanism
– Powell, Presotto
- 1983
|
|
88
|
eNVy: A non-volatile, main memory storage system
– WU, ZWAENEPOEL
- 1994
|
|
66
|
Free transactions with Rio Vista
– Lowell, Chen
- 1997
|
|
48
|
The Theory and Practice of Reliable System Design
– Siewiorek, Swarz
- 1982
|
|
40
|
The Design and Architecture of the Microsoft Cluster Service — A Practical Approach to High-availability and Scalability
– Vogels, Dumitriu, et al.
- 1998
|
|
36
|
Rollback and recovery strategies for computer programs
– Chandy, Ramamoorthy
- 1972
|
|
30
|
and Shaula Yemini. Optimistic recovery in distributed systems
– Strom
- 1985
|
|
29
|
Overview of network memory channel for PCI
– Gillett, Collins, et al.
- 1996
|
|
28
|
Discount Checking: Transparent, Low-Overhead Recovery for General Applications
– Lowell, Chen
- 1998
|
|
24
|
Checkpointing memory-resident databases
– Salem, Garcia-Molina
- 1989
|
|
24
|
Supporting nondeterministic execution in fault-tolerant systems
– Slye, Elnozahy
- 1996
|
|
22
|
A NonStop Operating System
– Bartlett
- 1978
|
|
22
|
An Overview of the NetWare Operating System
– Major, Minshall, et al.
- 1994
|
|
21
|
UTLB: A mechanism for address translation on network interfaces,” inProc.of The8thInternationalConferenceonArchitecturalSupportforProgrammingLanguagesandOperatingSystems(ASPLOS8
– Chen, Bilas, et al.
- 1998
|
|
21
|
et al. Virtual Memory Mapped Network Interface for the SHRIMP Multicomputer
– Blumrich
- 1994
|
|
18
|
Why Optimistic Message Logging has not been Used in Telecommunications Systems
– Huang, Wang
- 1995
|
|
17
|
Scalability of the Microsoft Cluster Service
– Vogels, Dumitriu, et al.
- 1998
|
|
15
|
Lightweight transactions on networks of workstations
– Papathanasiou, Markatos
- 1998
|
|
10
|
et al. RAID: High-Performance, Reliable Secondary Storage
– Chen
- 1994
|
|
8
|
Checkpointing and roolback-recovery for distributed systems
– Koo, Toueg
- 1987
|
|
5
|
Elnozahy et.al. A Survey of Rollback-Recovery Protocols in Message Passing Systems
– N
- 1996
|
|
5
|
et al. The Rio File Cache: Surviving Operating System Crashes
– Chen
- 1996
|
|
3
|
A and et.al. A Fault-tolerant multiprocessor system system
– Katzman
- 1989
|
|
3
|
Reliable Computer Design and Evaluation
– Siewiorek, Swarz
- 1992
|
|
3
|
The Postgres DBMS
– Stonebraker
- 1990
|
|
2
|
A NonStop Kernel. SOSP'81
– Bartlett
|
|
2
|
Dubnicki et.al. Software Support for Virtual MemoryMapped Communication
– unknown authors
|
|
2
|
Delp et.al. Memory as a Network Abstraction
– S
- 1991
|
|
2
|
Chillarege et.al. Challenges in Designing Fault-Tolerant Systems
– unknown authors
|
|
2
|
Cardoza et.al. Design of the TruCluster Multicomputer System for the Digital UNIX Environment
– M
- 1996
|