Truly-Transparent Checkpointing of Parallel Applications (1998) [9 citations — 0 self]
Abstract:
Checkpointing is a technique used for many purposes, including, but not limited to assistance in fault-tolerant applications, rollback transactions and migration. Many tools have been proposed in the past to help solve these problems. But in the field of migration there is still a lack, because either: (a) the tool is limited to some kind of parallel programming library (PVM, MPI), (b) the size of the checkpoint image is too big to be worth migrating, (c) the checkpoint is limited to some well-behaved applications or (d) the checkpoint image must be saved to file or sent to centralized servers instead of going directly to the target machine's memory. We developed a new tool called Epckpt that can solve this lack in process migration. Epckpt can: (a) checkpoint almost all kinds of applications independent of their behavior, (b) limit the size of the applications image to its minimum, (c) checkpoint fork-parallel applications, (d) checkpoint an application that was not meant for being checkpointed (was not re-compiled nor re-linked with any special library) and (e) send the checkpoint image directly to the target machine instead of to a server of a file. Our checkpoint tool is included in Linux's kernel. Results show that the maximum checkpoint overhead in an application's running time is up to 9%, including the cost to restart it. Migration results show that a reduction of up to 77 % of total running time can be achieved under heavy load situations. 1.
Citations
| 209 | Libckpt: Transparent checkpointing under Unix – Plank, Beck, et al. - 1995 |
| 76 | Scheduling and page migration for multiprocessor compute servers – Chandra, Devine, et al. - 1994 |
| 46 | Managing Checkpoints for Parallel Programs – Pruyne, Livny - 1996 |
| 2 | Analysis of Task Migration – Squillante, Nelson - 1991 |

