Clusters of workstations are increasingly being viewed as a cost-effective alternative to parallel supercomputers. However, resource management and scheduling on workstations clusters is complicated by the fact that the number of idle workstations available for executing parallel applications is constantly fluctuating. In this paper, we present a case for scheduling parallel applications on non-dedicated workstation clusters using dynamic space-sharing, a policy under which the number of processors allocated to an application can be changed during its execution. We describe an approach that uses application-level checkpointing and data repartitioning for supporting dynamic spacesharing and for handling the dynamic reconfiguration triggered when failure or owner activity is detected on a workstation being used by a parallel application. The performance advantages of dynamic space-sharing are quantified through a simulation study, and experimental results are presented for the overhead of dynamic reconfiguration of a grid-oriented data parallel application using our approach.
|
799
|
Condor - A Hunter of Idle Workstations
– Litzkow, Livny, et al.
- 1988
|
|
299
|
Cilk: An efficient multithreaded runtime system
– Blumofe, Joerg, et al.
- 1995
|
|
245
|
A case for NOW (Networks of Workstations
– Anderson, Culler, et al.
- 1995
|
|
205
|
Process Control and Scheduling Issues for Multiprogrammed SharedMemory Multiprocessors
– Tucker, Gupta
- 1989
|
|
153
|
The performance of multiprogrammed multiprocessor scheduling policies
– Leutenegger, Vernon
- 1990
|
|
136
|
A Survey of Scheduling in Multiprogrammed Parallel Systems
– Feitelson
- 1994
|
|
113
|
Characterizations of parallelism in applications and their use in scheduling
– Sevcik
- 1989
|
|
111
|
The Interaction of Parallel and Sequential Workloads on a Network of Workstations
– Arpaci, Dusseau, et al.
- 1995
|
|
90
|
The Available Capacity of a Privately Owned Workstation Environment
– Mutka, Livny
- 1991
|
|
87
|
Supercomputing out of recycled garbage: Preliminary experience with Piranha
– Gelernter, Kaminsky
- 1992
|
|
78
|
Fail Safe PVM: A Portable Package for Distributed Programming with Transparent Recovery
– Leon
- 1993
|
|
78
|
Distributed hierarchical control for parallel processing
– Feitelson, Rudolph
- 1990
|
|
66
|
Parallel processing on dynamic resources with CARMI
– Pruyne, Livny
- 1995
|
|
60
|
The Utility of Exploiting Idle Workstations for Parallel Computation
– Acharya, Edjlali, et al.
- 1997
|
|
54
|
Adaptive load migration systems for PVM
– Casas, Konuru, et al.
- 1994
|
|
50
|
Performance analysis of job scheduling policies in parallel supercomputing environments
– Naik, Squillante, et al.
- 1993
|
|
46
|
Processor Allocation Policies for Message-Passing Parallel Computers
– McCann, Zahorjan
- 1994
|
|
41
|
Block data decomposition for data-parallel programming on a heterogeneous workstation network
– Crandall, Quinn
- 1993
|
|
37
|
Providing resource management services to parallel applications
– Pruyne, Livny
- 1994
|
|
27
|
Dynamic Partitioning in a Transputer Environment
– Dussa, Carlson, et al.
- 1990
|
|
27
|
Dynamic Reconfiguration of Distributed Applications
– Hofmeister
- 1993
|
|
24
|
High-level fault tolerance in distributed programs
– Seligman, Beguelin
- 1994
|
|
24
|
Piranha scheduling: Strategies and their implementation
– Gelernter, Jourdenais, et al.
- 1993
|
|
20
|
Algorithm-based diskless checkpointing for fault tolerant matrix operations
– Plank, Kim, et al.
- 1995
|
|
17
|
On the benefits and limitations of dynamic partitioning in parallel computer systems
– Squillante
- 1995
|
|
13
|
Adaptive Execution of Data Parallel Computations on Networks of Heterogeneous Workstations
– Prouty, Otto, et al.
- 1994
|
|
10
|
A system for dynamic resource allocation and data distribution
– Moreira, Naik, et al.
- 1995
|
|
5
|
Designing reconfigurable data-parallel applications for scalable parallel computing platforms
– Moreira, Naik, et al.
- 1996
|
|
4
|
Programming Models for Irregular Applications
– Yelick
- 1993
|
|
3
|
et al. Scalapack: A scalable linear algebra library for distributed memory concurrent computers
– Choi
- 1992
|
|
1
|
Designing Recon gurable Data-Parallel Applications for Scalable Parallel Computing Platforms
– Moreira, Naik, et al.
- 1996
|
|
1
|
Distributed Computing Feasability in a non-dedicated homogenous distributed system
– Leutenegger, Sun
- 1993
|