Results 1 - 10
of
28
Fault-aware scheduling for bag-of-tasks applications on desktop grids,” in GRID ’06
- IEEE Computer Society
"... Abstract — Desktop Grids have proved to be a suitable platform for the execution of Bag-of-Tasks applications but, being characterized by a high resource volatility, require the availability of scheduling techniques able to effectively deal with resource failures and/or unplanned periods of unavaila ..."
Abstract
-
Cited by 25 (2 self)
- Add to MetaCart
(Show Context)
Abstract — Desktop Grids have proved to be a suitable platform for the execution of Bag-of-Tasks applications but, being characterized by a high resource volatility, require the availability of scheduling techniques able to effectively deal with resource failures and/or unplanned periods of unavailability. In this paper we present a set of fault-aware scheduling policies that, rather than just tolerating faults as done by traditional fault-tolerant schedulers, exploit the information concerning resource availability to improve application performance. The performance of these strategies have been compared via simulation with those attained by traditional fault-tolerant schedulers. Our results, obtained by consider a set of realistic scenarios modeled after real Desktop Grids, show that our approach results in better application performance and resource utilization. I.
Fault-Tolerant Scheduling for Bag-of-Tasks Grid Applications
- Proc. of the 2005 European Grid Conference (EuroGrid 2005). Lecture Notes in Computer Science
, 2005
"... Abstract. In this paper we propose a fault-tolerant scheduler for Bagof-Tasks ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
(Show Context)
Abstract. In this paper we propose a fault-tolerant scheduler for Bagof-Tasks
RIDGE: Combining Reliability and Performance in Open Grid Platforms
, 2007
"... Large-scale donation-based distributed infrastructures need to cope with the inherent unreliability of participant nodes. A widely-used work scheduling technique in such environments is to redundantly schedule the outsourced computations to a number of nodes. We present the design and implementation ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Large-scale donation-based distributed infrastructures need to cope with the inherent unreliability of participant nodes. A widely-used work scheduling technique in such environments is to redundantly schedule the outsourced computations to a number of nodes. We present the design and implementation of RIDGE, a reliabilityaware system which uses a node’s prior performance and behavior to make more effective scheduling decisions. We have implemented RIDGE on top of the BOINC distributed computing infrastructure and have evaluated its performance on a live testbed consisting of 120 PlanetLab nodes. Our experimental results show that RIDGE is able to match or surpass the throughput of the best vanilla BOINC configuration under different reliability environments, by automatically adapting to the characteristics of the underlying environment. In addition, RIDGE is able to provide much lower workunit makespans compared to BOINC, which indicates its desirability in service-oriented environments with time constraints.
The effectiveness of threshold-based scheduling policies in boinc projects
- In Proceedings of the 2nd IEEE International Conference on e-Science and Grid Technologies (eScience
, 2006
"... Several scientific projects use BOINC (Berkeley Open Infrastructure for Network Computing) to perform largescale simulations using volunteers ’ computers (workers) across the Internet. In general, the scheduling of tasks in BOINC uses a First-Come-First-Serve policy and no attention is paid to worke ..."
Abstract
-
Cited by 8 (7 self)
- Add to MetaCart
(Show Context)
Several scientific projects use BOINC (Berkeley Open Infrastructure for Network Computing) to perform largescale simulations using volunteers ’ computers (workers) across the Internet. In general, the scheduling of tasks in BOINC uses a First-Come-First-Serve policy and no attention is paid to workers ’ past performance, such as whether or not they have tended to perform tasks promptly and correctly. In this paper we use SimBA, a discrete-event Simulator of BOINC Applications, to study new threshold-based scheduling strategies for BOINC projects that use availability and reliability metrics to classify workers and distribute tasks according to this classification. We show that if availability and reliability thresholds are selected properly, then the workers ’ throughput of valid results increases significantly in BOINC projects.
Metrics for Effective Resource Management in Global Computing Environments
- IEEE International Conference on e-Science and Grid Technologies (eScience 2005
, 2005
"... Global computing uses Internet-connected PCs volunteered by their owners. These PCs are diverse, volatile, and error-prone. Sophisticated scheduling methods commonly applied in Grid computing may not be sufficiently scalable and flexible for global computing environments. This paper shows that it is ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Global computing uses Internet-connected PCs volunteered by their owners. These PCs are diverse, volatile, and error-prone. Sophisticated scheduling methods commonly applied in Grid computing may not be sufficiently scalable and flexible for global computing environments. This paper shows that it is possible to classify global computing hosts based on simple metrics such as availability and reliability, and that it is efficient to assign tasks to such hosts accordingly. The proposed classification of workers is applied to
Resource failure prediction for fine-grained cycle sharing
, 2005
"... Fine-Grained Cycle Sharing (FGCS) systems aim at utilizing the large amount of computational resources available on the Internet. In FGCS, host computers allow guest jobs to utilize the CPU cycles if the jobs do not significantly impact the local users of a host. A characteristic of such resources i ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
(Show Context)
Fine-Grained Cycle Sharing (FGCS) systems aim at utilizing the large amount of computational resources available on the Internet. In FGCS, host computers allow guest jobs to utilize the CPU cycles if the jobs do not significantly impact the local users of a host. A characteristic of such resources is that they are generally provided voluntarily and their availability fluctuates highly. Guest jobs may incur resource failures because of unexpected resource unavailability. To provide fault tolerance to guest jobs without adding significant computational overhead, failure prediction is required. This paper presents a method to predict resource failures in FGCS systems. It applies a semi-Markov Process and is based on a novel failure model, combining generic hardware-software failures with domain-specific failures in FGCS. We describe the failure prediction framework and its implementation in a production FGCS system named iShare. Through the experiments on an iShare testbed, we demonstrate that the prediction achieves accuracy above 86 % on average and outperforms linear time series models, while the computational cost is negligible. Our experimental results also show that the prediction is robust in the presence of irregular resource failures. 1
Resource availability prediction in fine-grained cycle sharing systems
- In Proceedings of the 15th IEEE International Symposium on High Performance Distributed Computing
, 2006
"... Fine-Grained Cycle Sharing (FGCS) systems aim at utilizing the large amount of computational resources available on the Internet. In FGCS, host computers allow guest jobs to utilize the CPU cycles if the jobs do not significantly impact the local users of a host. A characteristic of such resources i ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
(Show Context)
Fine-Grained Cycle Sharing (FGCS) systems aim at utilizing the large amount of computational resources available on the Internet. In FGCS, host computers allow guest jobs to utilize the CPU cycles if the jobs do not significantly impact the local users of a host. A characteristic of such resources is that they are generally provided voluntarily and their availability fluctuates highly. Guest jobs may fail because of unexpected resource unavailability. To provide fault tolerance to guest jobs without adding significant computational overhead, it requires to predict future resource availability. This paper presents a method for resource availability prediction in FGCS systems. It applies a semi-Markov Process and is based on a novel resource availability model, combining generic hardware-software failures with domain-specific resource behavior in FGCS. We describe the prediction framework and its implementation in a production FGCS system named iShare. Through the experiments on an iShare testbed, we demonstrate that the prediction achieves accuracy above 86 % on average and outperforms linear time series models, while the computational cost is negligible. Our experimental results also show that the prediction is robust in the presence of irregular resource unavailability. 1
M.J.: Scheduling on the grid via multi-state resource availability prediction
- In: 9th IEEE/ACM International Conference on Grid Computing, 2008
, 2008
"... To make the most effective application placement decisions on volatile large-scale heterogeneous Grids, schedulers must consider factors such as resource speed, load, and reliability. Including reliability requires availability predictors, which consider different periods of resource history, and us ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
To make the most effective application placement decisions on volatile large-scale heterogeneous Grids, schedulers must consider factors such as resource speed, load, and reliability. Including reliability requires availability predictors, which consider different periods of resource history, and use various strategies to make predictions about resource behavior. Prediction accuracy significantly affects the quality of the schedule, as does the method by which schedulers combine various factors, including the weight given to predicted availability, speed, load, and more. This paper explores the question of how to consider predicted availability to improve scheduling, concentrating on multi-state availability predictors. We propose and study several classes of schedulers, and a method for combining factors. We characterize the inherent tradeoff between application makespan and the number of evictions due to failure, and demonstrate how our schedulers can navigate this tradeoff under various scenarios. We vary application load and length, and the percentage of jobs that are checkpointable. Our results show that the only other multi-state prediction based scheduler causes up to 51 % more evicted jobs while simultaneously increasing average job makespan by 18% when compared with our scheduler. 1
Managing Opportunistic and Dedicated Resources in a Bi-modal Service Deployment Architecture by
, 2007
"... This document is dedicated to my father who inspired the never-ending quest for knowledge in my life. ii Acknowledgement ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
(Show Context)
This document is dedicated to my father who inspired the never-ending quest for knowledge in my life. ii Acknowledgement
Strategies to Create Platforms for Differentiated Services from Dedicated and Opportunistic Resources
- J. PARALLEL AND DISTRIBUTED COMPUTING
, 2007
"... This paper is proposing a new platform for implementing services in future service oriented architectures. The basic premise of our proposal is that by combining large volume of uncontracted resources with small clusters of dedicated resources, we can dramatically reduce the amount of dedicated reso ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
This paper is proposing a new platform for implementing services in future service oriented architectures. The basic premise of our proposal is that by combining large volume of uncontracted resources with small clusters of dedicated resources, we can dramatically reduce the amount of dedicated resources while the goodput provided by the overall system remains at a high level. This paper presents particular strategies for implementing this idea for a particular class of applications. We performed very detailed simulations on synthetic and real traces to evaluate the performance of the proposed strategies. Our findings on compute-intensive applications show that preemptive reallocation of resources is necessary for assured services. The proposed preemption based scheduling heuristic can significantly improve utilization of the dedicated resources by opportunistically offloading the peak loads on uncontracted resources, while keeping the service quality virtually unaffected.