Results 1 - 10
of
12
Cost-wait trade-offs in client-side resource provisioning with elastic clouds
- In IEEE International Conference on Cloud Computing (CLOUD
, 2011
"... HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte p ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
(Show Context)
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Feitelson, Reducing Performance Evaluation Sensitivity and Variability by Input Shaking
, 2007
"... Abstract—Simulations sometimes lead to observed sensitivity to configuration parameters as well as inconsistent performance results. The question is then what is the true effect and what is a coincidental artifact of the evaluation. The shaking methodology answers this by executing multiple simulati ..."
Abstract
-
Cited by 12 (6 self)
- Add to MetaCart
(Show Context)
Abstract—Simulations sometimes lead to observed sensitivity to configuration parameters as well as inconsistent performance results. The question is then what is the true effect and what is a coincidental artifact of the evaluation. The shaking methodology answers this by executing multiple simulations under small perturbations to the input workload, and calculating the average performance result; if the effect persists we can be more confident that it is real, whereas if it disappears it was an artifact. We present several examples where the sensitivity that appears in results based on a single evaluation is eliminated or considerably reduced by the shaking methodology. While our examples come from evaluations of scheduling algorithms for supercomputers, we believe the method has wider applicability.
On simulation and design of parallel-systems schedulers: are we doing the right thing
- IEEE Trans. Parallel & Distributed Syst
"... Abstract — It is customary to use open-system, trace-driven simulations to evaluate the performance of parallel-system schedulers. As a consequence, all schedulers have evolved to optimize the packing of jobs in the schedule, as a mean to improve a number of performance metrics that are conjectured ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
(Show Context)
Abstract — It is customary to use open-system, trace-driven simulations to evaluate the performance of parallel-system schedulers. As a consequence, all schedulers have evolved to optimize the packing of jobs in the schedule, as a mean to improve a number of performance metrics that are conjectured to be correlated with user satisfaction, with the premise that this will result in a higher productivity in reality. We argue that these simulations suffer from severe limitations that lead to suboptimal scheduler designs, and to even dismissing potentially good design alternatives. We propose an alternative simulation methodology called site-level simulation, in which the workload for the evaluation is generated dynamically by user-models that interact with the system. We present a novel scheduler called CREASY that exploits knowledge on user behavior to directly improve user satisfaction, and compare its performance to the original, packing-based EASY scheduler. We show that user productivity improves by up to 50 % under the user-aware design, while according to the conventional metrics, performance may actually degrade. Index Terms — Parallel job scheduling, trace-driven simulations, open-system model, user behavior, feedback. I.
On Extracting Session Data from Activity Logs
"... Activity logs from large-scale systems facilitate the study of user behavior, which can be used to improve and tune the user experience. However, the available data often lacks important elements such as the identification of user sessions. Previous work typically compensated for this by setting a t ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
(Show Context)
Activity logs from large-scale systems facilitate the study of user behavior, which can be used to improve and tune the user experience. However, the available data often lacks important elements such as the identification of user sessions. Previous work typically compensated for this by setting a threshold of around 30 minutes, and assuming that breaks in activity longer than the threshold reflect breaks between sessions. We show that using such a global threshold introduces artifacts that may affect the analysis, because there is a high probability that long sessions are not identified correctly. As an alternative, we suggest that a suitable individual threshold be found for each user, based on that user’s activity pattern. Applying this approach to a large dataset from the AOL search engine leads to a distribution of session durations that is free of artifacts like those that appear when using a global threshold.
Analysis of Scheduling Policies under Correlated Job Sizes
- PERFORMANCE EVALUATION 00 (2010) 1–24
, 2010
"... Correlations in traffic patterns are an important facet of the workloads faced by real systems, and one that has far-reaching consequences on the performance and optimization of the systems involved. However, all the existing analytical work on understanding the effect of correlations between succes ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Correlations in traffic patterns are an important facet of the workloads faced by real systems, and one that has far-reaching consequences on the performance and optimization of the systems involved. However, all the existing analytical work on understanding the effect of correlations between successive service requirements (job sizes) is limited to First-Come-First-Served scheduling. This leaves open fundamental questions: How do various scheduling policies interact with correlated job sizes? Can scheduling be used to mitigate the harmful effects of correlations? In this paper we take the first step towards answering these questions. Under a simple model for job size correlations, we present the first asymptotic analysis of various common size-independent scheduling policies when the job size sequence exhibits high correlation. Our analysis reveals that the characteristics of various scheduling policies, as well as their performance relative to each other, are markedly different under the assumption of i.i.d. job sizes versus correlated job sizes. Further, among the class of size-independent scheduling policies, there is no single scheduling policy that is optimal for all degrees of correlations and thus any optimal policy must learn the correlations. We support the asymptotic analysis with numerical algorithms for exact performance analysis under an arbitrary degree of correlation, and with simulations. Finally, we verify the lessons from our correlation model on real world traces.
Free elasticity and free CPU power for scientific workloads on
- IaaS Clouds,” in 18th IEEE International Conference on Parallel and Distributed Systems. Singapour, Singapore: IEEE
, 2012
"... Abstract—Recent Infrastructure as a Service (IaaS) solutions, such as Amazon’s EC2 cloud, provide virtualized ondemand computing resources on a pay-per-use model. From the user point of view, the cloud provides an inexhaustible supply of resources, which can be dynamically claimed and released. In t ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
Abstract—Recent Infrastructure as a Service (IaaS) solutions, such as Amazon’s EC2 cloud, provide virtualized ondemand computing resources on a pay-per-use model. From the user point of view, the cloud provides an inexhaustible supply of resources, which can be dynamically claimed and released. In the context of independent tasks, the main pricing model of EC2 promises two exciting features that drastically change the problem of resource provisioning and job scheduling. We call them free elasticity and free CPU power. Indeed, the price of CPU cycles is constant whatever the type of CPU and the amount of resources leased. Consequently, as soon as a user is able to keep its resources busy, the cost of one computation is the same using a lot of powerful resources or few slow ones. In this article, we study if these features can be exploited to execute bags of tasks, and what efforts are required to reach this goal. Efforts might be put on implementation, with complex provisioning and scheduling strategies, and in terms of performance, with the acceptance of execution delays. Using real workloads, we show that: (1) Most of the users can benefit from free elasticity with few efforts; (2) Free CPU power is difficult to achieve; (3) Using adapted provisioning and scheduling strategies can improve the results for a significant number of users; And (4) the outcomes of these efforts is difficult to predict. Keywords-cloud; provisioning; resource management; I.
General
"... Evaluating the performance of a computer system is based on using representative workloads. Common practice is to either use real workload traces to drive simulations, or else to use statistical workload models that are based on such traces. Such models allow various workload attributes to be manipu ..."
Abstract
- Add to MetaCart
(Show Context)
Evaluating the performance of a computer system is based on using representative workloads. Common practice is to either use real workload traces to drive simulations, or else to use statistical workload models that are based on such traces. Such models allow various workload attributes to be manipulated, thus providing desirable flexibility, but may lose details of the workload’s internal structure. To overcome this, we suggest to combine the benefits of real traces and flexible modeling. Focusing on the problem of evaluating the performance of parallel job schedulers, we partition each trace into independent subtraces representing different users, and then re-combine them in various ways, while maintaining features like the daily and weekly cycles of activity. This facilitates the creation of longer workload traces that enable longer simulations, the creation of multiple statistically similar workloads that can be used to gauge confidence intervals, and the creation of workloads with different load levels.
1 On Simulation and Design of Parallel-Systems Schedulers: Are We Doing the Right Thing?
"... Abstract — It is customary to use open-system, trace-driven simulations to evaluate the performance of parallel-system schedulers. As a consequence, all schedulers have evolved to optimize the packing of jobs in the schedule, as a mean to improve a number of performance metrics that are conjectured ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract — It is customary to use open-system, trace-driven simulations to evaluate the performance of parallel-system schedulers. As a consequence, all schedulers have evolved to optimize the packing of jobs in the schedule, as a mean to improve a number of performance metrics that are conjectured to be correlated with user satisfaction, with the premise that this will result in a higher productivity in reality. We argue that these simulations suffer from severe limitations that lead to suboptimal scheduler designs, and to even dismissing potentially good design alternatives. We propose an alternative simulation methodology called site-level simulation, in which the workload for the evaluation is generated dynamically by user-models that interact with the system. We present a novel scheduler called CREASY that exploits knowledge on user behavior to directly improve user satisfaction, and compare its performance to the original, packing-based EASY scheduler. We show that user productivity improves by up to 50 % under the user-aware design, while according to the conventional metrics, performance may actually degrade. Index Terms — Parallel job scheduling, trace-driven simulations, open-system model, user behavior, feedback. I.
Supervised by
, 2007
"... This thesis describes a methodology for performing evaluations of scheduling algorithms for super-computers based on simulation results. Simulations sometimes lead to observed variability and sensitivity to configuration parameters. The question is then what is the true effect and what is a coincide ..."
Abstract
- Add to MetaCart
(Show Context)
This thesis describes a methodology for performing evaluations of scheduling algorithms for super-computers based on simulation results. Simulations sometimes lead to observed variability and sensitivity to configuration parameters. The question is then what is the true effect and what is a coincidental artifact of the evaluation. The shaking methodology answers this by executing multiple simulations under small perturbations to the input workload, and calculating the average performance result; if the effect persists we can be more confident that it is real, whereas if it disappears it was an artifact. We present several examples where the sensitivity that appears in results based on a single evaluation is eliminated or considerably reduced by the shaking methodology. While the examples come from evaluations of scheduling algorithms for supercomputers, we believe the method has wider applicability. Acknowledgments I wish to express my gratitude to Prof. Dror Feitelson for his precious guidance and instruction. I would like to thank Dr. Dan Tsafrir for his supporting guidance and investment. It has been a great privilege to work with both of them. Special thanks go to Bracha Hod and Lior Amar for their support and advice, and to the Distributed Computing lab for the use of MOSIX and clusters. Finally, I would like to thank my family for their unconditional support and encouragement all along my studies.
unknown title
"... Collecting and analyzing data lies at the basis of the scientific method: findings about nature usher new ideas, and experimental results support or refute theories. All this is not very prevalent in computer science, possibly due to the fact that computer systems are man made, and not perceived as ..."
Abstract
- Add to MetaCart
(Show Context)
Collecting and analyzing data lies at the basis of the scientific method: findings about nature usher new ideas, and experimental results support or refute theories. All this is not very prevalent in computer science, possibly due to the fact that computer systems are man made, and not perceived as a natural phenomenon. But computer systems and their interactions with their users are actually complex enough to require objective observations and measurements. We’ll survey several examples related to parallel and other systems, in which we attempt to further our understanding of architectural choices, system evaluation, and user behavior. In all the cases, the emphasis is not on heroic data collection efforts, but rather on a fresh look at existing data, and uncovering surprising, interesting, and useful information. Using such empirical information is necessary in order to ensure that systems and evaluations are relevant to the real world.