Results 1 - 10
of
16
Adaptive Computing on the Grid Using AppLeS
, 2003
"... Ensembles of distributed, heterogeneous resources, also known as Computational Grids are emerging as critical platforms for high-performance and resource-intensive applications. Such platforms provide the potential for applications to aggregate enormous bandwidth, computational power, memory, second ..."
Abstract
-
Cited by 90 (7 self)
- Add to MetaCart
Ensembles of distributed, heterogeneous resources, also known as Computational Grids are emerging as critical platforms for high-performance and resource-intensive applications. Such platforms provide the potential for applications to aggregate enormous bandwidth, computational power, memory, secondary storage, and other resources during a single execution. However, achieving this performance potential in dynamic, heterogeneous environments is challenging. Recent experience with distributed applications indicates that adaptivity is fundamental to achieving application performance in dynamic Grid environments. The AppLeS (Application Level Scheduling) project provides a methodology, application software, and software environments for adaptively scheduling and deploying applications in dynamic, heterogeneous, multi-user Grid environments. In this paper, we discuss the AppLeS project and outline our results.
A Comprehensive Model of the Supercomputer Workload
"... ... This paper attacks this problem by considering requested time (and its relation with execution time) and the possibility of job cancellation, two aspects of the supercomputer workload that have not been modeled yet. Moreover, we also improve upon existing models for the arrival instant and p ..."
Abstract
-
Cited by 47 (5 self)
- Add to MetaCart
... This paper attacks this problem by considering requested time (and its relation with execution time) and the possibility of job cancellation, two aspects of the supercomputer workload that have not been modeled yet. Moreover, we also improve upon existing models for the arrival instant and partition size.
The Cost of Doing Science on the Cloud: The Montage Example
, 2008
"... Utility grids such as the Amazon EC2 cloud and Amazon S3 offer computational and storage resources that can be used on-demand for a fee by compute and data-intensive applications. The cost of running an application on such a cloud depends on the compute, storage and communication resources it will p ..."
Abstract
-
Cited by 45 (2 self)
- Add to MetaCart
Utility grids such as the Amazon EC2 cloud and Amazon S3 offer computational and storage resources that can be used on-demand for a fee by compute and data-intensive applications. The cost of running an application on such a cloud depends on the compute, storage and communication resources it will provision and consume. Different execution plans of the same application may result in significantly different costs. Using the Amazon cloud fee structure and a real-life astronomy application, we study via simulation the cost performance tradeoffs of different execution and resource provisioning plans. We also study these trade-offs in the context of the storage and communication fees of Amazon S3 when used for longterm application data archival. Our results show that by provisioning the right amount of storage and compute resources, cost can be significantly reduced with no significant impact on application performance. 1.
Backfilling using system-generated predictions rather than user runtime estimates
- In IEEE TPDS
, 2007
"... The most commonly used scheduling algorithm for parallel supercomputers is FCFS with backfilling, as originally introduced in the EASY scheduler. Backfilling means that short jobs are allowed to run ahead of their time provided they do not delay previously queued jobs (or at least the first queued j ..."
Abstract
-
Cited by 30 (4 self)
- Add to MetaCart
The most commonly used scheduling algorithm for parallel supercomputers is FCFS with backfilling, as originally introduced in the EASY scheduler. Backfilling means that short jobs are allowed to run ahead of their time provided they do not delay previously queued jobs (or at least the first queued job). To make such determinations possible, users are required to provide estimates of how long jobs will run, and jobs that violate these estimates are killed. Empirical studies have repeatedly shown that user estimates are inaccurate, and that system-generated predictions based on history may be significantly better. However, predictions have not been incorporated into production schedulers, partially due to a misconception (that we resolve) claiming inaccuracy actually improves performance, but mainly because underprediction is technically unacceptable: users will not tolerate jobs being killed just because system predic-tions were too short. We solve this problem by divorcing kill-time from the runtime prediction, and correcting predictions adaptively as needed if they are proved wrong. The end result is a surprisingly simple scheduler, which requires minimal deviations from current practices (e.g. using FCFS as the basis), and behaves exactly like EASY as far as users are concerned; nev-
When the Herd is Smart: Aggregate Behavior in the Selection of Job Request
- IEEE Transactions in Parallel and Distributed Systems
, 2003
"... In most parallel supercomputers, submitting a job for execution involves specifying how many processors are to be allocated to the job. When the job is moldable (i.e. there is a choice on how many processors the job uses), an application scheduler called SA can significantly improve job performance ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
In most parallel supercomputers, submitting a job for execution involves specifying how many processors are to be allocated to the job. When the job is moldable (i.e. there is a choice on how many processors the job uses), an application scheduler called SA can significantly improve job performance by automatically selecting how many processors to use [7]. Since most jobs are moldable, this result has great impact to the current state of practice in supercomputer scheduling. However, the widespread use of SA can change the nature of workload processed by supercomputers. When many SAs are scheduling jobs on one supercomputer, the decision made by one SA affects the state of the system, therefore impacting other instances of SA. In this case, the global behavior of the system comes from the aggregate behavior caused by all SAs. In particular, it is reasonable to expect the competition for resources to become tougher with multiple SAs, and this tough competition to decrease the performance improvement attained by each SA individually. This paper investigates this very issue. We found that the increased competition indeed makes it harder for each individual instance of SA to improve job performance. Nevertheless, there are two other aggregate behaviors that override increased competition when the system load is moderate to heavy. First, as load goes up, SA chooses smaller requests, 2 which increases efficiency, which effectively decreases the offered load, which mitigates long wait times. Second, better job packing and fewer jobs in the system make it easier for incoming jobs to fit in the supercomputer schedule, thus reducing wait times further. As the result, in moderate to heavy load conditions, a single instance of SA benefits from the fact than other jobs are also us...
Running bag-of-tasks applications on computational grids: The mygrid approach
- In ICPP
, 2003
"... We here discuss how to run Bag-of-Tasks applications (those parallel applications whose tasks are independent) on computational grids. Bag-of-Tasks applications are both relevant and amendable for execution on grids. However, few users currently execute their Bag-of-Tasks applications on grids. We i ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
We here discuss how to run Bag-of-Tasks applications (those parallel applications whose tasks are independent) on computational grids. Bag-of-Tasks applications are both relevant and amendable for execution on grids. However, few users currently execute their Bag-of-Tasks applications on grids. We investigate the reason for this state of affairs and introduce MyGrid, a system designed to overcome the identified difficulties. MyGrid provides a simple, complete and secure way for a user to run Bag-of-Tasks applications on all resources she has access to. Besides putting together a complete solution useful for real users, MyGrid embeds two important research contributions to grid computing. First, we introduce some simple working environment abstractions that hide the configuration heterogeneity of the machines that compose the grid from the user. Second, we introduce Work Queue with Replication (WQR), a scheduling heuristics that attains good performance without relying on information about the grid or the application, although consuming a few more cycles. Note that not depending on information makes WQR much easier to deploy in practice. 1 1
Open Grid: A User-Centric Approach for Grid Computing
- In 13th Symposium on Computer Architecture and High Performance Computing
, 2001
"... The possibility of combining multiple machines geographically dispersed as the platform for the execution of parallel applications is enticing, having sparked considerable research in an area that has been called Grid Computing. Despite intensive research effort, though, those for whom Grid Computin ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
The possibility of combining multiple machines geographically dispersed as the platform for the execution of parallel applications is enticing, having sparked considerable research in an area that has been called Grid Computing. Despite intensive research effort, though, those for whom Grid Computing has been aimed have been slow to adopt this new technology. We claim that there are two main reasons for that. First, the user has not been properly supported in her everyday activities in the sense that there is a lack of a cohesive gridwide working environment. Second, current efforts assume that the software they generate will be widely deployed across the whole grid, which is hard to achieve due to the specialized interest in high-performance computation. In this paper, we argue that the massive acceptance of Grid Computing technology depends on building solutions that are open (do not require a particular infrastructure), extensible (ease the addition of refinements) and complete (cover the whole production cycle). We then describe our system, called Open Grid, that implements our viewpoint and targets users of coarse-grain parallel applications. We hope that Open Grid will constitute itself into a step towards a widely used computational grid. Keywords -- Grid Computing; Internet Computing; Massively Parallel Processing I.
Simbatch: an API for simulating and predicting the performance of parallel resources managed by batch systems
- in "Workshop on Secure, Trusted, Manageable and Controllable Grid Services (SGS), held in conjunction with EuroPar’08
, 2008
"... apport de recherche ..."
Workflow task clustering for best effort systems with pegasus
- In MG ’08: Proceedings of the 15th ACM Mardi Gras conference
, 2008
"... Many scientific workflows are composed of fine computational granularity tasks, yet they are composed of thousands of them and are data intensive in nature, thus requiring resources such as the TeraGrid to execute efficiently. In order to improve the performance of such applications, we often employ ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Many scientific workflows are composed of fine computational granularity tasks, yet they are composed of thousands of them and are data intensive in nature, thus requiring resources such as the TeraGrid to execute efficiently. In order to improve the performance of such applications, we often employ task clustering techniques to increase the computational granularity of workflow tasks. The goal is to minimize the completion time of the workflow by reducing the impact of queue wait times. In this paper, we examine the performance impact of the clustering techniques using the Pegasus workflow management system. Experiments performed using an astronomy workflow on the NCSA TeraGrid cluster show that clustering can achieve a significant reduction in the workflow completion time (upto 97%).
Performance Impact of Resource Provisioning on Workflows
, 2006
"... Resource provisioning refers to the dedicated use of certain set of resources for certain timeframe. In this paper, we study the effect of using resource provisioning on the completion time of workflows, represented as a directed acyclic graph (DAG). Provisioning can be done statically using advance ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Resource provisioning refers to the dedicated use of certain set of resources for certain timeframe. In this paper, we study the effect of using resource provisioning on the completion time of workflows, represented as a directed acyclic graph (DAG). Provisioning can be done statically using advance reservations or using dynamic provisioning mechanisms such as Condor Glidein. A simulation is done using the Maui simulator, a workload trace collected from the NCSA Teragrid cluster and 13 workflows. The results show in general a reduction of about 50 % in the workflow completion time using provisioning for the FIFO and fair share scheduling policies over no provisioning. We also address the issue of utilization of the provisioned resources using a simple metric which tries to maximize the ratio of utilization to the completion time. Finally we present the results of a survey on the support of advance reservation at the Grid sites.

