Results 1 - 10
of
19
A provisioning model and its comparison with best effort for performancecost optimization in grids
- In proceedings of the Sixteenth IEEE International Symposium on High-Performance Distributed Computing (HPDC07
, 2007
"... The resource availability in Grids is generally unpredictable due to the autonomous and shared nature of the Grid resources and stochastic nature of the workload resulting in a best effort quality of service. The resource providers optimize for throughput and utilization whereas the users optimize f ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
The resource availability in Grids is generally unpredictable due to the autonomous and shared nature of the Grid resources and stochastic nature of the workload resulting in a best effort quality of service. The resource providers optimize for throughput and utilization whereas the users optimize for application performance. We present a cost-based model where the providers advertise resource availability to the user community. We also present a multi-objective genetic algorithm formulation for selecting the set of resources to be provisioned that optimizes the application performance while minimizing the resource costs. We use trace-based simulations to compare the application performance and cost using the provisioned and the best effort approach with a number of artificially generated workflow-structured applications and a seismic hazard application from the earthquake science community. The provisioned approach shows promising results when the resources are under high utilization and/or the applications have significant resource requirements.
Scheduling data-intensive workflows onto storage-constrained distributed resources
- In proceedings of the 7th IEEE Symposium on Cluster Computing and The Grid (CCGrid
, 2007
"... In this paper we examine the issue of optimizing disk usage and of scheduling large-scale scientific workflows onto distributed resources where the workflows are dataintensive, requiring large amounts of data storage, and where the resources have limited storage resources. Our approach is two-fold: ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
In this paper we examine the issue of optimizing disk usage and of scheduling large-scale scientific workflows onto distributed resources where the workflows are dataintensive, requiring large amounts of data storage, and where the resources have limited storage resources. Our approach is two-fold: we minimize the amount of space a workflow requires during execution by removing data files at runtime when they are no longer required and we schedule the workflows in a way that assures that the amount of data required and generated by the workflow fits onto the individual resources. For a workflow used by gravitationalwave physicists, we were able to improve the amount of storage required by the workflow by up to 57 %. We also designed an algorithm that can not only find feasible solutions for workflow task assignment to resources in diskspace constrained environments, but can also improve the overall workflow performance. 1.
Scheduling Workflows with Budget Constraints
- in Integrated Research in Grid Computing, S. Gorlatch and M. Danelutto, Eds.: CoreGrid series
, 2007
"... Abstract Grids are emerging as a promising solution for resource and computation demanding applications. However, the heterogeneity of resources in Grid computing, complicates resource management and scheduling of applications. In addition, the commercialization of the Grid requires policies that ca ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Abstract Grids are emerging as a promising solution for resource and computation demanding applications. However, the heterogeneity of resources in Grid computing, complicates resource management and scheduling of applications. In addition, the commercialization of the Grid requires policies that can take into account user requirements, and budget considerations in particular. This paper considers a basic model for workflow applications modelled as Directed Acyclic Graphs (DAGs) and investigates heuristics that allow to schedule the nodes of the DAG (or tasks of a workflow) onto resources in a way that satisfies a budget constraint and is still optimized for overall time. Two different approaches are implemented, evaluated and presented using four different types of basic DAGs.
Data Placement for Scientific Applications in Distributed Environments
"... Abstract — Scientific applications often perform complex computational analyses that consume and produce large data sets. We are concerned with data placement policies that distribute data in ways that are advantageous for application execution, for example, by placing data sets so that they may be ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abstract — Scientific applications often perform complex computational analyses that consume and produce large data sets. We are concerned with data placement policies that distribute data in ways that are advantageous for application execution, for example, by placing data sets so that they may be staged into or out of computations efficiently or by replicating them for improved performance and reliability. In particular, we propose to study the relationship between data placement services and workflow management systems. In this paper, we explore the interactions between two services used in large-scale science today. We evaluate the benefits of prestaging data using the Data Replication Service versus using the native data stage-in mechanisms of the Pegasus workflow management system. We use the astronomy application, Montage, for our experiments and modify it to study the effect of input data size on the benefits of data prestaging. As the size of input data sets increases, prestaging using a data placement service can significantly improve the performance of the overall analysis. I.
Data Throttling for Data-Intensive Workflows
"... Abstract — Existing workflow systems attempt to achieve high performance by intelligently scheduling tasks on resources, sometimes even attempting to move the largest data files on the highest-capacity links. However, such approaches are inherently limited, in that there is only minimal control avai ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Abstract — Existing workflow systems attempt to achieve high performance by intelligently scheduling tasks on resources, sometimes even attempting to move the largest data files on the highest-capacity links. However, such approaches are inherently limited, in that there is only minimal control available regarding the arrival time and rate of data transfer between nodes, resulting in unbalanced workflows in which one task is idle while waiting for data to arrive. This paper describes a data throttling framework that can be exploited by workflow systems to uniquely regulate the rate of data transfers between the workflow tasks via a specially-created QoS-enabled GridFTP server. Our workflow planner constructs a schedule that both specifies when/where individual tasks are to be executed, as well as when and at what rate data is to be transferred. Simulation results involving a simple workflow indicate that our system can achieve a 30 % speedup when nodes show a computation/communication ratio of approximately 0.5. We reinforce and confirm these results via the actual implementation of the Montage workflow in the wide area, obtaining a maximum speedup of 31 % and an average speedup with 16%. Overall, we believe that our data throttling Grid workflow system both executes workflows more efficiently (by better establishing balanced workflow graphs) and operates more cooperatively with unrelated concurrent Grid activities by consuming less overall network bandwidth, allowing such unrelated activities to execute more efficiently as well.
An Opportunistic Algorithm for Scheduling Workflows on Grids
"... Abstract. The execution of scientific workflows in Grid environments imposes many challenges due to the dynamic nature of such environments and the characteristics of scientific applications. This work presents an algorithm that dynamically schedules tasks of workflows to Grid sites based on the per ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Abstract. The execution of scientific workflows in Grid environments imposes many challenges due to the dynamic nature of such environments and the characteristics of scientific applications. This work presents an algorithm that dynamically schedules tasks of workflows to Grid sites based on the performance of these sites when running previous jobs from the same workflow. The algorithm captures the dynamic characteristics of Grid environments without the need to probe the remote sites. We evaluated the algorithm running a workflow in the Open Science Grid using tweve sites. The results showed improvement up to 120 % relative to other four usual scheduling strategies. 1
A Survey of Distributed Workflow Characteristics and Resource Requirements
"... Workflows have been used to model repeatable tasks or operations in a number of different industries including manufacturing and software. In recent years, workflows are increasingly used in distributed resources and web services environments through resource models such as grid and cloud computing. ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Workflows have been used to model repeatable tasks or operations in a number of different industries including manufacturing and software. In recent years, workflows are increasingly used in distributed resources and web services environments through resource models such as grid and cloud computing. These workflows often have disparate requirements and constraints that need to be accounted for during workflow orchestration. In this paper, we present workflow examples from different domains including bioinformatics and biomedical, weather and ocean modeling, astronomy detailing their data and computational requirements. 1
Optimizing workflow data footprint
- Sci. Program
, 2007
"... In this paper we examine the issue of optimizing disk usage and scheduling large-scale scientific workflows onto distributed resources where the workflows are data-intensive, requiring large amounts of data storage, and the resources have limited storage resources. Our approach is two-fold: we minim ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
In this paper we examine the issue of optimizing disk usage and scheduling large-scale scientific workflows onto distributed resources where the workflows are data-intensive, requiring large amounts of data storage, and the resources have limited storage resources. Our approach is two-fold: we minimize the amount of space a workflow requires during execution by removing data files at runtime when they are no longer needed and we demonstrate that workflows may have to be restructured to reduce the overall data footprint of the workflow. We show the results of our data management and workflow restructuring solutions using a Laser Interferometer Gravitational-Wave Observatory (LIGO) application and an astronomy application, Montage, running on a large-scale production grid─the Open Science Grid. We show that although reducing the data footprint of Montage by 48 % can be achieved with dynamic data cleanup techniques, LIGO Scientific Collaboration workflows require additional restructuring to achieve a 56 % reduction in data space usage. We also examine the cost of the workflow restructuring in terms of the application’s runtime. 1.
A Data Placement Service for Petascale Applications
- in Petascale Data Storage Workshop, Supercomputing 2007
, 2007
"... We examine the use of policy-driven data placement services to improve the performance of data-intensive, petascale applications in high performance distributed computing environments. In particular, we are interested in using an asynchronous data placement service to stage data in and out of applic ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We examine the use of policy-driven data placement services to improve the performance of data-intensive, petascale applications in high performance distributed computing environments. In particular, we are interested in using an asynchronous data placement service to stage data in and out of application workflows efficiently as well as to distribute and replicate data according to Virtual Organization policies. We propose a data placement service architecture and describe our implementation of one layer of this architecture, which provides efficient, priority-based bulk data transfers.
Grids and clouds: Making workflow applications work in heterogeneous distributed environments
- Int. J. High Perform. Comput. Appl
, 2010
"... ..."

