• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Computation Scheduling and Data Replication Algorithm for Data Grid,” available at: A Hierarchical Approach to Improve Job Scheduling and Data Replication in Data Grid 285 http://www.mcs.anl.gov/papers/P1081.pdf, last visited (2003)

by K Ranganathan, I Foster
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 15
Next 10 →

Data Scheduling for Large Scale Distributed Applications

by Mehmet Balman, Tevfik Kosar - PROCEEDINGS OF THE 5TH ICEIS DOCTORAL CONSORTIUM, IN CONJUNCTION WITH THE INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS (ICEIS’07 , 2007
"... Current large scale distributed applications studied by large research communities result in new challenging problems in widely distributed environments. Especially, scientific experiments using geographically separated and heterogeneous resources necessitated transparently accessing distributed d ..."
Abstract - Cited by 20 (16 self) - Add to MetaCart
Current large scale distributed applications studied by large research communities result in new challenging problems in widely distributed environments. Especially, scientific experiments using geographically separated and heterogeneous resources necessitated transparently accessing distributed data and analyzing huge collection of information. We focus on data-intensive distributed computing and describe data scheduling approach to manage large scale scientific and commercial applications. We identify parameters affecting data transfer and also analyze different scenarios for possible use cases of data placement tasks to discover key attributes for performance optimization. We are planning to define crucial factors in data placement in widely distributed systems and develop a strategy to schedule data transfers according to characteristics of dynamically changing distributed environments.
(Show Context)

Citation Context

...g the scheduling decision (Venugopal et al., 2004). One recent study concludes that allocating resources closest to the data required gives the best scheduling strategy (Ranganathan and Foster, 2002; =-=Ranganathan and Foster, 2004-=-). There are numerous work done for replica management, high performance data transfer, and data storage organization; however, there is still a gap in data-aware scheduling satisfying requirements of...

Evolving Toward the Perfect Schedule: Co-scheduling Job Assignments and Data Replication in Wide-Area Systems Using a Genetic Algorithm

by Thomas Phan, Radu Sion, et al.
"... Traditional job schedulers for grid or cluster systems are responsible for assigning incoming jobs to compute nodes in such a way that some evaluative condition is met. Such systems generally take into consideration the availability of compute cycles, queue lengths, and expected job execution times, ..."
Abstract - Cited by 14 (0 self) - Add to MetaCart
Traditional job schedulers for grid or cluster systems are responsible for assigning incoming jobs to compute nodes in such a way that some evaluative condition is met. Such systems generally take into consideration the availability of compute cycles, queue lengths, and expected job execution times, but they typically do not account directly for data staging and thus miss significant associated opportunities for optimisation. Intuitively, a tighter integration of job scheduling and automated data replication can yield significant advantages due to the potential for optimised, faster access to data and decreased overall execution time. In this paper we consider data placement as a first-class citizen in scheduling and use an optimisation heuristic for generating schedules. We make the following two contributions. First, we identify the necessity for co-scheduling job dispatching and data replication assignments and posit that simultaneously scheduling both is critical for achieving good makespans. Second, we show that deploying a genetic search algorithm to solve the optimal allocation problem has the potential to achieve significant speed-up results versus traditional allocation mechanisms. Through simulation, we show that our algorithm provides on average an approximately 20-45 % faster makespan than greedy schedulers.
(Show Context)

Citation Context

...he precedence relationship between the applications and the data and do not consider optimisation, e.g. [13]. Other recent approaches for co-scheduling provide greedy, sub-optimal solutions, e.g. [4] =-=[19]-=- [16]. This work includes the following two contributions. First, we identify the necessity for co-scheduling job dispatching and data replication and posit that simultaneously scheduling both is crit...

I.: Planning Spatial Workflows to Optimize Grid Performance

by Luiz Meyer, James Annis, Marta Mattoso, Mike Wilde, Ian Foster - in Distributed systems and grid computing (DSGC). 2006: ACM Press , 2005
"... Abstract. In many scientific workflows, particularly those that operate on spatially oriented data, jobs that process adjacent regions of space often reference large numbers of files in common. Such workflows, when processed using workflow planning algorithms that are unaware of the application’s fi ..."
Abstract - Cited by 11 (1 self) - Add to MetaCart
Abstract. In many scientific workflows, particularly those that operate on spatially oriented data, jobs that process adjacent regions of space often reference large numbers of files in common. Such workflows, when processed using workflow planning algorithms that are unaware of the application’s file reference pattern, result in a huge number of redundant file transfers between grid sites and consequently perform poorly. This work presents a generalized approach to planning spatial workflow schedules for Grid execution based on the spatial proximity of files and the spatial range of jobs. We evaluate our solution to this problem using the file access pattern of an astronomy application that performs co-addition of images from the Sloan Digital Sky Survey. We show that, in initial tests on Grids of 5 to 25 sites, our spatial clustering approach eliminates 50 % to 90 % of the file transfers between Grid sites relative to the next-best planning algorithms we tested that were not “spatially aware”. At moderate levels of concurrent file transfer, this reduction of redundant network I/O improves the application execution time by 30 % to 70%, reduces Grid network and storage overhead and is broadly applicable to a wide range of spatially-oriented problems. 1
(Show Context)

Citation Context

...anager controls the scheduling of the jobs to the sites. A replica location service (RLS) keeps track of the location of file replicas spread over the Grid. The model is based on that used in studies =-=[13,14,15]-=-. In the workflow, each file has a unique filename and is characterized by its size. Each job is characterized by a unique job identifier and the list of files it needs to process. In order to execute...

An Opportunistic Algorithm for Scheduling Workflows on Grids

by Luiz Meyer, Doug Scheftner, Jens Voeckler, Marta Mattoso, Mike Wilde, Ian Foster
"... Abstract. The execution of scientific workflows in Grid environments imposes many challenges due to the dynamic nature of such environments and the characteristics of scientific applications. This work presents an algorithm that dynamically schedules tasks of workflows to Grid sites based on the per ..."
Abstract - Cited by 8 (4 self) - Add to MetaCart
Abstract. The execution of scientific workflows in Grid environments imposes many challenges due to the dynamic nature of such environments and the characteristics of scientific applications. This work presents an algorithm that dynamically schedules tasks of workflows to Grid sites based on the performance of these sites when running previous jobs from the same workflow. The algorithm captures the dynamic characteristics of Grid environments without the need to probe the remote sites. We evaluated the algorithm running a workflow in the Open Science Grid using tweve sites. The results showed improvement up to 120 % relative to other four usual scheduling strategies. 1
(Show Context)

Citation Context

...s of data grids. Casanova et al. [2] propose an adaptive scheduling algorithm for parameter sweep applications where shared files are pre-staged strategically to improve reuse. Ranganathan and Foster =-=[17, 18]-=- evaluate a set of scheduling and replication algorithms and the impact of network bandwidth, user access patterns andsdata placement in the performance of job executions. The evaluation was done in a...

Stork data scheduler: mitigating the data bottleneck in e-science

by Tevfik Kosar, Mehmet Balman, Esma Yildirim, Sivakumar Kulasekaran, On Ross - In Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences , 1949
"... In this paper, we present the Stork Data Scheduler as a solution for mitigating the data bottleneck in e-science and data-intensive scientific discovery. Stork focuses on planning, scheduling, monitoring and management of data placement tasks and application-level end-to-end optimization of networke ..."
Abstract - Cited by 3 (2 self) - Add to MetaCart
In this paper, we present the Stork Data Scheduler as a solution for mitigating the data bottleneck in e-science and data-intensive scientific discovery. Stork focuses on planning, scheduling, monitoring and management of data placement tasks and application-level end-to-end optimization of networked I/O for petascale distributed e-Science applications. Unlike existing approaches, Stork treats data resources and the tasks related to data access and movement as first class entities just like compu-tational resources and compute tasks, and not simply the side effect of computation. Stork provides unique features such as aggregation of data transfer jobs considering their source and destination addresses, and an application-level throughput estima-tion and optimization service. We describe how these two features are implemented in Stork and their effects on end-to-end data transfer performance.

DECO: Data replication and Execution CO-scheduling for Utility Grids

by Vikas Agarwal, Gargi Dasgupta, Koustuv Dasgupta, Amit Purohit - Proceedings of International Conference on Service Oriented Computing , 2006
"... Abstract. Vendor strategies to standardize grid computing as the IT backbone for service-oriented architectures have created business opportunities to offer grid as a utility service for compute and data– intensive applications. With this shift in focus, there is an emerging need to incorporate agre ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
Abstract. Vendor strategies to standardize grid computing as the IT backbone for service-oriented architectures have created business opportunities to offer grid as a utility service for compute and data– intensive applications. With this shift in focus, there is an emerging need to incorporate agreements that represent the QoS expectations (e.g. response time) of customer applications and the prices they are willing to pay. We consider a utility model where each grid application (job) is associated with a function, that captures the revenue accrued by the provider on servicing it within a specified deadline. The function also specifies the penalty incurred on failing to meet the deadline. Scheduled execution of jobs on appropriate sites, along with timely transfer of data closer to compute sites, collectively work towards meeting these deadlines. To this end, we present DECO, a grid meta-scheduler that tightly integrates the compute and data transfer times of each job. A unique feature of DECO is that it enables differentiated QoS – by assigning profitable jobs to more powerful sites and transferring the datasets associated with them at a higher priority. Further, it employs replication of popular datasets to save on transfer times. Experimental studies demonstrate that DECO earns significantly better revenue for the grid provider, when compared to alternative scheduling methodologies. 1
(Show Context)

Citation Context

...ject is avoided by creating replicas of the object at selected sites. The Data Replication Service of the grid provides this functionality. A number of algorithms have been proposed in the literature =-=[6, 7]-=- for data replication in grids. In each case, changes in data placement are prescribed by a long–term replication process, that studies the history of accesses to data objects. Data objects are transf...

Failure-Awareness and Dynamic Adaption in Data Scheduling

by Mehmet Balman , 2008
"... ..."
Abstract - Cited by 2 (2 self) - Add to MetaCart
Abstract not found

Load Distribution of Analytical Query Workloads for Database Cluster Architectures

by Thomas Phan, Wen-syan Li - In EDBT , 2008
"... Enterprises may have multiple database systems spread across the organization for redundancy or for serving different applications. In such systems, query workloads can be distributed across different servers for better performance. A materialized view, or Materialized Query Table (MQT), is an auxil ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Enterprises may have multiple database systems spread across the organization for redundancy or for serving different applications. In such systems, query workloads can be distributed across different servers for better performance. A materialized view, or Materialized Query Table (MQT), is an auxiliary table with pre-computed data that can be used to significantly improve the performance of a database query. In this paper, we propose a framework for coordinating execution of OLAP query workloads across a database cluster with shared nothing architecture. Such coordination is complex since we need to consider (1) the time to build the MQTs, (2) the query execution impact of the MQTs, (3) whether the MQTs can fit in the disk space limitation, (4) server computation power, and (5) the effectiveness of the scheduling and placement algorithms in deriving a combination of configurations so that the workload can be completed in the shortest time period. We frame the problem as a combinatorial problem with a solution space that is exponential in the number of queries, MQTs, and servers. We provide a stochastic search heuristic that finds a near-optimal mapping of queries-to-servers and MQTs-to-servers within an arbitrarily bounded time and compare our solution with an exhaustive search and three standard greedy algorithms. Our search implementation produced schedules within 9% of the optimal found through an exhaustive search and produced better solutions than typical greedy algorithms for both TPC-H and synthetic benchmarks under a variety of experiments. For a key trial where disk space is limited, it produced 15 % better results than the next best competitor, corresponding to an absolute wall clock advantage of over 10 hours.
(Show Context)

Citation Context

...ata placement is decoupled from the job placement by scheduling the job close to or at the source of the data or by accessing a replica, where closeness is refers to a site with minimum transfer time =-=[26]-=-. Other approaches follow similar strategies for reducing the response time of jobs by minimizing the input and and output data transfer time. [22] assumes that single-file input data has already been...

A Hierarchical Approach to Improve Job Scheduling and Data Replication in Data Grid

by Somayeh Abdi , Sayyed Hashemi
"... ..."
Abstract - Add to MetaCart
Abstract not found

by

by Mehmet Balman, Mehmet Balman , 2010
"... ii Acknowledgments As a doctoral student, I had been privileged to work with my committee chair, Tevfik Kosar, during my study at Louisiana State University. As his first student, I benefited from his instructive comments and his mentorship in academic and scientific approach, that helped me expand ..."
Abstract - Add to MetaCart
ii Acknowledgments As a doctoral student, I had been privileged to work with my committee chair, Tevfik Kosar, during my study at Louisiana State University. As his first student, I benefited from his instructive comments and his mentorship in academic and scientific approach, that helped me expand my research horizon. I have acquired great knowledge and research skills by working with him. I would also like to mention my great appreciation for my committee members, Gabrielle Allen, Konstantin Busch, Jianhua Chen and Ramachandran Vaidyanathan. They have supported and encouraged me in many ways. I am heartily thankful to Evangelos Triantaphyllou for his extremely valuable guidance. I have always admired his high quality of scholarship, and I appreciate his effort as a friendly student advisor, and his advice helped me a lot. I would like to thank Thomas Sterling for his thoughtful comments during our discussions in the HPC area in general, which helped me better focus specifically on my topic. I would also like to mention Daniel S. Katz for his scholarly comments and help in the early phases of my research. I would like to thank Hugh Greenberg and Lachesar Ionkov from Los Alamos National Laboratory. I had the great opportunity to meet with David Daniel and Adolfy Hoisie at LANL. I would also
(Show Context)

Citation Context

...this is still an emerging field and there are many open problems. Data intensive characteristics of current applications has led us to investigate data-aware resource allocation and scheduling models =-=[109, 85, 86]-=-. Data transfer scheduling is also an important component in workflow management. We can simply classify the steps in a large scale application as follows; (1) obtain data from experiments or simulate...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University