Results 1 - 10
of
15
Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors
, 1999
"... Devices]: Modes of Computation---Parallelism and concurrency General Terms: Algorithms, Design, Performance, Theory Additional Key Words and Phrases: Automatic parallelization, DAG, multiprocessors, parallel processing, software tools, static scheduling, task graphs This research was supported ..."
Abstract
-
Cited by 142 (4 self)
- Add to MetaCart
Devices]: Modes of Computation---Parallelism and concurrency General Terms: Algorithms, Design, Performance, Theory Additional Key Words and Phrases: Automatic parallelization, DAG, multiprocessors, parallel processing, software tools, static scheduling, task graphs This research was supported by the Hong Kong Research Grants Council under contract numbers HKUST 734/96E, HKUST 6076/97E, and HKU 7124/99E. Authors' addresses: Y.-K. Kwok, Department of Electrical and Electronic Engineering, The University of Hong Kong, Pokfulam Road, Hong Kong; email: ykwok@eee.hku.hk; I. Ahmad, Department of Computer Science, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong. Permission to make digital / hard copy of part or all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and / or a fee. 2000 ACM 0360-0300/99/1200--0406 $5.00 ACM Computing Surveys, Vol. 31, No. 4, December 1999 1.
Benchmarking and Comparison of the Task Graph Scheduling Algorithms
, 1999
"... The problem of scheduling a parallel program represented by a weighted directed acyclic graph (DAG) to a set of homogeneous processors for minimizing the completion time of the program has been extensively studied. The NP-completeness of the problem has stimulated researchers to propose a myriad of ..."
Abstract
-
Cited by 67 (2 self)
- Add to MetaCart
The problem of scheduling a parallel program represented by a weighted directed acyclic graph (DAG) to a set of homogeneous processors for minimizing the completion time of the program has been extensively studied. The NP-completeness of the problem has stimulated researchers to propose a myriad of heuristic algorithms. While most of these algorithms are reported to be efficient, it is not clear how they compare against each other. A meaningful performance evaluation and comparison of these algorithms is a complex task and it must take into account a number of issues. First, most scheduling algorithms are based upon diverse assumptions, making the performance comparison rather purposeless. Second, there does not exist a standard set of benchmarks to examine these algorithms. Third, most algorithms are evaluated using small problem sizes, and, therefore, their scalability is unknown. In this paper, we first provide a taxonomy for classifying various algorithms into distinct categories a...
Benchmarking the task graph scheduling algorithms
- in IPPS/SPDP
, 1998
"... Abstract † The problem of scheduling a weighted directed acyclic graph (DAG) to a set of homogeneous processors to minimize the completion time has been extensively studied. The NPcompleteness of the problem has instigated researchers to propose a myriad of heuristic algorithms. While these algorith ..."
Abstract
-
Cited by 24 (2 self)
- Add to MetaCart
Abstract † The problem of scheduling a weighted directed acyclic graph (DAG) to a set of homogeneous processors to minimize the completion time has been extensively studied. The NPcompleteness of the problem has instigated researchers to propose a myriad of heuristic algorithms. While these algorithms are individually reported to be efficient, it is not clear how effective they are and how well they compare against each other. A comprehensive performance evaluation and comparison of these algorithms entails addressing a number of difficult issues. One of the issues is that a large number of scheduling algorithms are based upon radically different assumptions, making their comparison on a unified basis a rather intricate task. Another issue is that there is no standard set of benchmarks that can be used to evaluate and compare these algorithms. Furthermore, most algorithms are evaluated using small problem sizes, and it is not clear how their performance scales with the problem size. In this paper, we first provide a taxonomy for classifying various algorithms into different categories according to their assumptions and functionalities. We then propose a set of benchmarks which are of diverse structures without being biased towards a particular scheduling technique and still allow variations in important parameters. We have evaluated 15 scheduling algorithms, and compared them using the proposed benchmarks. Based upon the design philosophies and principles behind these algorithms, we interpret the results and discuss why some algorithms perform better than the others.
A Comparison of Heuristics for Scheduling DAGs on Multiprocessors
- in Proceedings of the Eighth International Parallel Processing Symposium
, 1994
"... Many algorithms to schedule DAGs on multiprocessors have been proposed, but there has been little work done to determine their effectiveness. Since multi-processor scheduling is an NP-hard problem, no exact tractable algorithm exists, and no baseline is available from which to compare the resulting ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
Many algorithms to schedule DAGs on multiprocessors have been proposed, but there has been little work done to determine their effectiveness. Since multi-processor scheduling is an NP-hard problem, no exact tractable algorithm exists, and no baseline is available from which to compare the resulting schedules. This paper is an attempt to quantify the differences in a few of the heuristics. The empiracle performance of five heuristics is compared when they are applied to ten specific DAGs which represent program dependence graphs of important applications. The comparison is made between a graph based method, a list scheduling technique and three critical path mathods. 1. Introduction One of the primary problems in creating efficient programs for multiprocessor systems with distributed memory is to partition the program into tasks that can be assigned to different processors for parallel execution. If a high degree of parallelism is the objective, a greater amount of communication will b...
Analysis, Evaluation, and Comparison of Algorithms for Scheduling Task Graphs on Parallel Processors
- In Proceedings of the Second International Symposium on Parallel Architectures, Algorithms, and Networks
, 1996
"... Abstract 1 In this paper, we survey algorithms that allocate a parallel program represented by an edge-weighted directed acyclic graph (DAG), also called a task graph or macrodataflow graph, to a set of homogeneous processors, with the objective of minimizing the completion time. We analyze 21 such ..."
Abstract
-
Cited by 21 (5 self)
- Add to MetaCart
Abstract 1 In this paper, we survey algorithms that allocate a parallel program represented by an edge-weighted directed acyclic graph (DAG), also called a task graph or macrodataflow graph, to a set of homogeneous processors, with the objective of minimizing the completion time. We analyze 21 such algorithms and classify them into four groups. The first group includes algorithms that schedule the DAG to a bounded number of processors directly. These algorithms are called the bounded number of processors (BNP) scheduling algorithms. The algorithms in the second group schedule the DAG to an unbounded number of clusters and are called the unbounded number of clusters (UNC) scheduling algorithms. The algorithms in the third group schedule the DAG using task duplication and are called the task duplication based (TDB) scheduling algorithms. The algorithms in the fourth group perform allocation and mapping on arbitrary processor network topologies. These algorithms are called the arbitrary processor network (APN) scheduling algorithms. The design philosophies and principles behind these algorithms are discussed, and the performance of all of the algorithms is evaluated and compared against each other on a unified basis by using various scheduling parameters.
On Parallelizing the Multiprocessor Scheduling Problem
- IEEE Trans. on Parallel and Distributed Systems
, 1999
"... Existing heuristics for scheduling a node and edge weighted directed task graph to multiple processors can produce satisfactory solutions but incur high time complexities which tend to exacerbate in more realistic environments with relaxed assumptions. Consequently, these heuristics do not scale wel ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
Existing heuristics for scheduling a node and edge weighted directed task graph to multiple processors can produce satisfactory solutions but incur high time complexities which tend to exacerbate in more realistic environments with relaxed assumptions. Consequently, these heuristics do not scale well and cannot handle problems of moderate sizes. A natural approach to reducing complexity while aiming for a similar or potentially better solution is to parallelize the scheduling algorithm. This can be done by partitioning the task graphs and concurrently generating partial schedules for the partitioned parts, which are then concatenated to obtain the final schedule. The problem, however, is nontrivial as there exists dependencies among the nodes of a task graph which must be preserved for generating a valid schedule. Moreover, the time clock for scheduling is global for all the processors (that are executing the parallel scheduling algorithm), making the inherent parallelism invisible. In...
Clustering and Intra-Processor Scheduling for Explicitly-Parallel Programs on Distributed-Memory Systems
- In International Parallel Processing Symposium
, 1994
"... Programs for distributed-memory systems are explicitly-parallel and comprise of a set of sequential tasks or processes that communicate via message-passing. The sequence of computation in each task together with the intermediate send and receive communication steps exhibit temporal behavior of the p ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Programs for distributed-memory systems are explicitly-parallel and comprise of a set of sequential tasks or processes that communicate via message-passing. The sequence of computation in each task together with the intermediate send and receive communication steps exhibit temporal behavior of the program. In this paper, we show that the two common models of program representation, the precedence graph and the interaction graph models, are insufficient to capture this temporal behavior and hence are not ideal for solving the clustering and the scheduling problems. We use a new Temporal Communication Graph (TCG) model to represent such explicitly-parallel programs. This model captures communication dependency and overlap of communication with computation. This provides flexibility to get a better estimate of the program completion time. New measures are developed for quantifying critical communication and inter-task parallelism on this model. We analyze the importance of intra-processor...
Efficient Circuit Partitioning Algorithms For Parallel Logic Simulation
- In Proceedings of Supercomputing ’89
, 1989
"... General purpose parallel processing machines are increasingly being used to speed up a variety of VLSI CAD applications. This paper addresses logic simulation on parallel machines by exploiting the concurrency in the circuit being simulated (called data parallelism) as opposed to exploiting paralle ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
General purpose parallel processing machines are increasingly being used to speed up a variety of VLSI CAD applications. This paper addresses logic simulation on parallel machines by exploiting the concurrency in the circuit being simulated (called data parallelism) as opposed to exploiting parallelism inherent in the simulation algorithm itself (called functional parallelism). The most crucial step in obtaining the maximum parallelism using data parallelism is the partitioning of circuit elements. We introduce a cost function which tries to model the simulation of a logic circuit in a parallel environment. The cost function tries to estimate the parallel run time for logic simulation given the processor assignment and the underlying multiprocessor architecture. We then present different heuristic algorithms to partition the circuit and evaluate the efficiency of these algorithms using the proposed cost function. Partitioning algorithms for both event-driven and compiled code simulati...
Mapping tasks to processors at run-time
- Proc. ISCIS VII (Int. Symp. on Comp. & Inf. Sciences) Antalya, Turkey (Nov.1992
, 1992
"... We consider the dynamic task allocation problem in multicomputer system with multiprogramming. Programs are given as task interaction graphs that have to be mapped onto the processors at run-time. We propose a fast two-phase heuristic algorithm where phase 1 performs a hierarchic clustering of the t ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
We consider the dynamic task allocation problem in multicomputer system with multiprogramming. Programs are given as task interaction graphs that have to be mapped onto the processors at run-time. We propose a fast two-phase heuristic algorithm where phase 1 performs a hierarchic clustering of the tasks which is used by the second phase to map clusters of suitable size onto free partitions of the processor graph. 1
On task scheduling accuracy: Evaluation methodology and results
- Journal of Supercomputing
, 2004
"... Abstract. Many heuristics based on the directed acyclic graph (DAG) have been proposed for the static scheduling problem. Most of these algorithms apply a simple model of the target system that assumes fully connected processors, a dedicated communication sub-system and no contention for the communi ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Abstract. Many heuristics based on the directed acyclic graph (DAG) have been proposed for the static scheduling problem. Most of these algorithms apply a simple model of the target system that assumes fully connected processors, a dedicated communication sub-system and no contention for the communication resources. Only a few algorithms consider the network topology and the contention for the communication resources. This article evaluates the accuracy of task scheduling algorithms and thus the appropriateness of the applied models. An evaluation methodology is proposed and applied to a representative set of scheduling algorithms. The obtained results show a significant inaccuracy of the produced schedules. Analyzing these results is important for the development of more appropriate models and more accurate scheduling algorithms.

