Results 1 - 10
of
25
Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors
, 1999
"... Devices]: Modes of Computation---Parallelism and concurrency General Terms: Algorithms, Design, Performance, Theory Additional Key Words and Phrases: Automatic parallelization, DAG, multiprocessors, parallel processing, software tools, static scheduling, task graphs This research was supported ..."
Abstract
-
Cited by 142 (4 self)
- Add to MetaCart
Devices]: Modes of Computation---Parallelism and concurrency General Terms: Algorithms, Design, Performance, Theory Additional Key Words and Phrases: Automatic parallelization, DAG, multiprocessors, parallel processing, software tools, static scheduling, task graphs This research was supported by the Hong Kong Research Grants Council under contract numbers HKUST 734/96E, HKUST 6076/97E, and HKU 7124/99E. Authors' addresses: Y.-K. Kwok, Department of Electrical and Electronic Engineering, The University of Hong Kong, Pokfulam Road, Hong Kong; email: ykwok@eee.hku.hk; I. Ahmad, Department of Computer Science, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong. Permission to make digital / hard copy of part or all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and / or a fee. 2000 ACM 0360-0300/99/1200--0406 $5.00 ACM Computing Surveys, Vol. 31, No. 4, December 1999 1.
PYRROS: Static Task Scheduling and Code Generation for Message Passing Multiprocessors
- The 6th ACM Int'l Conf. on Supercomputing
, 1992
"... We describe a parallel programming tool for scheduling static task graphs and generating the appropriate target code for message passing MIMD architectures. The computational complexity of the system is almost linear to the size of the task graph and preliminary experiments show performance comparab ..."
Abstract
-
Cited by 81 (21 self)
- Add to MetaCart
We describe a parallel programming tool for scheduling static task graphs and generating the appropriate target code for message passing MIMD architectures. The computational complexity of the system is almost linear to the size of the task graph and preliminary experiments show performance comparable to the "best" hand-written programs. 1 Introduction In this paper, we consider static scheduling and code generation for message passing architectures. There are generally three distinct ways in addressing the programming difficulties for distributed memory architectures. The first approach considers the problem of automatic parallelization and scheduling from sequential programs. The emphasis has been in the development of compilers or software tools that will assist in programming parallel architectures [2, 16, 18, 19]. Since message passing architectures require coarse grain parallelism to be efficient, one difficulty is the identification of parallelism especially at the procedural ...
Benchmarking and Comparison of the Task Graph Scheduling Algorithms
, 1999
"... The problem of scheduling a parallel program represented by a weighted directed acyclic graph (DAG) to a set of homogeneous processors for minimizing the completion time of the program has been extensively studied. The NP-completeness of the problem has stimulated researchers to propose a myriad of ..."
Abstract
-
Cited by 67 (2 self)
- Add to MetaCart
The problem of scheduling a parallel program represented by a weighted directed acyclic graph (DAG) to a set of homogeneous processors for minimizing the completion time of the program has been extensively studied. The NP-completeness of the problem has stimulated researchers to propose a myriad of heuristic algorithms. While most of these algorithms are reported to be efficient, it is not clear how they compare against each other. A meaningful performance evaluation and comparison of these algorithms is a complex task and it must take into account a number of issues. First, most scheduling algorithms are based upon diverse assumptions, making the performance comparison rather purposeless. Second, there does not exist a standard set of benchmarks to examine these algorithms. Third, most algorithms are evaluated using small problem sizes, and, therefore, their scalability is unknown. In this paper, we first provide a taxonomy for classifying various algorithms into distinct categories a...
Optimized Rapid Prototyping for Real-Time Embedded Heterogeneous Multiprocessors
, 1999
"... This paper presents an enhancement of our "Algorithm Architecture Adequation" (AAA) prototyping methodology which allows to rapidly develop and optimize the implementation of a reactive real-time dataflow algorithm on a embedded heterogeneous multiprocessor architecture, predict its real-time behavi ..."
Abstract
-
Cited by 32 (9 self)
- Add to MetaCart
This paper presents an enhancement of our "Algorithm Architecture Adequation" (AAA) prototyping methodology which allows to rapidly develop and optimize the implementation of a reactive real-time dataflow algorithm on a embedded heterogeneous multiprocessor architecture, predict its real-time behavior and automatically generate the corresponding distributed and optimized static executive. It describes a new optimization heuristic able to support heterogeneous architectures and takes into account accurately inter-processor communications, which are usually neglected but may reduce dramatically multiprocessor performances. 1 Introduction The increasing complexity of signal, image and control processing algorithms in embedded applications, requires high computational power to satisfy real-time constraints. This power can be achieved by parallel multiprocessors which are often heterogeneous in embedded system: they are made of different types of processors interconnected by different typ...
Benchmarking the task graph scheduling algorithms
- in IPPS/SPDP
, 1998
"... Abstract † The problem of scheduling a weighted directed acyclic graph (DAG) to a set of homogeneous processors to minimize the completion time has been extensively studied. The NPcompleteness of the problem has instigated researchers to propose a myriad of heuristic algorithms. While these algorith ..."
Abstract
-
Cited by 24 (2 self)
- Add to MetaCart
Abstract † The problem of scheduling a weighted directed acyclic graph (DAG) to a set of homogeneous processors to minimize the completion time has been extensively studied. The NPcompleteness of the problem has instigated researchers to propose a myriad of heuristic algorithms. While these algorithms are individually reported to be efficient, it is not clear how effective they are and how well they compare against each other. A comprehensive performance evaluation and comparison of these algorithms entails addressing a number of difficult issues. One of the issues is that a large number of scheduling algorithms are based upon radically different assumptions, making their comparison on a unified basis a rather intricate task. Another issue is that there is no standard set of benchmarks that can be used to evaluate and compare these algorithms. Furthermore, most algorithms are evaluated using small problem sizes, and it is not clear how their performance scales with the problem size. In this paper, we first provide a taxonomy for classifying various algorithms into different categories according to their assumptions and functionalities. We then propose a set of benchmarks which are of diverse structures without being biased towards a particular scheduling technique and still allow variations in important parameters. We have evaluated 15 scheduling algorithms, and compared them using the proposed benchmarks. Based upon the design philosophies and principles behind these algorithms, we interpret the results and discuss why some algorithms perform better than the others.
An Algorithm for Automatically Obtaining Distributed and Fault-Tolerant Static Schedules
- In International Conference on Dependable Systems and Networks, DSN’03
, 2003
"... Our goal is to automatically obtain a distributed and fault-tolerant embedded system: distributed because the system must run on a distributed architecture; fault-tolerant because the system is critical. Our starting point is a source algorithm, a target distributed architecture, some distribution c ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
Our goal is to automatically obtain a distributed and fault-tolerant embedded system: distributed because the system must run on a distributed architecture; fault-tolerant because the system is critical. Our starting point is a source algorithm, a target distributed architecture, some distribution constraints, some indications on the execution times of the algorithm operations on the processors of the target architecture, some indications on the communication times of the data-dependencies on the communication links of the target architecture, a number of fail-silent processor failures that the obtained system must tolerate, and finally some real-time constraints that the obtained system must satisfy. In this article, we present a scheduling heuristic which, given all these inputs, produces a fault-tolerant, distributed, and static scheduling of the algorithm on the architecture, with an indication whether or not the real-time constraints are satisfied. The algorithm we propose consist of a list scheduling heuristic based active replication strategy, that allows at least +1 replicas of an operation to be scheduled on different processors, which are run in parallel to tolerate at most failures. Due to the strategy used to schedule operations, simulation results show that the proposed heuristic improve the performance of our method, both in the absence and in the presence of failures. Keywords: Fault Tolerance in Distributed and Real-Time Systems, Safety-Critical Systems, software implemented fault-tolerance, multi-component architectures, distribution heuristics.
An evaluation framework and instruction set architecture for ion-trap based quantum micro-architectures
- In Proc. 32nd Annual International Symposium on Computer Architecture
, 2005
"... The theoretical study of quantum computation has yielded efficient algorithms for some traditionally hard problems. Correspondingly, experimental work on the underlying physical implementation technology has progressed steadily. However, almost no work has yet been done which explores the architectu ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
The theoretical study of quantum computation has yielded efficient algorithms for some traditionally hard problems. Correspondingly, experimental work on the underlying physical implementation technology has progressed steadily. However, almost no work has yet been done which explores the architecture design space of large scale quantum computing systems. In this paper, we present a set of tools that enable the quantitative evaluation of architectures for quantum computers. The infrastructure we created comprises a complete compilation and simulation system for computers containing thousands of quantum bits. We begin by compiling complete algorithms into a quantum instruction set. This ISA enables the simple manipulation of quantum state. Another tool we developed automatically transforms quantum software into an equivalent, fault-tolerant version required to operate on real quantum devices. Next, our infrastructure transforms the ISA into a set of low-level micro architecture specific control operations. In the future, these operations can be used to directly control a quantum computer. For now, our simulation framework quickly uses them to determine the reliability of the application for the target micro architecture. Finally, we propose a simple, regular architecture for iontrap based quantum computers. Using our software infrastructure, we evaluate the design trade offs of this micro architecture. 1
Fault-Tolerant Deployment of Embedded Software for Cost-Sensitive Real-Time Feedback-Control Applications
- In Procs. of Design Automation and Test in Europe
, 2004
"... Designing cost-sensitive real-time control systems for safety critical applications requires a careful analysis of the cost/coverage trade-offs of fault-tolerant solutions. This further complicates the difficult task of deploying the embedded software that implements the control algorithms on the ex ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
Designing cost-sensitive real-time control systems for safety critical applications requires a careful analysis of the cost/coverage trade-offs of fault-tolerant solutions. This further complicates the difficult task of deploying the embedded software that implements the control algorithms on the execution platform that is often distributed around the plant (as it is typical, for instance, in automotive applications). We propose a synthesis-based design methodology that relieves the designers from the burden of specifying detailed mechanisms for addressing platform faults, while involving them in the definition of the overall fault-tolerance strategy. Thus, they can focus on addressing plant faults within their control algorithms, selecting the best components for the execution platform, and defining an accurate fault model. Our approach is centered on a new model of computation, Fault Tolerant Data Flows (FTDF), that enables the integration of formal validation techniques.
An Approximation Algorithm for Scheduling Dependent Tasks on M Processors With Small Communication Delays
, 1995
"... This paper defines and studies an approximation algorithm for scheduling tasks with small communication delays on m processors. In a first step, a schedule oe 1 for the problem instance with an unlimited number of processors is generated with a polynomial algorithm with relative performance bo ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
This paper defines and studies an approximation algorithm for scheduling tasks with small communication delays on m processors. In a first step, a schedule oe 1 for the problem instance with an unlimited number of processors is generated with a polynomial algorithm with relative performance bounded by 2(1 + ae) 2 + ae , where ae denotes the ratio between the greatest communication delay and the smallest processing time. Then this solution is used to solve the resource conflicts during the scheduling phase on m processors, with a rather unusual feature: a processor may remain idle even if some tasks are feasible in order to wait for a more important task. The schedule generated by this algorithm is proved to have an overall worst case performance 4 + 3ae 2 + ae \Gamma 2 + 2ae (2 + ae)m , improving the best known ratio 2 + ae \Gamma 2 m . Keywords : scheduling, makespan, communication delays. AMS subject classification : 68M20. 1 1 Introduction With the recen...
Thread Partitioning and Scheduling Based on Cost Model
, 1997
"... There has been considerable interest in implementing a multithreaded program execution and architecture model on a multiprocessor whose primary processors consist of today's off-the-shelf microprocessors. Unlike some custom-designed multithreaded processor architectures, which can interleave mult ..."
Abstract
-
Cited by 10 (7 self)
- Add to MetaCart
There has been considerable interest in implementing a multithreaded program execution and architecture model on a multiprocessor whose primary processors consist of today's off-the-shelf microprocessors. Unlike some custom-designed multithreaded processor architectures, which can interleave multiple threads concurrently, conventional processors can only execute one thread at a time. This presents a unique and challenging problem to the compiler: partition a program into threads so that it executes both correctly and in minimal time. We present a new heuristic algorithm based on an interesting extension of the classical list scheduling algorithm. Based on a cost model, our algorithm groups instructions into threads by considering the trade-offs among parallelism, latency tolerance, thread switching costs and sequential execution efficiency. The proposed algorithm has been implemented, and its performance measured through experiments on a variety of architecture parameters a...

