Results 1  10
of
39
SingleISA Heterogeneous MultiCore Architectures: The Potential for Processor Power Reduction
, 2003
"... This paper proposes and evaluates singleISA heterogeneous multicore architectures as a mechanism to reduce processor power dissipation. Our design incorporates heterogeneous cores representing different points in the power/performance design space; during an application 's execution, system s ..."
Abstract

Cited by 349 (22 self)
 Add to MetaCart
This paper proposes and evaluates singleISA heterogeneous multicore architectures as a mechanism to reduce processor power dissipation. Our design incorporates heterogeneous cores representing different points in the power/performance design space; during an application 's execution, system software dynamically chooses the most appropriate core to meet specific performance and power requirements.
Scheduling Strategies for MasterSlave Tasking on Heterogeneous Processor Grids
, 2002
"... In this paper, we consider the problem of allocating a large number of independent, equalsized tasks to a heterogeneous "grid" computing platform. We use a nonoriented graph to model a grid, where resources can have different speeds of computation and communication, as well as different ..."
Abstract

Cited by 99 (30 self)
 Add to MetaCart
(Show Context)
In this paper, we consider the problem of allocating a large number of independent, equalsized tasks to a heterogeneous "grid" computing platform. We use a nonoriented graph to model a grid, where resources can have different speeds of computation and communication, as well as different overlap capabilities. We show how to determine the optimal steadystate scheduling strategy for each processor (the fraction of time spent computing and the fraction of time spent communicating with each neighbor). This result holds for a quite general framework, allowing for cycles and multiple paths in the interconnection graph, and allowing for several masters. Because
BandwidthCentric Allocation of Independent Tasks on Heterogeneous Platforms
 In International Parallel and Distributed Processing Symposium (IPDPS’2002). IEEE Computer
, 2001
"... In this paper, we consider the problem of allocating a large number of independent, equalsized tasks to a heterogenerous "grid" computing platform. Such problems arise in collaborative computing eorts like SETI@home. We use a tree to model a grid, where resources can have dierent speeds ..."
Abstract

Cited by 84 (28 self)
 Add to MetaCart
(Show Context)
In this paper, we consider the problem of allocating a large number of independent, equalsized tasks to a heterogenerous "grid" computing platform. Such problems arise in collaborative computing eorts like SETI@home. We use a tree to model a grid, where resources can have dierent speeds of computation and communication, as well as dierent overlap capabilities. We dene a base model, and show how to determine the maximum steadystate throughput of a node in the base model, assuming we already know the throughput of the subtrees rooted at the node's children. Thus, a bottomup traversal of the tree determines the rate at which tasks can be processed in the full tree. The best allocation is bandwidthcentric: if enough bandwidth is available, then all nodes are kept busy; if bandwidth is limited, then tasks should be allocated only to the children which have suciently small communication times, regardless of their computation power. We then show how nodes with other capabilities ones that allow more or less overlapping of computation and communication than the base model can be transformed to equivalent nodes in the base model. We also show how to handle a more general communication model. Finally, we present simulation results of several demanddriven task allocation policies that show that our bandwidthcentric method obtains better results than allocating tasks to all processors on a rstcome, rst serve basis. Key words: heterogeneous computer, allocation, scheduling, grid, metacomputing. Corresponding author: Jeanne Ferrante The work of Larry Carter and Jeanne Ferrante was performed while visiting LIP. 1 1
UMR: A MultiRound Algorithm for Scheduling Divisible Workloads
 In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS’03
, 2003
"... Divisible load applications occur in many fields of science and engineering, can be easily parallelized in a masterworker fashion, but pose several scheduling challenges. While a number of approaches have been proposed that allocate work to workers in a single round, using multiple rounds improves ..."
Abstract

Cited by 82 (7 self)
 Add to MetaCart
(Show Context)
Divisible load applications occur in many fields of science and engineering, can be easily parallelized in a masterworker fashion, but pose several scheduling challenges. While a number of approaches have been proposed that allocate work to workers in a single round, using multiple rounds improves overlap of computation with communication. Unfortunately, multiround algorithms are difficult to analyze and have thus received only limited attention. In this paper we answer three open questions in the multiround divisible load scheduling area: (i) How to account for latencies? (ii) How to account for heterogeneous platforms; and (iii) How many rounds should be used? To answer (i), we derive the first closedform optimal schedule for a homogeneous platform with both computation and communication latencies, for a given number of rounds. To answer (ii) and (iii), we present a novel algorithm, UMR. We use simulation to evaluate UMR in a variety of realistic scenarios.
Autonomous protocols for bandwidthcentric scheduling of independenttask applications
 In International Parallel and Distributed Processing Symposium IPDPS’2003. IEEE Computer
, 2003
"... IEEE. ..."
(Show Context)
RUMR: Robust Scheduling for Divisible Workloads
 Proceedings of the 12th IEEE Symposium on High Performance and Distributed Computing (HPDC12
, 2003
"... Divisible workload applications arise in many fields of science and engineering. They can be parallelized in masterworker fashion and relevant scheduling strategies have been proposed to reduce application makespan. Our goal is to develop a practical divisible workload scheduling strategy. This req ..."
Abstract

Cited by 22 (4 self)
 Add to MetaCart
(Show Context)
Divisible workload applications arise in many fields of science and engineering. They can be parallelized in masterworker fashion and relevant scheduling strategies have been proposed to reduce application makespan. Our goal is to develop a practical divisible workload scheduling strategy. This requires that previous work be revisited as several usual assumptions about the computing platform do not hold in practice. We have partially addressed this concern in a previous paper via an algorithm that achieves high performance with realistic resource latency models. In this paper we extend our approach to account for performance prediction errors, which are expected for most realworld performance and applications. In essence, we combine ideas from multiround divisible workload scheduling, for performance, and from factoringbased scheduling, for robustness. We present simulation results to quantify the benefits of our approach compared to our original algorithm and to other previously proposed algorithms. 1
A PolynomialTime Algorithm for Allocating Independent Tasks on Heterogeneous ForkGraphs
, 2002
"... In this paper, we consider the problem of allocating a large number of independent, equalsized tasks to a heterogeneous processor farm. The master processor P 0 can process a task within w 0 timeunits; it communicates a task in d i timeunits to the ith slave P i , 1 i p, which requires w i ..."
Abstract

Cited by 19 (11 self)
 Add to MetaCart
In this paper, we consider the problem of allocating a large number of independent, equalsized tasks to a heterogeneous processor farm. The master processor P 0 can process a task within w 0 timeunits; it communicates a task in d i timeunits to the ith slave P i , 1 i p, which requires w i timeunits to process it. We assume communicationcomputation overlap capabilities for each slave (and for the master), but the communication medium is exclusive: the master can only communicate with a single slave at each timestep. We give a
Cost minimization while satisfying hard/soft timing constraints for heterogeneous embedded systems
 ACM Transactions on Design Automation of Electronic Systems (TODAES
, 2009
"... In highlevel synthesis for realtime embedded systems using heterogeneous functional units (FUs), it is critical to select the best FU type for each task. However, some tasks may not have fixed execution times. This article models each varied execution time as a probabilistic random variable and so ..."
Abstract

Cited by 15 (6 self)
 Add to MetaCart
In highlevel synthesis for realtime embedded systems using heterogeneous functional units (FUs), it is critical to select the best FU type for each task. However, some tasks may not have fixed execution times. This article models each varied execution time as a probabilistic random variable and solves heterogeneous assignment with probability (HAP) problem. The solution of the HAP problem assigns a proper FU type to each task such that the total cost is minimized while the timing constraint is satisfied with a guaranteed confidence probability. The solutions to the HAP problem are useful for both hard realtime and soft realtime systems. Optimal algorithms are proposed to find the optimal solutions for the HAP problem when the input is a tree or a simple path. Two other algorithms, one is optimal and the other is nearoptimal heuristic, are proposed to solve the general problem. The experiments show that our algorithms can effectively reduce the total cost while satisfying timing constraints with guaranteed confidence probabilities. For example, our algorithms achieve an average reduction of 33.0 % on total cost with 0.90 confidence probability satisfying timing constraints compared with the previous work using worstcase scenario.
Scheduling strategies for mixed data and task parallelism on heterogeneous clusters and grids
, 2003
"... ..."
Efficient assignment and scheduling for heterogeneous dsp systems
 IEEE Trans. on Parallel and Distributed Systems
, 2005
"... This paper addresses high level synthesis for realtime digital signal processing (DSP) architectures using heterogeneous functional units (FUs). For such special purpose architecture synthesis, an important problem is how to assign a proper FU type to each operation of a DSP application and genera ..."
Abstract

Cited by 14 (6 self)
 Add to MetaCart
This paper addresses high level synthesis for realtime digital signal processing (DSP) architectures using heterogeneous functional units (FUs). For such special purpose architecture synthesis, an important problem is how to assign a proper FU type to each operation of a DSP application and generate a schedule in such a way that all requirements can be met and the total cost can be minimized. We propose a twophase approach to solve this problem. In the first phase, we solve heterogeneous assignment problem, i.e., given the types of heterogeneous FUs, a DataFlow Graph (DFG) in which each node has different execution times and costs (may relate to power, reliability, etc.) for different FU types, and a timing constraint, how to assign a proper FU type to each node such that the total cost can be minimized while the timing constraint is satisfied. In the second phase, based on the assignments obtained in the first phase, we propose a minimum resource scheduling algorithm to generate a schedule and a feasible configuration that uses as little resource as possible. We prove heterogeneous assignment problem is NPcomplete. Efficient algorithms are proposed to find an optimal solution when the given DFG is a simple path or a tree. Three other algorithms are proposed to solve the general problem. The experiments show that our algorithms can effectively reduce the total cost compared with the previous work.