Results 1 
5 of
5
Task Graph Performance Bounds Through Comparison Methods
, 2001
"... When a parallel computation is represented in a formalism that imposes seriesparallel structure on its task graph, it becomes amenable to automated analysis and scheduling. Unfortunately, its execution time will usually also increase as precedence constraints are added to ensure seriesparallel str ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
When a parallel computation is represented in a formalism that imposes seriesparallel structure on its task graph, it becomes amenable to automated analysis and scheduling. Unfortunately, its execution time will usually also increase as precedence constraints are added to ensure seriesparallel structure. Bounding the slowdown ratio would allow an informed tradeoff between the benefits of a restrictive formalism and its cost in loss of performance. This dissertation deals with seriesparallelising task graphs by adding precedence constraints to a task graph, to make the resulting task graph seriesparallel. The weak bounded slowdown conjecture for seriesparallelising task graphs is introduced. This states that the slowdown is bounded if information about the workload can be used to guide the selection of which precedence constraints to add. A theory of best seriesparallelisations is developed to investigate this conjecture. Partial evidence is presented that the weak slowdown bound is likely to be 4/3, and this bound is shown to be tight.
A performance analysis of local synchronization
 In Proceedings of the 18th ACM Symposium on Parallelism in Algorithms and Architectures
, 2006
"... Synchronization is often necessary in parallel computing, but it can create delays whenever the receiving processor is idle, waiting for the information to arrive. This is especially true for barrier, or global, synchronization, in which every processor must synchronize with every other processor. N ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Synchronization is often necessary in parallel computing, but it can create delays whenever the receiving processor is idle, waiting for the information to arrive. This is especially true for barrier, or global, synchronization, in which every processor must synchronize with every other processor. Nonetheless, barriers are the only form of synchronization explicitly supplied in MPI and OpenMP. Many applications do not actually require global synchronization; local synchronization, in which a processor synchronizes only with those processors from which it has an incoming edge in some directed graph, is often adequate. However, the behavior of a system under local synchronization is more difficult to analyze, since processors do not all start tasks at the same time. In this paper, we show that if the synchronization graph is a directed cycle and the task times are geometrically distributed with p = 0.5, the time it takes for a processor to complete a task, including synchronization time, approaches an exact limit of 2 + √ 2 as the number of processors in the cycle approaches infinity. Under global synchronization, however, the time is unbounded, increasing logarithmically with the number of processors. Similar results also apply for p � = 0.5. We give a new proof of the constant upper bounds that apply when tasks are normally distributed and the synchronization graph is any graph of bounded degree. We also prove that for some powerlaw distributions on the tasks, there is no constant upper bound as the number of processors increases, even for the directed cycle. Finally, we show that constant upper bounds apply for some cases of a different synchronization model in which a processor waits for only a subset of its neighbors.
Analysis of Delays . . .
, 2010
"... Synchronization is often necessary in parallel computing, but it can create delays whenever the receiving processor is idle, waiting for the information to arrive. This is especially true for barrier, or global, synchronization, in which every processor must synchronize with every other processor. N ..."
Abstract
 Add to MetaCart
Synchronization is often necessary in parallel computing, but it can create delays whenever the receiving processor is idle, waiting for the information to arrive. This is especially true for barrier, or global, synchronization, in which every processor must synchronize with every other processor. Nonetheless, barriers are the only form of synchronization explicitly supplied in OpenMP, and they occur whenever collective communication operations are used in MPI. Many applications do not actually require global synchronization; local synchronization, in which a processor synchronizes only with those processors from or to which information or resources are needed, is often adequate. However, when tasks take varying amounts of time the behavior of a system under local synchronization is more difficult to analyze since processors do not start tasks at the same time. We show that when the synchronization dependencies form a directed cycle and the task times are geometrically distributed with p =0.5, then as the number of processors tends to infinity the processors are working 2 − √ 2 ≈ 0.59 % of the time. Under global synchronization, however, the time to complete each task is unbounded, increasing logarithmically with the number of processors. Similar results apply for p ̸ = 0.5. We also present some of the combinatorial properties of the synchronization problem with geometrically distributed tasks on an undirected cycle. Nondeterministic synchronization is also examined, where processors decide randomly at the beginning of each task which neighbors(s) to synchronize with. We show that the expected number of task dependencies for random synchronization on an undirected cycle is the same as for deterministic synchronization on a directed cycle. Simulations are included to extend the analytic results. They show that more heavytailed distributions can actually create fewer delays than less heavytailed ones if the number of processors is small for some randomneighbor synchronization models. The results also show the rate of convergence to the steady state for various task distributions and synchronization graphs.