| V. Sarkar and J. Hennessy. Partitioning Parallel Programs for Macro-Dataflow. In ACM Conference on Lisp and Functional Programming, pages 202--211, August 1986. |
....very similar to the Jacobson predictor [15] but with the addition of associativity to reduce aliasing effects in the second level tables. 3.4. Dynamic partitioning by MEM slicing Partitioning refers to the problem of dividing the sequential program into threads. Partitioning is known NP hard [30], and solutions are generally heuristic based. An algorithm must balance the often conflicting needs of control flow, data flow, and load balance. Ideally, threads are control predictable, contain no short inter thread data dependencies, and are all approximately the same size. While some work has ....
V. Sarkar and J. Hennessy, "Partitioning parallel programs for macro-dataflow", In Conference Proceedings of the 1986 ACM Conference on Lisp and Functional Programming, pp. 192-201, 1986.
....and an extensible portion that administrators can use to implement new scheduling policies or to extend the basic descriptions of requirements or abilities. To determine the basis for the fixed portion of the description vector, we reviewed 18 algorithms from the existing scheduling literature [2 5, 7, 13, 17 20, 22 24, 26 28, 30, 31]. Table 3 depicts the resulting data set. We found that only two characteristics processor speed and inter processor communication time estimates were used by more than four algorithms. Therefore, we included processor speed estimates in the description vector and provide a mechanism to ....
V. Sarkar and J. Hennessy. Partitioning Parallel Programs for Macro-Dataflow. In ACM Conference on Lisp and Functional Programming, pages 202--211, August 1986.
.... on new tasks loadave an estimate of the load average for the entire system Procclass information on the different classes of processors in the system To determine the basis for the fixed portion of the description vector, we reviewed 18 algorithms from the existing scheduling literature [2 5, 7, 13, 17 20, 22 24, 26 28, 30, 31]. Table 1 depicts the resulting data set. We found that only two characteristics processor speed and inter processor communication time estimates were used by more than four algorithms. Therefore, we included processor speed estimates in the description vector and provide a mechanism to ....
V. Sarkar and J. Hennessy. Partitioning Parallel Programs for Macro-Dataflow. In ACM Conference on Lisp and Functional Programming, pages 202--211, August 1986.
....typically executed as processes on different processors, and scheduled by the operating system. Scheduling overhead often requires that the granularity of these processes be at the procedure level. Usually, the programmer must specify this kind of parallelism with special language features [Ari82, SH86] or library calls [SFG91, TRG 87, GBD 91] There has been some related work in scheduling pre partitioned task graphs [ERA91, GY92, GP94, YFGS95] and in partitioning graphs for parallel execution [GP94, RNSB94] The third approach exploits data parallelism by performing the same ....
V.Sarkar and J.Hennessy. Partitioning Parallel Programs for Macro-Dataflow. In LISP and Functional Programming, pages 202--211, 1986.
....method of dependence sets is described in [Ian88] The basic idea here is to conglomerate all operations that depend on the same set of inputs into one cluster. Clusters are broken when any memory fetch operation is done. Any cycle that cannot be resolved is not introduced. Sarkar and Hennessey [SH86] give a method to avoid the unresolvability 47 of static cycles in dataflow graphs by imposing constraints on the partitioning strategy. Dynamic cycles arise due to implicit arcs between STORE and FETCH memory operations. Conglomeration of instructions into threads is decided based on their ....
....dataflow graphs where a node may contain one or more graphs. The intermediate form IF1 [SG85] is a macro dataflow form for the functional language Sisal. The macro dataflow graphs are partitioned based on a homogeneous two level multi computer, which does not consider the memory hierarchy [SH86, Sar87] On most multi threaded architectures just partitioning the graph into code fragments will not suffice. The code fragments are scheduled by the compiler onto the various processors that are available for use. Sarkar in [Sar87] assumes an infinite number of processors while partitioning ....
V. Sarkar and J. Henessy. Partitioning parallel programs for macro-dataflow. In Proc. ACM Conference on LISP and Functional Programming, pages 202-- 211, August 1986.
.... to take on new tasks loadave an estimate of the load average for the entire system Procclass information on the different classes of processors in the system To determine the basis for the fixed portion of the description vector, we reviewed 18 algorithms from the existing scheduling literature [2 5, 7, 13, 17 20, 22 24, 26 28, 30, 31]. Table 1 depicts the resulting data set. We found that only two characteristics processor speed and inter processor communication time estimates were used by more than four algorithms. Therefore, we included processor speed estimates in the description vector and provide a mechanism to ....
V. Sarkar and J. Hennessy. Partitioning Parallel Programs for Macro-Dataflow. In ACM Conference on Lisp and Functional Programming, pages 202--211, August 1986.
....in a given application due to ambiguous memory dependences (as explained in Section 2.3.2) and (2) even if the dependence information is accurate, data dependences, control dependences, load imbalance, and task overheads (as explained in Section 2.6) often impose conflicting requirements. Sarkar [89] showed that partitioning simple functional programs into tasks to execute on a multiprocessor is NP Complete. He modeled program execution time as a function of the amount of work done by each task and data communication delay due to inter task dependences. Even without including considerations ....
....in many ways. Since the programs considered were functional, considerations of data dependencies were much simpler. The algorithms considered tasks at the granularity of entire function definitions and not at the fine granularity of a few basic blocks, as is the case for Multiscalar tasks. Sarkar [89] showed that the problem of opti 79 mally partitioning simple functional programs to execute on a Multiprocessor considering only the amount of work and data communication is NP Complete. In the past, several compilation structures, for example, trace in Trace Scheduling [33] 34] superblock ....
V. Sarkar and J. Hennessy. Partitioning parallel programs for macro-dataflow. In Conference Proceeedings of the 1986 ACM Conference on Lisp and Functional Programming, pages 192--201. Association for Computing Machinery, 1986. 189
....task overhead often impose conflicting requirements. Sarkar showed that given the communication costs and amount of work associated with each function invocation, partitioning simple functional programs into non speculative tasks to optimize the execution time on a multiprocessor is NP Complete [13]. Due to the intractable nature of obtaining optimal tasks, we rely on heuristics to approach the problem. This section describes how our task selection heuristics produce tasks with favorable characteristics. However, before going into the description, it may be helpful to summarize the ....
V. Sarkar and J. Hennessy. Partitioning parallel programs for macro-dataflow. In Conference Proceeedings of the 1986 ACM Conference on Lisp and Functional Programming, pages 192--201. Association for Computing Machinery, 1986.
....tasks to estimate the near future task requirements as expressions of task parameters. The estimation heuristics abstract away variations in resource requests that are caused by local variables of the task, but preserve differences captured in the task parameters. They build on the work of Sarkar [6, 7], who uses profiling based compile time estimates of average task behavior to partition and schedule applicative programs. Our approach of capturing the internal behavior of applications through these estimates of near term behavior contrasts with the annotation approach adopted by Hornig [4] who ....
....with base conditions. Exact solutions may not always be possible; in which case heuristics can be used to obtain the best case, average case, or worst case behaviors of the recursion. HEURISTIC 6 (Estimating behavior of recursive functions via profiling) This uses the approach adopted by Sarkar [6], which distinguishes between the cost of external and internal calls (from mutually recursive functions) to a recursive function. In our model, since there can be no recursion at the task level, we are interested in predicting the cost of only the external calls to any member of a mutually ....
V. Sarkar and J. Hennessy, "Partitioning parallel programs for macro-dataflow," Proceedings Conf. on Lisp and Functional Programming, pp. 202--211, ACM, Aug. 1986.
....power four, the expression map power4 [1,2,3,4] could be used. Function composition is denoted by the infix . combinator. For example a function to calculate sine to the power four is: power4 . sin. Two useful list operators are # and . The # operator determines the length of a list, for example #[99,100,101] is 3. The operator is an infix operator for indexing lists, for example [33,34,35,36] 1 is 34 (list indexing starts from 0) Equations may be guarded. For example a function, filter. An application such as filter p l returns a list of all the elements from l which satisfy the predicate p: ....
....other than parallel calls to other serial combinators. The effect of this was to make the implementation of tasks simple since tasks were exactly serial combinator applications. However this does not seem to have significantly affected the sizes or number of tasks produced. Sarkar and Hennessy, [101], describe a compile time method for automatically partitioning (IF1) data flow graphs. The goal once again was to increase task sizes. Their system had three phases: CHAPTER 6. PARALLELISM CONTROL 90 1. assign execution times to nodes and communications times to edges 2. partition the graph 3. ....
V Sarkar and J Hennessy. Partitioning parallel programs for macro-dataflow. In ACM Conference on Lisp and Functional Programming, pages 202--211, Cambridge, Massachusetts, August 1986.
....from incompatible administrative domains while respecting the autonomy of the individual systems. Second, many researchers have concentrated on scheduling and load balancing algorithms while assuming the existence of the mechanisms necessary to support them (see, for example, Sarkar and Hennessy [13], Lo [12] or Blake [1] They have either designed ad hoc mechanisms to support particular algorithms, or limited their research to theoretical analysis of the scheduling algorithms. Third, users of computer systems may require resources that are not available locally, such as specialized ....
V. Sarkar and J. Hennessy. Partitioning Parallel Programs for Macro-Dataflow. In ACM Conference on Lisp and Functional Programming, pages 202--211, August 1986.
....optimized counter based execution profile of the program. 1 Introduction It is important for a compiler to obtain estimates of execution times for subcomputations of an input program, if it is to attempt optimizations related to overhead values in the target architecture. In earlier work [SH86a, SH86b, Sar87, Sar89] we used estimates of execution times to facilitate the automatic partitioning and scheduling of programs written in the singleassignment language, Sisal, for parallel execution on multiprocessors. In this paper, we present a general framework for estimating average execution times ....
....in a control flow graph. It is only recently that automatic program optimizations have been proposed that use frequency information e.g. trace scheduling [FERN84] register allocation [Wal86] optimization of delayed branches [MH86] partitioning and scheduling of parallel programs [SH86a, SH86b] Given its growing importance, execution profile information ought to become an indispensable component of future programming systems, and the availability of the frequency information will no doubt motivate its use in new optimizations. Previous efforts in obtaining execution frequencies were ....
Vivek Sarkar and John Hennessy. Partitioning Parallel Programs for Macro-Dataflow. ACM Conference on Lisp and Functional Programming, pages 202--211, August 1986.
No context found.
V. Sarkar and J. Hennessy. Partitioning Parallel Programs for Macro-Dataflow. In ACM Conference on Lisp and Functional Programming, pages 202--211, August 1986.
No context found.
V Sarkar and J Hennessey. Partitioning Parallel Programs for Macro Dataflow. In Proceedings of the 1986.
No context found.
V. Sarkar and J. Hennessy. Partitioning Parallel Programs for Macro-Dataflow. In Proceedings of the Conference on LISP and Functional Programming, pages 202--211, 1986.
No context found.
V. Sarkar and J. Hennessy. Partitioning Parallel Programs for MacroDataflow. In ACM Conference on Lisp and Functional Programming, pages 202--211, August 1986.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC