| C.D. Polychronopoulos and U. Banerjee, "Speedup bounds and processor allocation for parallel programs on multiprocessors," in Proc. |
....and So, Bolmarcich, Darema, and Norton [41] the effect of dynamically scheduling the task graph given a limited number of processors is approximated in terms of bounds, essentially in the flavor of Eq. 3. 4, by applying Graham s list scheduling results [25] Also Polychronopoulos and Banerjee [36] describe bounds for doacross loops taking into account a limited number of processors. An improvement of the above bounds has been described by Jain and Rajaraman [28] Comparable to our recursive approach in Eq. 3.5 they improve the sharpness of the basic bound given by Eq. 3.4 by applying the ....
C.D. Polychronopoulos and U. Banerjee, "Speedup bounds and processor allocation for parallel programs on multiprocessors," in Proc.
....message passing task graph can always be directly evaluated by simulation [12] justifies an elaborate survey. As mentioned in the introduction, all these techniques are essentially based on a recursive reduction procedure. Approaches to this effect have been described by Polychronopoulos [28], Sarkar [30] and So et al. 33] The resource contention which is explicitly accounted for is related to the limited number of processors. In this respect, the speedup bounds which result from the application of list scheduling theory [16] are essentially the same as in our multiple resource ....
C.D. Polychronopoulos and U. Banerjee, "Speedup bounds and processor allocation for parallel programs on multiprocessors," in Proc. 1986.
....that the extensions necessary to accommodate multiple processors, concurrently executing separate Parallel ISETL expressions, at the point of garbage collection invocation are trivial. For more information of the ramifications of the scheduling disciplines used or not used in the code above, see [8, 10, 9, 11, 12, 13]. To see that these parallelization efforts in the garbage collector were indeed fruitful, the chapter on performance evaluation includes myriad timings. Comparisons are made between the execution times of various ISETL codes under sequential and concurrent runtime garbage collection systems. ....
C. D. Polychronopoulos and Utpal Banerjee. Speedup bounds and processor allocation of parallel programs on multiprocessor systems. In Proceedings of the 1986 International Conference on Parallel Processing, pages 961--968, August 1986.
....heuristics. Prior work in lower bound analysis has focussed mainly on non real time applications. A typical concern in non real time applications is to determine a lower bound on the time required to complete a given application on a specified number of processors and resources. For instance, in [6,9 11], lower bound analysis is used to characterize speedups that can be achieved in non real time parallel programs. This concern is usually not relevant to realtime applications, where the objective is to complete the tasks in the application within their respective deadlines; it is not necessary to ....
C. Polychronopoulos and U. Banerjee, "Speedup bounds and processors allocation for parallel programs on multiprocessors," in Proceedings of International Conference on Parallel Processing, pp. 961--968, August 1986.
....Introduction Prior work in lower bound analysis has focussed mainly on non real time applications. A typical concern in non real time applications is to determine a lower bound on the time required to complete a given application on a specified number of processors and resources. For instance, in [37, 49, 66, 70], lower bound analysis is used to characterize speedups that can be achieved in non real time parallel programs. This concern is usually not relevant to real time applications, where the objective is to complete the tasks in the application within their respective deadlines; it is not necessary to ....
C. Polychronopoulos and U. Banerjee, "Speedup bounds and processors allocation for parallel programs on multiprocessors," in Proceedings of International Conference on Parallel Processing, pp. 961--968, August 1986.
....message passing task graph can always be directly evaluated by simulation [12] justifies an elaborate survey. As mentioned in the introduction, all these techniques are essentially based on a recursive reduction procedure. Approaches to this effect have been described by Polychronopoulos [28], Sarkar [30] and So et al. 33] The resource contention which is explicitly accounted for is related to the limited number of processors. In this respect, the speedup bounds which result from the application of list scheduling theory [16] are essentially the same as in our multiple resource ....
C.D. Polychronopoulos and U. Banerjee, "Speedup bounds and processor allocation for parallel programs on multiprocessors," in Proc. 1986 Int. Conf. Parallel Proc., IEEE, Aug. 1986, pp. 961--968.
....a given efficiency, in the presence of communication overhead. A more general discussion on processor allocation to parallel programs with doacross loops for shared memory systems, and the resulting execution speedup and associated overheads with respect to a critical task size, can be found in [Polychronopoulos and Banerjee, 1986]. Nicol and Willard [1987] present a theoretical analysis of speedup with communication and synchronization overhead for numerical applications and for various interconnection schemes. An analysis of run time overhead for the scheduling of doall loops, using two separate models for overhead, one ....
....scheduling overhead for exploitation of fine grained parallelism. A two level scheduling mechanism is used by Fang et al. 1990] which achieves a balance between granularity and scheduling overhead in the scheduling of general parallel nested loops. Further discussion on loop scheduling is in [Polychronopoulos et al. 1986], where the problem of processor assignment for compile time loop scheduling is addressed and a dynamic programming algorithm for finding the optimal assignment is presented. Polychronopoulos and Kuck [1987] present a heuristic algorithm for dynamic loop scheduling, called Guided Self Scheduling, ....
Polychronopoulos, C. D. and Banerjee, U. (1986). Speedup bounds and processor allocation for parallel programs on multiprocessors. In Int'l. Conf. on Parallel Processing, pages 961--968.
....will allow them to realize the high performance possible with good algorithms on parallel machines. However, general purpose parallel programming tools have not yet succeeded in providing both ease of use and high speed for applications researchers attempting to use parallel machines (e.g. see [10, 14, 18]) Parallel processing for CVIP will not be successful if it requires CVIP researchers to become parallel processing experts in order to reap the benefits of parallel systems. Our goal is to exploit the special characteristics of CVIP algorithms in order to ease algorithm development and achieve ....
C. D. Polychronopoulos and U. Banerjee. Speedup bounds and processor allocation for parallel programs on multiprocessors. In 1986 International Conference on Parallel Processing, pages 961--968, August 1986.
....Based on conventional reduction techniques, the above work is clearly included in the Pamela approach. Predictions which involve explicit contention analysis are either limited to simple iterative program structures or to global approaches. Iterative structures include loop partitioning schedules [35] (CPU contention, of which the Pamela generalization is described earlier) and simple SPMD algorithm models [42] contention involved in accessing shared memory, which simply follow from Rules 1 and 2) Global approaches to the analysis of contention are either based on approximate techniques ....
C.D. Polychronopoulos and U. Banerjee, "Speedup bounds and processor allocation for parallel programs on multiprocessors," in Proc. Int. Conf. Parallel Proc., Aug. 1986, pp. 961--968.
....algorithm s structure will already be represented in parallel algorithms that have been developed previously. General purpose parallel programming tools have not yet succeeded in providing both ease of use and high speed for applications researchers attempting to use parallel machines (e.g. see [13, 18, 28]) Many factors contribute to the difficulty of this task. For example, current parallel machines often exhibit anomalous behavior that can have a major impact on performance. For example, communications protocols on the Thinking Machines CM 2 running Version 5.0 software are such that it is ....
....has been targeted for pipelined supercomputers (e. g, 14] Although good results are often achieved, it has also been demonstrated that, in some cases, the potential parallelism will not be realized, especially when the target architecture is a multiprocessor system as opposed to a vector machine [18]. The process of developing a new program is the same as but no easier than what is involved in developing the program for a serial machine. At the opposite end of the spectrum is the representation of parallel programs using explicitly parallel code, where the user specifies what each ....
C. D. Polychronopoulos and U. Banerjee, "Speedup Bounds and Processor Allocation for Parallel Programs on Multiprocessors," 1986 Int'l Conf. Parallel Processing, pp. 961-968, Aug. 1986.
....entail additional synchronization delay at a mutual barrier due to their variance. This phenomenon has been investigated for a wide class of distributions by Kruskal and Weiss [29] Balasundaram et al. 7] Lester [31] as an alternative to the above cited analysis) Polychronopoulos and Banerjee [40], Sarkar [43] So et al. 47] and Wang [52] Atapattu and Gannon take a partially parametric approach. The symbolic approach taken by Clement and Quinn [10] is also based on reduction of a static graph model. Unlike the above approaches, however, the work loads are determined using multiple ....
C.D. Polychronopoulos and U. Banerjee, "Speedup bounds and processor allocation for parallel programs on multiprocessors," in Proc. 1986 Int. Conf. Parallel Proc., Aug. 1986, pp. 961--968.
....Computation of Multiple Matrix Products In this subsection, we discuss a processor allocation method for independently computing multiple matrix products. When executing multiple parallel tasks concurrently, one good heuristic for allocating processors to each task is proportional allocation [28, 29]. The proportional allocation algorithm allocates a number of processors proportional to the computation amount of each task. This algorithm tries to minimize the completion time of all tasks by balancing the execution times of the tasks. However, it assumes that the execution time of a task ....
C. D. Polychronopoulos and U. Banerjee, "Speedup bounds and processor allocation for parallel programs on multiprocessors," in Proc. of Int. Conf. on Parallel Processing, pp. 961--968, 1986.
....effects (cf. vector performance analysis) In the past, compile time) partitioning of parallel loop programs on multiprocessors has received a lot of attention. In the scheduling analysis of doacross loops (a generalization over do and dopar due to Cytron [16] Polychronopoulos and Banerjee [43] developed a performance model based on cyclic scheduling which accounts for the effect of a limited number of (P ) processors. The model has been extended by Gupta [23] to account for the effects of inter process communication and (barrier) synchronization. Apart from the limited number of ....
C.D. Polychronopoulos and U. Banerjee, "Speedup bounds and processor allocation for parallel programs on multiprocessors," in Proc. 1986 Int. Conf. Parallel Proc., IEEE, Aug. 1986, pp. 961--968.
....based on SP reduction which implies a solution cost only linear in the size of the program source. Approaches to this effect have been described by Atapattu and Gannon [6] Balasundaram et al. 7] Lester [33] as an alternative to the above cited analysis) Polychronopoulos and Banerjee [43], Sarkar [46] So and Norton [50] and Wang [55] Atapattu and Gannon as well as Wang [56] take a partially parametric approach. While the above approaches account for task synchronization and conditional control flow (especially in the case of stochastic graphs) they disregard an explicit ....
C.D. Polychronopoulos and U. Banerjee, "Speedup bounds and processor allocation for parallel programs on multiprocessors," in Proc. 1986 Int. Conf. Parallel Proc., IEEE, Aug. 1986, pp. 961--968.
....normally results from parallelizing a task. In this subsection, we discuss a processor allocation method for independently computing multiple matrix products. When executing multiple parallel tasks concurrently, one good heuristic for allocating processors to each task is proportional allocation [26, 27]. The proportional allocation algorithm allocates processors proportional to the computation amount of each task. This algorithm tries to minimize the completion time of all tasks by balancing the execution time of each task. However, it assumes that the execution time of a task decreases if more ....
C. D. Polychronopoulos and U. Banerjee, "Speedup bounds and processor allocation for parallel programs on multiprocessors," in Proc. of Int. Conf. on Parallel Processing, pp. 961--968, 1986.
....by the need to do symbolic operations, particularly in database problems. 3. Application specific software. General purpose parallel programming tools have not yet succeeded in providing both ease of use and high speed for applications researchers attempting to use parallel machines [17, 18]. One factor that contributes to the difficulty of this task is that current parallel machines often exhibit anomalous behavior that can have a major impact on performance. Coping with such behavior requires a detailed knowledge of the system that cannot be expected from a typical applications ....
C. D. Polychronopoulos and U. Banerjee, "Speedup Bounds and Processor Allocation for Parallel Programs on Multiprocessors," 1986 Int'l Conf. Parallel Processing, pp. 961968, August 1986.
....errors, up to orders of magnitude. Typical for approaches at the top level of the prediction hierarchy is that one either chooses not to address resource contention for efficiency reasons (e.g. 18, 22] or to account for specific resource types only (e.g. limited number of processors [2, 21], network contention [9, 23] memory contention [5, 25] or to implicitly account for contention as a result of the phenomenological modeling approach [6, 8] Some of the methods account for certain resource limitations in terms of bounds [2, 15, 21, 24] mostly in the context of dynamic task ....
.... types only (e.g. limited number of processors [2, 21] network contention [9, 23] memory contention [5, 25] or to implicitly account for contention as a result of the phenomenological modeling approach [6, 8] Some of the methods account for certain resource limitations in terms of bounds [2, 15, 21, 24], mostly in the context of dynamic task scheduling. However, in neither case, contention modeling forms an integral part of a generic performance modeling approach. Recently a static technique has been presented, which, at the same low cost, generalizes over previous approaches by considering any ....
C.D. Polychronopoulos and U. Banerjee, "Speedup bounds and processor allocation for parallel programs on multiprocessors," in Proc. 1986 Int. Conf. Parallel Proc., Aug. 1986, pp. 961--968.
No context found.
C. D. Polychronopoulos and U. Banerjee, "Speedup Bounds and Processor Allocation for Parallel Programs on Multiprocessors," 1986 Int'l Conf. Parallel Processing, pp. 961-968, Aug. 1986.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC