| T. Fahringer. Compile-Time Estimation of Communication Costs for Data Parallel Programs. Journal of Parallel and Distributed Computing, Academic Press, 39(1):46--65, Nov. 1996. |
....They only handle the problem of choosing which dimensions of the arrays should be aligned and which dimension should be distributed. They do not try to handle the complete set of data distributions provided by HPF. Their tool is parameterized by the compiler and the machine. Fahringer [8] presents a compile time method to estimate the communication costs of a data parallel program. Its method only handles block distributions and is targeted to the Vienna Fortran Compilation System [2] His method is precise but needs fully specified problem sizes and machine parameters while we ....
Thomas Fahringer. Compile-time estimation of communication costs for data parallel programs. Journal of Parallel and Distributed Computing, 39(1):46--65, November 1996.
....library. This means that and communication pattern can be accommodated. Other approaches to the problem of estimating the performance of parallel programs have been for specific problems like data partitioning[3] or have 2 taken a statistical approach[10, 4] or have used static information only[5, 6, 8]. Other simulation based approaches that have some commonality with ours are by Kubota et al.[14] Aversa et al.[2] and Kempf et al.[12] 2 Description The data flow diagram for the Parallel Performance Estimator is shown in Figure 1. Each of the contributing components is explained below. Plots ....
T. Fahringer. Compile-time estimation of communication costs for data parallel programs. Journal of Parallel and Distributed Computing, 39(1):46--65, 1996.
....to that attained by an implementation hand coded by a programmer. To evaluate the success in achieving this aim, testing of the performance of SITSS produced code was undertaken. A significant body of work has been undertaken in to predicting the performance of parallel programming including [13,12,44], and performance measurement and comparison, including [9,39,43] Irrespective of the range of performance measures available, it may be argued that the user is mainly interested in the length of time it takes their program to run. This measure is therefore considered the most appropriate for the ....
T. Fahringer. Compile-Time Estimation of Communication Costs for Data Parallel Programs. Journal of Parallel and Distributed Computing, 39(1):46-- 65, November 1996.
....The execution of component 20 starts after the execution of component 19, 18 or 19, 11 has finished. Finding appropriate estimates to characterize the timing behavior is crucial for any further analysis. Parameters can be obtained for example from a static analysis of the program code. In [11] [12] a performance estimator is introduced which computes a set of parallel program parameters such as network contentions, transfer and computation time. These parameters can be selectively determined for statements, loops, procedures or the entire program and could be used to obtain appropriate ....
T. Fahringer, "Compile-time estimation of communication costs for data parallel programs", Journal of Parallel and Distributed Computing, vol. 39, no. 1, pp. 46--65, November 1996.
....applications can be determined at compile time. A parallel program, therefore, can be represented by a directed acyclic task graph [2] in which the node weights represent task processing times and the edge weights represent data dependencies as well as the communication times between tasks [2] [5]. The static scheduling problem is, in general, NP complete [4] and there have been many heuristics suggested in the literature for scheduling a parallel machine. However, the problem of scheduling tasks to a cluster is a relatively less explored topic. Specifically, there are two difficult ....
....sequential program, uses a scheduling algorithm to perform scheduling, and then generates the parallel code in a scheduled form for a cluster of workstations. The timings for the tasks and messages are assigned through a timing database which was obtained through profiling of the basic operations [5], 11] As soon as the task graph is generated, the duplication based scheduler is invoked. To minimize the overall execution time of the application on the cluster, the scheduler first determines which tasks are more critical so that they need to be scheduled to start at earlier time slots, ....
T. Fahringer, "Compile-Time Estimation of Communication Costs for Data Parallel Programs," J. Parallel and Distributed Computing, vol. 39, pp. 46-65, 1996.
....The execution of component 20 starts after the execution of component 19, 18 or 19, 11 has finished. Finding appropriate estimates to characterize the timing behavior is crucial for any further analysis. Parameters can be obtained for example from a static analysis of the program code. In [11,12] a performance estimator is introduced which computes a set of parallel program parameters such as network contentions, transfer and computation time. These parameters can be selectively determined for statements, loops, procedures or the entire program and could be used to obtain appropriate ....
T. Fahringer, "Compile-time estimation of communication costs for data parallel programs", Journal of Parallel and Distributed Computing, vol. 39, no. 1, pp. 46--65, November 1996.
....by applying various static data dependency analysis [4] 8] 18] and program partitioning [3] 19] 22] techniques. Furthermore, the nodes and edges of the DAG are associated with weights, which are generated by using techniques such as execution profiling and analytical benchmarking [6], 9] 12] for representing amounts of execution time and communication time, respectively. The tasks can then be scheduled to the processors for execution by using a suitable scheduling algorithm. The objective of scheduling is to minimize the overall completion time or schedule length of the ....
T. Fahringer, "Compile-Time Estimation of Communication Costs for Data Parallel Programs," Journal of Parallel and Distributed Computing, vol. 39, 1996, pp. 46-65.
....They only handle the problem of choosing which dimensions of the arrays should be aligned and which dimension should be distributed. They do not try to handle the complete set of data distributions provided by HPF. Their tool is parameterized by the compiler and the machine. ffl Fahringer [8] presents a compile time method to estimate the communication costs of a data parallel program. Its method only handles block distributions and is targeted to the Vienna Fortran Compilation System [2] His method is precise but needs fully speci ed problem sizes and machine parameters while we use ....
Thomas Fahringer. Compile-time estimation of communication costs for data parallel programs. Journal of Parallel and Distributed Computing, 39(1):4665, November 1996.
No context found.
T. Fahringer. Compile-Time Estimation of Communication Costs for Data Parallel Programs. Journal of Parallel and Distributed Computing, Academic Press, 39(1):46--65, Nov. 1996.
No context found.
T. Fahringer. Compile-Time Estimation of Communication Costs for Data Parallel Programs. Journal of Parallel and Distributed Computing, Academic Press, 39(1):46--65, Nov. 1996.
....communication time (TT) on a von Neumann architecture. In this paper we describe how P 3 T models communication caused by Fortran 90 array assignments in the context of regular data distributions. Predicting communication based on Fortran 77 array references has been described in detail in [17]. In what follows, we briefly sketch how VFC generates parallel code for Fortran 90 array assignments. Then, we outline the computation of the communication parameters for Fortran 90 array assignments based on a modified VFC runtime system and associated communication libraries 4.2.1 Modeling ....
T. Fahringer. Compile-Time Estimation of Communication Costs for Data Parallel Programs. Journal of Parallel and Distributed Computing, Academic Press, 39(1):46--65, Nov. 1996.
....machines where communication distances are relevant. In this chapter we describe how P 3 T models communication introduced by Fortran 90 array assignments in the context of regular data distributions. Predicting communication based on Fortran 77 array references has been described in detail in [26]. In what follows, we present the computation of communication parameters for Fortran 90 array assignments based on a modified VFC runtime system and associated communication libraries. 3.4.1 Method The communication framework of a parallel program compiled under VFC is implemented in the VFC ....
T. Fahringer. Compile-Time Estimation of Communication Costs for Data Parallel Programs. Journal of Parallel and Distributed Computing, Academic Press, 39(1):46-- 65, Nov. 1996.
....communication time (TT) on a von Neumann architecture. In this paper we describe how P 3 T models communication caused by Fortran 90 array assignments in the context of regular data distributions. Predicting communication based on Fortran 77 array references has been described in detail in [15]. In what follows, we briefly sketch how VFC generates parallel code for Fortran 90 array assignments. Then, we outline the computation of the communication parameters for Fortran 90 array assignments based on a modified VFC runtime system and associated communication libraries 4.2.1 Modeling ....
T. Fahringer. Compile-Time Estimation of Communication Costs for Data Parallel Programs. Journal of Parallel and Distributed Computing, Academic Press, 39(1):46--65, Nov. 1996.
....are buffer safe and balanced. In order to make a program buffer safe, we selectively block communication with small communication time whereas communication with larger communication time is hoisted to the earliest possible program point until all buffer constraints are honored. P 3 T [7, 9, 8], a state of the art performance estimator for distributed memory parallel programs, is used to find the best communication placement of all created ones. Employing an accurate performance estimator opens ground for more aggressive optimization opportunities that carefully examine various ....
....the set of SENDs that cover u 2 U . Uses(s) defines the set of non local uses that are associated with a specific SEND s 2 S. 2. 2 Performance prediction In order to support eliminating communication buffer conflicts and finding the best out of a variety of communication placements we use P 3 T [7, 8, 6, 9], an accurate and effective performance estimation tool for distributed memory parallel programs. P 3 T is a static performance estimator that analytically estimates the performance of data parallel programs (subset of Vienna Fortran [31] High Performance Fortran [20] Fortran90 and Fortran77) ....
[Article contains additional citation context not shown here]
T. Fahringer. Compile-Time Estimation of Communication Costs for Data Parallel Programs. Journal of Parallel and Distributed Computing, Academic Press, 39(1):46--65, Nov. 1996.
No context found.
T. Fahringer, Compile-time estimation of communication costs for data parallel programs, J. Parallel and Distributed Computing 39 (1) (1996) 46--65.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC