| T. Fahringer. Estimating and Optimizing Performance for Parallel Programs. IEEE Computer, 28(11):47 -- 56, November 1995. |
.... max operators in the deterministic case . The low cost, symbolic property of SP task graph analysis is typically employed in static compile time cost prediction techniques in which either numeric or symbolic cost expressions are directly derived from the intermediate program representation [7, 9, 17, 18, 34, 44]. Despite the attractive low cost and symbolic properties of SP task graph analysis, however, their inherent inability to model mutual exclusion makes them generally unsuitable as the basis for a generalpurpose performance modeling technique. 2.3 Approach Recently, a symbolic performance ....
....generate symbolic expressions or not, are essentially based on critical path analysis of SP graphs. Approaches based on deterministic SP graph analysis in the flavor of Eq. 3. 1 include the work of Atapattu and Gannon [7] Balasundaram, Fox, Kenedy, and Kremer [9] Clement and Quinn [10] Fahringer [17], Mendes, Yang and Reed [34] and Wang [43, 44] Approaches based on stochastic SP graphs include the work of Sahner and Trivedi [37] and Lester [31] which, similar to us, uses a modeling language, called PEL (Performance Evaluation Language) Although similar from SP graph analysis point of ....
[Article contains additional citation context not shown here]
T. Fahringer, "Estimating and optimizing performance for parallel programs," IEEE Computer, Nov. 1995, pp. 47--56.
....reasons of convenience as explained later on. In the following we briefly describe the translation process. A more detailed background can be found in [8] The analytic approach underlying the translation process is based on critical path analysis of the delays due to condition synchronization [5, 12, 16] ( task synchronization ) combined with a lower bound approximation of the delays due to mutual exclusion synchronization ( queuing delay ) as a result of resource contention [8] In the following we assume a PAMELA model in which all expressions have already been substituted as the result of ....
T. Fahringer, "Estimating and optimizing performance for parallel programs," IEEE Computer, Nov. 1995, pp. 47--56.
....by its argument (base 0) For instance, the expression 10 unitvec(3) will be compiled to [0,0,0,10] 3. 2 Symbolic Compilation The analytic approach underlying the translation process is based on a combination of compile time critical path analysis of the delays due to task synchronization [3, 5, 7, 15, 20], and a lower bound approximation of the delays due to queuing delay. A more detailed background can be found in [9] A PAMELA model is translated to a time domain performance model by substituting every process equation by a numeric equation that models the execution time associated with the ....
T. Fahringer, "Estimating and optimizing performance for parallel programs," IEEE Computer, Nov. 1995, pp. 47--56.
....approaches and not consider modeling during this research. Pancake et al. Panca95b] discuss the problems of monitoring and modeling performance and the limitations of each. Tools are becoming available which model or estimate the performance of sections of parallel code, such as P T [Fahri95] and SimOS [Rosen95] For certain programs these tools are beneficial in that they can identify when and where bottlenecks are occurring and can aid in determining data distribution strategies. Unfortunately, these tools aid only in performance tuning. They cannot aid in application comprehension ....
Thomas Fahringer, Estimating and Optimizing Performance for Parallel Programs, IEEE Computer, Vol. 28, No. 11, November 1995, pp. 47-56.
....loop adaptive controls systems. 4 Related Work A large number of a priori performance prediction and aposteriori performance measurement and analysis tools havebeendeveloped, targeting both sequential and parallel systems far more than can be summarized here. Notable examples include P 3 T[10] for performance prediction, together with Paradyn [25] and AIMS [36] for performance measurement. Each has exposed key research issues in performance measurement and analysis. Similarly,several systems have been built that support application behavior steering (i.e. guiding a computation toward ....
Fahringer, T. Estimating and Optimizing Performance for Parallel programs. IEEE Computer 28, 11 (November 1995), 47--56.
.... Code SUIF Format ACT Parallelisation Layer Application Layer CHIP3S Script Profiler Program Unknowns Figure 3 Model creation process with ACT During the implementation stage, when the parallel source code is available, ACT can be employed as a static performance prediction tool [1], Figure 4. The performance of the application can then be analysed for several parallel platforms, provided they are available as hardware objects. PACE allows the development of models even when parts of the source code are not available. Performance prototyping is the terminology that is used ....
T. Fahringer, Estimating and Optimizing Performance for Parallel Programs, IEEE Computer, Vol. 28(11), November 1995.
.... Paragraph is an animation tool used to trace the dynamic behavior of the program (Heath, and Etheridge 1991) and Paradyn is a tool for measuring performance of a large scale parallel system (Mller et al. 1995) P 3 T is a performance estimator tool that achieves high estimation accuracy (Fahringer 1995). Avtar is a virtual data environment (Reed et al. 1995) that allows users to explore parallel performance data and modify application and system parameters to see how performance is affected. Lilith Lights (Evensky, Gentile, and Wyckoff 1998) is a visualization tool for monitoring and debugging ....
Fahringer, T. 1995. Estimating and Optimizing Performance for Parallel Programs. Computer 28(11):47-56.
....research effort to build modeling tools because such tools are imperative to derive efficient implementations. The significant work includes the work related to the Fortran D compiler [4] the Paradigm compiler [5] the Suif compiler [2] the Fx compiler [34, 33] and the Vienna Fortran Compiler [11, 12]. Other approaches include the use of petri nets [13] queuing networks, and Markov chains [35] The Fortran D compiler contains an interactive tool that allows the programmer to select regions of the sequential input program. The tool responds with a data decomposition scheme and diagnostic ....
T. Fahringer. Estimating and Optimizing Performance for Parallel Programs. IEEE Computer, 28(11):47--56, 1995.
....prediction methods exist, generally associated with different modelling formalisms. These methods mainly differ in the position they occupy on the trade off between prediction accuracy and (computational) efficiency. Static techniques are used to quickly obtain first order performance estimates (Fahringer 1995; Gemund 1996) Techniques based on (timed, stochastic) extensions of Petri nets (Ajmone Marsan et al. 1986) yield accurate results, but are timeconsuming due to a state explosion (which results in a complexity which is exponential in the model size) Between these two extremes, several techniques ....
Fahringer, T. 1995. "Estimating and optimizing performance for parallel programs." IEEE Computer, Nov., 47-56.
....languages. These methods mainly differ in the position they occupy on the trade off between prediction accuracy and computational efficiency. Static techniques, e.g. based on symbolic expressions or simple critical path algorithms, are used to efficiently obtain first order performance estimates (Fahringer 1995; Gemund 1996) Techniques based on timed or stochastic extensions of Petri nets (Ajmone Marsan et al. 1986) yield accurate results, but are timeconsuming due to a state space explosion (resulting in a complexity which is exponential in the model size) In the remainder of this paper, we will ....
Fahringer, T. 1995. "Estimating and optimizing performance for parallel programs", IEEE Computer, Nov., 47-56.
No context found.
T. Fahringer. Estimating and Optimizing Performance for Parallel Programs. IEEE Computer, 28(11):47 -- 56, November 1995.
No context found.
T. Fahringer. Estimating and Optimizing Performance for Parallel Programs. IEEE Computer, 28(11):47 -- 56, November 1995. Postscript file available via http://www.par.univie.ac.at/~tf/papers/p3t/ieee-mag.ps.
No context found.
Thomas Fahringer. Estimating and Optimizing Performance for Parallel Programs. IEEE Computer, 28(11):47 -- 56, November 1995. Postscript file available via http://www.par.univie.ac.at/~tf/papers/p3t/ieee-mag.ps.
No context found.
T. Fahringer. Estimating and Optimizing Performance for Parallel Programs. IEEE Computer, 28(11):47 -- 56, November 1995.
No context found.
T. Fahringer. Estimating and Optimizing Performance for Parallel Programs. IEEE Computer, 28(11):47 -- 56, November 1995.
....J 1 =1,N DO J 2 =N (2 J 1 ) N S1: IF (J 1 J 2 N) THEN A(J 1 ; J 2 ) ENDIF . ENDDO ENDDO Detecting zero trip loops [14] is a similar problem which tries to determine whether the loop body of a given loop nest is ever executed. Counting the number of loop iterations has been shown [11, 12, 14, 21] to be crucial for many performance analyses such as modeling work distribution [10] data locality [11] and communication overhead [9] All of these problems can be formulated as queries based on a set of linear and non linear constraints I defined over loop variables and parameters (loop ....
....remarks are given in Section 8. 2 Preliminaries The following notations and definitions are used in the remainder of this paper: ffl Our symbolic analysis has been implemented and is currently being integrated with VFCS [2] a HighPerformance Fortran style parallelizing compiler and P T [8, 12], and with a performance estimator for data parallel programs on distributed memory parallel architectures. The VFCS paralleliziation strategy is based on data decomposition in conjunction with the single program, multiple data programming model. With this method, each array is partitioned and ....
Thomas Fahringer. Estimating and Optimizing Performance for Parallel Programs. IEEE Computer, 28(11):47 -- 56, November 1995.
....models are commonly used to assume a more or less virtual and often unrealistic application behavior. Moreover, very few performance estimators actually consider code transformations and optimizations applied by a compiler. In this paper we introduce P 3 T , the successor tool of P 3 T [22, 15, 16], which models programs, code transformations, and parallel and distributed architectures. The input programs of P 3 T are written in High Performance Fortran [27, 1] which represents the de facto standard of high level data parallel programming. Moreover, P 3 T analyzes Fortran90 message ....
T. Fahringer. Estimating and Optimizing Performance for Parallel Programs. IEEE Computer, 28(11):47 -- 56, November 1995.
....on this architecture. Statistical models are often used to assume a more or less virtual and often unrealistic application behavior. Moreover, very few performance estimators actually consider code transformations and optimizations applied by a compiler. P 3 T , the successor tool of P 3 T [5, 6], is a performance estimator for distributed and parallel programs which models programs, code transformations, and parallel and distributed architectures. The input programs of P 3 T are written in High Performance Fortran [10] which represents the de facto standard of high level data parallel ....
T. Fahringer. Estimating and Optimizing Performance for Parallel Programs. IEEE Computer, 28(11):47 -- 56, November 1995.
....models are commonly used to assume a more or less virtual and often unrealistic application behavior. Moreover, very few performance estimators actually consider code transformations and optimizations applied by a compiler. In this paper we introduce P 3 T , the successor tool of P 3 T [20, 13, 14], which models programs, code transformations, and parallel and distributed architectures. The input programs of P 3 T are written in High Performance Fortran [25, 1] which represents the de facto standard of high level data parallel programming. Moreover, P 3 T analyzes Fortran90 message ....
T. Fahringer. Estimating and Optimizing Performance for Parallel Programs. IEEE Computer, 28(11):47 -- 56, November 1995.
....communication vectorization and elimination of redundant communication. We have implemented a prototype of our symbolic evaluation framework which is used as part of the Vienna Fortran Compilation System (VFCS) 5] a parallelizing compiler for distributed memory architectures and P 3 T [21, 22] a performance estimator to parallelize and optimize High Performance Fortran programs [34, 5] for distributed memory architectures. The organization of this paper is as follows. Preliminaries are presented in Section 2. In Section 3, we describe our symbolic evaluation framework. This ....
....our method to support symbolic dependence testing and various optimizations (including communication vectorization and elimination of redundant communication) which can result in significant performance improvements of parallel programs. Symbolic evaluation is also being used as part of P 3 T [21, 22], a state of the art performance estimator, in order to estimate the work distribution [23] of parallel programs as a parameterized function defined over unknown problem sizes. Currently, we are extending several compiler optimizations for distributed memory architectures to exploit the prototype ....
T. Fahringer. Estimating and Optimizing Performance for Parallel Programs. IEEE Computer, 28(11):47 -- 56, November 1995.
....on this architecture. Statistical models are commonly used to assume a more or less virtual and often unrealistic application behavior. Moreover, very few performance estimators actually consider code transformations and optimizations applied by a compiler. P 3 T , the successor tool of P 3 T [15, 11, 12], is a performance estimator for distributed and parallel programs which models programs, code transformations, and parallel and distributed architectures. The input programs of P 3 T are written in High Performance Fortran [17, 1] which represents the de facto standard of high level data ....
T. Fahringer. Estimating and Optimizing Performance for Parallel Programs. IEEE Computer, 28(11):47 -- 56, November 1995.
....the set of SENDs that cover u 2 U . Uses(s) defines the set of non local uses that are associated with a specific SEND s 2 S. 2. 2 Performance prediction In order to support eliminating communication buffer conflicts and finding the best out of a variety of communication placements we use P 3 T [7, 8, 6, 9], an accurate and effective performance estimation tool for distributed memory parallel programs. P 3 T is a static performance estimator that analytically estimates the performance of data parallel programs (subset of Vienna Fortran [31] High Performance Fortran [20] Fortran90 and Fortran77) ....
....arrays. We have extended P 3 T to cover also general buffer communication where data is received into a buffer that is allocated dynamically, and the array reference that led to communication is replaced by a reference to the buffer. For detailed description of P 3 T , the reader may refer to [7, 8, 6, 9]. 4 3 Buffer Safe Communication Latency Hiding and Message Coalescing In this section we describe our communication optimization strategy. First, we hoist SENDs to the earliest possible program points without considering communication buffer constraints. Second, we aggressively coalesce SENDs ....
T. Fahringer. Estimating and Optimizing Performance for Parallel Programs. IEEE Computer, 28(11):47 -- 56, November 1995.
....(dead code elimination) Detecting 1 zero trip loops [8] is a similar problem which tries to determine whether the loop body of a given loop nest is ever executed. Other related problems require loop iteration or statement execution counts which are key figures to estimate a program s performance [7, 4]. All of these problems can be formulated as a set of linear and non linear constraints I defined over loop variables and parameters (loop invariants) which are commonly derived from loop bounds and conditional statements. For instance, I is given by f1 I1 N , N= 2 I1) I2 N , I1 I2 Ng ....
....this algorithm can be used to compare symbolic expressions for equality and inequality ( relationships, examine non linear array index functions for data dependences, and detect redundant inequalities in a set of constraints. We have implemented the algorithm and use it as part of P 3 T [5, 4], a performance estimator for parallel programs, and VFCS [1] a parallelizing compiler for data parallel programs on distributed memory parallel architectures. Experiments will be shown that demonstrate the usefulness of our approach. This paper is organized as follows: In Section 2 we present ....
T. Fahringer. Estimating and Optimizing Performance for Parallel Programs. IEEE Computer, 28(11):47 -- 56, November 1995.
No context found.
FAHRINGER, T. Estimating and Optimizing Performance for Parallel programs. IEEE
No context found.
T. Fahringer, Estimating and Optimizing Performance for Parallel Programs, IEEE Computer, Vol. 28(11), pp. 47-56 (1995).
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC