| B. Carlson, T. Wagner, L. Dowdy, and P. Worley. Speedup properties of phases in the execution profile of distributed parallel programs. In Computer Performance Evaluation '92: Modeling Techniques and Tools, pages 83--95, 1992. (Ed.) R. Pooley and J. Hillston. |
....progress. Hence, the curve distortion caused by system perturbations can be better dealt with and result in a more accurate similiarity comparison. 52 Related Work 6. 1 Curve Fitting Our curve fitting based signature design is inspired by the phase behavioral analysis of parallel codes ( 5] 6] [7] [8] A phase is a time period when the program is performing the same activity. A typical scientific application s runtime can be divided into three main phases: startup, intermediate, and final. During the startup, the application initializes problem parameters, usually accompanied by reading ....
Carlson, B. M., Wagner, T. D., Dowdy, L. W., and Worley, P. H. Speedup Properties of Phases in the Execution Profile of Distributed Parallel Programs. In Computer Performance Evaluation - Modeling Techniques and Tools, Antony Rowe, Ltd., 1992. 59
....per second. Timing variability causes inconsistency in resultant curves. Therefore, the area under curves might be similar, but the shapes of curves will not necessarily be so. 6 Related Work Our curve fitting signature design was inspired by phase behavioral analysis of parallel codes [3, 4, 5, 6]. Phase behavioral analysis uses geometric curves to characterize performance as a function of time. Unlike our online signature approach, these automatic phase characterization algorithms must scan the entire trace several times. Prophesy [15] is the most similar work. It also uses curve fitting ....
Carlson, B. M., Wagner, T. D., Dowdy, L. W., and Worley, P. H. Speedup Properties of Phases in the Execution Profile of Distributed Parallel Programs. Computer Performance Evaluation 92: Modelling Techniques and Tools , R. Pooley and J. Hillston, Eds. Edinburgh Press, 1992, pp. 83--95.
....information transparent, performance tuning is extremely difficult. P 3 T at compile time computes a set of performance parameters each of which reflects a different performance aspect. In the following all P 3 T performance parameters are described. 4. 1 Work Distribution It is well known [8, 6, 44, 42, 30, 40, 13, 41, 34, 25] that the work distribution has a strong influence on the cost performance ratio of a parallel system. An uneven work distribution may lead to a significant reduction in a program s performance. Therefore, providing both programmer and parallelizing compiler with a work distribution parameter for ....
B. Carlson, T. Wagner, L. Dowdy, and P. Worley. Speedup properties of phases in the execution profile of distributed parallel programs. In Computer Performance Evaluation '92: Modeling Techniques and Tools, pages 83--95, 1992. (Ed.) R. Pooley and J. Hillston.
....defined: stationary phase and transitional phase. A stationary phase is a contiguous subsequence of an execution profile which has roughly uniform processor activity. A transitional phase is a contiguous subseuqence of an execution profile which constitutes an abrupt change in processor activity [Carlson92; Carlson94] Three special types of stationary phases are of interest. They are: 1) a single processor is utilized; 2) all processors are utilized; 3) less than all, but more than one processor is utilized. It is quite clear that identification and analysis of phases are important both in ....
....processors are utilized; 3) less than all, but more than one processor is utilized. It is quite clear that identification and analysis of phases are important both in workload characterization and in scalability analysis. Carlson94] proposes an algorithm for off line detection of phases, and [Carlson92] describes three families of methods for smoothing execution profile data. They are useful in automatically detecting phases in execution profiles. The execution signature [Dowdy88; Leuze89; Dowdy94] is defined as the execution time of the parallel algorithm, T(p) as a function of the number of ....
[Article contains additional citation context not shown here]
Carlson, B. M., Wagner, T. D., Dowdy, L. W. and Worley, P. H., "Speedup properties of phases in the execution profile of distributed parallel programs," in: Pooley, R. and Hillston, J. (eds), Computer Performance Evaluation - Modeling Techniques and Tools, Antony Rowe, Ltd., 1992.
....The key functionality of P 3 T is devoted to compute a set of performance parameters at compile time: ffl Work Distribution The work distribution parameter describes how well the computations of a program are distributed over the set of available processors. As shown by numerous researchers [15, 13, 80, 68, 49, 62, 22, 67, 53, 34], work distribution has a strong influence on the cost performance ratio of a multiprocessor system. An uneven work distribution CHAPTER 3. P 3 T 25 may lead to a significant reduction in a program s performance. Therefore, providing both programmer and compiler with a work distribution ....
B. Carlson, T. Wagner, L. Dowdy, and P. Worley. Speedup properties of phases in the execution profile of distributed parallel programs. In Computer Performance Evaluation '92: Modeling Techniques and Tools, pages 83--95, 1992. (Ed.) R. Pooley and J. Hillston.
....information transparent, performance tuning is extremely difficult. P 3 T at compile time computes a set of performance parameters each of which reflects a different performance aspect. In the following all P 3 T performance parameters are described. 4. 1 Work Distribution It is well known [7, 5, 37, 35, 27, 33, 11, 34, 30, 23] that the work distribution has a strong influence on the cost performance ratio of a parallel system. An uneven work distribution may lead to a significant reduction in a program s performance. Therefore, providing both programmer and parallelizing compiler with a work distribution parameter for ....
B. Carlson, T. Wagner, L. Dowdy, and P. Worley. Speedup properties of phases in the execution profile of distributed parallel programs. In Computer Performance Evaluation '92: Modeling Techniques and Tools, pages 83--95, 1992. (Ed.) R. Pooley and J. Hillston.
....instantiations for which they own the corresponding sub domain. This naturally specifies the amount of work to be done by each processor and consequently the overall work distribution of a parallel program. Therefore domain decomposition inherently implies a work distribution. It is well known ([8, 5, 3, 28, 26, 18, 22, 6, 25, 19, 13]) that the work distribution has a strong influence on the cost performance ratio of a parallel system. An uneven work distribution may lead to a significant reduction in a program s performance. Therefore providing both programmer and parallelizing compiler with a work distribution parameter ....
....assigns all 2016 iterations to the first processor. Therefore the work distribution of this example is far from being perfect. Optimal work distribution is attained if each processor would assign a new value to W (I) for 252 ( 2016 8) distinct instantiations of S. Much of previous research ([3, 22, 6, 25, 19]) concentrates on monitoring or estimating work distribution and close derivatives such as processor utilization, execution profile and average parallelism at the machine level. Traditional work distribution analysis distorts results in the following cases: ffl Array replication ( 32] is a ....
[Article contains additional citation context not shown here]
B. Carlson, T. Wagner, L. Dowdy, and P. Worley. Speedup properties of phases in the execution profile of distributed parallel programs. In Computer Performance Evaluation '92: Modeling Techniques and Tools, pages 83--95, 1992. (Ed.) R. Pooley and J. Hillston.
....transposition sort of 1024 elements [47] Comp. Parall. 16 14 12 10 8 6 4 2 0 T(16) p 0.5 1 1.5 2 2. 5 Figure 15: Parallelism and computation profiles of the block decomposition algorithm with 16 processors (problem size n = 168) 43] the phases of a parallel algorithm is discussed in [49] [50]. The information in a profile can be more succinctly captured in another form of representation which is referred to as the shape [51] This representation is a cumulative plot of the fraction of the execution time where a certain number of processors is busy. An example of shapes derived from a ....
B. Carlson, T.D. Wagner, L.W. Dowdy, and P.H. Worley. Speedup Properties of Phases in the Execution Profile of Distributed Parallel Programs. In R. Pooley and J. Hillston, editors, Computer Performance Evaluation'92 - Modelling Techniques and Tools, pages 83--95, Edinburgh, Scotland, 1992.
....instantiations for which they own the corresponding sub domain. This naturally specifies the amount of work to be done by each processor and consequently the overall work distribution of a parallel program. Therefore domain decomposition inherently implies a work distribution. It is well known ([8, 5, 3, 28, 26, 17, 21, 6, 25, 18, 12]) that the work distribution has a strong influence on the cost performance ratio of a parallel system. An uneven work distribution may lead to a significant reduction in a program s performance. Therefore providing both programmer and parallelizing compiler with a work distribution parameter for ....
....assigns all 2016 iterations to the first processor. Therefore the work distribution of this example is far from being perfect. Optimal work distribution is attained if each processor would assign a new value to W (I) for 252 ( 2016 8) distinct instantiations of S. Much of previous research ([3, 21, 6, 25, 18]) concentrates on monitoring or estimating work distribution and close derivatives such as processor utilization, execution profile and average parallelism at the machine level. Traditional work distribution analysis distorts results in the following cases: ffl Array replication ( 32] is a ....
[Article contains additional citation context not shown here]
B. Carlson, T. Wagner, L. Dowdy, and P. Worley. Speedup properties of phases in the execution profile of distributed parallel programs. In Computer Performance Evaluation '92: Modeling Techniques and Tools, pages 83--95, 1992. (Ed.) R. Pooley and J. Hillston.
....elements to be assigned to a problem (e.g. mapping or scheduling [Klei 92] but are not suitable for performance prediction during program design, where the execution signature should be the output of the study, and not the input. But they can be used to estimate the speedup of parallel programs [Carl 92] CHAPTER 3. SURVEY AND COMPARISON 68 Signature A different viewpoint of the degree of activity is provided by a signature. The following definitions are based on the definition of the computation profile, denoted as P comp (t) i.e. the units are the number of computing processing elements ....
B. Carlson, T. Wagner, L. Dowdy, and P. H. Worley. "Speedup Properties of Phases in the Execution Profile of D istributed Parallel Programs". In: R. Pooley and J. Hillston, Eds., Computer Performance Evaluation'92 - Modelling Techniques an d Tools, pp. 83--95, Edinburgh, Scotland, 1992.
....seen in an execution profile, that may be most meaningful when making scheduling decisions in a multiprogrammed environment, especially if dynamic scheduling is to be exploited. Evidence in support of this claim can found by noting the phases observed in the execution profile of parallel programs [Carlson1992] and the performance improvements obtained as a result of dynamically adjusting processor allocations with changes in the parallelism of the applications [McCann1993] Some of the problems with using a single parameter characterization of parallelism are detailed by Marinescu and Rice ....
....effectively, processor allocation decisions must utilize knowledge of these characteristics. A number of studies have described various techniques for determining processor allocations based on different approximations of an application s efficiency [Majumdar1988a] Eager1989] Ghosal1991] Carlson1992] Rosti1994] Sevcik1994] In this chapter we are interested in determining how much benefit can be obtained by knowing and applying knowledge of how much work each application executes. Therefore, we also assume that the following information is available for each application: A3.6: The amount ....
[Article contains additional citation context not shown here]
B. Carlson, T. Wagner, L. Dowdy, and P. Worley, "Speedup Properties of Phases in the Execution Profile of Distributed Parallel Programs", Proceedings of the 6th International Conference on Modelling Techniques and Tools for Computer Performance Evaluation, pp. 83-95, Edinburgh, Scotland, September, 1992.
....the Oak Ridge National Laboratory and by the Scientific Computing Program of the Office of Energy research, U. S. Department of Energy via subcontract 19X SL131V from the Oak Ridge National Laboratory. improves the prediction of the speedup of a parallel program if more processors are allocated [3]. In that work, the phases were identified manually. This paper presents a method of automatic detection of phases off line. The algorithm is presented in terms of the C language structures that were used to implement it along with a narrative description of the steps involved. The analysis ....
....stationary and transitional phases, is a complete description of the execution profile. Figure 1 illustrates these definitions. In that figure, subsequences A, C, and E represent stationary phases, and subsequences B and D represent transitional phases. Figure 1 is reprinted from Carlson et.al. [3]. In this simplified example, Phase A can be viewed as the fraction sequential. Phases C and E represent periods of homogeneous processor utilization at different levels of utilization. The speedup characteristics of these two phases will likely be different. For example, if Phase E represents a ....
[Article contains additional citation context not shown here]
B. Carlson, T. Wagner, L. Dowdy, and P. Worley, "Speedup properties of phases in the execution profile of distributed memory parallel program," in Computer Performance Evaluation `92 (R. Pooley and J. Hillston, eds.), pp. 83--95, Antony Rowe Ltd., 1992.
No context found.
B. Carlson, T. Wagner, L. Dowdy, and P. Worley. Speedup properties of phases in the execution profile of distributed parallel programs. In Computer Performance Evaluation '92: Modeling Techniques and Tools, pages 83--95, 1992. (Ed.) R. Pooley and J. Hillston.
No context found.
B. Carlson, T. Wagner, L. Dowdy, and P. Worley. Speedup properties of phases in the execution profile of distributed parallel programs. In Computer Performance Evaluation '92: Modeling Techniques and Tools, pages 83--95, 1992. (Ed.) R. Pooley and J. Hillston.
No context found.
Carlson, B. M., Wagner, T. D., Dowdy, L. W., and Worley, P. H. Speedup Properties of Phases in the Execution Profile of Distributed Parallel Programs. In Computer Performance Evaluation 92: Modelling Techniques and Tools , R. Pooley and J. Hillston, Eds. Edinburgh
No context found.
Carlson, B. M., Wagner, T. D., Dowdy, L. W., and Worley, P. H. Speedup Properties of Phases in the Execution Profile of Distributed Parallel Programs. In Computer Performance Evaluation 92: Modelling Techniques and Tools , R. Pooley and J. Hillston, Eds. Edinburgh Press, 1992, pp. 83--95.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC