| M. Crovella and T. LeBlanc. Parallel Performance Prediction Using Lost Cycles. In Proceedings Supercomputing '94, pages 600--610, November 1994. |
....PERFORMANCE METRICS We quantify the benefit of parallel processing by the speedup S=Tseq Tp how much faster the parallel program runs with respect to the runtime of the sequential version. The portion of Tpar that is not used for useful computation is therefore as considered lost processor cycles [4], or overhead: Overhead = Tv.p T s . Hence, the choice of speedup as the performance measure implies means that each processor has Tpar time allocated to perform its part of the job: T;r. 2) J Wi Tcomp e te at processor i perfore its pan wi of e total use work W and rvehea d not ....
.....Toereaa Ovh = lS) Thus, we can conclude that all time measurements on a processor should be scaled with pp. Slower processors will carry out less useful work in a given time and the parallel overheads will have less impact on the speedup. This result corresponds with the approach proposed in [4], which essentially uses processor cycles as time unit. Granularity In load balancing problems, the overheads are mainly communication and idle time, that we call blocking overhead. In our setting, the learning algorithm can only improve the performance by minimizing blocking, since the ....
Crovella, M. E. and Leblanc, T.J., "Parallel Performance Prediction using Lost Cycles Analysis", in Proc. of Supercomputing '94, IEEE Computer Society, 1994.
....overhead. We argue that for an accurate quantification of the cost of each part of a parallel program, the generated blocking overhead should also be taken into account. 1 Introduction In literature, processors becoming idle, or blocking overhead, is mainly attributed to load imbalances [1] 3][4], but we argue that this is not the only reason for the processors idle time. Blocking overhead can be caused by any phase of the execution profile, as mentioned in literature [2] 6] Partitioning for example, is in most cases performed sequentially on one processor. This causes at the same ....
....Metrics We quantify the benefit of parallel processing by the speedup S=Ts,q Tpar, how much faster the parallel program runs with respect to the runtime of the sequential version. The portion of Tpar that is not used for useful computation is therefore considered as lost processor cycles [4], or overhead [1] Overhead = Tp.p T s . 1) Hence, the choice of speedup as the performance measure implies that each processor has Tp time allocated to perform its part of the job: par i,j T; r : riomp rerhead (3) with the time that processor i performs its part of the useful work and ....
[Article contains additional citation context not shown here]
Crovella, M. E. and Leblanc, T.J.: Parallel Performance Prediction using Lost Cycles Analysis. In: Proc. of Supercomputing '94, IEEE Computer Society (1994).
....such as the implementation of the parallel code and the e ect of di erent data access patterns. We therefore consider (1) to be a speci c form of X i O i (2) where each O i is an overhead. Much work has been done on the classi cation and practical use of overheads of parallel programs eg ([2], 3] 4] 5] A hierarchical breakdown of temporal overheads is given in [3] The top level overheads are information movement, critical path, parallelism management, and additional computation. The critical path overheads are due to imperfect parallelization. Typical components will be load ....
M.E. Crovella and T.J. LeBlanc, Parallel Performance Prediction Using Lost Cycles Analysis, Proceedings of Supercomputing '94, IEEE Computer Society, pp. 600-609, November 1994.
....financial 50 Authors and Reference Progs Nature and Author of Programs Atapattu and Gannon [5] 6 Lawrence Livermore Loop Kernels. Balasundaram et al. [9] 1 2D red black relaxation, by the authors. Brewer [22, 23] 3 2D stencil, 2 sorting algorithms, all by the author. Crovella and LeBlanc [34] 2 Subgraph isomorphism, 2D FFT, latter by Jaspal Subhlok. Fahringer and Zima [43] many Includes Jacobi relaxation, matrix multiplication, Gauss Jordan, LU, authors not specified. Formella et al. [47] 3 Dense conjugate gradient and 1D and 2D stencils, by the authors. MacDonald [79] 6 ....
....their tool is based on a static analysis of the assembly language code of the compiled program. In comparison, performance prediction in PERFSIM is based mostly on a model of the runtime system, and static analysis is avoided by executing the control structure of the program. Crovella and LeBlanc [34] describe an interactive performance tuning tool that tries to fit the behavior of the program to a model taken from a library of performance models. To use their tool, a user must run the program several time on the target machine. To use PERFSIM the user does not have to run the program on the ....
Mark E. Crovella and Thomas J. LeBlanc. Parallel performance prediction using lost cycles analysis. In Proceedings of Supercomputing '94, pages 600--609, Washington, D.C., November 1994.
.... identifying the location of problems, diagnosis was still di#cult because the causes of misses and identification of the particular data objects involved was often di#cult to determine from MTOOL s output [5] Going beyond attributing cycles lost to the memory hierarchy, lost cycles analysis [11] classified all of the sources of overhead (waiting time) that might be encountered by a parallel program. The Carnival tool set [2] extended this into waiting time analysis . It provided a visualization tool with each unit of source code having an execution time attributed to it. Colored bars ....
T. LeBlanc M. Crovella. Parallel Performance Prediction Using Lost Cycles. In Proceedings Supercomputing '94, pages 600--610, November 1994.
.... useful for identifying the location of problems, diagnosis was still di#cult because the causes of misses and identification of the data objects involved was often di#cult to determine from MTOOL s output [6] Going beyond attributing cycles lost to the memory hierarchy, lost cycles analysis [15] classified all of the sources of overhead (waiting time) that might be encountered by a parallel program. The Carnival tool 14 divide ci a ci cn 500 cn divide ci b ci cn 666 cn divide ci c ci cn 300 cn divide ci cy6 ci ci cy67 ci minus ci cy6 ci ci cy67 ci ....
T. LeBlanc M. Crovella. Parallel Performance Prediction Using Lost Cycles. In Proceedings Supercomputing '94, pages 600--610, November 1994.
.... that use kernel benchmarks for determining computation and communication speeds [2] or that are based on counting the number of program statements [14] It also contrasts with approaches that concentrate of the measurement of parallel overhead factors in order to predict extrapolated performance [6]. Fourth, an important goal of our prediction methodology is to capture performance trends for future computer architectures. This is different from and complementary to approaches that model performance with the goal of improving application speeds [1] capturing communication behavior [15] ....
M. E. Crovella and T. J. LeBlanc. Parallel performance prediction using lost cycles analysis. In IEEE, editor, Proc., Supercomputing '94: Washington, DC, November 14--18,
....a code will run. Because of the di#culties in foreseeing the e#ect on performance of factors such as input data, the number of available processors and the characteristics of the communication network that connect them, most parallel performance tuning still relies on a measure modify approach [5]. This approach is dependent upon making detailed performance measurements of programs during execution and can be extremely time consuming, especially if it is necessary to repeat this process for many target machines. Performance modeling provides hope for an escape from the measure modify ....
Mark E. Crovella and Thomas J. LeBlanc. Parallel performance prediction using lost cycles analysis. In Proceedings of Supercomputing, pages 600--609, 1994.
....running executable image. Complex algorithmic changes require re writing part of the program. Performance predictions can be based either on extrapolations of executions of the program in a controlled environment, or on stochastic models derived from static program analysis. Lost Cycles Analysis[3] predicts performance at different operating points by running a controlled set of experiments that vary an orthogonal set of parameters and record the resulting execution time. How ever, this technique requires implementations of the different tuning options to be available for execution. Static ....
M. E. Crovella and T. J. LeBlanc, "Parallel Performance Prediction Using Lost Cycles," Proceedings of Supercomputing '94. Nov. 1418, 1994, Washington, DC, pp. 600-609.
....and t s is the time required to execute the sequential version of the program; when parallelising a sequential program the ultimate goal is to minimise t o . Thus, it is helpful to classify the overheads into a number of classes according to their source. Following an approach proposed in [5], we consider five classes of overheads: unparallelised code, load imbalance, communication, synchronisation, and parallelism start up. Assuming that the cost functions F UC , F LI , F C , F S and F PS represent an estimate of the time spent as a result of the occurrence of each source of ....
M. E. Crovella, T. J. LeBlanc, "Parallel Performance Prediction Using Lost Cycles Analysis", in Proceedings of Supercomputing '94, IEEE Computer Society Press, pp. 600--609.
....is important to isolate and quantify the impact of each of these components on the overall execution as shown in Figure 1. We have proposed a concept of overhead functions [8] 9] to capture the growth of particular system overheads with respect to specific system parameters. Crovella and LeBlanc [10] propose a similar set of metrics called Lost Cycles. Both these metrics quantify the contribution of each overhead towards the overall execution time. The studies differ in the techniques used to quantify these metrics. Crovella and LeBlanc [10] use experimentation, while simulation is used in ....
....to specific system parameters. Crovella and LeBlanc [10] propose a similar set of metrics called Lost Cycles. Both these metrics quantify the contribution of each overhead towards the overall execution time. The studies differ in the techniques used to quantify these metrics. Crovella and LeBlanc [10] use experimentation, while simulation is used in our approach. III. Related Work There have been a number of studies addressing architectural issues such as network latency and contention [11] 12] 13] 14] and synchronization [15] 16] in isolation. While such issues are extremely ....
[Article contains additional citation context not shown here]
M. E. Crovella and T. J. LeBlanc, "Parallel Performance Prediction Using Lost Cycles Analysis," in Proceedings of Supercomputing '94, November 1994.
....a particular input language, this method needs as input the estimates of execution time for every program event and probabilities of all branch decisions. Also, the derivation of the sequencing trees might become intractable for a program with a nontrivial number of events. Crovella and LeBlanc [17] predicted performance of parallel programs based on lost cycles analysis, which involves measurements and modeling of the sources of overhead in a program. One of their tools measures various categories of overhead, while another tool fits the measurements to analytic functions representing the ....
Mark E. Crovella and Thomas L. LeBlanc. Parallel performance prediction using lost cycles analysis. In Proceedings of Supercomputing'94, pages 600--609, Washington, November 1994.
....toolkit These tools integrate monitoring and statistical modeling techniques. Measurements are used to parameterize the model, which is subsequently used for predicting performance. The IS performs the basic data collection tasks. Reference: http: www.nas.nasa.gov NAS Tools Projects AIMS and [4,33] Performance and Program Visualization ParaGraph and POLKA The IS collects runtime data in the form of time ordered trace records. These trace records are used to drive hard coded (ParaGraph) or user defined (POLKA) visualizations of system and program behavior. References: ....
Crovella, Mark E. and Thomas J. LeBlanc, "Parallel Performance Prediction Using Lost Cycles Analysis," Proceedings of Supercomputing `94, Washington, DC, Nov. 14--18, 1994.
....transmission time, message startup overhead and instruction cycle time. Other approaches that use decomposition of parallel systems address both the communication and the memory hierarchy models via the LogP HMM [11] or aim at decomposing the execution overheads due to parallelization [12] [4]. The work we present in this paper falls in this wide class of modeling by decomposition. The approach we propose is to simulate the program execution by following the control flow of the original program, to estimate the computation running time, and to determine the sequence of send and ....
M. E. Crovella and T. J. LeBlanc. Parallel performance prediction using lost cycles analysis. In Proceedings of Supercomputing, Washington, D.C., November 1994.
....is made more difficult by the fact that execution time varies with the number of parallel processors that are used, requiring non trivial comparisons between different implementations. The resulting, approximate optimisation problem has been called the best (parallel) implementation problem [6], although a more apposite name is the acceptable implementation problem. To date, two main approaches have been employed: 1) the manual (explicit) approach the programming language is extended with a set of parallel commands which the programmer places, where necessary, in the source code ....
....have been articulated by Riley [20] whose manual parallelisation scheme requires measurement of certain execution time events, together with post execution analysis of the various overheads associated with parallel execution. A similar approach, termed lost cycles analysis, is described in [6]. Although this kind of approach usually achieves an acceptable result, it is labour intensive, time consuming, expensive, and highly dependent on expert skills. The rewards for automating the process would be considerable. An automatic parallelising compiler accepts serial sourcecode as input, ....
[Article contains additional citation context not shown here]
M. Crovella and T. LeBlanc. Parallel performance prediction using lost cycles analysis. Proc. SuperComputing '94, 600--609, 1994.
....commercial sequential debuggers. Its IS synchronizes and controls the activities of individual debuggers that run the concurrent processes. The IS also collects data from these processes to run multiple visualizations. Performance modeling and prediction AIMS, Lost cycles analysis toolkit [4], 33] These tools integrate monitoring and statistical modeling techniques. Measurements are used to parameterize the model, which is subsequently used for predicting performance. The IS performs the basic data collection tasks. Correctness checking SPI [3] Scalable Parallel Instrumentation ....
# M.E. Crovella and T.J. LeBlanc, "Parallel Performance Prediction Using Lost Cycles Analysis," Proc. Supercomputing `94, pp. 600-- 609, Washington, D.C., Nov. 1994.
....restructuring and architectural enhancements. It can also help in choosing between alternate application implementations, selecting the best hardware platform, and the different other uses of a performance analysis study outlined earlier in section 1. Recognizing this importance, studies [57, 56, 16, 13] have attempted to separate and quantify parallel system overheads. Linear Actual Processors Speedup Algorithmic Overheads Software Interaction Overheads Hardware Interaction Overheads Figure 1: Overheads in a Parallel System Ignoring effects of superlinearity, one would expect a speedup that ....
....actions, contribute to the hardware interaction overhead. To fully understand the scalability of the parallel system, it is important to isolate and quantify the impact of different parallel system overheads on the overall execution as shown in Figure 1. Overhead functions [56, 57] and lost cycles [16] are metrics that have been proposed to capture the growth of overheads in a parallel system. Both these metrics quantify the contribution of each overhead towards the overall execution time. The studies differ in the techniques used to quantify these metrics. Experimentation is used in [16] to ....
[Article contains additional citation context not shown here]
M. E. Crovella and T. J. LeBlanc. Parallel Performance Prediction Using Lost Cycles Analysis. In Proceedings of Supercomputing '94, November 1994.
....does not adversely impact the performance of the applications it is designed to serve. The need to limit the intrusiveness of the NWS influences both the implementation of the overall system and the forecasting techniques we have chosen. Since the problems of non intrusive resource monitoring [28, 21, 9] and load forecasting [26, 3, 18, 23, 8] both pose open research questions, we have separated the sensory and forecasting functions of the NWS. The resulting modular design is intended to provide a general CPU sensor network link sensor machine machine machine memory sensor Sensory Subsystem ....
Crovella, M., and LeBlanc, T. Parallel performance prediction using lost-cycles analysis. In Proceedings of Supercomputing 1994 (1994).
....mechanism is used to safeguard the order of execution in the example of the previous section, the time that the processors spent waiting to acquire a lock should be less than t s (1 Gamma 1=p) in order to have faster execution time. Further issues on modelling these overheads are discussed in [11]. 4 Mapping In the mapping phase, the parallelism available is mapped onto processors in such a way that overheads are minimised. The approaches traditionally used for DO. ENDDO type of loops in scientific applications are either to assign a specific number of iterations to each processor ....
M. E. Crovella, T. J. LeBlanc, "Parallel Performance Prediction Using Lost Cycles Analysis", in Proceedings of Supercomputing '94 (IEEE Computer Society Press, 1994), 600--609.
....by a NSF National Young Investigator (NYI) award. 2 can represent different levels of execution detail and can include parameters and components that allow alternative problem and system test cases to be studied. Some parameters may be derived from performance measurements. The work by [5,7,15] are representative of such static prediction approaches. Although it is possible to achieve surprisingly accurate estimates of global performance statistics (e.g. total execution time) the problems that arise in static performance prediction concern the inability to account for the performance ....
M. E. Crovella and T. J. LeBlanc, Parallel Performance Prediction Using Lost Cycles Analysis, Proc. Supercomputing 94, IEEE Computer Society and ACM, pages 600-609, November 1994.
.... data points are obtained from a small number of runs of the program and some technique is used to extrapolate the performance for other conditions (hardware speed, problem size, processors) One project that is particularly interesting is the Lost Cycles Analysis taking place at Rochester [CL94] An instrumentation tool has been designed to instrument an application so that, when executed on the target hardware, the run time system captures the number of cycles spent (lost) in various categories of overhead. The categories chosen for the KSR1 are load imbalance, insufficient ....
Mark E. Crovella and Thomas J. LeBlanc. Parallel performance prediction using lost cycles analysis. In Supercomputing '94, 1994.
....toolkit These tools integrate monitoring and statistical modeling techniques. Measurements are used to parameterize the model, which is subsequently used for predicting performance. The IS performs the basic data collection tasks. Reference: http: www.nas.nasa.gov NAS Tools Projects AIMS and [4,33] Performance and Program Visualization ParaGraph and POLKA The IS collects runtime data in the form of time ordered trace records. These trace records are used to drive hard coded (ParaGraph) or user defined (POLKA) visualizations of system and program behavior. References: ....
Crovella, Mark E. and Thomas J. LeBlanc, "Parallel Performance Prediction Using Lost Cycles Analysis," Proceedings of Supercomputing `94, Washington, DC, Nov. 14--18, 1994.
....a wide range of programs. There are several compiler driven tools for parallel program performance evaluation, in which the performance analysis is based either on simulation [Dikaiakos et al. 1994; Dikaiakos 1994; Parashar et al. 1994] or measurement and extrapolation [Balasundaram et al. 1991; Crovella et al. 1995]. In general, these are all focused on message passing systems and static scheduling disciplines. Finally, there are alternative approaches based on computing bounds for task graphs with known task time distributions [Hartleb and Mertsiotakis 1992; Yazici Pekergin and Vincent 1991] These ....
Crovella, M. E., LeBlanc, T. J., and Meira, W. 1995. Parallel Performance Prediction Using the Lost Cycles Toolkit. Tech. rep., Department of Computer Science, University of Rochester.
No context found.
M. Crovella and T. LeBlanc. Parallel Performance Prediction Using Lost Cycles. In Proceedings Supercomputing '94, pages 600--610, November 1994.
No context found.
M. E. Crovella and T. J. LeBlanc, "Parallel Performance Prediction Using Lost Cycles," Proceedings of Supercomputing '94. Nov. 14-18, 1994, Washington, DC, pp. 600-609.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC