| W. Williams, T. Hoel, and D. Pase. The MPP Apprentice Performance Tool: Delivering the Performance of the Cray T3D, 1994. |
....and sequential time are used to select appropriate suggestions on how the program s execution time might be improved. The performance data can be granulated at the levels of subroutines, loops, parallel blocks, or case statements. Recently, Cray Research released a new tool, the MPP Apprentice [6], that recognises the performance problems in message passing programs on the Cray T3D. The MPP Apprentice s approach is similar to ATExpert, but it supports a more general programming model. The W 3 Search Model [7] tries to answer three questions: why is the application performance poor ....
W. Williams, T. Hoel and D. Pase, "The MPP Apprentice Performance Tool: Delivering the Performance of the Cray T3D", In Programming Environments for Massively Parallel Distributed Systems, North--Holland, 1994, pp. 333--345.
....steps (1) 4) the tool is designed to accomplish or guide. For the most part, performance tools are rather narrowly 9 defined as a mechanism for gathering data (step 2) or as an aide to a human formulating hypotheses about an application s behavior (step 1) during a single observed program run [27,54,55,61,69,77,80]. In this common approach, much of the work is done manually by a knowledgable expert conducting the tuning study, frequently with the use of visualization tools [27,41,51,73] Recently, research has focused on automating the diagnostic process (steps 1, 2, and 3) 20,28,45,50,70] Several ....
....to analyze performance data gathered from their Autotasking Fortran compiling environment. Performance characteristics were matched to the source and to a set of performance rules, and the results were used to generate specific advice for the programmer tuning the code. Cray s MPP Apprentice [77] also generates diagnostic feedback and advice for the programmer, necessarily more general (and therefore providing less direct guidance to the programmer) than that of ATExpert because Apprentice is not closely tied to a parallelizing compiler. Both of these are post mortem tools, that is, they ....
[Article contains additional citation context not shown here]
W. Williams, T. Hoel, and D. Pase. The MPP Apprentice performance tool: Delivering the performance of the Cray T3D. In K.M. Decker and R.M. Rehmann, editors, Programming Environments for Massively Parallel Distributed Systems. Birkhauser, 1994.
....Figure 11. Final performance curves. of these tools, but, restricting attention to tools which support FORTRAN or C, significant research efforts in this area include Paragraph [5] Pablo [16] AIMS [26] Paradyn [9] PMA [23] and XPVM [4] Commercial systems include Vampir [11] MPP Apprentice [24], Nupshot [25] and VT [2] Tools for shared memory systems have received less attention, possibly due to the lack of a widely accepted standard for the associated programmingparadigm, and because of the need for hardware support to monitor the memory system (the advent of OpenMP [14] seems likely ....
W. Williams, T. Hoel and D. Pase, The MPP Apprentice Performance Tool: Delivering the Performance of the Cray T3D, in: K.M. Decker et al. (eds.), Programming Environments for Massively Parallel Distributed Systems, Birkhauser Verlag, 333--345, 1994.
....the interplay between data mapping and communication for HPF programs are offered by this system. In [1] the performance of Fortran D programs is analyzed at the source level which is based on an integration with the Fortran D compiler [16] and the Pablo performance system [34] MPP Apprentice [40] supports post execution performance analysis for C, C , and Fortran90 programs on the Cray T3D machine. The previous two approaches are most similar to our approach. The Fortran D Pablo integrated performance system has sophisticated capabilities to link performance data with distribution, ....
W. Williams, T. Hoel, and D. Pase. The MPP Apprentice Performance Tool: Delivering the Performance of the Cray T3D, 1994.
....data in terms of low level node computing and message passing, for example, it would be difficult for the programmer to relate this information to their program. There are several existing tools that do source level profiling. Some commercial examples of these tools include the MPP Apprentice[29] for the Cray T3D, Prism[27] for the TMC CM 5 and MPPE[18] from Maspar. In the research world, examples include the Pablo system from the Universities of Illinois[24,1] that can trace Fortran D programs and present this information in terms of the source program, and TAU from University of ....
Winifred Williams, Timothy Hoel, and Douglas Pase. The MPP Apprentice performance tool: Delivering the performance of the Cray T3D. In Karsten M. Decker and Rene M. Rehmann, editors, Programming Environments for Massively Parallel Distributed Systems. Birkhauser Verlag, 1994.
....processors, flexible trace formats, dynamic instrumentation and perturbation reduction, and high level language support by relating performance data to sections of source code. A few of the above performance tools also go further than merely reporting performance data. These include MPP Apprentice [43] and P 3 T [39] for data parallel programming models; Poirot [31] and Paradyn for message passing data parallel programs [34] and Projections for object based message driven programs [29] These tools analyze the performance data to give the user insights into performance problems and ....
W. Williams, T. Hoel, and D. Pase. The MPP Apprentice Performance Tool: Delivering the Performance of the Cray T3D. In K. M. Decker and R. M. Rehmann, editors, Programming Environments for Massively Parallel Distributed Systems. Birkaeuser Verlag, Basel, Switzerland, 1994.
....as computation time and message statistics. The D sys 10 tem also performs sophisticated analysis of static and dynamic trace data to provide detailed, language dependent information such as identification of loop carried cross processor dependencies. The Cray Research product MPP Apprentice [68] uses compiler generated instrumentation and cost estimates to time and count language level events for the myriad programming models of the Cray T3D. By timing and counting events, MPP Apprentice achieves scalability. Through integration with the Cray Research compilers, MPP Apprentice provides ....
....savings of making mapping decisions dynamically. Language specific performance tools have demonstrated several individual measurement techniques that we can use in performance tools based on the NV model. Data parallel language tools have shown how to provide control (code) views of performance [1,41,60,68], and we adopt their techniques for attaching performance data to code displays. Object oriented language tools offer natural hierarchical structures for organizing noun data [48] and demonstrate techniques for interpreting compiler output for mapping purposes. The standardization of ....
Winifred Williams, Timothy Hoel, and Douglas Pase. The MPP Apprentice performance tool: Delivering the performance of the cray T3D. In Karsten M. Decker and Rene M. Rehmann, editors, Programming Environments for Massively Parallel Distributed Systems. Birkhauser Verlag, 1994.
....and sequential time are used to select appropriate suggestions about how the program s execution time might be improved. Data can been presented at the granularity of subroutines, loops, parallel blocks, or case statements. Recently, Cray Research has released a new tool, the MPP Apprentice[98], that recognizes performance problems for message passing programs on the Cray T3D[19] The MPP Apprentice s approach is similar to ATExpert, but it works for a more general programming model. Crovella and LeBlanc s predicate profiling[20] provides a search system to compare different algorithms ....
....about individual program constructs, such as loops or statements, instrumenting runtime libraries or primitives is not sufficient. Collecting data at this granularity requires instrumentation be interspersed with the statements of the user s program. CXpa[36] AE[52] Prism[84] and MPP Apprentice[98] use a modified compiler to insert instrumentation at the desired location. Compiler based instrumentation affords access to the wealth of information that is available during compilation. For example, information about data and loop dependencies is difficult to gather without compiler ....
W. Williams, T. Hoel and D. Pase, "The MPP Apprentice Performance Tool: Delivering the Performance of the Cray T3D", in Programming Environments for Massively Parallel Distributed Systems, (ed), NorthHolland, (To Appear) 1994.
....the interplay between data mapping and communication for HPF programs are offered by this system. In [1] the performance of Fortran D programs is analyzed at the source level which is based on an integration with the Fortran D compiler [21] and the Pablo performance system [39] MPP Apprentice [45] supports post execution performance analysis for C, C , and Fortran90 programs on the Cray T3D machine. The previous two approaches are most similar to our approach. The Fortran D Pablo integrated performance system has sophisticated capabilities to link performance data with distribution, ....
W. Williams, T. Hoel, and D. Pase. The MPP Apprentice Performance Tool: Delivering the Performance of the Cray T3D, 1994.
....visualizations, and search tools. Profile metrics[1, 6, 15, 22] associate a value with each component of a distributed or parallel application (frequently procedures) and are presented as sorted tables. Visualizations[8, 13, 14, 18, 23] explain application performance using pictures. Search tools[10, 17, 21] help users to manage performance data information overload by treating the problem of finding a performance bottleneck as a search problem. However, all of these tools focus on the measurement and analysis of a specific program for a single execution. One type of tool that permits programmers to ....
W. Williams, T. Hoel, and D. Pase, The MPP Apprentice Performance Tool: Delivering the Performance of the Cray T3D, in Programming Environments for Massively Parallel Distributed Systems. 1994, North-Holland.
....application to be re compiled, then run while trace data is gathered for the entire application, then analyzed post mortem using the trace data. Such trace based approaches may have problems with the amount of trace data needed to analyze a single large, long running application. MPP Apprentice [10], a performance tool designed for the Cray T3D, avoids the scaling problem of trace based tools by using static counters and timers, but does require recompilation of the application to be tuned. Paradyn avoids both re compilation and the scaling difficulties inherent in a tracing approach. 5 ....
W. Williams, T. Hoel and D. Pase, The MPP Apprentice Performance Tool: Delivering the Performance of the Cray T3D, in Programming Environments for Massively Parallel Distributed Systems, K. M. Decker and R. M. Rehmann (editors), Birkaeuser Verlag, Basel, 1994.
....to identify bottlenecks, decreases the amount of unhelpful instrumentation, and improves the usefulness of the information obtained from a diagnostic session. 1 INTRODUCTION Accurate performance diagnosis of parallel and distributed programs is a difficult and time consuming task. Recent research [1, 2, 14, 3, 4] examines possible approaches for automating, and thereby simplifying, the process of diagnosing a single program run. This paper describes how historical performance data, i.e. data gathered in one or more previous executions of an application, can be used to increase the effectiveness of ....
W. Williams, T. Hoel, and D. Pase. The MPP Apprentice performance tool: Delivering the performance of the Cray T3D. In K.M. Decker and R.M. Rehmann, editors, Programming Environments for Massively Parallel Distributed Systems, Birkhuser, 1994.
....great flexibility, it requires many decisions to be made on what to collect, and when to collect them. The quality of the collected performance data depends on the sampling rate, the time required to make the decisions, and the correctness of the decisions. The MPP Apprentice performance tool [17] is designed to help users to tune the performance of their Cray T3D applications. It collects statistics for each section of code, summarises the data on each processor, and displays the summary across all processors. During program execution, the time and pass count for each code block are ....
W. Williams, T. Hoel and D. Pase, The MPP Apprentice Performance Tool: Delivering the Performance of the Cray T3D, Programming Environments for Massively Parallel Distributed Systems, North-Holland, 1994, pp. 333-345.
....with the compilation systems is required to relate the achieved performance to the source code. Examples of such an integration are presented in [1] 6] 7] This work was supported in part by the European Commission under the Esprit IV Long Term Research Project n. 21033 HPF . and [8]. In [1] the performance of Fortran D programs is analyzed at the source level thanks to the integration of Fortran D compiler and the Pablo performance tool. The integration of TAU with the pC compiler and run time system allows instrumentation, profiling, and tracing support [6] The ....
....the integration of Fortran D compiler and the Pablo performance tool. The integration of TAU with the pC compiler and run time system allows instrumentation, profiling, and tracing support [6] The source level evaluation of C, C , Fortran and HPF codes is addressed by the MPP Apprentice tool [8] and by the Visualization Tool [7] In this paper we present the integration of the VFC compilation system [3] and the Medea performance analysis tool [4] This work is part of the Esprit LTR project HPF 1 , whose goal is to extend the current version of the HPF language and the related ....
W. Williams, T. Hoel, and D. Pase. The MPP Apprentice Performance Tool: Delivering the Performance of the Cray T3D. In K.M. Decker, editor, Programming Environments for Massively Parallel Distributed Systems, pages 333--345. Birkhauser Verlag, 1994.
....8 nodes of a SUN SPARCstation cluster. The data used here was collected using Paradyn, and is stored in the form of Paradyn histograms. The result of the query is 9 Paradyn histograms. Here we display the result using xmgr. 4 RELATED WORK There has been extensive work on parallel profiling tools [11,12,3,2,13], and a more limited amount of work on tools that allow displays of performance data from multiple runs [14,21] In the debugging area, work has been done to find bugs by comparing the execution of a new program version to an old one by comparing program output and user selected variables[16] ....
W. Williams, T. Hoel, and D. Pase. The MPP Apprentice performance tool: Delivering the performance of the Cray T3D. Programming Environments for Massively Parallel Distributed Systems, Monte Verita, 1994.
....are explicit (e.g. in message passing codes) For high level parallel languages, such tools can only capture and present dynamic performance data in terms of primitive operations (e.g. communication library calls) in the compilergenerated code. A few tools (Prism [13] NV [4] and MPP Apprentice [14]) provide source level support for performance analysis of high level parallel languages. However, none of these tools provide source level performance support for the combination of data parallel languages and optimizing compilers necessary for Fortran D or HPF. Source level performance analysis ....
....message passing model of Figure 3. Because the goal of data parallel languages is to insulate software developers from the idiosyncrasies of message passing, performance tuning should not require them to understand the details of the compiler generated code. With the exception of MPP Apprentice [14], existing performance tools (e.g. 10, 9, 7, 12] lack the ability to relate performance information from the compiler generated message passing code back to the source in the presence of substantial code restructuring by the compiler. Without access to compiler knowledge about the compilation ....
[Article contains additional citation context not shown here]
Williams, W., Hoel, T., and Pase, D. The MPP Apprentice Performance Tool: Delivering the Performance of the Cray T3D. In Programming Environments for Massively Parallel Distributed Systems (Basel, Switzerland, 1994), Birkhauser Verlag.
No context found.
W. Williams, T. Hoel, and D. Pase, "The MPP Apprentice performance tool: Delivering the performance of the Cray T3D", in Programming Environments for Massively Parallel Distributed Systems, K.M. Decker and R.M. Rehmann, editors, Birkhuser, 1994.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC