| V. S. Adve, J. Mellor-Crummey, M. Anderson, K. Kennedy, J.-C. Wang, D. A. Reed, "An Integrated Compilation and Performance Analysis Environment for Data Parallel Programs," Proceedings of Supercomputing '95, San Diego, CA, November 1995. |
....dynamically parallelize programs. Ko et al. 26] identified optimal decompositions through brute force incremental execution of all possible decompositions in multilevel parallel programs. Numerous systems have also been designed to manually tone parallel performance on traditional multiprocessors [ 1 ] [ 19] 27] 30] but they have relied on off line, not real time, dependence analysis of memory traces. multiprocessors. Simulation results demonstrate Jrpm s ability to automatically select and optimize appropriate thread decompositions with minimal effort from the programmer. On our CMP with ....
Adve, V. S. et al. An integrated compilation and performance analysis environment for data parallel programs. In SC'95, San Diego, CA, November 1995.
....suboptimal single application performance and frequent processor idle cycles from having too few applications available to execute. The primary problem is that creating parallelized versions of legacy code is difficult. Even with a good tool chain including profilers and parallelizing compilers [1][2] 9] 11] automated parallelization has proven to be a very difficult problem [19] While successful for certain scientific applications, automated parallelization has typically provided poor parallel performance on general purpose applications, especially integer ones. Manual parallelization ....
V.S. Adve, et al., "An integrated compilation and performance analysis environment for data parallel programs," Supercomputing 1995.
....Among the supporting tools are those that perform automatic parallelization, performance visualization, instrumentation, and debugging. Many of the current tools are summarized in [5, 6] Several tools have attempted to integrate di#erent parallel programming tasks. Pablo and the Fortran D editor [1] combine program optimization and performance visualization. The SUIF Explorer [17] and FORGExplorer [2] have a similar goal. The KAP Pro Toolset [16] consists of tools for automatic parallelization, performance visualization, and debugging. The focus of the Annai Tool Project [23] is on the ....
V. S. Adve, J. Mellor-Crummey, M. Anderson, K. Kennedy, J. C. Wang, and D. A. Reed. An integrated compilation and performance analysis environment for data parallel programs. In Proc. of Supercomputing Conference, pages 1370--1404, 1995.
....of the system architecture and its implications for the design of parallel programs. In addition, there are other sources of performance data which aren t used at all. Similar projects exist for other processor architectures [2] On the other hand, currently active research projects in academia [1, 5, 9] strive for portability and support for heterogeneous systems. This approach tends to discard the wealth of performance data sources available on modern systems, or uses them in a primitive manner. This proposal presents a case for a new performance visualization system designed specifically for ....
....The Pablo toolkit [13] decoupled performance data semantics from its structural representation via a performance data metaformat. This enabled a construction of a coarse grained dataflow model for graphical programming of performance analyses and visualizations. Its follow on work, SvPablo [1], closed the circle by integrating source code browser with manual instrumentation and the display of performance metrics. Requested instrumentation points are passed to a preprocessor for C and C programs, which instruments source code with calls to data capture library; instrumented sources ....
Adve, V. et al. An Integrated Compilation and Performance Analysis Environment for Data-Parallel Programs. In Proceedings of Supercomputing '95, November 1995.
....that are integrated in that they support a common underlying programming methodology and they can exchange intermediate data. In this regard, our work is related to work done to support di#erent phases of the parallel program development cycle. Examples of such tools are Fortran D editor Pablo [8], the KAP Pro Toolset [11] The Parallel Programming Hub is being built in a project called NETCARE (NETwork com1 Figure 1: Entering the Parallel Programming Hub: a) the main interface and (b) the user account request interface. puting for Computer Architecture Research and Education) which ....
....tools are not well integrated. Related projects have attempted to resolve this issue by broadening the support for di#erent stages of the parallel program development process and by creating more complete parallel programming environments. These e#orts include Faust [36] Fortran D editor Pablo [8], the the KAP Pro Toolset [11] Faust encompasses code optimization and performance evaluation with the emphasis on project management. Fortran D editor Pablo and SUIF Explorer aim to integrate parallelization and performance evaluation. The Annai Tool Project focuses on the aspects of ....
V. S. Adve, J. Mellor-Crummey, M. Anderson, K.Kennedy,J.C.Wang,andD.A.Reed.An integrated compilation and performance analysis environment for data parallel programs. In Proc. of Supercomputing Conference, pages 1370--1404, 1995.
....in such a scheme is complex because communication is not manifest in the source code, and the analysis required to locate and evaluate communication requires a global analysis of the source code. This situation has given rise to a number of analysis 14 and interactive parallelization tools [3]. In contrast, ZPL s communication costs are dependent only on the operations within a statement and can therefore be trivially identified. 4.2 Programming Benefits of Regions Regions in ZPL replace the array subscripting notation found in most other languages [16] Scalar languages use ....
....to the notion of portable performance. Ngo et al. demonstrate that HPF s failure to specify a data distribution model results in erratic execution times when compiling HPF programs with different compilers on the IBM SP 2 [39] To alleviate this problem, tools such as the dPablo toolkit [3] have been designed which give source level feedback about compilation decisions and program execution. However, these tools are tightly coupled to a compiler s individual compilation model and therefore do not directly aid in the development of portable programs. NESL [10] is a parallel ....
Vikram S. Adve, Jhy-Chun Wang, John Mellor-Crummey, Daniel A. Reed, Mark Anderson, and Ken Kennedy. An integrated compilation and performance analysis environment for data parallel programs. In Supercomputing '95, December 1995.
.... 1 6 160 168 N 1.7973 27667.582 Y 1 7 162 168 N 1.7973 5533.515 Y 2 8 164 168 N 1.7973 86.461 Y 3 9 166 168 N 1.8176 1.410 N 4 . In addition to this static analysis, systems such as SUIF Explorer and the D System [2] use a set of dynamic execution analyzers to find sections of code which are potentially parallelizable and to determine which regions of the program would most benefit from parallelization. This information is used to guide the analysis, enabling the programmer to focus on the sections of code ....
Vikram S. Adve, John Mellor-Crummey, Mark Anderson, Ken Kennedy, Jhy-Chun Wang, and Daniel A. Reed. An integrated compilation and performance analysis environment for data parallel programs. In Proceedings of Supercomputing '95, pages 1370--1404, December 1995.
....nodes are labeled with source line number, the variable name and subscripts, and a notation of whether the access was read or write. A concurrency history graph represents the reachable states in the execution of the parallel program and can be used to detect data usage and deadlock errors. In [1] an integration of the Pablo Performance Environ control flow data dependece control dependece Input Program PROGRAM lus2 INTEGER isize PARAMETER(isize = 256) COMMON a,b,d REAL a(256,256) REAL b(256,256) REAL d(256,256) INTEGER i,j call initialize(a,isize) call lu(a,isize) call ....
Adve V.S., Wang J.C, Mellor-Crummey J., Reed D.A., Anderson M., and Kennedy K. An integrated Compilation and Performance Analysis Environment for Data Parallel Progams. CRPC-TR94513-S (1994).
....that are integrated in that they support a common underlying programming methodology and they can exchange intermediate data. In this regard, our work is related to work done to support different phases of the parallel program development cycle. Examples of such tools are Fortran D editor Pablo [8], the Annai Tool Project [9] SUIF Explorer [10] and KAP Pro Toolset [11] The Parallel Programming Hub is being built in a project called NETCARE (NETwork com1 (a) b) Figure 1: Entering the Parallel Programming Hub: a) the main interface and (b) the user account request interface. puting ....
....tools are not well integrated. Related projects have attempted to resolve this issue by broadening the support for different stages of the parallel program development process and by creating more complete parallel programming environments. These efforts include Faust [36] Fortran D editor Pablo [8], the Annai Tool Project [9] SUIF Explorer [10] and the KAP Pro Toolset [11] Faust encompasses code optimization and performance evaluation with the emphasis on project management. Fortran D editor Pablo and SUIF Explorer aim to integrate parallelization and performance evaluation. The Annai ....
V. S. Adve, J. Mellor-Crummey, M. Anderson, K. Kennedy, J. C. Wang, and D. A. Reed. An integrated compilation and performance analysis environment for data parallel programs. In Proc. of Supercomputing Conference, pages 1370--1404, 1995.
....Choudhary [15] aims at optimizing the accesses to out of core arrays in data parallel programs. With PASSION, the compiler can generate explicit parallel input output calls with optimized data distribution from the specifications found in high level languages like High Performance Fortran (HPF) [16 18]. This is a compile time data distribution optimization, and will not handle the system state changes that occur during the execution time. It also requires the programmer to annotate the sourcecode to allow the compiler to generate the necessary input output calls. We believe for dynamic ....
V. S. Adve, J. Mellor-Crummey, M. Anderson, K. Kennedy, J. Wang, and D. A. Reed, "An Integrated Compilation and Performance Analysis Environment for Data Parallel Programs," in Proceedings of Supercomputing '95, Dec. 1995. 163
....data parallel compiler translates the high level source code into a lower level representation. We will analyze this translation phase in much more detail in the next chapter. proc : 1) 2) 3) 4) 5) 6) 7) 8) 9) 10) 11) 12) 13) 14) 15) 16) a (block) block) b c (cyclic) 1] [2] [3] 4] Figure 5.3: Distribution of data array elements across four processors. 62 Data Parallel Compiler Data Parallel Code Source Compiler and Linker Conventional SPMD Code Communication Library Executable Code Figure 5.4: Typical phases in the compilation of a data parallel ....
....the bidirectional mapping between the high level Fortran D source code and the low level SMPD code. Given a construct in one of the codes, the mapping indicates the associated construct(s) in the other. A detailed description of the SDDF record contents is beyond the scope of this study; see [2] for details. For performance scalability prediction, the key record types and their fields include the following. ffl Loops: Characterization of the loop iteration space, given by the loop limits, stride, and index variable; When the loop body contains some kind of interprocessor ....
[Article contains additional citation context not shown here]
Vikram S. Adve, John Mellor-Crummey, Mark Anderson, Ken Kennedy, Jhy-Chun Wang, and Daniel A. Reed. An integrated compilation and performance analysis environment for data parallel programs. In Proceedings of Supercomputing'95, San Diego, December 1995.
....SESIND SESGRD SESINT SRCHDF IVSR 9 A solution that combines the advantages of both automatic and manual parallelization is to incorporate users input into the automatic parallelization process. Several previous systems combines the knowledge of the compiler run time system and the user [1][11] 28] 60] 61] 71] A common model is for the user to provide information in an a priori fashion, in the form of explicit directives to the compiler to perform or ignore some compiler analysis or optimization[60] 61] This requires that the application programmer have relatively deep knowledge ....
....to the compiler to perform or ignore some compiler analysis or optimization[60] 61] This requires that the application programmer have relatively deep knowledge of the compiler. Another model is for the compiler to make the analysis results available to the programmer in an interactive manner[1][11] 28] 71] The user can then modify the source code or add directives as he or she examines the compilation result. Experience with the earlier systems such as Parascope Editor[28] suggests the importance of coupling compiler analyses with dynamic program profilers[54] This approach has been ....
[Article contains additional citation context not shown here]
V. S. Adve, J. Mellor-Crummey, M. Anderson, K. Kennedy, J.-C. Wang, and D. A. Reed, "An Integrated Compilation and Performance Analysis Environment for Data Parallel Programs." In Proceedings of Supercomputing 1995, San Diego, CA, November 1995.
....(a machine code to source code mapping) There are also tools that can provide more complicated hierarchical mappings between data parallel code and low level runtime activities. For example, some tools for parallel Fortran codes are integrated with the compiler (MPP Apprentice [70] and FortranD [1]) The compiler generates mapping information that the tool uses to correlate performance measures, like synchronization times, with source code fragments or data parallel arrays; the execution of application binary code can be mapped to the application developer s view of the program. Another ....
Vikram S. Adve, Jhy-Chun Wang, John Mellor-Crummey, Daniel A. Reed, Mark Anderson, and Ken Kennedy. An Integrated Compilation and Performance Analysis Environment for Data Parallel Programs. In Proceedings of the Supercomputing'95 Conference, San Diego, California, December 1995.
....requires fewer language extensions, and relies on less run time support. The Fortran D compiler computes information to support the D Editor, a interactive parallel programming tool for presenting both static and dynamic information on interprocessor communication and parallel performance [27, 28, 29]. The compiler has also been used for evaluating an integer programming system for performing automatic data decomposition [30] Few researchers have published experimental results for large programs. Pingali Rogers apply message vectorization, message pipelining, and reduction recognition in ....
V. Adve, J-C. Wang, J. Mellor-Crummey, D. Reed, M. Anderson, and K. Kennedy. An integrated compilation and performance analysis environment for data parallel programs. In Proceedings of Supercomputing '95, San Diego, CA, November 1995.
....a performance model to which all HPF compilers must adhere, this problem could have been alleviated. Most compilers compensate for HPF s lack of a performance model by providing tools that give source level feedback about the compilation process and or program execution. The dPablo toolkit [1] is one such example. The problem with this approach is that such tools are tightly coupled to a particular compiler s compilation model, and therefore do not aid in the creation of portably performing programs. In contrast with HPF s hidden and unspecified communication model, F Gamma Gamma ....
....to allow translated references to arrays. The operator takes an array and an offset vector called a direction as operands and shifts references to the array by the offset. For example, to replace each element of B with the sum of its left and right neighbors, one would write: R] B : B [0, 1] B [0,1] 4) Directions are generally named in order to improve a program s readability. For example, line (4) could have been written: direction left = 0, 1] right = 0,1 ] 5) R] B : B left B right; Figure 4 shows a picture of this operation. Directions are typically reused ....
[Article contains additional citation context not shown here]
V. S. Adve, J.-C. Wang, J. Mellor-Crummey, D. A. Reed, M. Anderson, and K. Kennedy. An integrated compilation and performance analysis environment for data parallel programs. In Supercomputing '95, December 1995.
....all the communications associated with the selected loop; the data layout pane displays the data decomposition information for each array of the loop; and finally the source pane shows the actual program code. The performance analysis environment Pablo has also been integrated within the D Editor [2]. The Vienna Fortran Compilation System (VFCS) 60] is a source to source compilation system based on Vienna Fortran, another extension set for Fortran, similar to Fortran D and HPF. It contains a compiler, an interactive performance estimator P3T [20] a performance measuring system (VFPMS) and ....
....maximum number of successors of any node of the graph, d successor functions s i : these functions give, for any node, its successors in all directions. The i th successor of node N can be noted as [0; 0; 1; 0; 0] where the only 1 is at position i. For instance, a dense matrix matrix[2][3] would be described very easily, since its successor functions effectively match index incrementations (see Figure 2.1) Assuming this graph is provided, either from a library, or by the user, there is no need to know how the data structure is effectively implemented. This dense matrix could be ....
[Article contains additional citation context not shown here]
V.S. Adve, J.C. Wang, J. Mellor-Crummey, D.A. Reed, M. Anderson, and K. Kennedy. An Integrated Compilation and Performance Analysis Environnement for Data Parallel Programs. Technical Report CRPC-TR94513-S, Center for Research on Parallel Computation, Rice University, December 1994.
....overview of the Unify distributed shared memory system which Xunify is designed to evaluate. Section 4 describes the design of the Xunify performance tool, its novel features and the views it supports. 2. Related Work Several tools have been developed to monitor parallel and distributed systems [21, 1, 8, 19, 20, 9] with the majority of distributed monitering systems targeted towards message passing systems. Each system relies on a monitoring component to gather run time information, a facility to route this trace information to a central location and an analysis and visualization system to display the ....
V. S. Adve, J. M. Crummey, M. Anderson, K. Kennedy, J. C. Wang, and D. A. Reed. Integrated compilation and performance analysis enviroment for parallel programs. In Proceedings of Supercomputing '95, December 1995.
....all the communications associated with the selected loop; the data layout pane displays the data decomposition information for each array of the loop; and finally the source pane shows the actual program code. The performance analysis environment Pablo has also been integrated within the D Editor [2]. The Vienna FORTRAN Compilation System (VFCS) 37] is a source to source compilation system based on Vienna Fortran, another extension set for Fortran, similar to FortranD and HPF. It contains a compiler, an interactive performance estimator P3T [13] a performance measuring system (VFPMS) and a ....
V.S. Adve, J.C. Wang, J. Mellor-Crummey, D.A. Reed, M. Anderson, and K. Kennedy. An Integrated Compilation and Performance Analysis Environnement for Data Parallel Programs. Technical Report CRPC-TR94513-S, Center for Research on Parallel Computation, Rice University, December 1994.
....as changing global arrays into private copies on each processor. Manual transformation errors can cause race conditions which are hard to track down. A solution that combines the advantages of both automatic and manual parallelization is to incorporate users input into the parallelization process[1][2] 11] 12] A common model is for the user to provide information in an a priori fashion, in the form of explicit directives to the compiler to perform or ignore some compiler analysis or optimization[12] This requires that the application programmer have relatively deep knowledge of the ....
....manner[11] The user can then modify the source code and or add directives as he or she examines the compilation result. Experience with earlier systems suggests the importance of coupling compiler analyses with dynamic program profilers. This approach has been adopted in more recent systems[1][2] In addition, it is found that more powerful compiler analyses are necessary to help the programmer find coarse grain parallelism, and that the programmer needs guidance in choosing the proper program transformations. The SUIF Explorer is a new interactive parallelizer designed with the goal ....
[Article contains additional citation context not shown here]
V. S. Adve, J. Mellor-Crummey, M. Anderson, K. Kennedy, J.-C. Wang, D. A. Reed, "An integrated compilation and performance analysis environment for data parallel programs," Proceedings of Supercomputing `95, San Diego, CA, November 1995.
....root causes (that is, implementation decisions) is particularly complex in cases where the phenomena of interest are created by events that are distant in both time and space. There are many tools that help the programmer in assessing performance phenomena. The integration of Fortran D and Pablo [1] correlates static information (e.g. data dependences) and This research was supported by NSF grant CCR 9510173, an NSF CISE Institutional Infrastructure Grant No. CDA 9401142, and an equipment grant from Digital Equipment Corporation s External Research Program. Wagner Meira Jr. is supported ....
V. S. Adve, J. Wang, J. Mellor-Crummey, D. A. Reed, M. Anderson, and K. Kennedy. An integrated compilation and performance analysis environment for data parallel programs. Technical Report CRPC-TR94513-S, Rice University, December 1994.
....There are several existing tools that do source level profiling. Some commercial examples of these tools include the MPP Apprentice[29] for the Cray T3D, Prism[27] for the TMC CM 5 and MPPE[18] from Maspar. In the research world, examples include the Pablo system from the Universities of Illinois[24,1] that can trace Fortran D programs and present this information in terms of the source program, and TAU from University of Oregon[20] that can do similar operations for pC programs[2] While presenting performance data at the source code level of a high level parallel language is crucial, it is ....
Vikram S. Adve, Jhu-Chun Wang, John Mellor-Crummey, Daniel A. Reed, Mark Anderson, and Ken Kennedy. An integrated compilation and performance analysis environment for data parallel programming. Technical Report 94513-S, CRPC, 1994.
....4] to measure the time efficiency for different application domains and also generally applicable test environments, such as Testpilot [2] have been developed. In the area of parallel systems and programming languages, large scale computational performance measurement systems are even more common [1, 20, 17] and much effort is spent on the development of these tools. Another system for carrying out experiments in all computer science research areas is the Desktop Experiment Management system [12] The context of the presented system is the TopSpin project, which involves the building of a system for ....
Vikram S. Adve, John Mellor-Crummey, Mark Anderson, Jhy chun Wang, and Daniel A. Reed. An integrated compilation and performance analysis environment for data parallel programs.
....p = 1 (sequential version) 726:9ms for p = 4, and 1059:3ms for p = 16. 7. Related Work in Automating Layout Many researchers have developed compilation algorithms or systems to support automatic alignment and data distribution for data parallel or automatically parallelized sequential programs [11, 7, 3, 14, 1, 15]. The global optimization problem for both locality and parallelism is NPcomplete, and has been addressed using heuristics [3, 11, 15] and 0 1 integer programming techniques [5] Some systems are interactive [11, 1] All of this work covers regular parallel programs and is applicable only to the ....
.... for data parallel or automatically parallelized sequential programs [11, 7, 3, 14, 1, 15] The global optimization problem for both locality and parallelism is NPcomplete, and has been addressed using heuristics [3, 11, 15] and 0 1 integer programming techniques [5] Some systems are interactive [11, 1]. All of this work covers regular parallel programs and is applicable only to the Navier Stokes solver phase of our application. The tridiagonal solver is used as a benchmark for some systems [14, 3, 15] although none consider the skewed layout. In addition, transformations to globally rearrange ....
V. Adve, J.-C. Wang, J. Mellor-Crummey, D. Reed, M. Anderson, and K. Kennedy. An integrated compilation and performance analysis environment for data parallel programs. In Supercomputing, San Diego, CA, Dec. 1995.
....Grant No. CDA9401142, and an equipment grant from Digital Equipment Corporation s External Research Program. Wagner Meira Jr. is supported by CNPq Brazil, Grant 200.862 93 6 There are many tools that help the programmer in assessing performance phenomena. The integration of Fortran D and Pablo [1] correlates static information (e.g. data dependences) and dynamic measurements. Paradyn [13] identifies both synchronization and shared memory bottlenecks, and provides profiles and information about sharing patterns. MTool [4] and MemSpy [6] focus on memory operations, quantifying their ....
V. S. Adve, J. Wang, J. Mellor-Crummey, D. A. Reed, M. Anderson, and K. Kennedy. An integrated compilation and performance analysis environment for data parallel programs. Technical Report CRPCTR94513 -S, Rice University, December 1994.
....solve the performance problems, and generates concise hints which are communicated to runtime libraries by a hints file . 2. 3 Previous work Performance analysis research has concentrated on visually displaying performance data [1, 2] relating performance data to high level language constructs [3, 4], or giving the user insights into performance problems using expert analysis [5, 6] Our framework aims to go a step further in the direction of automation : Paradise not only finds performance problems, but also solutions in terms of optimizations for the problem areas, and in co operation with ....
V. Adve, J. Mellor-Crummey, M. Anderson, K. Kennedy, J. Wang, and D. A. Reed. An integrated compilation and performance analysis environment for data-parallel programs. In Proceedings of Supercomputing 1995, December 1995.
....monitoring tools provide performance information about the hardware and system software at the base of a parallel or distributed system. For example, hardware monitors measure bus activity, cache activity, memory activity, instruction types, vector performance, I O sub 7 systems, or networks [1,2,3]. Operating systems monitors usually monitor processor utilization, scheduling, virtual memory activity, I O activity, or system calls [4] Although these tools can be used while programs of any language execute on a system, their measurements cannot easily be related to individual constructs of a ....
....measure implicit costs for other types of programming constructs. Two recent performance measurement systems use extensive compiler information to provide detailed performance information for data parallel Fortran dialects. The D system provides performance feedback for Fortran D applications [1]. It integrates the Fortran 77D compiler with the Pablo performance measurement environment. The result is an editor that annotates source code lines with performance information such as computation time and message statistics. The D sys 10 tem also performs sophisticated analysis of static and ....
[Article contains additional citation context not shown here]
Vikram S. Adve, Jhu-Chun Wang, John Mellor-Crummey, Daniel A. Reed, Mark Anderson, and Ken Kennedy. An integrated compilation and performance analysis environment for data parallel programming. Technical Report 94513-S, CRPC, 1994.
No context found.
V. S. Adve et al. An Integrated Compilation and Performance Analysis Environment for Data Parallel Programs. In Proceedings of Supercomputing, 1995.
No context found.
V. Adve, J. Wang, J. Mellor-Crummey, D. Reed, M. Anderson, and K. Kennedy. An integrated compilation and performance analysis environment for data parallel programs. In Proceedings of Supercomputing '95, December 1995. http://www.supercomp.org/sc95/proceedings/528 VADV/PAPER.PS.
....performance histogram data in a listing of disassembled object code annotated with source code. While this makes it possible to analyze highly optimized code where a single source line may map into multiple instances in the executable, it is a di#cult and time consuming exercise. Adve et al. [1] demonstrated a performance browser that uses compiler derived mapping information to interactively correlate HPF source code with the compiler s parallelized, optimized F77 MPI output, which is instrumented with SvPablo (See below. This combination can present MPI performance data in terms of ....
V. Adve, J.-C. Wang, J. Mellor-Crummey, D. Reed, M. Anderson, and K. Kennedy. An Integrated Compilation and Performance Analysis Environment for Data Parallel Programs. In Proceedings of Supercomputing '95, San Diego, CA, December 1995.
....but also eliminate the barrier that separates program creation from execution and post mortem optimization. This view is based on our experience building performance instrumentation and analysis tools for parallel systems [5, 6, 8] integrating runtime measurement and deep compile time analysis [1], and from an ongoing series of discussions application, compiler, library, and runtime system developers. 2 In this integrated model, high level problem solving environments (PSEs) allow users to compose programs from configurable modules, each with a performance specification interface that ....
....and performance analysts to capture, analyze, visualize and steer distributed computations that execute on thousands of distributed processors. Autopilot and Virtue are built atop portions of the Pablo toolkit [5,6] and the Globus runtime system. Both Autopilot and Virtue use the Globus toolkit [1] for wide area communication and task management. Because Globus supports multiple communication protocols, including shared memory, MPI, and TCP IP, it unites parallel systems, PC and workstation clusters, and wide area networks, allowing Autopilot and Virtue to interoperate via in a single ....
V. S. Adve, J. Mellor-Crummey, M. Anderson, K. Kennedy, J.-C. Wang, and D. A. Reed, "An Integrated Compilation and Performance Analysis Environment for Data Parallel Programs," Proceedings of Supercomputing '95, December 1995.
....and the extent to which memory operations are overlapped in our systems. 24 3. 4 Applications We use six applications in this study Radix, FFT and LU from the SPLASH 2 suite [42] Water and MP3D from the SPLASH suite [37] and Erlebacher, obtained from the Rice parallel Fortran compiler group [4]. We describe the applications briefly below. LU is a non contiguous version of the kernel from the SPLASH 2 suite modified to use flags instead of barriers to improve performance. The version of LU that we use also includes loop transformations and procedure inlining optimizations to better ....
....the SPLASH suite performing a Monte Carlo simulation of a rarefied flow simulation. Water is an N body molecular dynamics simulation from SPLASH. Erlebacher solves partial differential equations by performing 3 D vectorized tridiagonal solves using the Alternating Direction Implicit (ADI) method [4]. The key data structures are 3 dimensional arrays which are distributed by assigning a consecutive block of X Y planes to each processor. One phase dominates the execution time, and contains all the communication and synchronization of this application. The computation in this phase consists of a ....
V. S. Adve, J.-C. Wang, J. Mellor-Crummey, D. Reed, M. Anderson, and K. Kennedy. An Integrated Compilation and Performance Analysis Environment for Data Parallel Programs. In Proceedings of Supercomputing '95, San Diego, CA, December 1995.
....program with larger data sets. We construct the prediction model for the complete application by concatenating the models from its constituent fragments. As an example, we analyze the performance of the Erlebacher program, an 800 line, ten procedure benchmark 3 written by Thomas Eidson, at ICASE [2, 4]. Erlebacher solves 3 D partial differential equations via tridiagonal solves using Alternating Direction Implicit (ADI) integration. The program operates in succession on each of the three dimensions (X, Y and Z) In each dimension, derivatives are computed, followed by forward and backward ....
V. S. Adve, J. Mellor-Crummey, M. Anderson, K. Kennedy, J.-C. Wang, and D. A. Reed. An integrated compilation and performance analysis environment for data parallel programs. In Proceedings of Supercomputing'95, San Diego, December 1995.
....block 8 7:62 Theta 10 7 Figure 2: Application characteristics tions O2 funrollloop. 2. 4 Applications We use six applications for this study LU, FFT, and Radix from the SPLASH 2 suite [22] Mp3d and Water from the SPLASH suite [21] and Erlebacher from the Rice parallel compiler group [1]. We modified LU slightly to use flags instead of barriers for better load balance. Figure 2 gives input sizes for the applications and their execution times on a Simple uniprocessor. We also study versions of LU and FFT that include ILP specific optimizations that can be implemented in a ....
V. S. Adve et al. An Integrated Compilation and Performance Analysis Environment for Data Parallel Programs. In Supercomputing '95, 1995.
....costs. Thus, we predict the performance of non instrumented codes using performance data from the instrumented versions. Our current model is based on an integrated compilation and performance analysis system jointly developed by the Rice Fortran D group and the University of Illinois Pablo group [1]. Compile time data on program transformations and the relation of generated SPMD code to highlevel data parallel source allows us to link dynamic performance data with the corresponding highlevel program. Using this linkage, we then build parametrized execution models for code fragments in the ....
....As we noted in x2, the mechanism we use to predict performance is dependent on support from the data parallel compiler. Our current model is based on an integrated compilation and performance analysis system jointly developed by the Rice Fortran D group and the University of Illinois Pablo group [1]. Below, we describe this infrastructure and illustrate its use with a simple example. 3.1 Compiler Scalability Support In addition to generating SPMD code, the Rice Fortran D compiler also creates a file that describes each program loop, procedure call and communication primitive in the ....
[Article contains additional citation context not shown here]
Adve, V. S., Mellor-Crummey, J., Andersdon, M., Kennedy, K., Wang, J.-C., and Reed, D. A. An Integrated Compilation and Performance Analysis Environment for Data Parallel Programs, March 1995. Submitted for publication.
....the results of the analytic model validations. 5. 1 Applications Used in Model Validations The validation experiments include the following applications: FFT, LU, and Radix from the SPLASH 2 suites [29] Water from the SPLASH suite [24] and Erlebacher from the Rice parallel compiler group [2]. 2 We also use versions of LU and of FFT (denoted by opt) that are optimized for ILP systems by applying loop interchange to schedule read misses closer together, thus better overlapping their latencies [21] The optimization in FFTopt has the side effect that all read requests overlapped at ....
V. Adve et al. An Integrated Compilation and Performance Analysis Environment for Data Parallel Programs . In Proceedings of Supercomputing '95, San Diego, CA, Dec. 1995.
....connect and configure a data analysis graph. This Pablo software infrastructure has been used as a basis for a portion of Intel s performance analysis tools on the Intel Paragon XP S [19] and has been integrated with data parallel compilers to study the performance of Fortran D and HPF codes [1]. For additional details on the Pablo software design philosophy and our experiences, see [15, 14, 16] 2.2 Pablo Software Extensibility Given the rapid changes in hardware platforms and programming models and the performance optimization problems inherent in an evolving market, software ....
....inherent in an evolving market, software performance tools must be capable of evolving to accommodate unanticipated needs. Since its development, we have retargeted the Pablo software to study application input output patterns [2] World Wide Web access patterns [6] data parallel languages [1], and parallel file system policies [5] One of the keys to the Pablo environment s adaptability has been the inclusion of a self defining data format (SDDF) SDDF data streams consists of a group of record descriptors and record instances. Much as structure declarations in the C programming ....
[Article contains additional citation context not shown here]
Adve, V. S., Mellor-Crummey, J., Anderson, M., Kennedy, K., Wang, J., and Reed, D. A. An Integrated Compilation and Performance Analysis Environment for Data Parallel Programs. In Proceedings of Supercomputing '95 (Dec. 1995).
....key system parameters for our base system. Results for five variations of the base system are also reported, as described in Section 5. 4.2. Applications We study 5 applications FFT, LU, and Radix from SPLASH 2 [13] MP3D from SPLASH [12] and Erlebacher from the Rice parallel compiler group [1]. A few changes have been made to the original SPLASH 2 codes ILP processor and cache parameters Processor speed 500MHz Maximum fetch decode retire rate 4 Instruction issue window 64 entries Functional units 4 integer arithmetic 4 floating point 4 address generation Branch speculation depth 8 ....
V. S. Adve et al. An Integrated Compilation and Performance Analysis Environment for Data Parallel Programs. In Proc. Supercomputing '95, December 1995.
No context found.
V. S. Adve, J. Mellor-Crummey, M. Anderson, K. Kennedy, J.-C. Wang, D. A. Reed, "An Integrated Compilation and Performance Analysis Environment for Data Parallel Programs," Proceedings of Supercomputing '95, San Diego, CA, November 1995.
No context found.
V. Adve, J.-C. Wang, J. Mellor-Crummey, D. Reed, M. Anderson, and K. Kennedy. An integrated compilation and performance analysis environment for data parallel programs. In Proceedings of Supercomputing '95, San Diego, CA, Dec. 1995.
No context found.
V. Adve, J.-C. Wang, J. Mellor-Crummey, D. Reed, M. Anderson, and K. Kennedy. An integrated compilation and performance analysis environment for data parallel programs. In Proceedings of Supercomputing '95, San Diego, CA, Dec. 1995.
No context found.
V. Adve, J-C. Wang, J. Mellor-Crummey, D. Reed, M. Anderson, and K. Kennedy. An integrated compilation and performance analysis environment for data parallel programs. In Proceedings of Supercomputing '95, San Diego, CA, December 1995.
No context found.
V. Adve, J.-C. Wang, J. Mellor-Crummey, D. Reed, M. Anderson, and K. Kennedy. An integrated compilation and performance analysis environment for data parallel programs. In Proceedings of Supercomputing '95, San Diego, CA, Dec. 1995.
No context found.
V. S. Adve, J. Mellor-Crummey, M. Anderson, K. Kennedy, J.-C. Wang, and D. A. Reed, "An integrated compilation and performance analysis environment for data parallel programs, " in Proceedings of Supercomputing '95, San Diego, CA, Dec. 1995.
No context found.
V. S. Adve, J. Mellor-Crummey, M. Anderson, K. Kennedy, J. Wang, and D. A. Reed. An Integrated Compilation and Performance Analysis Environment for Data Parallel Programs. In Proceedings of Supercomputing'95, Sandigo, CA, USA, December 1995. Electronic publication.
No context found.
Vikram S. Adve, Jhy-Chun Wang, John Mellor-Crummey, Daniel A. Reed, Mark Anderson, and Ken Kennedy. An integrated compilation and performance analysis environment for data parallel programs. In Supercomputing '95, December 1995.
No context found.
V. S. Adve, J. M. Crummey, M. Anderson, K. Kennedy, J. C. Wang, and D. A. Reed, Integrated compilation and performance analysis enviroment for parallel programs, in Proceedings of Supercomputing '95, December 1995.
No context found.
V. S. Adve, J. Mellor-Crummey, M. Anderson, K. Kennedy, J. C. Wang, and D. A. Reed. An integrated compilation and performance analysis environment for data parallel programs. In Proc. of Supercomputing Conference, pages 1370--1404, 1995.
No context found.
V. S. Adve, J. Mellor-Crummey, M. Anderson, K. Kennedy, J. Wang, and D. Reed. An integrated compilation and performance analysis environment for data parallel programs. In Proceedings Supercomputing '95, December 1995.
No context found.
V. Adve, J. Mellor-Crummey, M. Anderson, K. Kennedy, J. Wang, and D. A. Reed. An integrated compilation and performance analysis environment for data-parallel programs. In Proceedings of Supercomputing 1995, December 1995.
No context found.
V. Adve, J. Mellor-Crummey, M. Anderson, K. Kennedy, J-C. Wang, and D. Reed. An integrated compilation and performance analysis environment for data parallel programs. In Proceedings of Supercomputing '95, San Diego, CA, December 1995.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC