| S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proceedings of the 2000. |
....We then evaluate the performance of our optimized implementations relative to the fundamental limits on performance. Specifically, we first derive simple models of the upper bounds on the execution rate (Mflop s) of our implementations. Using hardware counter data collected with the PAPI library [10], we then verify our models on three hardware platforms (Table 1) and a set of triangular factors from applications (Table 2) We observe that our optimized implementations can achieve 80 or more of these bounds; furthermore, we observe speedups of up to 1.8x when both register blocking and ....
S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proceedings of Supercomputing, November 2000.
....performance counter data. We also present related work that uses hardware counter data to understand Java middleware and server applications. 5. 1 Accessing Hardware Counters Several library packages provide access to hardware performance counter data, including the HPM toolkit [11] PAPI [8], and PCL [7] These libraries provide facilities to instrument programs, record hardware counter data, and analyze the results. We extend the functionality of existing libraries to obtain hardware performance data in a virtual machine. Specifically, we extend Jikes RVM to collect ....
S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proceedings of the 2000.
....of loops) JavaPSL also formalizes trace files by using an event class. This paper does not contain any property which incorporates events, which may change in future work. 5 Note that due to advanced monitoring and profiling technologies (e.g. dynamic profiling [14] hardware profiling [5], and source code profiling [19, 15] there is basically no barrier to obtain experiment related data for arbitrary parallel and distributed programs. 4.2 Filters and Statistics for Experiment related Data It order to provide flexible mechanisms to describe performance problems we commonly ....
S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proceeding SC'2000.
....speci c platform and operating system. Platform independent counter libraries are currently under development in various research Translation Lookaside Bu er projects. Two such libraries are the Performance Counter Library (PCL) 4] and the Performance Application Programming Interface (PAPI) [7]. Their aim is to provide a set of platform independent counters, that allow easy portability of programs instrumented with these libraries, and to allow inter platform performance comparisons. As we currently restrict our research to a Solaris UltraSPARC platform, we use the libcpc [12] Although ....
S. Browne, J. Dongarra, N. Garner, K. London and P. Mucci, A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters, Proceedings SC'2000.
....are required in parallel algorithm design and in the interaction of the algorithms with the runtime system. Current research has focused either on middleware between the application and the system [6, 7, 1, 22] or on performance monitoring and prediction libraries such as NWS [36] and PAPI [4, 5]. However, besides some preliminary work in AppLeS [3] no attention has been given on how to use this system information to dynamically change the algorithm for better resource utilization. In [34] we described a unique parallelization approach that is designed to tolerate high network ....
S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Supercomputing 2000.
....1: Components of a comprehensive performance analysis environment. 2 Microprocessor Hardware Performance Counters Modern microprocessors include integrated hardware support for non intrusive monitoring of a variety of processor and memory system events. Commonly referred to as hardware counters [3, 14], this capability is very useful to both computer architects [2] and applications developers [23] Detailed software instrumentation can introduce perturbation into an application and the measurement process itself. On the other hand, simulation can become impractical for large, complex ....
....ithread) call deltat( Finished Y sweep ,2) call f end section(rank, 2,0,ierr) BARRIER Table 1: Sample code segment from function runhyd3 of sPPM. Table 1 shows a code segment from sPPM [17] that has been instrumented with high level library routines written on top of MPX [15] and PAPI [3] in order to capture eight hardware counter values: total processor cycles, total instructions, cycles stalled waiting for 4 memory accesses, floating point divide instructions, L1 cache misses, floating point instructions, load instructions, and store instructions. As Table 2 ....
S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci, A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters, Proc. SC2000.
....the crisp output might be somewhere between 0.3 and 0.4. 5.2 Performance Data Sources To collect data about application and system performance, we need to gather information about what the application is doing and how the system responds to the application stimuli. We use the PAPI toolkit [3] to read the processor hardware performance counters. The information gathered 25 half , Figure 5.6: Scaled Truth Function for furnace Fuzzy Variables by this toolkit is used both to determine the application instruction mix for the application intrinsic signature and the ....
BROWNE, S., DONGARRA, J., GARNER, N., LONDON, K., AND MUCCI, P. A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters. In Proceedings of Supercomputing 2000.
....is a critical issue for most scientific programs. To help programmers tune their programs, a variety of tools have been created ranging from source code and binary analysis tools [ 1, 2, 3, 4, 5] to libraries and utilities to access hardware performance counters built into microprocessors [6, 7, 8, 9]. Depending on the type of problem being studied and stage of the tuning process (initial tuning of a new algorithm vs. frae tuning for a specific platform) different tools are useful. One area that has been lacking is a set of tools that allow programmers to understand the precise memory ....
....none of the above simulators linked the memory references to symbolic data structures and subroutines in the source program. This was another crucial aspect of the SIGMA approach. There have been some tools that access hardware performance counters. For Intel platforms, Vttme[15] is available. PAPI[7] provides a multi platform interface to access hardware counters. However, these approaches only provide counters of data or sampling among code regions. In contrast, SIGMA provides detailed information about individual memory references, and the actual memory addresses being accessed. Other ....
S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. "A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters". In Proceedings of Supercomputing'00, November 2000.
....data that the rest of the SCALEA system could analyze. The TAU [22, 17] performance framework is an integrated toolkit for performance instrumentation, measurement, and analysis for parallel, multithreaded programs. SCALEA uses TAU instrumentation library as one of its tracing libraries. PAPI [5] speci es a standard API for accessing hardware performance counters available on most modern microprocessors. SCALEA uses the PAPI library for measuring hardware counters. gprof [11, 10] is a compiler based pro ling framework that mostly analyses the execution behavior and counts of functions ....
S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proceeding SC'
....are required in parallel algorithm design and in the interaction of the algorithms with the runtime system. Current research has focused either on middleware between the application and the system [6, 7, 1, 22] or on performance monitoring and prediction libraries such as NWS [37] and PAPI [4, 5]. However, besides some preliminary work in AppLeS [3] no attention has been given on how to use this system information to dynamically change the algorithm for better resource utilization. In [34] we described a unique parallelization approach that is designed to tolerate high network ....
S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Supercomputing
.... system information has been used at the scheduler level, or at best at the outset of a program for original problem partitioning and allocation of resources [1, 31] With recent performance monitoring and prediction libraries such as Network Weather Service [30] AppLeS [2] and PAPI [3, 4], this information can be used dynamically to set appropriately the variable granularity of our algorithms. In this paper, we demonstrate the e ectiveness and robustness of our approach even with simpler tools, on both a SUN and a Pentium based COW. Our experiments indicate that the new code ....
....there are two problems; rst, repartitioning is often too expensive to be performed frequently, and second, system information must be obtained accurately and inexpensively. Recently, a variety of performance monitoring and prediction libraries such as Network Weather Service (NWS) 30] and PAPI [3, 4] have been developed. NWS provides measurements and forecasts of CPU and bandwidth on a heterogeneous network, while PAPI provides local processor information (memory, CPU, paging, etc. There is a small number of related, new projects focusing on middleware that allows the application to specify ....
S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Supercomputing
No context found.
S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proceedings of the 2000.
No context found.
S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proceedings of Supercomputing, November 2000.
No context found.
S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proceedings of the 2000.
No context found.
S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proceeding SC'2000.
No context found.
S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proceedings of Supercomputing, November 2000.
No context found.
S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proceeding SC'2000.
No context found.
S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proceedings of Supercomputing, November 2000.
No context found.
S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proceedings of the 2000.
No context found.
S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proceeding SC'2000.
No context found.
S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proceedings of Supercomputing, November 2000.
No context found.
S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proceeding SC'2000.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC