22 citations found. Retrieving documents...
S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proceedings of the 2000.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Automatic Performance Tuning and Analysis of Sparse .. - Vuduc, Kamil, Hsu, .. (2002)   (Correct)

....We then evaluate the performance of our optimized implementations relative to the fundamental limits on performance. Specifically, we first derive simple models of the upper bounds on the execution rate (Mflop s) of our implementations. Using hardware counter data collected with the PAPI library [10], we then verify our models on three hardware platforms (Table 1) and a set of triangular factors from applications (Table 2) We observe that our optimized implementations can achieve 80 or more of these bounds; furthermore, we observe speedups of up to 1.8x when both register blocking and ....

S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proceedings of Supercomputing, November 2000.


A VM Infrastructure for Understanding the Hardware.. - Sweeney, Cahoon.. (2003)   (Correct)

....performance counter data. We also present related work that uses hardware counter data to understand Java middleware and server applications. 5. 1 Accessing Hardware Counters Several library packages provide access to hardware performance counter data, including the HPM toolkit [11] PAPI [8], and PCL [7] These libraries provide facilities to instrument programs, record hardware counter data, and analyze the results. We extend the functionality of existing libraries to obtain hardware performance data in a virtual machine. Specifically, we extend Jikes RVM to collect ....

S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proceedings of the 2000.


Modeling and Detecting Performance Problems for.. - Fahringer, Júnior (2002)   (Correct)

....of loops) JavaPSL also formalizes trace files by using an event class. This paper does not contain any property which incorporates events, which may change in future work. 5 Note that due to advanced monitoring and profiling technologies (e.g. dynamic profiling [14] hardware profiling [5], and source code profiling [19, 15] there is basically no barrier to obtain experiment related data for arbitrary parallel and distributed programs. 4.2 Filters and Statistics for Experiment related Data It order to provide flexible mechanisms to describe performance problems we commonly ....

S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proceeding SC'2000.


How Fast is `-fast'? Performance Analysis of KDD.. - Czezowski, Christen (2002)   (Correct)

....speci c platform and operating system. Platform independent counter libraries are currently under development in various research Translation Lookaside Bu er projects. Two such libraries are the Performance Counter Library (PCL) 4] and the Performance Application Programming Interface (PAPI) [7]. Their aim is to provide a set of platform independent counters, that allow easy portability of programs instrumented with these libraries, and to allow inter platform performance comparisons. As we currently restrict our research to a Solaris UltraSPARC platform, we use the libcpc [12] Although ....

S. Browne, J. Dongarra, N. Garner, K. London and P. Mucci, A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters, Proceedings SC'2000.


Dynamic Load Balancing of an Iterative Eigensolver on Grids .. - McCombs, Mills, al. (2003)   (Correct)

....are required in parallel algorithm design and in the interaction of the algorithms with the runtime system. Current research has focused either on middleware between the application and the system [6, 7, 1, 22] or on performance monitoring and prediction libraries such as NWS [36] and PAPI [4, 5]. However, besides some preliminary work in AppLeS [3] no attention has been given on how to use this system information to dynamically change the algorithm for better resource utilization. In [34] we described a unique parallelization approach that is designed to tolerate high network ....

S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Supercomputing 2000.


Scalable Analysis Techniques for Microprocessor Performance.. - Ahn, Vetter (2002)   (2 citations)  (Correct)

....1: Components of a comprehensive performance analysis environment. 2 Microprocessor Hardware Performance Counters Modern microprocessors include integrated hardware support for non intrusive monitoring of a variety of processor and memory system events. Commonly referred to as hardware counters [3, 14], this capability is very useful to both computer architects [2] and applications developers [23] Detailed software instrumentation can introduce perturbation into an application and the measurement process itself. On the other hand, simulation can become impractical for large, complex ....

....ithread) call deltat( Finished Y sweep ,2) call f end section(rank, 2,0,ierr) BARRIER Table 1: Sample code segment from function runhyd3 of sPPM. Table 1 shows a code segment from sPPM [17] that has been instrumented with high level library routines written on top of MPX [15] and PAPI [3] in order to capture eight hardware counter values: total processor cycles, total instructions, cycles stalled waiting for 4 memory accesses, floating point divide instructions, L1 cache misses, floating point instructions, load instructions, and store instructions. As Table 2 ....

S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci, A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters, Proc. SC2000.


Performance Contracts: Predicting and Monitoring Grid.. - Vraalsen, Aydt.. (2001)   (12 citations)  (Correct)

....the crisp output might be somewhere between 0.3 and 0.4. 5.2 Performance Data Sources To collect data about application and system performance, we need to gather information about what the application is doing and how the system responds to the application stimuli. We use the PAPI toolkit [3] to read the processor hardware performance counters. The information gathered 25 half , Figure 5.6: Scaled Truth Function for furnace Fuzzy Variables by this toolkit is used both to determine the application instruction mix for the application intrinsic signature and the ....

BROWNE, S., DONGARRA, J., GARNER, N., LONDON, K., AND MUCCI, P. A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters. In Proceedings of Supercomputing 2000.


SIGMA: A Simulator Infrastructure to Guide Memory Analysis - DeRose, Ekanadham.. (2002)   (3 citations)  (Correct)

....is a critical issue for most scientific programs. To help programmers tune their programs, a variety of tools have been created ranging from source code and binary analysis tools [ 1, 2, 3, 4, 5] to libraries and utilities to access hardware performance counters built into microprocessors [6, 7, 8, 9]. Depending on the type of problem being studied and stage of the tuning process (initial tuning of a new algorithm vs. frae tuning for a specific platform) different tools are useful. One area that has been lacking is a set of tools that allow programmers to understand the precise memory ....

....none of the above simulators linked the memory references to symbolic data structures and subroutines in the source program. This was another crucial aspect of the SIGMA approach. There have been some tools that access hardware performance counters. For Intel platforms, Vttme[15] is available. PAPI[7] provides a multi platform interface to access hardware counters. However, these approaches only provide counters of data or sampling among code regions. In contrast, SIGMA provides detailed information about individual memory references, and the actual memory addresses being accessed. Other ....

S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. "A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters". In Proceedings of Supercomputing'00, November 2000.


On Using SCALEA for Performance Analysis of.. - Truong.. (2001)   (1 citation)  (Correct)

....data that the rest of the SCALEA system could analyze. The TAU [22, 17] performance framework is an integrated toolkit for performance instrumentation, measurement, and analysis for parallel, multithreaded programs. SCALEA uses TAU instrumentation library as one of its tracing libraries. PAPI [5] speci es a standard API for accessing hardware performance counters available on most modern microprocessors. SCALEA uses the PAPI library for measuring hardware counters. gprof [11, 10] is a compiler based pro ling framework that mostly analyses the execution behavior and counts of functions ....

S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proceeding SC'


Dynamic Load Balancing of an Iterative Eigensolver on.. - McCombs, Mills.. (2001)   (Correct)

....are required in parallel algorithm design and in the interaction of the algorithms with the runtime system. Current research has focused either on middleware between the application and the system [6, 7, 1, 22] or on performance monitoring and prediction libraries such as NWS [37] and PAPI [4, 5]. However, besides some preliminary work in AppLeS [3] no attention has been given on how to use this system information to dynamically change the algorithm for better resource utilization. In [34] we described a unique parallelization approach that is designed to tolerate high network ....

S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Supercomputing


Algorithmic Modifications to the Jacobi-Davidson Parallel.. - Mills, Stathopoulos   (Correct)

.... system information has been used at the scheduler level, or at best at the outset of a program for original problem partitioning and allocation of resources [1, 31] With recent performance monitoring and prediction libraries such as Network Weather Service [30] AppLeS [2] and PAPI [3, 4], this information can be used dynamically to set appropriately the variable granularity of our algorithms. In this paper, we demonstrate the e ectiveness and robustness of our approach even with simpler tools, on both a SUN and a Pentium based COW. Our experiments indicate that the new code ....

....there are two problems; rst, repartitioning is often too expensive to be performed frequently, and second, system information must be obtained accurately and inexpensively. Recently, a variety of performance monitoring and prediction libraries such as Network Weather Service (NWS) 30] and PAPI [3, 4] have been developed. NWS provides measurements and forecasts of CPU and bandwidth on a heterogeneous network, while PAPI provides local processor information (memory, CPU, paging, etc. There is a small number of related, new projects focusing on middleware that allows the application to specify ....

S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Supercomputing


Vertical Profiling: Understanding the Behavior of - Object-Oriented..   (Correct)

No context found.

S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proceedings of the 2000.


Performance Modeling and Analysis of Cache Blocking.. - Nishtala, Vuduc.. (2004)   (Correct)

No context found.

S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proceedings of Supercomputing, November 2000.


Using Hardware Performance Monitors to Understand.. - Sweeney..   (Correct)

No context found.

S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proceedings of the 2000.


On Using SCALEA for Performance Analysis of Distributed and - Parallel Programs Hong-Linh   (Correct)

No context found.

S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proceeding SC'2000.


Automatic Tuning of Collective Communication.. - Nishtala, Patel.. (2003)   (Correct)

No context found.

S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proceedings of Supercomputing, November 2000.


SCALEA User's Guide - Performance Analysis for Distributed.. - Truong, Fahringer (2002)   (Correct)

No context found.

S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proceeding SC'2000.


Automatic Performance Tuning and Analysis of Sparse .. - Vuduc, Kamil, Hsu, .. (2002)   (Correct)

No context found.

S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proceedings of Supercomputing, November 2000.


Using Hardware Performance Monitors to Understand.. - Sweeney..   (Correct)

No context found.

S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proceedings of the 2000.


SISPROFILING Library: User's Guide - Version 1.0 - Truong (2002)   (Correct)

No context found.

S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proceeding SC'2000.


Performance Modeling and Analysis of Cache Blocking.. - Nishtala, Vuduc.. (2004)   (Correct)

No context found.

S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proceedings of Supercomputing, November 2000.


SCALEA: A performance analysis system for distributed and.. - Truong, Fahringer (2001)   (Correct)

No context found.

S. Browne, J. Dongarra, N. Garner, K. London, and P. Mucci. A scalable cross-platform infrastructure for application performance tuning using hardware counters. In Proceeding SC'2000.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC