MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

 

Download:
Download as a PDF | Download as a PS
unknown authors
http://www.cs.wisc.edu/~zilles/papers/profiler.hpca.ps
Add To MetaCart

Abstract:

Aggressive program optimization requires accurate profile information, but such accuracy requires many samples to be collected. We explore a novel profiling architecture that reduces the overhead of collecting each sample by including a programmable co-processor that analyzes a stream of profile samples generated by a microprocessor. From this stream of samples, the co-processor can detect correlations between instructions (e.g., memory dependence profiling) as well as those between different dynamic instances of the same instruction (e.g., value profiling). The profiler's programmable nature allows a broad range of data to be extracted, post-processed, and formatted, as well as provides the flexibility to tailor the profiling application to the program under test. Because the co-processor is specialized for profiling, it can execute profiling applications more efficiently than a general-purpose processor. The co-processor should not significantly impact the cost or performance of the main processor because it can be implemented using a small number of transistors at the chip's periphery. We demonstrate the proposed design through a detailed evaluation of load value profiling. Our implementation quickly and accurately estimates the value invariance of loads, with time overhead roughly proportional to the size of the instruction working set of the program. This algorithm demonstrates a number of general techniques for profiling, including: estimating the completeness of a profile, a means to focus profiling on particular instructions, management of profiling resources.

Citations

1253 The Simplescalar toolset, version 2.0 – Burger, Austin - 1997
664 ATOM: A system for building customized program analysis tools – Srivastava, Eustace - 1994
318 The Stanford FLASH Multiprocessor – Kuskin, Ofelt, et al. - 1994
314 Value Locality and Load Value Prediction – Lipasti, Wilkerson, et al. - 1996
313 The Alpha 21264 microprocessor – Kessler - 1991
198 A general approach for run-time specialization and its application to C – Consel, Noël - 1996
186 Efficient path profiling – Ball, Larus - 1996
177 DIVA: A Reliable Substrate for Deep Submicron – Austin - 1999
115 Selective Value Prediction – Calder, Reinman, et al. - 1999
112 Effective Dynamic Compilation – Fast - 1996
97 C: A language for high-level, efficient, and machine-independent dynamic code generation – Engler, Hsieh, et al. - 1996
95 Dynamic Program Instrumentation for Scalable Performance Tools – Hollingsworth, Miller, et al. - 1998
93 ProfileMe: Hardware Support for Instruction-Level Profiling on Out-of-Order Processors – Dean - 1997
91 Value profiling – Calder, Feller, et al. - 1997
90 Performance analysis using the MIPS R10000 performance counters – Zagha, Larson, et al. - 1996
77 The technology behind Crusoe processors – Klaiber - 2000
63 Value profiling and optimization – Calder, Feller, et al. - 1999
63 Can Program Profiling Support Value Prediction – Gabbay, Mendelson - 1997
60 A Hardware-Driven Profiling Scheme for Identifying Program Hot Spots to Support Runtime Optimization,” ISCA – Merten, Trick, et al. - 1999
54 Informing Memory Operations: Providing Memory Performance Feedback – Horowitz - 1996
54 System Support for automatic Profiling and Optimization – Zhang - 1997
44 Storageless Value Prediction Using Prior Register Values – Tullsen, Seng - 1999
36 Accurate and Practical Profile-Driven Compilation Using the Profile Buffer – CONTE, MENEZES, et al. - 1996
35 et al. A 160-MHz, 32-b, 0.5-W – Montanaro - 1996
28 Using Branch Handling Hardware to Support Profile-Driven Optimization – Conte, Patel, et al. - 1994
26 et al., “Continuous Profiling: Where have all the cycles gone – Anderson - 1997
26 Initial Results for Glacial Variable Analysis – Autrey, Wolfe - 1996
26 Integrating Performance Monitoring and Communication in Parallel Computers (92 kB – Martonosi, Ofelt, et al. - 1996
22 A hardware mechanism for dynamic extraction and relayout of program hot spots – Merten, Trick, et al. - 2000
20 Efficient and flexible value sampling – Burrows, Erlingson, et al. - 2000
20 Relational profiling: Enabling thread-level parallelism in virtual machines – Heil, Smith - 1996
17 Transparent Dynamic Optimization – Bala, Duesterwald, et al. - 1999
12 Value prediction in VLIW machines – Nakra, Gupta, et al. - 1999
7 Shift Register Sequences. Aegean Park Press, revised edition – Golumb - 1982
4 Vtune: a visual tuning environment. http://support.intel.com/support/performancetools/vtune – Corporation
4 The shrimp hardware performance monitor: Design and applications – Martonosi, Clark, et al. - 1996
4 Instruction sampling instrumentation – Westcott, White - 1992
3 Apparatus for sampling instruction operand or result values in a processor pipeline – Chrysos, Dean, et al. - 1999
3 Sampling Methods For Applied Research: Text and Cases – Tryfos - 1996
1 A Fully Associative Software -Managed Cache Design – Hallnor, Reinhardt - 2000