• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Basic block distribution analysis to find periodic behavior and simulation points in applications. In: International conference on parallel architectures and compilation techniques, (2001)

by T Sherwood, E Perelman, B Calder
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 314
Next 10 →

Automatically characterizing large scale program behavior

by Timothy Sherwood, Erez Perelman, Greg Hamerly , 2002
"... Understanding program behavior is at the foundation of computer architecture and program optimization. Many pro-grams have wildly different behavior on even the very largest of scales (over the complete execution of the program). This realization has ramifications for many architectural and com-pile ..."
Abstract - Cited by 778 (41 self) - Add to MetaCart
Understanding program behavior is at the foundation of computer architecture and program optimization. Many pro-grams have wildly different behavior on even the very largest of scales (over the complete execution of the program). This realization has ramifications for many architectural and com-piler techniques, from thread scheduling, to feedback directed optimizations, to the way programs are simulated. However, in order to take advantage of time-varying behavior, we.must first develop the analytical tools necessary to automatically and efficiently analyze program behavior over large sections of execution. Our goal is to develop automatic techniques that are ca-pable of finding and exploiting the Large Scale Behavior of programs (behavior seen over billions of instructions). The first step towards this goal is the development of a hardware independent metric that can concisely summarize the behav-ior of an arbitrary section of execution in a program. To this end we examine the use of Basic Block Vectors. We quantify the effectiveness of Basic Block Vectors in capturing program behavior across several different architectural met-rics, explore the large scale behavior of several programs, and develop a set of algorithms based on clustering capable of an-alyzing this behavior. We then demonstrate an application of this technology to automatically determine where to simulate for a program to help guide computer architecture research. 1.
(Show Context)

Citation Context

...ons of execution. In order to perform such an analysis we need to develop a hardware independent metric that can concisely summarize the behavior of an arbitrary section of execution in a program. In =-=[19]-=-, we presented the use of Basic Block Vectors (BBV), which uses the structure of the program that is exercised during execution to determine where to simulate. A BBV represents the code blocks execute...

Temperature-aware microarchitecture

by Kevin Skadron, Mircea R. Stan, Wei Huang, Sivakumar Velusamy, Karthik Sankaranarayanan, David Tarjan - In Proceedings of the 30th Annual International Symposium on Computer Architecture , 2003
"... With power density and hence cooling costs rising exponentially, processor packaging can no longer be designed for the worst case, and there is an urgent need for runtime processor-level techniques that can regulate operating temperature when the package’s capacity is exceeded. Evaluating such techn ..."
Abstract - Cited by 478 (52 self) - Add to MetaCart
With power density and hence cooling costs rising exponentially, processor packaging can no longer be designed for the worst case, and there is an urgent need for runtime processor-level techniques that can regulate operating temperature when the package’s capacity is exceeded. Evaluating such techniques, however, requires a thermal model that is practical for architectural studies. This paper describes HotSpot, an accurate yet fast model based on an equivalent circuit of thermal resistances and capacitances that correspond to microarchitecture blocks and essential aspects of the thermal package. Validation was performed using finiteelement simulation. The paper also introduces several effective methods for dynamic thermal management (DTM): “temperaturetracking” frequency scaling, localized toggling, and migrating computation to spare hardware units. Modeling temperature at the microarchitecture level also shows that power metrics are poor predictors of temperature, and that sensor imprecision has a substantial impact on the performance of DTM. 1.

Phase Tracking and Prediction

by Timothy Sherwood, Suleyman Sair, Brad Calder , 2003
"... In a single second a modern processor can execute billions of instructions. Obtaining a bird's eye view of the behavior of a program at these speeds can be a difficult task when all that is available is cycle by cycle examination. In many programs, behavior is anything but steady state, and und ..."
Abstract - Cited by 233 (19 self) - Add to MetaCart
In a single second a modern processor can execute billions of instructions. Obtaining a bird's eye view of the behavior of a program at these speeds can be a difficult task when all that is available is cycle by cycle examination. In many programs, behavior is anything but steady state, and understanding the patterns of behavior, at run-time, can unlock a multitude of optimization opportunities.
(Show Context)

Citation Context

...hey are limited in their ability to see the program behavior in a larger context. Recently there has been a renewed interest in examining the run-time behavior of programs over longer periods of time =-=[10, 11, 19, 20, 3]-=-. It has been shown that programs can have considerably different behavior depending on which portion of execution is examined. More specifically, it has been shown that many programs execute as a ser...

Runtime power monitoring in high-end processors: Methodology and empirical data

by Canturk Isci, Margaret Martonosi , 2003
"... With power dissipation becoming an increasingly vexing problem across many classes of computer systems, measuring power dissipation of real, running systems has become crucial for hardware and software system research and design. Live power measurements are imperative for studies requiring execution ..."
Abstract - Cited by 199 (4 self) - Add to MetaCart
With power dissipation becoming an increasingly vexing problem across many classes of computer systems, measuring power dissipation of real, running systems has become crucial for hardware and software system research and design. Live power measurements are imperative for studies requiring execution times too long for simulation, such as thermal analysis. Furthermore, as processors become more complex and include a host of aggressive dynamic power management techniques, per-component estimates of power dissipation have become both more challenging as well as more important. In this paper we describe our technique for a coordinated measurement approach that combines real total power measurement with performance-counter-based, perunit power estimation. The resulting tool offers live total power measurements for Intel Pentium 4 processors, and also provides power breakdowns for 22 of the major CPU subunits over minutes of SPEC2000 and desktop workload execution. As an example application, we use the generated component power breakdowns to identify program power phase behavior. Overall, this paper demonstrates a processor power measurement and estimation methodology and also gives experiences and empirical application results that can provide a basis for future power-aware research. 1.
(Show Context)

Citation Context

...chnique, we demonstrate here how we can use component-based power breakdowns to identify power phases of programs. Several prior papers have proposed methods for detecting or exploiting program phases=-=[1, 5, 9, 22, 23, 29]-=-. Our example here is distinct because we focus on power phases rather than performance phases. A more detailed description of our power phase research can be found in [14]. We use the similarity matr...

Managing Multi-Configurable Hardware via Dynamic Working Set Analysis

by Ashutosh S. Dhodapkar, James E. Smith - In 29th Annual International Symposium on Computer Architecture , 2002
"... Microprocessors are designed to provide good average performance over a variety of workloads. This can lead to inefficiencies both in power and performance for individual programs and during individual phases within the same program. Microarchitectures with multi-configuration units (e.g. caches, pr ..."
Abstract - Cited by 192 (3 self) - Add to MetaCart
Microprocessors are designed to provide good average performance over a variety of workloads. This can lead to inefficiencies both in power and performance for individual programs and during individual phases within the same program. Microarchitectures with multi-configuration units (e.g. caches, predictors, instruction windows) are able to adapt dynamically to program behavior and enable /disable resources as needed. A key element of existing configuration algorithms is adjusting to program phase changes. This is typically done by "tuning" when a phase change is detected -- i.e. sequencing through a series of trial configurations and selecting the best. We study algorithms that dynamically collect and analyze program working set information. To make this practical, we propose working set signatures -- highly compressed working set representations (e.g. 32-128 bytes total). We describe algorithms that use working set signatures to 1) detect working set changes and trigger re-tuning; 2) identify recurring working sets and re-install saved optimal reconfigurations, thus avoiding the time-consuming tuning process; 3) estimate working set sizes to configure caches directly to the proper size, also avoiding the tuning process. We use reconfigurable instruction caches to demonstrate the performance of the proposed algorithms. When applied to reconfigurable instruction caches, an algorithm that identifies recurring phases achieves power savings and performance similar to the best algorithm reported to date, but with orders-of-magnitude savings in retunings. 1
(Show Context)

Citation Context

... ammp swim lucas twolf Previous work related to hardware reconfiguration was discussed in Sec. 1.1. In this section, we briefly discuss some more work related to working set analysis. Sherwood et al. =-=[22]-=- have proposed the use of program phase information to speed up simulation. They use basic block execution frequency information as a fingerprint for an interval of execution. The goal then, is to fin...

Picking Statistically Valid and Early Simulation Points

by Erez Perelman, Greg Hamerly, Brad Calder , 2003
"... Modern architecture research relies heavily on detailed pipeline simulation. Simulating the full execution of an industry standard benchmark can take weeks to months to complete. To address this issue we have recently proposed using Simulation Points (found by only examining basic block execution fr ..."
Abstract - Cited by 116 (15 self) - Add to MetaCart
Modern architecture research relies heavily on detailed pipeline simulation. Simulating the full execution of an industry standard benchmark can take weeks to months to complete. To address this issue we have recently proposed using Simulation Points (found by only examining basic block execution frequency profiles) to increase the efficiency and accuracy of simulation. Simulation points are a small set of execution samples that when combined represent the complete execution of the program.

Characterizing and Predicting Program Behavior and its Variability

by Evelyn Duesterwald, Sandhya Dwarkadas - In International Conference on Parallel Architectures and Compilation Techniques , 2003
"... To reach the next level of performance and energy efficiency, optimizations are increasingly applied in a dynamic and adaptive manner. Current adaptive systems are typically reactive and optimize hardware or software in response to detecting a shift in program behavior. We argue that program behavio ..."
Abstract - Cited by 115 (4 self) - Add to MetaCart
To reach the next level of performance and energy efficiency, optimizations are increasingly applied in a dynamic and adaptive manner. Current adaptive systems are typically reactive and optimize hardware or software in response to detecting a shift in program behavior. We argue that program behavior variability requires adaptive systems to be predictive rather than reactive. In order to be effective, systems need to adapt according to future rather than most recent past behavior. In this paper we explore the potential of incorporating prediction into adaptive systems. We study the time-varying behavior of programs using metrics derived from hardware counters on two different micro-architectures. Our evaluation shows that programs do indeed exhibit significant behavior variation even at a granularity of millions of instructions. In addition, while the actual behavior across metrics may be different, periodicity in the behavior is shared across metrics. We exploit these characteristics in the design of on-line statistical and table-based predictors. We introduce a new class of predictors, cross-metric predictors, that use one metric to predict another, thus making possible an efficient coupling of multiple predictors. We evaluate these predictors on the SPECcpu2000 benchmark suite and show that table-based predictors outperform statistical predictors by as much as 69 % on benchmarks with high variability. 1.
(Show Context)

Citation Context

...thods that define and use metrics to dynamically identify phases for adaptive optimization [5, 6, 8, 13, 26], and into techniques that identify appropriate simulation points for the desired workloads =-=[10, 16, 17, 24, 25]-=-. Balasubramonian et al. [5, 6] use interval-based exploration mechanisms in order to dynamically reconfigure the underlying micro-architecture to meet the need of an application. They identify a chan...

Comparing program phase detection techniques

by Ashutosh S. Dhodapkar, James E. Smith - In Int. Symposium on Microarchitecture , 2003
"... Detecting program phase changes accurately is an important aspect of dynamically adaptable systems. Three dynamic program phase detection techniques are compared – using instruction working sets, basic block vectors (BBV), and conditional branch counts. Because program phases are difficult to define ..."
Abstract - Cited by 99 (1 self) - Add to MetaCart
Detecting program phase changes accurately is an important aspect of dynamically adaptable systems. Three dynamic program phase detection techniques are compared – using instruction working sets, basic block vectors (BBV), and conditional branch counts. Because program phases are difficult to define, we compare the techniques using a variety of metrics. BBV techniques perform better than the other techniques providing higher sensitivity and more stable phases. However, the instruction working set technique yields 30 % longer phases than the BBV method, although there is less stability within phases. On average, the methods agree on phase changes 85 % of the time. Of the 15% of time they disagree, the BBV method is more efficient at detecting performance changes. The conditional branch counter technique provides good sensitivity, but is less effective at detecting major phase changes. Nevertheless, the branch counter technique correlates 83 % of the time with the BBV based technique. As an auxiliary result, we show that techniques based on procedure granularities do not perform as well as those based on instruction or basic block granularities. This is mainly due to their inability to detect changes within procedures. 1.
(Show Context)

Citation Context

...the number of elements in the set i.e. the cardinality of the set. Since the relative working set distance is a normalized metric, the maximum possible working set difference is 100%. Sherwood et al. =-=[20]-=- define a BBV to be a set of counters, each of which counts the number of times a static basic block is entered in a given execution interval. In later work [19], they approximate the BBV with an arra...

Simpoint 3.0: Faster and more flexible program analysis

by Greg Hamerly, Erez Perelman, Jeremy Lau, Brad Calder - Journal of Instruction Level Parallelism , 2005
"... This paper describes the new features available in the Sim-Point 3.0 release. The release provides two techniques for drastically reducing the run-time of SimPoint: faster searching to find the best clustering, and efficiently clustering large numbers of intervals. SimPoint 3.0 also provides an opti ..."
Abstract - Cited by 77 (4 self) - Add to MetaCart
This paper describes the new features available in the Sim-Point 3.0 release. The release provides two techniques for drastically reducing the run-time of SimPoint: faster searching to find the best clustering, and efficiently clustering large numbers of intervals. SimPoint 3.0 also provides an option to output only the simulation points that represent the majority of execution, which can reduce simulation time without much increase in error. Finally, this release provides support for correctly clustering variable length intervals, taking into consideration the weight of each interval during clustering. This paper describes SimPoint 3.0’s new features, how to use them, and points out some common pitfalls. 1
(Show Context)

Citation Context

...lman, Lau, & Calder we can perform very fast and accurate sampling. All of these representative samples together represent the complete execution of the program. The underlying philosophy of SimPoint =-=[1, 2, 3, 4, 5, 6]-=- is to use a program’s behavior patterns to guide sample selection. SimPoint intelligently chooses a very small set of samples called Simulation Points that, when simulated and weighed appropriately, ...

Using SimPoint for Accurate and Efficient Simulation

by Erez Perelman, Greg Hamerly, Michael Van Biesbrouck, Timothy Sherwood, Brad Calder - ACM SIGMETRICS Performance Evaluation Review , 2003
"... Modern architecture research relies heavily on detailed pipeline simulation. Simulating the full execution of a single industry standard benchmark at this level of detail takes on the order of months to complete. This problem is exacerbated by the fact that to properly perform an architectural evalu ..."
Abstract - Cited by 72 (2 self) - Add to MetaCart
Modern architecture research relies heavily on detailed pipeline simulation. Simulating the full execution of a single industry standard benchmark at this level of detail takes on the order of months to complete. This problem is exacerbated by the fact that to properly perform an architectural evaluation requires multiple benchmarks to be evaluated across many separate runs. To address this issue we recently created a tool called SimPoint that automatically finds a small set of Simulation Points to represent the complete execution of a program for e#cient and accurate simulation. In this paper we describe how to use the SimPoint tool, and introduce an improved SimPoint algorithm designed to significantly reduce the simulation time required when the simulation environment relies upon fast-forwarding.
(Show Context)

Citation Context

...ame program binary with the input may be run hundreds or thousands of times to examine how, for example, the effectiveness of a given architecture changes with its size. Our goal in creating SimPoint =-=[1, 2]-=- is to (1) significantly reduce simulation time, (2) provide an accurate characterization of the full program, and (3) to perform the analysis to accomplish the first two goals in a matter of minutes....

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University