• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

The SimpleScalar Tool Set, Version 2.0 (0)

by Doug Burger, Todd Austin
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 1,844
Next 10 →

Wattch: A Framework for Architectural-Level Power Analysis and Optimizations

by David Brooks, Vivek Tiwari, Margaret Martonosi - In Proceedings of the 27th Annual International Symposium on Computer Architecture , 2000
"... Power dissipation and thermal issues are increasingly significant in modern processors. As a result, it is crucial that power/performance tradeoffs be made more visible to chip architects and even compiler writers, in addition to circuit designers. Most existing power analysis tools achieve high ..."
Abstract - Cited by 1320 (43 self) - Add to MetaCart
Power dissipation and thermal issues are increasingly significant in modern processors. As a result, it is crucial that power/performance tradeoffs be made more visible to chip architects and even compiler writers, in addition to circuit designers. Most existing power analysis tools achieve high accuracy by calculating power estimates for designs only after layout or floorplanning are complete In addition to being available only late in the design process, such tools are often quite slow, which compounds the difficulty of running them for a large space of design possibilities.
(Show Context)

Citation Context

...se power models can be integrated into a range of architectural simulators to provide power estimates. In this work we have integrated these power models into the SimpleScalar architectural simulator =-=[7]-=-. - Cycle. Levels] Power Models Figure 2: Overall Structure of the Power Simulator. Figure 2 illustrates the overall structure of Wattch and 84 the interface between the performance simulator and the ...

Automatically characterizing large scale program behavior

by Timothy Sherwood, Erez Perelman, Greg Hamerly , 2002
"... Understanding program behavior is at the foundation of computer architecture and program optimization. Many pro-grams have wildly different behavior on even the very largest of scales (over the complete execution of the program). This realization has ramifications for many architectural and com-pile ..."
Abstract - Cited by 778 (41 self) - Add to MetaCart
Understanding program behavior is at the foundation of computer architecture and program optimization. Many pro-grams have wildly different behavior on even the very largest of scales (over the complete execution of the program). This realization has ramifications for many architectural and com-piler techniques, from thread scheduling, to feedback directed optimizations, to the way programs are simulated. However, in order to take advantage of time-varying behavior, we.must first develop the analytical tools necessary to automatically and efficiently analyze program behavior over large sections of execution. Our goal is to develop automatic techniques that are ca-pable of finding and exploiting the Large Scale Behavior of programs (behavior seen over billions of instructions). The first step towards this goal is the development of a hardware independent metric that can concisely summarize the behav-ior of an arbitrary section of execution in a program. To this end we examine the use of Basic Block Vectors. We quantify the effectiveness of Basic Block Vectors in capturing program behavior across several different architectural met-rics, explore the large scale behavior of several programs, and develop a set of algorithms based on clustering capable of an-alyzing this behavior. We then demonstrate an application of this technology to automatically determine where to simulate for a program to help guide computer architecture research. 1.
(Show Context)

Citation Context

...e of full program behavior. Related work is discussed in Section 6, and the techniques presented are summarized in Section 7. 2. METHODOLOGY In this paper we used both ATOM [21] and SimpleScalar 3.0c =-=[3]-=- to perform our analysis and gather our results for the Alpha AXP ISA. ATOM is used to quickly gather profiling information about the code executed for a program. SimpleScalar is used to validate the ...

Temperature-aware microarchitecture

by Kevin Skadron, Mircea R. Stan, Wei Huang, Sivakumar Velusamy, Karthik Sankaranarayanan, David Tarjan - In Proceedings of the 30th Annual International Symposium on Computer Architecture , 2003
"... With power density and hence cooling costs rising exponentially, processor packaging can no longer be designed for the worst case, and there is an urgent need for runtime processor-level techniques that can regulate operating temperature when the package’s capacity is exceeded. Evaluating such techn ..."
Abstract - Cited by 478 (52 self) - Add to MetaCart
With power density and hence cooling costs rising exponentially, processor packaging can no longer be designed for the worst case, and there is an urgent need for runtime processor-level techniques that can regulate operating temperature when the package’s capacity is exceeded. Evaluating such techniques, however, requires a thermal model that is practical for architectural studies. This paper describes HotSpot, an accurate yet fast model based on an equivalent circuit of thermal resistances and capacitances that correspond to microarchitecture blocks and essential aspects of the thermal package. Validation was performed using finiteelement simulation. The paper also introduces several effective methods for dynamic thermal management (DTM): “temperaturetracking” frequency scaling, localized toggling, and migrating computation to spare hardware units. Modeling temperature at the microarchitecture level also shows that power metrics are poor predictors of temperature, and that sensor imprecision has a substantial impact on the performance of DTM. 1.

DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design

by Todd M. Austin - In Proc. 32nd Annual Intl. Symp. on Microarchitecture , 1999
"... Building a high-petformance microprocessor presents many reliability challenges. Designers must verify the correctness of large complex systems and construct implementations that work reliably in varied (and occasionally adverse) operating conditions. To&rther complicate this task, deep submicro ..."
Abstract - Cited by 374 (15 self) - Add to MetaCart
Building a high-petformance microprocessor presents many reliability challenges. Designers must verify the correctness of large complex systems and construct implementations that work reliably in varied (and occasionally adverse) operating conditions. To&rther complicate this task, deep submicron fabrication technologies present new reliability challenges in the form of degraded signal quality and logic failures caused by natural radiation interference. In this paper; we introduce dynamic verification, a novel microarchitectural technique that can significantly reduce the burden of correctness in microprocessor designs. The approach works by augmenting the commit phase of the processor pipeline with a functional checker unit. Thefunctional checker verifies the correctness of the core processor’s computation, only permitting correct results to commit. Overall design cost can be dramatically reduced because designers need only veri ’ the correctness of the checker unit. We detail the DIVA checker architecture, a design optimized for simplicity and low cost. Using detailed timing simulation, we show that even resource-frugal DIVA checkers have little impact on core processor peflormance. To make the case for reduced verification costs, we argue that the DIVA checker should lend itself to functional and electrical verification better than a complex core processor Finally, future applications that leverage dynamic veri@cation to increase processor performance and availability are suggested. 1
(Show Context)

Citation Context

...ulation for DIVA checkers with varied resource configurations, checker latency, and fault rates. 3.1 Methodology The simulators used in this study are derived from the SimpleScalar/Alpha 3.0 tool set =-=[9]-=-, a suite of functional and timing simulation tools for the Alpha AXP ISA. The timing simulator executes only user-level instructions, performing a detailed timing simulation of an aggressive 4way dyn...

Dynamic Thermal Management for High-Performance Microprocessors

by David Brooks, Margaret Martonosi - In Proceedings of the 7th IEEE Symposium on High-Performance Computer Architecture , 2001
"... With the increasing clock rate and transistor count of today’s microprocessors, power dissipation is becoming a critical component of system design complexity. Thermal and power-delivery issues are becoming especially critical for high-performance computing systems. In this work, we investigate dyna ..."
Abstract - Cited by 333 (5 self) - Add to MetaCart
With the increasing clock rate and transistor count of today’s microprocessors, power dissipation is becoming a critical component of system design complexity. Thermal and power-delivery issues are becoming especially critical for high-performance computing systems. In this work, we investigate dynamic thermal management as a technique to control CPUpower dissipation. With the increasing usage of clock gating techniques, the average power dissipation typically seen by common applications is becoming much less than the chip’s rated maximum power dissipation. However; system designers still must design thermal heat sinks to withstand the worst-case scenario. We define and investigate the major components of any dynamic thermal management scheme. Specijcally we explore the tradeoffs between several mechanisms for responding to periods of thermal trauma and we consider the effects of hardware and sofnyare implementations. With appropriate dynamic thermal management, the CPU can be designed for a much lower maximum power rating, with minimal performance impact for typical applications. 1
(Show Context)

Citation Context

... Simulated Processor 3 Methodology We have developed an architectural-level power modeling tool called Wattch [5]. Wattch provides power modeling extensions to the SimpleScalar architecture simulator =-=[7]-=-. Wattch’s power modeling infrastructure is based on parameterized power models of common structures present in modern superscalar microprocessors. Per-cycle power estimates are generated by scaling t...

Clock Rate versus IPC: The End of the Road for Conventional Microarchitectures

by Vikas Agarwal, M.S. Hrishikesh, Stephen W. Keckler, Doug Burger , 2000
"... The doubling of microprocessor performance every three years has been the result of two factors: more transistors per chip and superlinear scaling of the processor clock with technology generation. Our results show that, due to both diminishing improvements in clock rates and poor wire scaling as se ..."
Abstract - Cited by 324 (23 self) - Add to MetaCart
The doubling of microprocessor performance every three years has been the result of two factors: more transistors per chip and superlinear scaling of the processor clock with technology generation. Our results show that, due to both diminishing improvements in clock rates and poor wire scaling as semiconductor devices shrink, the achievable performance growth of conventional microarchitectures will slow substantially. In this paper, we describe technology-driven models for wire capacitance, wire delay, and microarchitectural component delay. Using the results of these models, we measure the simulated performance---estimating both clock rate and IPC--- of an aggressive out-of-order microarchitecture as it is scaled from a 250nm technology to a 35nm technology. We perform this analysis for three clock scaling targets and two microarchitecture scaling strategies: pipeline scaling and capacity scaling. We find that no scaling strategy permits annual performance improvements of better than 12.5%, which is far worse than the annual 50-60% to which we have grown accustomed. 1

Selective Cache Ways: On-Demand Cache Resource Allocation

by David H. Albonesi , 2000
"... Increasing levels of microprocessor power dissipation call for new approaches at the architectural level that save energy by better matching of on-chip resources to application requirements. Selective cache ways provides the ability to disable a subset of the ways in a set associative cache durin ..."
Abstract - Cited by 321 (8 self) - Add to MetaCart
Increasing levels of microprocessor power dissipation call for new approaches at the architectural level that save energy by better matching of on-chip resources to application requirements. Selective cache ways provides the ability to disable a subset of the ways in a set associative cache during periods of modest cache activity, while the full cache may remain operational for more cache-intensive periods. Because this approach leverages the subarray partitioning that is already present for performance reasons, only minor changes to a conventional cache are required, and therefore, full-speed cache operation can be maintained. Furthermore, the tradeoff between performance and energy is flexible, and can be dynamically tailored to meet changing application and machine environmental conditions. We show that trading off a small performance degradation for energy savings can produce a significant reduction in cache energy dissipation using this approach. 1. Introduction Contin...

Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications

by Tim Sherwood, Erez Perelman, Brad Calder , 2001
"... Modern architecture research relies heavily on detailed pipeline simulation. Simulating the full execution of an industry standard benchmark can take weeks to months to complete. To overcome this problem researchers choose a very small portion of a program's execution to evaluate their results, ..."
Abstract - Cited by 315 (31 self) - Add to MetaCart
Modern architecture research relies heavily on detailed pipeline simulation. Simulating the full execution of an industry standard benchmark can take weeks to months to complete. To overcome this problem researchers choose a very small portion of a program's execution to evaluate their results, rather than simulating the entire program. In this paper we propose Basic Block Distribution Analysis as an automated approach for finding these small portions of the program to simulate that are representative of the entire program's execution. This approach is based upon using profiles of a program's code structure (basic blocks) to uniquely identify different phases of execution in the program. We show that the periodicity of the basic block frequency profile reflects the periodicity of detailed simulation across several different architectural metrics (e.g., IPC, branch miss rate, cache miss rate, value misprediction, address misprediction, and reorder buffer occupancy). Since basic block frequencies can be collected using very fast profiling tools, our approach provides a practical technique for finding the periodicity and simulation points in applications.

Pipeline gating: speculation control for energy reduction

by Srilatha Manne - In Proceedings of the 25th Annual International Symposium on Computer Architecture , 1998
"... Branch prediction has enabled microprocessors to increase instruction level parallelism (ILP) by allowing programs to speculatively execute beyond control boundaries. Although speculative execution is essential for increasing the instructions per cycle (IPC), it does come at a cost. A large amount o ..."
Abstract - Cited by 288 (3 self) - Add to MetaCart
Branch prediction has enabled microprocessors to increase instruction level parallelism (ILP) by allowing programs to speculatively execute beyond control boundaries. Although speculative execution is essential for increasing the instructions per cycle (IPC), it does come at a cost. A large amount of unnecessary work results from wrong-path instructions entering the pipeline due to branch misprediction. Results generated with the SimpleScalar tool set using a 4-way issue pipeline and various branch predictors show an instruction overhead of 16 % to 105 % for every instruction committed. The instruction overhead will increase in the future as processors use more aggressive speculation and wider issue widths [9]. In this paper, we present an innovative method for power reduction which, unlike previous work that sacrificed flexibility or performance, reduces power in high-performance microprocessors without impacting performance. In particular, we introduce a hardware mechanism called pipeline gating to control rampant speculation in the pipeline. We present inexpensive mechanisms for determining when a branch is likely to mispredict, and for stopping wrong-path instructions from entering the pipeline. Results show up to a 38 % reduction in wrong-path instructions with a negligible performance loss (   ¢¡¤ £). Best of all, even in programs with a high branch prediction accuracy, performance does not noticeably degrade. Our analysis indicates that there is little risk in implementing this method in existing processors since it does not impact performance and can benefit energy reduction. 1
(Show Context)

Citation Context

...es in the pipeline, this "boosting" improves our gating decision. 3 Empirical Evaluation of Pipeline Gating To properly understand the effects of stalling the pipeline, we used the SimpleSca=-=lar tools [2]-=- to develop a pipeline model of an outof -order, speculative, wide-issue processor. We modified the simoutordersprocessor model to produce the machine configuration listed in Tables 1 and 2. Table 3 s...

Secure Program Execution via Dynamic Information Flow Tracking

by G. Edward Suh, Jaewook Lee, Srinivas Devadas , 2004
"... Dynamic information flow tracking is a hardware mechanism to protect programs against malicious attacks by identifying spurious information flows and restricting the usage of spurious information. Every security attack to take control of a program needs to transfer the program’s control to malevolen ..."
Abstract - Cited by 271 (3 self) - Add to MetaCart
Dynamic information flow tracking is a hardware mechanism to protect programs against malicious attacks by identifying spurious information flows and restricting the usage of spurious information. Every security attack to take control of a program needs to transfer the program’s control to malevolent code. In our approach, the operating system identifies a set of input channels as spurious, and the processor tracks all information flows from those inputs. A broad range of attacks are effectively defeated by disallowing the spurious data to be used as instructions or jump target addresses. We describe two different security policies that track differing sets of dependencies. Implementing the first policy only incurs, on average, a memory overhead of 0.26 % and a performance degradation of 0.02%. This policy does not require any modification of executables. The stronger policy incurs, on average, a memory overhead of 4.5 % and a performance degradation of 0.8%, and requires binary annotation. 1
(Show Context)

Citation Context

...mory. The following example is constructed based on Newsham’s document on format string attacks [11]. int main(int argc, char **argv) { char buf[100]; } if (argc != 2) exit(1); snprintf(buf, 100, argv=-=[1]-=-); buf[sizeof buf - 1] = 0; printf(‘‘buffer: %s\n’’, buf); return 0; The general purpose of this example is quite simple: print out a value passed on the command line. Note that thescode is written ca...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University