Results 1 - 10
of
68
Control-Theoretic Techniques and Thermal-RC Modeling for Accurate and Localized Dynamic Thermal Management
, 2001
"... This paper proposes the use of formal feedback control theory as a way to implement adaptive techniques in the processor architecture. Dynamic thermal management (DTM) is used as a test vehicle, and variations of a PID controller (Proportional-Integral-Differential) are developed and tested for adap ..."
Abstract
-
Cited by 169 (16 self)
- Add to MetaCart
(Show Context)
This paper proposes the use of formal feedback control theory as a way to implement adaptive techniques in the processor architecture. Dynamic thermal management (DTM) is used as a test vehicle, and variations of a PID controller (Proportional-Integral-Differential) are developed and tested for adaptive control of fetch "toggling." To accurately test the DTM mechanism being proposed, this paper also develops a thermal model based on lumped thermal resistances and thermal capacitances. This model is computationally efficient and tracks temperature at the granularity of individual functional blocks within the processor. Because localized heating occurs much faster than chip-wide heating, some parts of the processor are more likely to be "hot spots" than others.
Best of both latency and throughput
- Proc. IEEE Conf. on Computer Design (ICCD
, 2004
"... Abstract ..."
(Show Context)
Power-Aware Control Speculation through Selective Throttling
- IN PROCEEDINGS OF THE NINTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE
, 2003
"... ..."
(Show Context)
Understanding and Improving Operating System Effects in Control Flow Prediction
, 2002
"... Many modern applications exercise the operating system kernel significantly, resulting in several implications including affecting the control flow transfer in the execution environment. This paper focuses on understanding the operating system effects on control flow transfer and prediction, and des ..."
Abstract
-
Cited by 27 (5 self)
- Add to MetaCart
Many modern applications exercise the operating system kernel significantly, resulting in several implications including affecting the control flow transfer in the execution environment. This paper focuses on understanding the operating system effects on control flow transfer and prediction, and designing architectural support to alleviate the bottlenecks.
Applying Decay Strategies to Branch Predictors for Leakage Energy Savings
, 2002
"... With technology advancing toward deep submicron, leakage energy is of increasing concern, especially for large onchip array structures such as caches and branch predictors. Recent work has suggested that even larger branch predictors can and should be used in order to improve microprocessor performa ..."
Abstract
-
Cited by 24 (8 self)
- Add to MetaCart
With technology advancing toward deep submicron, leakage energy is of increasing concern, especially for large onchip array structures such as caches and branch predictors. Recent work has suggested that even larger branch predictors can and should be used in order to improve microprocessor performance. A further consideration is that the branch predictor is a thermal hot spot, thus further increasing its leakage. For these reasons, it is natural to consider applying decay techniques---already shown to reduce leakage energy for caches---to branch-prediction structures.
Decomposable and responsive power models for multicore processors using performance counters
- in ICS’10
, 2010
"... Abstract—Power modeling based on performance monitoring counters (PMCs) has attracted the interest of many researchers since it become a quick approach to understand and analyse power behavior on real systems. Moreover, several power aware policies use power models to guide their decisions and to tr ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
(Show Context)
Abstract—Power modeling based on performance monitoring counters (PMCs) has attracted the interest of many researchers since it become a quick approach to understand and analyse power behavior on real systems. Moreover, several power aware policies use power models to guide their decisions and to trigger low-level mechanisms-e.g. manage processor frequency-. Hence, the information, the accuracy and the capacity for detecting power phases that a model provides is critical to increase the power-aware research chances and to improve the success of power savings techniques based on such models. In addition, the design of current processors have varied considerably with the inclusion of multiple cores with some resources shared on a single die. As a result, PMC-based power models warrant further investigation on current energy-efficient multicore processors. In this paper, we present a methodology to produce decomposable PMC-based power models on current multicore architectures. Besides from being able to estimate accurately the power consumption, the models provide per component power consumption, supplying extra information about power behavior. Moreover, we analyse and validate their responsiveness –the capacity to detect power phases–. We produce a set of power models for an Intel R○ Core TM 2 Duo, modeling one or two cores for a wide set of DVFS configurations. The models are empirically validated using the SPECcpu2006 and compared to other models built using existing approaches. Overall, we demonstrate that the proposed methodology produces more accurate and responsive power models, showing error ranges between [1.89-6] % and almost 100 % accuracy in detecting phase variations above 0.5 watts. I.
Power-aware branch prediction: Characterization and design,”
- IEEE Transactions on Computers,
, 2004
"... ..."
(Show Context)
MisSPECulation: Partial and misleading use of SPEC CPU2000 in computer architecture conferences
- Computer Architecture, 2003. Proceedings. 30th Annual International Symposium on no. SN
, 2003
"... A majority of the papers published in leading computer architecture conferences use SPEC CPU2000, or its predecessor SPEC CPU95, which has become the de facto standard for measuring processor and/or memory-hierarchy performance. However, in most cases a subset of the suite’s benchmarks are simulated ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
(Show Context)
A majority of the papers published in leading computer architecture conferences use SPEC CPU2000, or its predecessor SPEC CPU95, which has become the de facto standard for measuring processor and/or memory-hierarchy performance. However, in most cases a subset of the suite’s benchmarks are simulated. For example: 27 papers were published in ISCA 2002, 16 used SPEC CINT2000, 4 used the whole suite, and only 3 papers explained their omissions. This paper quantifies the extent of this phenomenon in the ISCA, Micro, and HPCA conferences: 173 papers were surveyed, 115 used benchmarks from SPEC CINT, but only 23 used the whole suite. If this current trend continues, by the year 2005 80 % of the papers will use the full CINT2000 suite, a year after CPU2004 shall be announced. We claim that results based upon a subset of a benchmark suite are speculative and conflict with Amdahl’s Law. The law implies that we must present the speedup of using the proposed technique on the whole suite. Projecting the law (by statistically supplying values for the missing benchmarks) to several published papers reduces promising results to average ones. Speedups are reduced from 1.42 to 1.16 in one case, from 1.43 to 1.13 in another, and from 1.76 to 1.15 in a third. Finally, we have found that the disregard for CFP2000 is unwarranted in papers that explore the data cache domain, the suite displays a higher data cache miss rate than CINT2000, which is used more frequently. 1.
Microprocessor Pipeline Energy Analysis
, 2003
"... The increase in high-performance microprocessor power consumption is due in part to the large power overhead of wideissue, highly speculative cores. Microarchitectural speculation, such as branch prediction, increases instruction throughput but carries a power burden due to wasted power for mis-spe ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
(Show Context)
The increase in high-performance microprocessor power consumption is due in part to the large power overhead of wideissue, highly speculative cores. Microarchitectural speculation, such as branch prediction, increases instruction throughput but carries a power burden due to wasted power for mis-speculated instructions. Pipeline over-provisioning supplies excess resources which often go unused. In this paper, we use our detailed performance and power model for an Alpha 21264 to measure both the useful energy and the wasted effort due to mis-speculation and over-provisioning. Our experiments show that flushed instructions account for approximately 6 % of total energy, while over-provisioning imposes a tax of 17 % on average. These results suggest opportunities for power savings and energy efficiency throughout microprocessor pipelines.
Branch predictor prediction: A power-aware branch predictor for high-performance processors
- In Proceedings of the 2002 International Conference on Computer Design
, 2002
"... We introduce Branch Predictor Prediction (BPP) as a power-aware branch prediction technique for high performance processors. Our predictor reduces branch prediction power dissipation by selectively turning on and off two of the three tables used in the combined branch predictor. BPP relies on a smal ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
(Show Context)
We introduce Branch Predictor Prediction (BPP) as a power-aware branch prediction technique for high performance processors. Our predictor reduces branch prediction power dissipation by selectively turning on and off two of the three tables used in the combined branch predictor. BPP relies on a small buffer that stores the addresses and the sub-predictors used by the most recent branches executed. Later we refer to this buffer to decide if any of the sub-predictors and the selector could be gated without harming performance. In this work we study power and performance trade-offs for a subset of SPEC 2k benchmarks. We show that on the average and for an 8-way processor, BPP can reduce branch prediction power dissipation by 28 % and 14 % compared to non-banked and banked 32k predictors respectively. This comes with a negligible impact on performance (1 % max). We show that BPP always reduces power even for smaller predictors and that it offers better overall power and performance compared to simpler predictors. 1.