Results 11 - 20
of
135
Energy behavior of Java applications from the memory perspective
- in Usenix Java Virtual Machine Research and Technology Symposium (JVM’01
, 2001
"... Permission is granted for noncommercial reproduction of the work for educational or research purposes. ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
Permission is granted for noncommercial reproduction of the work for educational or research purposes.
Performance and Power Effectiveness in Embedded Processors -- Customizable Partitioned Caches
, 2001
"... This paper explores an application-specific customization technique for the data cache, one of the foremost area/power consuming and performance determining microarchitectural features of modern embedded processors. The automated methodology for customizing the processor microarchitecture that we p ..."
Abstract
-
Cited by 18 (6 self)
- Add to MetaCart
This paper explores an application-specific customization technique for the data cache, one of the foremost area/power consuming and performance determining microarchitectural features of modern embedded processors. The automated methodology for customizing the processor microarchitecture that we propose results in increased performance, reduced power consumption and improved determinism of critical system parts while the fixed design ensures processor standardization. The resulting improvements help to enlarge the significant role of embedded processors in modern hardware–software codesign techniques by leading to increased processor utilization and reduced hardware cost. A novel methodology for static analysis and a microarchitecturally field-reprogrammable implementation of a customizable cache controller that implements a partitioned cache structure is proposed. Partitioning the load/store instructions eliminates cache interference; hence, precise knowledge about the hit/miss behavior of the references within each partition becomes available, resulting in significant reduction in tag reads and comparisons. Moreover, eliminating cache interference naturally leads to a significant reduction in the miss rate. The paper presents an algorithm for defining cache partitions, hardware support for customizable cache partitions, and a set of experimental results. The experimental results indicate significant improvements in both power consumption and miss rate.
Power-aware branch prediction: Characterization and design,”
- IEEE Transactions on Computers,
, 2004
"... ..."
(Show Context)
Profile-Based Energy Reduction for High-Performance Processors
- 4TH. ACM WORKSHOP ON FEEDBACK DIRECTED AND DYNAMIC OPTIMIZATION (FDDO-4)
"... To reduce the energy consumption of modern processors, designers have proposed many energy-saving techniques. In many cases, these techniques are dynamically activated and deactivated. In systems that employ these techniques, to adapt to changes in application behavior, profiling can help determine ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
To reduce the energy consumption of modern processors, designers have proposed many energy-saving techniques. In many cases, these techniques are dynamically activated and deactivated. In systems that employ these techniques, to adapt to changes in application behavior, profiling can help determine how to manage the activation of techniques to improve a certain metric. In this paper
Dynamic Allocation Of Datapath Resources For Low Power
- in Proc. of Workshop on Complexity–Effective Design, held in conjunction with ISCA–28
, 2001
"... We show by profiling the execution of SPEC95 benchmarks that the usage of datapath resources in a modern superscalar processor is highly dynamic and correlated. The one-sizefits all philosophy used for permanently allocating datapath resources in a modern superscalar CPU is thus complexityineffectiv ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
We show by profiling the execution of SPEC95 benchmarks that the usage of datapath resources in a modern superscalar processor is highly dynamic and correlated. The one-sizefits all philosophy used for permanently allocating datapath resources in a modern superscalar CPU is thus complexityineffective due to the overcommittment of resources in general. We propose a strategy to dynamically and simultaneously adjust the sizes of two such correlated resources - the dispatch buffer (also known as an issue queue) and the reorder buffer - to reduce power dissipation in the datapath without significant impact on the performance. We also show how the resizing technique can be augmented with dynamic adaptation of dispatch rate. Representative results show reduction in power dissipation of 69% for the dispatch buffer and of 52%for the reorder buffer with an average IPC loss below 8.5%.
Partitioned Instruction Cache Architecture for Energy Efficiency
- ACM Transactions on Embedded Computing Systems
, 2003
"... this paper studies energy-e#cient cache architectures in the memory hierarchy that can have a signi#cant impact on the overall system energy consumption ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
(Show Context)
this paper studies energy-e#cient cache architectures in the memory hierarchy that can have a signi#cant impact on the overall system energy consumption
Low-Cost Embedded Program Loop Caching - Revisited
- University of Michigan
, 1999
"... Many portable and embedded applications are characterized by spending a large fraction of their execution time on small program loops. In these applications, instruction fetch energy can be reduced by using a small instruction cache when executing these tight loops. Recent work has shown that it ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
(Show Context)
Many portable and embedded applications are characterized by spending a large fraction of their execution time on small program loops. In these applications, instruction fetch energy can be reduced by using a small instruction cache when executing these tight loops. Recent work has shown that it is possible to use a small instruction cache without incurring any performance penality [4, 6]. In this paper, we will extend the work done in [6]. In the modified loop caching scheme proposed in this paper, when a program loop is larger than the loop cache size, the loop cache is capable of capturing only part of the program loop without having any cache conflict problem. For a given loop cache size, our loop caching scheme can reduce instruction fetch energy more than other loop cache schemes previously proposed. We will present some quantitative results on how much power can be saved on an integrated embedded design using this scheme. + + December 18th, 1999 2 1 Introduction ...
Power Reduction through Work Reuse
- In Int’l Symp. on Low Power Electronics and Design
, 2001
"... Power consumption has become one of the big challenges in designing high performance processors. The rapid increase in complexity and speed that comes with each new CPU generation causes greater problems with power consumption and heat dissipation. Traditionally, these concerns are addressed through ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
(Show Context)
Power consumption has become one of the big challenges in designing high performance processors. The rapid increase in complexity and speed that comes with each new CPU generation causes greater problems with power consumption and heat dissipation. Traditionally, these concerns are addressed through semiconductor technology improvements such as voltage reduction and technology scaling. This work proposes an alternative solution to this problem, by dealing with the power consumption in the very early stage of the microarchitecture design. More precisely, we show that by modifying the wellestablished out-of-order, superscalar processor architecture, significant gains can be achieved in terms of power requirements without performance penalty. Our proposed approach relies on reusing as much as possible from the work done by the front-end of a typical pipelined, superscalar out-of-order via the use of a cache nested deeply into the processor structure. Experimental results show up to 52% (20% on average) savings in average energy per committed instruction for two different pipeline structures.
Microarchitectural Power Modeling Techniques for Deep Sub-Micron Microprocessors
, 2004
"... The need to perform early design studies that combine architectural simulation with power estimation has become critical as power has become a design constraint whose importance has moved to the fore. To satisfy this demand several microarchitectural power simulators have been developed around Simpl ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
The need to perform early design studies that combine architectural simulation with power estimation has become critical as power has become a design constraint whose importance has moved to the fore. To satisfy this demand several microarchitectural power simulators have been developed around SimpleScalar, a widely used microarchitectural performance simulator. They have proven to be very useful at providing insights into power/performance trade-offs. However, they are neither parameterized nor technology scalable. In this paper, we propose more accurate parameterized power modeling techniques reflecting the actual technology parameters as well as input switching-events for memory and execution units. Compared to HSPICE, the proposed techniques show 93 % and 91 % accuracies for those blocks, but with a much faster simulation time. We also propose a more realistic power modeling technique for external I/O. In general, our approach includes more detailed microarchitectural and circuit modeling than has been the case in earlier simulators, without incurring a significant simulation time overhead—it can be as small as a few percent.
Fine-Grain CAM-Tag Cache Resizing Using Miss Tags
"... A new dynamic cache resizing scheme for low-power CAMtag caches is introduced. A control algorithm that is only activated on cache misses uses a duplicate set of tags, the miss tags, to minimize active cache size while sustaining close to the same hit rate as a full size cache. The cache partitionin ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
(Show Context)
A new dynamic cache resizing scheme for low-power CAMtag caches is introduced. A control algorithm that is only activated on cache misses uses a duplicate set of tags, the miss tags, to minimize active cache size while sustaining close to the same hit rate as a full size cache. The cache partitioning mechanism saves both switching and leakage energy in unused partitions with little impact on cycle time. Simulation results show that the scheme saves 28--56% of data cache energy and 34--49% of instruction cache energy with minimal performance impact.