| Daniel Folegnani and Antonio Gonzlez, "Energy Effective Issue Logic", Proceedings of 28th Annual of International Symposium on Computer Architecture, 2001. Page(s): 230-239, Gteborg Sweden. |
....generations, leakage may constitute as much as 50 of total power dissipation. Recently, a great deal of research work in the architecture community has focused on reducing leakage power in the caches [11, 13, 14, 18, 22, 27, 28] branch predictor [15, 16] register file [2] issue queues [12, 21, 8, 9], and the ALUs [10] Leakage control at the architecture level is attractive, because architectural techniques can control large groups of circuits (e.g. cache lines, banks, or the entire cache) at once. Yet most of these studies use only abstract models of leakage that do not fully account for ....
D. Folegnani and A. Gonzalez. Energy-effective issue logic. In Proceedings of the 28th Annual International Symposium on Computer Architecture, pages 248--59, June. 2001.
....generations, leakage may constitute as much as 50 of total power dissipation. Recently, a great deal of research work in the architecture community has focused on reducing leakage power in the caches [11, 13, 14, 18, 22, 27, 28] branch predictor [15, 16] register file [2] issue queues [12, 21, 8, 9], and the ALUs [10] Leakage control at the architecture level is attractive, because architectural techniques can control large groups of circuits (e.g. cache lines, banks, or the entire cache) at once. Yet most of these studies use only abstract models of leakage that do not fully account for ....
D. Folegnani and A. Gonzalez. Energy-effective issue logic. In Proceedings of the 28th Annual International Symposium on Computer Architecture, pages 248--59, June. 2001.
.... power may be well in excess of 50 , so higher power savings can be can be achieved by the use of new comparators) The extent of the total issue queue energy power savings also depends on the optimizations that could already be implemented within the issue queue, for example, the ones suggested in [3] and [8] 6. Concluding Remarks Traditional comparators used in several datapath artifacts of a modern processor are notoriously energy inefficient as they dissipate energy on a mismatch in one or more bit positions. In scenarios, where matches occur relatively rarely, alternative comparator ....
....1101 1111 match combinations example of a datapath artifact where mismatches significantly outweigh the full matches. An additional challenge in the design of the issue queues for superscalar CPUs has to do with the delay of the tag matching and steering logic, which sits on the critical path [3, 4, 6]. We introduced the designs of two comparators that dissipate energy primarily on a full match, including a design (the PLSSC) that has a lower overall response time than the traditional design. Assuming a pipeline cycle time of 500 ps, the traditional comparators of Figure 1 leave about 380 ps ....
Folegnani, D. and Gonzalez, A., "Energy--Effective Issue Logic," in Proc. ISCA, 2001, pp. 230--239.
....power dissipation occurs during associative lookup of ROB entries, in the course of ROB writes for setting up new entries or generating result values, and in the course of reading out data from the ROB during operand reads or commits. For example, a recent study by Folegnani and Gonzalez [6] estimated that about 27 of the total power expended within a Pentium III like microprocessor is dissipated in the ROB. In this paper, we propose a considerably simplified ROB architecture, which exploits the fact that in a typical superscalar processor the bulk of the source operand values is ....
....superscalar CPUs, the ROB area can be a significant fraction of the overall die area. For example, the reorder buffer ( instruction queue ) on the PA 8000 occupies about 15 of the total chip area [7] Such complex ROB also represents a major source of total chip power dissipation as high as 27 [6]. Finally, multiple cycles are often needed to read the data values from such large multi ported structure [4, 5] In the baseline datapath of Figure 1, we assumed that two cycles are needed to access the ROB. Thus, three D stages are used: stage D1 is used for decoding and register renaming ....
Folegnani, D., Gonzalez, A., "Energy--Effective Issue Logic", in Proceedings of Int'l. Symp. on Computer Architecture, July 2001.
....energy. Some other work also proposes to dynamically adjust hardware resources to reduce energy consumption while still meeting application demand. Among other proposals, Albonesi et al. adjust the cache configuration [1, 3] Folegnani and Gonzalez disable empty instruction window entries [6], Bahar and Manne shut down functional units [2] The concept of these approaches is similar, but the issue of on demand branch prediction is a bit more tricky. While any adaptation that results in performance degradation runs the risk of increasing energy consumption (due to fixed energy overhead ....
D. Folegnani and A. Gonzalez. Energy-Effective Issue Logic. In International Symposium on Computer Architecture, pages 230--239, Goteberg, Sweden, June--July 2001.
....consumption of functional units, exploiting the fact that the sizes of operands are often less than the size of the available functional units. In [5] Pipeline Balancing dynamically tunes the resources of a general purpose processor to the needs of the application by monitoring performance. In [11], energy consumption of the issue logic is reduced by dynamically re sizing the instruction queue and disabling the wake up of ready operands. In [25] critical path prediction is used to separate high speed functional units dedicated to critical instructions from low power functional units ....
D. Folegnani and A. Gonzez. Energy-Effective Issue Logic . Proc. of the lnt. Symp. on Comp. Architec., 2001.
....the paper. 1.1 Multi configuration units With power becoming a critical consideration for general purpose microprocessors, several microarchitectural techniques have been proposed for reducing power consumption. There have been proposals for configurable caches and TLBs [2 4] issue windows [5, 6] and pipelines [7] Almost all of these techniques focus on adapting the shape and size of the hardware to match the program requirements. For example, caches can be configured such that they are big enough to just fit the program s working set. If the working set is small, a large part of the ....
D. Folegnani and A. Gonzlez, "Energy Effective Issue Logic," Proc. of the 28 Intl. Sym. on Computer Architecture, June 2001, pp. 230-239.
....the optimal envelope for energy delay product; but their approach is not adaptive. 13] presents a framework for complexity adaptive processors but does not specify the mechanism for adaptive behavior. In [14] a similar adaptive scheme is presented, but the approach is very coarse grained, and [15] has a dynamic approach but they look at resizing only the issue queue. The work presented in [16] uses instructions per cycle (IPC) values obtained from profiling to characterize different portions of the code and uses a fixed window of instructions whose execution is monitored in order to reduce ....
D. Folegnani and A. Gonzalez, "Energy-effective issue logic," in Proc. Int. Symp. Comput. Architecture (ISCA), Goteborg, Sweden, May 2001, pp. 230--239.
....and so we do not consider rename stage for clock gating. The issue stage consists of the issue queue, which uses an associative array and a wakeup select combinational logic. There are many papers on reducing the issue queue power. 1] clock gates the issue queue using its predictive scheme. [6] proposes a scheme in which issue queue entries that are either deterministically determined to be empty, or deterministically known to be already woken up, are essentially clock gated. Because [6] already presents a deterministic method to clock gate the issue queue, we do not explore applying ....
....on reducing the issue queue power. 1] clock gates the issue queue using its predictive scheme. 6] proposes a scheme in which issue queue entries that are either deterministically determined to be empty, or deterministically known to be already woken up, are essentially clock gated. Because [6] already presents a deterministic method to clock gate the issue queue, we do not explore applying DCG to the issue queue. Register read stage consists of a register file implemented using an array. However, only at the very end of issue, we know how many instructions are selected Figure 4. ....
[Article contains additional citation context not shown here]
D. Folegnani and A. Gonzalez, "Energy-effective issue logic", In Proc .of 28 Int'l Symp. on Computer Architecture (ISCA), Jul. 2001, pp. 230-239.
....studied the most is the low power domain this is why we focus the analysis in this paper on this area. In this case, researchers have proposed various architectural Low Power Techniques (LPTs) that allow general purpose processors to save energy, typically at the expense of performance (e.g. [1, 2, 4, 8, 17, 19]) Examples of such LPTs are cache reconfiguration and issue width changes. By activating these LPTs dynamically, processors can be more effective. Some of the more advanced proposals for adaptive processors combine several LPTs [7, 12, 13, 15, 21] # This work was supported in part by the ....
....to adapt the hardware and what specific LPTs to activate. These decisions are usually based on testing a few different configurations of the LPTs and identifying which ones are best, and when. Nearly all existing proposals for adaptive systems follow what we call a Temporal approach to adaptation [1, 2, 4, 6, 7, 8, 9, 12, 15, 19, 21]. In this case, both the testing (or exploration) for the best configuration and the application of the chosen configuration are tied to successive intervals in time. Specifically, to identify the best configuration, the available configurations are typically tested back to back one after another. ....
[Article contains additional citation context not shown here]
D. Folegnani and A. Gonz alez. Energy-Effective Issue Logic. In International Symposium on Computer Architecture, pages 230--239, May 2001.
....requires driving a register tag through long wires and comparing it to all the source register tags in the instruction window. Consequently, it can consume substantial energy. To reduce the energy consumed by instruction wakeup, Folegnani and Gonzalez have recently proposed several optimizations [5]. Specifically, window entries that are either empty or contain instructions that already have all their source operands available, are not compared against the broadcasted register number. Furthermore, the size of the instruction window is dynamically adjusted to reduce empty area. According to ....
....Specifically, window entries that are either empty or contain instructions that already have all their source operands available, are not compared against the broadcasted register number. Furthermore, the size of the instruction window is dynamically adjusted to reduce empty area. According to [5], by disabling comparisons to empty and ready entries in a 128 entry window, we decrease the number of comparisons performed by the average completing instruction to only 14.2. While this is a very significant reduction, our data indicates that we can further reduce the number of comparisons per ....
[Article contains additional citation context not shown here]
D. Folegnani and A. Gonzalez. Energy-Effective Issue Logic. In International Symposium on Computer Architecture, pages 230--239, May 2001.
....using histogramming to record the occupancy. We feel the histogram is more robust relative to the average occupancy metric because the histogram reveals the tails of the occupancy distribution. This nuance is most significant when the partitioning is at a fine granularity. Folegnani and Gonzalez [12] study resizing the issue queue and similarly use the system IPC to detect if resizing is needed. Powell et al. 18] use selective direct mapping to reduce energy on accesses to set associative caches. The method in [18] accesses the tags in full on the primary access, but only reads data from ....
Daniele Folegnani and Antonio Gonzalez. Energy effective issue logic. In 28th International Symposium on Computer Architecture, 2001.
....even straight remaps ) the power (and especially the power density) limits could become a potential show stopper as transistors shrink and the frequency keeps increasing. Techniques like clock gating (e.g. 4, 5] and dynamic size adaptation of on chip resources like caches and queues (e.g. [6 10, 25, 27]) have been either used or proposed as methods for power management in future 1, 2, 3: A. Buyuktosunoglu, K. Das and T. Karkhanis were summer interns at IBM; they are currently back at Univs. of Rochester, Michigan and Wisconsin respectively. 4: Prof. J. E. Smith, Univ. of Wisconsin, was on a ....
....result snapshots, of the LPX prototype. 2. Background: Power Performance Data In an out of order, speculative super scalar design like each of the two cores in POWER4, a large percentage of the core power in the non cache execution engine is spent in the instruction window or issue queue unit [3, 8 10]. Figure l(a) shows the relative distribution of power across the major units within a single POWER4 core. Figure l(b) zooms in on the instruction sequencing unit that contains the various out of order issue queues and rename buffers. Figure 2 shows the power density across some of the major ....
D. Folegnani and A. Gonzalez, "Energy-effective issue logic," Proc. 1SCA-01, pp. 230-239, June 2001.
.... instance, the integer queue on the Alpha 21264 is the highest power consumer on the chip [11] Similarly, the issue queue is one of the highest power density regions within a POWER4 class processor core [1] For this reason, several techniques for reducing the issue queue power have been proposed [2, 3, 5]. However, these prior efforts have exclusively focused on approaches that require considerable re design and verification effort as well as design risk. What has been thus far lacking is a quantitative comparison of a range of issue queue power optimization techniques that vary in their design ....
D. Folegnani, A. Gonzalez. Energy-Effective Issue Logic. ISCA-28, 2001.
....does not impair performance of a 4 cluster processor architecture. Section 6 reviews previous related works to optimize physical register files (e.g. virtualphysical registers [13] register caches [4, 1] read write port arbitration [1] or to reduce critical path [18, 2] or power consumption [5, 8] on wake up and selection logic. On WSRS architectures, all these proposals can be applied at cluster level. Finally, Section 7 summarizes this study. 2 Register Write Specialization 2.1 Register Write Specialization principle Figure 2 illustrates register write specialization with a 4 cluster ....
....to optimistically select any instruction that is fireable, removing selection logic from the critical path. Critical path as well as power consumption in the wake up logic is addressed by Ernst and Austin [5] through eliminating tag comparison for one of the operands. Folegnani and Gonzales [8] proposed to selectively disable part of the comparators in the wake up logic. We would like to point out that all these techniques [13, 4, 1, 18, 2, 5, 8] are orthogonal with WSRS and can be applied at cluster level to WSRS architectures. 7 Conclusion Scaling the current superscalar designs ....
[Article contains additional citation context not shown here]
D. Folegnani and A. Gonzalez. Energy-effective issue logic. In Proceedings of the 28th Annual International Symposium on Computer Architecture, June 30--July 4, 2001.
....components in a microprocessor design. This is largely due to the fact that they are operating in every cycle, and by their nature cover a large set of data being queried to operate on a single element. Power aware microarchitecture research will eventually address these issues ( 1] 2] [3]) but not without introducing additional design complexity if the overall architectural approach remains unchanged. We concluded that while exploiting instructionlevel parallelism was desirable, extracting it should not come at the expense of huge increases in design complexity. The solution we ....
D. Folegnani and A. Gonzalez. Energyeffective issue logic. In Proc. of the 28th International Symposium on Computer Architecture, pages 230--239, June 2001.
....the issue queue had fewer active modules; an asynchronous synchronous interface was thus made an integral part of the queue to exploit this feature; it was left un clear how the remaining components of the dampath could be clocked at a higher rate to take advantage of a faster issue queue. In [FG 01] Folegnani and Gonzalez introduced a FIFO issue queue that permitted out of order issue but avoided the compaction of vacated entries within the valid region of the queue to save power. The queue was divided into regions and the number of instructions committed from the most recently allocated ....
Folegnani, D., Gonzalez, A., "Energy Effective Issue Logic", to appear in Int'l Symp. on Computer Architecture, 2001.
....scheduling [22] and register renaming [27] methods are studied to reduce Hamming distance of adjacent instruction words thus minimizing switching activities on the instruction cache data bus. There are a lot of ongoing research in micro architectural level power analysis [9, 3, 37] and reduction [8, 41, 20]. Synergy between compilation techniques and micro architectural level power reducing mechanisms is a must to achieve significant power saving. In particular, compiler researchers studied the interaction between program transformation and frequency voltage scaling [15, 5, 23] Exploiting schedule ....
Daniele Folegnani and Antonio Gonzalez. Energy-effective issue logic. In Proc. of the 28th Ann. Intl. Symp. on Computer Architecture, pages 230--239, Goteborg, Sweden, Jun. 30--Jul. 4,
....reaction to system adaptation. Also, the gran 1 For simplicity, we assume the voltage of the whole system is scaled, not just the processor. ularity of such sections has to be relatively coarse so that the transient state and or any adaptation overhead becomes very small. Many dynamic systems[3, 4, 7] constantly monitor the program and predict that the behavior in the near future is similar to the current one. This usually involves an observation interval of a fixed duration. In [25] it is shown that programs generally demonstrate periodic behavior, but the period is application specific. ....
D. Folegnani and A. Gonzales. Energy-Effective Issue Logic. In International Symposium on Computer Architecture, pages 230--239, 2001.
....correlations among the usage of these resources. Our studies show why a distributed, dynamic allocation of these resources is needed for reducing the power energy dissipation (Section 6) Dynamic resource allocations within a single datapath component (the IQ) for conserving power was studied in [9, 10, 12]. Specifically, in [9, 10] the authors explored the design of an adaptive issue queue, where the queue entries were grouped into independent modules. The number of modules allocated was varied dynamically to track the ILP; power savings was achieved by turning off unused modules. In [9, 10] the ....
....301 321 341 a) Behavior of fpppp benchmark simulation cycles (in millions) dispatch blocks because of the non availability of free entries within the IQ, the ROB or the LSQ. This information, directly indicating the increased resource demands, is then used to drive the upsizing decision. In [12], Folegnani and Gonzalez introduced a FIFO issue queue that permitted out of order issue but avoided the compaction of vacated entries within the valid region of the queue tosavepower. Thequeuewasdivided into regionsand the number of instructions committed from the most recently allocated issue ....
Folegnani, D., Gonzalez, A., "Energy--Effective Issue Logic", in Proc. of the Int'l Symposium on Computer Architecture, 2001, pp.230--239.
....of recent research being directed towards the reduction of the energy dissipation within these components. Of this large body of work, we mention only the contributions that focus on dynamic resource allocation. The dynamic allocation of a single datapath resource was studied in [BA 00, BAS 01, FG 01] In [BAS 01, FG 01] the authors explored the design of an adaptive issue queue, where the queue entries were grouped into independent modules. The number of modules allocated was varied dynamically to track the ILP; power savings with minimal impact on performance was achieved by turning off ....
....directed towards the reduction of the energy dissipation within these components. Of this large body of work, we mention only the contributions that focus on dynamic resource allocation. The dynamic allocation of a single datapath resource was studied in [BA 00, BAS 01, FG 01] In [BAS 01, FG 01] the authors explored the design of an adaptive issue queue, where the queue entries were grouped into independent modules. The number of modules allocated was varied dynamically to track the ILP; power savings with minimal impact on performance was achieved by turning off unused modules. In ....
[Article contains additional citation context not shown here]
Folegnani, D., Gonzalez, A., EnergyEffective Issue Logic, to appear in Intl Symp. on Computer Architecture, 2001.
No context found.
D. Folegnani and A. Gonzlez. Energy-Effective Issue Logic. In ISCA 2001.
....consumption of functional units, exploiting the fact that the sizes of operands are often less than the size of the available functional units. In [5] Pipeline Balancing dynamically tunes the resources of a general purpose processor to the needs of the application by monitoring performance. In [11], energy consumption of the issue logic is reduced by dynamically re sizing the instruction queue and disabling the wake up of ready operands. In [25] critical path prediction is used to separate high speed functional units dedicated to critical instructions from low power functional units ....
D. Folegnani and A. Gonzlez. "Energy-Effective Issue Logic". Proc. of the Int. Symp. on Comp. Architec., 2001.
No context found.
Daniel Folegnani and Antonio Gonzlez, "Energy Effective Issue Logic", Proceedings of 28th Annual of International Symposium on Computer Architecture, 2001. Page(s): 230-239, Gteborg Sweden.
No context found.
D. Folegnani and A. Gonzalez. Energy-effective issue logic. In 28th International Symposium on Computer Architecture, July 2001.
No context found.
D. Folegnani and A. Gonzalez. Energy-Effective Issue Logic. In Proceedings of the 28th Annual International Symposium on Computer Architecture, pages 230--239, July 2001.
No context found.
D. Folegnani and A. Gonzalez, "Energy-Effective Issue Logic", in Proc. 28th Ann. Int. Symp. Computer Architecture, 2001, pp. 230-239.
No context found.
Daniele Folegnani and Antonio Gonzalez. Energy-effective issue logic. In Proceedings of the 28th International Symposium on Computer Architecture, pages 230--239, June 2001.
No context found.
D. Folegnani and A. Gonzalez. Energy-effective issue logic. In Proceedings of the 28th Annual International Symposium on Computer Architecture, pages 248--59, June. 2001.
No context found.
D. Folegnani and A. Gonzlez, "Energy-effective issue logic," in Proc. Int. Symp. Computer Architecture, pp. 230--239, July 2001.
No context found.
Folegnani, D. and Gonzalez, A., "Energy--Effective Issue Logic," in Proc. ISCA, 2001, pp. 230--239.
No context found.
D. Folegnani and A. Gonzalez. Energy-Effective Issue Logic. In Proceedings of the 28th Annual International Symposium on Computer Architecture, pages 230--239, July 2001.
No context found.
D. Folegnani and A. Gonzalez. Energy-Effective Issue Logic. In Proceedings of the 28th Annual International Symposium on Computer Architecture, pages 230--239, July 2001.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC