| PANWAR,R.AND RENNELS,D. 1995. Reducing the frequency of tag compares for low power I-cache design. In SLPE,pp. 57--62. |
....checks, thereby saving the energy. Execution footprints are recorded into an extended BTB (Branch Target Bu#er) The rest of this paper is organized as follows. Section 2 shows the e#ect of tag comparison on the total cache energy. In addition, another technique to omit tag comparison proposed in [10] is explained as a comparative approach. Section 3 presents the concept and Manuscript received Department of Electonics and Computer Science, Fukuoka University, 8 19 1 Nanakuma, Jonan Ku, Fukuoka 814 0133 Japan. Department of Informatics, Kyushu University, 6 1 Kasuga koen, Kasuga, ....
....tag comparison on total cache energy bits and 64 bits, respectively. 3. Interline Tag Comparison Cache As explained in Section 2, it is important to consider the tag memory access energy for obtaining more energy reduction. A technique to reduce the frequency of tag comparison has been proposed [10]. When two instructions i and j are executed successively, we can consider the following cases: Intraline sequential flow: i and j reside in the same cache line, and their addresses are sequential. Intraline non sequential flow: i and j reside in the same cache line, and their addresses ....
[Article contains additional citation context not shown here]
R. Panwar and D. Rennels,"Reducing the frequency of tag compares for low power I-cache design," Proc. of the 1995.
....required by snoop induced accesses. When all access are considered, this HJ organization results in a 41 energy reduction on the average. 5 Related Work A number of previous studies have focused on architectural microarchitectural techniques to reduce energy dissipation in the memory hierarchy [1,3,10,13,14,18,19,26,32]. Most of these techniques directly target reducing power dissipation induced by processor memory accesses rather than snoopinduced accesses which are the focus of this paper. Many of the techniques propose using tiny energy efficient devices to capture small program working sets and filter ....
R. Panwar and D. Rennels. Reducing the frequency of tag compares for low power 1-Cache design. In Proc. Intl. Symposium on Low Power Electronics and Design, 1995.
....in terms of power but usually also highly redundant, as conflicting references are frequently close to each other in the address space, thus necessitating only a few tag bits for conflict identification. A straightforward optimization for an architecture with no fetch buffers, already proposed in [8], consists of avoiding tag operations when Effective Address TAG0 DATA0 TAG1 DATA1 Comp Comp set1 hit miss set0 hit miss Cache Index set1 data set0 data Tag TAGi DATAi tag data a) b) Figure 1: Set associative cache organizations sequentially fetching within a cache line, and only ....
R. Panwar and D. Rennels, "Reducing the frequency of tag compares for low power I-cache designs", in SLPE, pp. 57-- 62, October 1995.
....If we can concentrate memory accesses on the small static module, a lot of energy can be reduced due to the small value of C. Many techniques for data allocation to the static module have been proposed. These techniques are based on profile data from the execution of programs. Panwar et al. [46] proposed S cache. Frequently executed basic blocks are placed in the S cache (a small static module) Jump instructions which control the execution flow between the S cache and the level 1 main cache are inserted in program codes. Bellas et al. 5] 7] proposed the loop cache (L cache) for ....
....(e.g. operating system) can determine the trade o# between the performance and the energy dissipation by modifying the CWSR. 8 (2) Omitting Tag Comparison In conventional caches, tag comparison is performed in every access to determine whether the current access hits the cache. Panwar et al. [46] proposed a conditional tag comparison scheme which attempts to reduce the total count of tag comparison required. If two successive instructions i and j reside in the same cache line, the tag comparison for j can be omitted. Another approach to omitting the tag comparison is history based ....
R. Panwar and D. Rennels,"Reducing the frequency of tag compares for low power I-cache design," Proc. of the 1995.
....that our approach can reduce the total count of tag checks by 90 , resulting in 15 of cache energy reduction, with less than 0.5 performance degradation. The rest of this paper is organized as follows. Section 2 shows related work, and explains the detail of another technique proposed in [11] to omit tag checks as a comparative approach. Section 3 presents the concept and mechanism of the HBTC cache. Section 4 reports evaluation results for performance energy e# ciency of our approach, and Section 5 concludes this paper. 2 Related Work A technique to reduce the frequency of tag ....
....a comparative approach. Section 3 presents the concept and mechanism of the HBTC cache. Section 4 reports evaluation results for performance energy e# ciency of our approach, and Section 5 concludes this paper. 2 Related Work A technique to reduce the frequency of tag checks has been proposed [11]. If successively executed instructions i and j reside in the same cache line, then we can omit the tag check for instruction j. Namely, the cache proposed in [11] performs tag checks only when i and j reside in di#erent cache lines. We call the cache interline tag comparison cache (ITC cache) ....
[Article contains additional citation context not shown here]
R. Panwar and D. Rennels,"Reducing the frequency of tag compares for low power I-cache design," Proc. of the 1995.
.... to power consumption in computing systems [23] Much of this attention has focused on architectural and circuit techniques for reducing on chip processor power and energy consumption via techniques such as clock gating [2] memory subsystem storage structure optimizations [3] 5] 16] 17] 21] 14][24][25] 26] 27] 30] system bus optimizations [8] 12] pipeline speculation gating [19] and main memory access [18] Recently, a study by Moshovos et al. examined the potential for ltering remote snoop requests by checking them against a small Jetty table to avoid tag lookups and reduce on chip ....
R. Panwar and D. Rennels. Reducing the frequency of tag compares for low power i-cache design. In Proc. Intl. Symposium on Low Power Electronics and Design, 1995.
....We do not consider this option further in this paper. Direct addressing is only used for data caches. Instruction caches have very regular access patterns and are only accessed via the program counter, and hence are amenable to software invisible micro architectural techniques to remove tag checks [16, 18, 19]. Instruction Explanation (l s)wlda rt, off(rs) da Load or store word, load direct address. Perform regular load or store, and also set the direct address register da to the location of the referenced line. l s)wda rt, off(rs) da Load or store word, using direct address. Data from the ....
R. Panwar and D. Rennels. Reducing the frequency of tag compares for low power I-cache design. In SLPE, pages 57--62, October 1995.
....complements of their true value. This decision is made in such a way that the new value causes at most half the bus signals to make a transition. Other organizational techniques for reducing power dissipation include the use of additional hardware for reducing the frequency of tag comparisons [PaRe 95] or determining a hit before accessing the data array in a multimodule cache implementation [KBN 95] In the later case, the cache cycle time gets prolonged so the applicability of this technique to L1 caches in modern CPUs are rather limited. 3. Multiple Block Buffering and Other Enhancements ....
Panwar, R. and Rennels, D., "Reducing the frequency of tag compares for low power I--cache design", in Proc. of the Int'l. Sym. on Low Power Design, 1995, pp. 57--62.
....within the various cache components when simulating the execution of code produced for a MIPS machine, since the primary source of power dissipation in CMOS caches are due to gate output transitions. We used the widely accepted SPECint92 benchmarks for our studies. Past studies, such as [KBN 95, PaRe 95, ChMc 96] have used the average of the hit (or miss) ratios to estimate power dissipations within the caches. Our approach, in contrast, is to use a cycle by cycle simulator to measure the transitions actually caused on each cache access. Admittedly, our cache power estimation technique is ....
....is referred to [KaGh 96] 6. Related Work and Conclusions We first describe how our work differs from earlier cited and related work. Cache hit miss ratios of well known benchmark programs have been used directly to estimate the cache power dissipation in a variety of studies such as [KBN 95, PaRe 95, ChMc 96] We have found that power dissipations estimated from simple hit miss ratios (and even models that also take into account the relative numbers of reads and writes) are grossly inaccurate. For example, using the hit miss ratios and the fraction of reads writes collected from our ....
Panwar, R. and Rennels, D., "Reducing the frequency of tag compares for lowpower I--cache design", in Proc. of the Int'l. Sym. on Low Power Design, 1995, pp. 57--62.
....entire tag array. However, this comparison adds energy overhead and adversely affects the latency of the cache access. For an instruction cache, accesses are sequential by default and in most cases it is very simple to determine when an instruction fetch is from the previously accessed cache line [35, 38]. In fact, the segmented program counter implementation described in Section 4.4 satisfies this requirement perfectly. Since I split the program counter between the low order 3 bits and the high order 27 bits, whenever the upper segment is not clocked the access is guaranteed to be to the same 8 ....
R. Panwar and D. Rennels. Reducing the frequency of tag compares for low power I-cache design. In ISLPED, pages 57--62, October 1995.
.... is not to perform this search for sequential fetch within each cache line (intra line sequential flow) since we know the same line is being accessed, but for non sequential fetch such as branches (non sequential flow) and sequential fetch across a cache line boundary (inter line sequential flow) [14, 16], full instruction lookups are performed. This eliminates around 75 of all lookups. This reduces the number of tag readouts and comparisons as well as the number of instruction words read out. We include this optimization in the baseline cache to which we will compare our results. Another ....
R. Panwar and D. Rennels. Reducing the frequency of tag compares for low power I-cache design. In SLPE, pages 57--62, October 1995.
....RAM organization techniques [Itoh 96, EvFr 95] operational level techniques, such as instruction scheduling [SuDe 95] or cache organization techniques. Various cache organizational techniques to reduce the power dissipation in static RAM based caches have been studied in [KBN 95] KaGh97a,b] PaRe 95] SuDe 95] From the standpoint of the CPU designer, it is important to get an estimation of the energy dissipated in on chip caches, including caches that are organized to reduce the energy dissipation. A. Goals of this Paper The goal of this paper is to develop analytical models for ....
....complements of their true value. This decision is made in such a way that the new value causes at most half the bus signals to make a transition. Other organizational techniques for reducing power dissipation include the use of additional hardware for reducing the frequency of tag comparisons [PaRe 95] or determining a hit before accessing the data array in a multimodule cache implementation [KBN 95] In the later case, the cache cycle time gets prolonged so the applicability of this technique to L1 caches in modern CPUs are rather limited. Power consumptions in multi level caches were ....
Panwar, R. and Rennels, D., "Reducing the frequency of tag compares for low power I--cache design", in Proc. of the Int'l. Sym. on Low Power Design, 1995, pp. 57--62.
....altogether. We focus on data accesses since instruction caches, while they dissipate considerable energy, have very regular access patterns and are only accessed via the program counter. Hence they are amenable to software invisible micro architectural techniques for power reduction, e.g. [16, 17, 15]. We first review the design of a low power cache. Then we explain caches tagged with content addressable memory. In section 3 we discuss direct addressing, the process which allows us to avoid tag checks, and we present a thorough evaluation of the design in the next section. In = Tag Index ....
R. Panwar and D. Rennels. Reducing the frequency of tag compares for low power I-cache design. In SLPE, pages 57--62, October 1995.
....on a miss so that performance is unchanged, the processor energy efficiency an be improved by 15 25 . Further techniques have been proposed to reduce the accesses to the instruction cache s tag array by exploiting this same spatial locality, increasing processor energy efficiency of 5 10 [29]. The processor control typically knows which pipeline stages are being used each cycle. Those pipeline stages not used in a given cycle should have their clock disabled for that cycle. This is particularly important to do in superscalar architectures that typically have only a fraction of the ....
R. Panwar and D. Rennels, "Reducing the Frequency of Tag Compares for Low Power I-Cache Design," Proceedings of the International Symposium on Low Power Design, Apr. 1995, pp. 57-62.
....requirements to the absolute minimum [12] In the work of [19] the power consumption of the internal cache of an Intel 486DX2 processor and the external system memory is taken into account by an empirical model based on actual current measurements. There are also some memory related power studies [3, 8, 14, 16], but these are oriented to caches in microprocessors and not for custom network components. Our own previous work was situated at the architectural level [20, 4] 3 The Set Data Structure Model A set of records which are accessed with one or more keys can be represented by many different data ....
R.Panwar, D.Rennels, "Reducing the Frequency of Tag Compares for Low-Power I-Cache Design", 1995 International Symposium on Low Power Design, Laguna CA, pp. 57-62, Apr. 1995.
No context found.
PANWAR,R.AND RENNELS,D. 1995. Reducing the frequency of tag compares for low power I-cache design. In SLPE,pp. 57--62.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC