| S. McFarling Combining branch predictors. Digital Equipment Corporation, WRL Technical Note TN-36, 1993. |
.... and 1 Synchronization unit) 4 floating point Instruction Queues 32 entry integer and floating point queues Renaming Registers 100 integer and 100 floating point Retirement bandwidth 12 instructions cycle TLB 128 entry ITLB and DTLB Branch Predictor McFarling style, hybrid predictor [16] Icache 128KB, 2 way set associative, single port Dcache 128KB, 2 way set associative, dual ported L2 cache 16MB, direct mapped, 20 cycle latency, fully pipelined (1 access per cycle) L1 L2 bus 256 bits wide, 2 cycle latency Memory bus 128 bits wide, 4 cycle latency Physical Memory 128MB, ....
MCFARLING, S. Combining branch predictors. Technical Report TN-36, Digital Equipment Corporation, Western Research Lab (June 1993).
.... and 2 Synchronization units) 4 floating point Instruction Queues 32 entry integer and floating point queues Renaming Registers 100 integer and 100 floating point Retirement bandwidth 12 instructions cycle TLB 128 entry ITLB and DTLB Branch Predictor McFarling style, hybrid predictor [49] Local Predictor 4K entry prediction table indexed by 2K entry history table Global Predictor 8K entries, 8K entry selection table Branch Target Buffer 1K entries, 4 way set associative MSHR 32 entries for the L1 caches, 32 entries for the L2 cache Store Buffer 32 entries Cache Line Size 64 ....
MCFARLING, S. Combining branch predictors. Technical Report TN-36, Digital Equipment Corporation, Western Research Lab (June 1993).
....studied in this paper. While eliminating all branch prediction aliasing is not trivial, it is our belief that the destructive user OS part can be alleviated with appropriate architectural support. There are numerous branch predictors that have been proposed to address different situations [30][14][7] 25] 10] 4] 15] 6] These prediction mechanisms have paid less attention to the OS requirements and no particular scheme was proposed on tuning control flow prediction hardware for the OS. Our intention in this paper, however, is not to propose a new predictor to add to this list. Rather, it is ....
....The binary decision tree based branch sequence of this handler is given in Figure 6b. It can be observed that the branches in the OS routine inttrap will be correlated with a NNT branching sequence while the branches in systrap will be correlated with a NNNT branching sequence. Hence Gshare [14] and GAg [29] predictors work extremely well with these branches. 0x80007dd4: exception ip12 andi k0, k0,0x7c li k1,124 beq k0, k1,0x80007d0c handle vced li k1,56 beq k0, k1,0x80007cec handle vcei li k1,32 beqz k0,0x800080f0 inttrap sw at, 24524( zero) beq ....
[Article contains additional citation context not shown here]
S. McFarling, Combining Branch Predictors, WRL Technical Note TN-36, Digital Equipment Corporation, June 1993.
....in an effort to improve the rate of instruction delivery to the execution core. Techniques to reduce the impact of I cache misses include multi level instruction memory hierarchies [17] and instruction prefetch [40] Techniques to reduce the impact of branch mispredictions include hybrid [21] and indirect [6] branch predictors, and recovery miss caches to reduce misprediction latencies [2] A number of compiler based techniques work to improve instruction delivery performance. They include branch alignment [5] trace scheduling [12] and block structured ISAs [15] We will now ....
....a pipelined memory bus, where a new request can occur every 4 cycles, so each bus can transfer 8 bytes cycle. There is a 32 entry 8 way associative instruction TLB and a 32 entry 8 way associative data TLB, each with a 30 cycle miss penalty. For this paper, we used the McFarling hybrid predictor [21] for our conditional branch predictor. The predictor has a 2 bit meta chooser and a 2 bit bimodal predictor, both stored in the branch predictor entry with their corresponding branch. In addition, a tagless gshare predictor is also available, accessed in parallel with the branch predictor. The ....
S. McFarling. Combining branch predictors. Technical Report TN-36, Digital Equipment Corporation, Western Research Lab, June 1993.
....The binary decision tree based branch sequence of this handler is given in Figure 6b. It can be observed that the branches in the OS routine inttrap will be correlated with a NNT branching sequence while the branches in systrap will be correlated with a NNNT branching sequence. Hence Gshare [McFar93] and GAg [Yeh91] predictors work extremely well with these branches. 0x80007dd4: exception ip12 andi k0, k0,0x7c li k1,124 beq k0, k1,0x80007d0c handle vced li k1,56 beq k0, k1,0x80007cec handle vcei li k1,32 beqz k0,0x800080f0 inttrap sw at, 24524( zero) beq ....
....to give an indication of the associated area cost. Table 5. A Comparison of Several Branch De aliasing Schemes Predictor Description of feature to exploit heterogeneous branches or De aliasing Additional Branch De aliasing Hardware Predictor Size Normalized to Gshare(8k256k) Gshare [McFar93] Consists of one correlation shift register (BHSR) and one BHT. BHSR is XORed with branch address bits of a branch address to index BHT entry. The XORing helps to reduce aliasing effects. 01 1, 2 [Ever97] Consists of multiple single scheme components: simple 2 bit (2bc) GAs, Gshare, ....
S. McFarling, Combining Branch Predictors, WRL Technical Note TN-36, Digital Equipment Corporation, June 1993
....control flow and data flow analysis. Programs were executed for a total of five billion instructions or program termination. 5 Results To evaluate the benefit of loop termination prediction we examine the branch prediction performance of two predictors based on McFarling s meta predictor [8]. McFarling s meta predictor has a meta chooser table of 2 bit counters to choose between bimodal and gshare branch prediction. We use a bimodal table of 2 bit counters indexed by the PC to produce the bimodal prediction. In addition, we use a global history register XORed with the branch PC as ....
S. McFarling. Combining branch predictors. Technical Report TN-36, Digital Equipment Corporation, Western Research Lab, June 1993.
....fully pipelined allowing a new instruction to initiate execution each cycle. We use a 128 entry 4 way associative FTB with a 2K entry 4 way associative second level FTB. Each fetch block stored in the FTB can span up to five sequential cache blocks. We use the McFarling bi modal gshare predictor [10], with an 8K entry gshare table and a 64 entry return address stack in combination with the FTB. We use a 32 entry FTQ in conjunction with the FTB. 5.2 Memory Hierarchy We rewrote the memory hierarchy in SimpleScalar to model bus occupancy, bandwidth, and pipelining of the second level cache ....
S. McFarling. Combining branch predictors. Technical Report TN-36, Digital Equipment Corporation, Western Research Lab, June 1993.
....7.8 misprediction rate) as a 4096 entry two level predictor. Early indirect branch prediction studies have been reported in [Lee84, Jaco97, Emer97 and Chang97] Branch classification and hybrid prediction was first proposed for conditional branches by Chang Patt in [Chang94] and by McFarling in [McFar93] respectively. To our knowledge, no previous study has looked at branch prediction of Java programs by examining both user and kernel execution. 3. Benchmarks and Experimental Methodology This work is based on simulation analysis of branch instruction traces generated on a complete system ....
S. McFarling, Combining Branch Predictors, WRL Technical Note TN-36, Digital Equipment Corporation, June 1993
....management rules used in a cascaded predictor. Early indirect branch prediction studies have been reported in [Lee84, Jaco97, Emer97 and Chang97] Branch classification and hybrid prediction was first proposed for conditional branches by Chang Patt in [Chang94 and Chang95] and by McFarling in [McFar93] respectively. To our knowledge, no previous study has analyzed branch prediction of Java programs by examining both user and kernel execution. 7. Conclusion and Future Research The popularity and wide adoption of Java has necessitated the development of an efficient Java runtime system. We ....
S. McFarling, Combining Branch Predictors, WRL Technical Note TN-36, Digital Equipment Corporation, June 1993
....in which each branch is assigned its own local branch history register. In either version, a 2 bit saturating counter is selected by rst computing the index into PHT from the branch address and then using the branch history register to locate the counter. The gshare predictor proposed by McFarling[5] improves the GAs version by XORing the branch address with the the global branch history to compute the index into PHT. Nair [6] proposed the dynamic path based predictor as opposed to pattern based predictors. Instead of storing branch outcomes, the path based predictor stores actual branch ....
Scott McFarling. Combining branch predictors. Technical Report TN-36, Digital Equipment Corporation, Western Research Lab, June 1993.
....MULT, and 2 FP MULT DIV. The latencies are: ALU 1 cycle, MULT 3 cycles , FP Adder 2 cycles, FP Mult 4 cycles, and FP DIV 12 cycles. All functional units, except the divide units, are fully pipelined allowing a new instruction to initiate execution each cycle. We use a McFarling gshare predictor [13] to drive our fetch unit. Two predictions can be made per cycle with up to 8 instructions fetched. We rewrote the memory hierarchy in SimpleScalar to better model bus occupancy, bandwidth, and pipelining of the second level cache and main memory. The L1 instruction cache is a 32K 2 way ....
S. McFarling. Combining branch predictors. Technical Report TN-36, Digital Equipment Corporation, Western Research Lab, June 1993.
....The latencies are: ALU 1 cycle, MULT 3 cycles, Integer DIV 12 cycles, FP Adder 2 cycles, FP Mult 4 cycles, and FP DIV 12 cycles. All functional units, except the divide units, are fully pipelined allowing a new instruction to initiate execution each cycle. We use a McFarling gshare predictor [20] to drive our fetch unit. Two predictions can be made per cycle with up to 8 instructions fetched. We rewrote the memory hierarchy in SimpleScalar to better model bus occupancy, bandwidth, and pipelining of the second level cache and main memory. For the majority of our results, the L1 ....
S. McFarling. Combining branch predictors. Technical Report TN36, Digital Equipment Corporation, Western Research Lab, June 1993.
....the resulting two level predictor achieves a prediction accuracy of 90 for a history table with 1K entries. Next we combined two level predictors of different path length into a hybrid predictor to increase the prediction accuracy further. Hybrid predictors were first proposed by McFarling [98] for conditional branch prediction. To our knowledge, this study is the first to evaluate hybrid prediction for indirect branches. We studied three classes of hybrid predictors: 7 . Classifying predictors, first proposed for conditional branches by Chang, Hao and Patt [23] assign a class to ....
....Many of problems we encountered in the course of this work resemble problems encountered in cache design. For example, many of the solutions in Section 8 come straight out of cache design literature [65] Others are inspired by good conditional branch designs, for example the GSHARE predictor [98]. To see which of these solutions carry over to indirect branch predictors, we have to measure their predictor performance on real programs. We use misprediction rate, defined as branch misprediction frequency, as the metric to evaluate designs. Minimization of misprediction rate ensures that ....
[Article contains additional citation context not shown here]
S. McFarling, Combining Branch Predictors, WRL Technical Note TN-36, Digital Equipment Corporation, June 1993.
....a misprediction ratio of 31.5 for gcc. The cascaded predictor in presented in this study, with 4 entry filter, obtains 23.7 misprediction rate with a 512 entry, fourway associative dual path hybrid predictor as second stage. Hybrid prediction for conditional branches was first proposed in [McFar93]. Recent results can be found in [CHP95] and [ECP96] Chen et al. CCM96] propose Partial Prefix Matching prediction for conditional branch prediction and show that a PPM predictor performs better than a two level predictor for a similar hardware budget. Since a PPM predictor chooses the ....
S. McFarling, Combining Branch Predictors, WRL Technical Note TN-36, Digital Equipment Corporation, June 1993
....in order to concentrate on fetch direct prefetching. A 4K entry FTB is large, but is implementable using the 2 level FTB approach described in [14] We are currently evaluating using FDP with a 2 level FTB design. For conditional branch prediction, we use the McFarling bi modal gshare predictor [11], with an 8K entry gshare table and a 64 entry return address stack in combination with the FTB. Table 2 shows the effects of using the FTB with a dual ported instruction cache. The results show that 7 speedup is achieved on average when using 2 ports instead of 1 port for the instruction cache. ....
S. McFarling. Combining branch predictors. Technical Report TN36, Digital Equipment Corporation, Western Research Lab, June 1993.
No context found.
S. McFarling. Combining branch predictors. Technical Report TN-36, Digital Equipment Corporation, Western Research Lab, June 1993.
....approximation to quantify area rather than performing synthesis on each state we wish to examine. 5.5 Branch Prediction Results We compare out customized predictor against three other branch predictors. The first predictor we compare against is the the gshare predictor predictor of McFarling [17]. The second predictor is a meta chooser predictor that contains a two level local history branch prediction table, a global history table, and a meta chooser table that determines whether to use the local or global prediction for predicting the current branch. We call this the Local Global ....
S. McFarling. Combining branch predictors. Technical Report TN-36, Digital Equipment Corporation, Western Research Lab, June 1993.
....a misprediction ratio of 31.5 for gcc. The cascaded predictor in presented in this study, with 4 entry filter, obtains 23.7 misprediction rate with a 512 entry, four way associative dual path hybrid predictor as second stage. Hybrid prediction for conditional branches was first proposed in [McFar93]. Recent results can be found in [CHP95] and [ECP96] Chen et al. CCM96] propose Partial Prefix Matching prediction for conditional branch prediction and show that a PPM predictor performs better than a twolevel predictor for a similar hardware budget. Since a PPM predictor chooses the prediction ....
S. McFarling. Combining Branch Predictors. WRL Technical Note TN-36, Digital Equipment Corporation, June 1993.
No context found.
S. McFarling Combining branch predictors. Digital Equipment Corporation, WRL Technical Note TN-36, 1993.
No context found.
Scott McFarling. Combining branch predictors. Technical Report TN-36, Digital Equipment Corporation, Western Research Laboratory, June 1993.
No context found.
S. McFarling. Combining branch predictors. Technical Report TN-36, Digital Equipment Corporation, Western Research Lab, June 1993.
No context found.
S. McFarling Combining branch predictors. WRL Technical Note TN-36, Digital Equipment Corporation, 1993.
No context found.
S. McFarling. Combining branch predictors. Technical Report TN-36, Digital Equipment Corporation, Western Research Lab, June 1993.
No context found.
S. McFarling. Combining branch predictors. Technical Report TN-36, Digital Equipment Corporation, Western Research Lab, June 1993.
No context found.
McFarling, Scott. Combining Branch Predictors. Tech. Rep. Technical Report TN-36, Digital Equipment Corporation, Western Research Lab, June 1993.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC