| T. Yeh and Y. Patt, "Two-level Adaptive Branch Prediction and Instruction Fetch Mechanism for High Performance Superscalar Processors," Computer Science and Engineering Div. Tech. Report CSE-TR-18293, University of Michigan, Oct. 1993. |
....predicted branch outcomes of the instruction stream. The prediction accuracy of the branch predictor is critical to the performance of the trace cache, as mispredictions result in large trace abort penalties. Thus, the branch predictor used in this research is the Gag correlated branch predictor [21], based on its high prediction accuracy and its ability to be generalized to do multiple branch prediction. In this scheme, a 16 bit global history register indexes into a pattern history table. The pattern history table actually consists of four sets of counters, which are then selected based on ....
T-Y Yeh, " Two-level Adaptive Branch Prediction and Instruction Fetch Mechanisms for High Performance Superscalar Processors," Ph.D. Thesis, Department of Electrical Engineering and Computer Science, University of Michigan, 1993
....uses this information to trigger data prefetches. Updating the ld st cache only on a miss in the data cache is similar to BHGP updating the PT only on an I cache miss. The Branch Address Cache(BAC) 17] predicts multiple branch target addresses every cycle by extending 2 level branch predictor [18] so as to predict multiple branches per cycle; it uses the BAC to determine the starting addresses of the target basic blocks past K branches. The technique does not use the BAC to issue prefetches to the L1 I cache, but focuses on simultaneously fetching several non consecutive basic blocks from ....
T-Y Yeh, "Two-level Adaptive Branch Prediction and Instruction Fetch Mechanisms for High Performance Superscalar Processors," Ph.D. Dissertation, University of Michigan, 1993.
....penalties and low cache miss penalties. Finally, we find that the application of profile driven code layout and branch alignment techniques (without SCBP) can improve the performance of the dynamic correlated branch prediction techniques. 1 Introduction Recent work in branch prediction [3, 18, 25, 30, 31, 32, 33] has led to the development of both hardware and software schemes that achieve high prediction accuracy by exploiting branch correlation. The motivation for this work stems from the fact that the performance of superscalar and deeply pipelined processors can benefit significantly from a small ....
....increasing the overall prediction accuracy. Pan, So, and Rahmeh [25] appear to have been the first to use the term correlation. In their two level adaptive scheme, they use a single hardware shift register of length k to record the previous directions of the last k branches (Yeh and Patt [31] refer to this scheme as GAs and to k as the history depth) The contents of the shift register are concatenated with some bits from the branch address to select one of the 2 bit counters in the BHT. Under the constraint of a fixed size BHT, McFarling [18] was able to achieve prediction accuracies ....
T. Yeh, "Two-Level Adaptive Branch Prediction and Instruction Fetch Mechanisms for High Performance Superscalar Processors," Computer Science and Engineering Div. Tech. Report CSE-TR-182-93, Univ. of Michigan, Ann Arbor, MI, Oct. 1993.
....examining and issuing four to six instructions per cycle, with higher rates expected [4] 5] 6] 7] Successful use of these high issue rates requires careful tuning of the microarchitecture. There is a wealth of technological alternatives for this task. These include branch handling strategies [8], functional unit duplication [2] and instruction fetch, issue, completion and retirement policies [9] The deciding factor between the various techniques is a function of the performance each adds, versus the cost each incurs. Unfortunately, this tradeoff analysis rarely takes power consumption ....
T. Yeh, Two-level adaptive branch prediction and instruction fetch mechanisms for high performance superscalar processors, PhD thesis, Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, 1993.
....line, and performs well on their execution model retiring less than 2.5 instructions per cycle on integer applications. Another major drawback is the requisite use of a 2 bit counter prediction scheme [22] which results in a higher misprediction rates than other schemes such as local schemes [17][28]. Moreover, as the I cache line size keeps growing in current processors, the interleaving factor of the BTB grows as much and the collapsing logic becomes more complex. Although this approach is interesting, our purpose is to go one step beyond by fetching multiple basic blocks in a single cycle, ....
....single target, except for AaT when Aa is an indirect branch or a procedure return. Hence, the two block ahead BTB does not require more entries to achieve the same hit ratio as a conventional BTB. Associativity. Since most of the conditional branches are either mostly taken or mostly fall through [28], the BTB will often only record one of AaT or AaN. Thus the associativity required in the two block ahead BTB is not higher than in a conventionnal BTB. An entry AaX is mapped in the BTB as follows: low order bits of A are used to address the set (BTB indexing) and both aX and high order bits of ....
T. Yeh, "Two-Level Adaptive Branch Prediction and Instruction Fetch Mechanisms for High Performance Superscalar Processors," PhD thesis, Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor (MI) , 1993.
....be taken. This is called the Branch Always method. This method usually increases performance since it has been found that the majority of branches are taken. The results, however, vary greatly between different programs based on the programmer awareness and the compiler. An average of 60 accuracy [16] is obtained using this technique. Programs with many loops may even obtain accuracies as high as 90 or more. An improved static scheme is to use the direction to the target address as a means of predicting the branch [9] If the branch is a backward branch, the target address has a lower address ....
....Loops are a common occurrence in programs and have a strong tendency to be terminated by backward conditional branches that are disproportionately taken. Thus, this approach works well for loop intensive programs. It achieves an average of 70 accuracy across a variety of benchmark applications [16]. A branch can also be predicted based on the opcode of the branch [1] A study showed that branch if negative , branch if equal , and branch if greater than or equal are usually taken. This is due to these types of branches usually being used to terminate loops. This scheme tends to ....
[Article contains additional citation context not shown here]
Tse-Yu Yeh. Two-Level Adaptive Branch Prediction and Instruction Fetch Mechanisms for High Performance Superscalar Processors. PhD thesis, University of Michigan, 1993. 75
....that can be observed, recorded, and exploited. The most successful application of branch correlation has been in dynamic branch prediction, where architects have built predictors that remember patterns in the stream of branch executions [McFarling 1993; Pan et al. 1992; Yeh and Patt 1991; 1993; Yeh 1993; Young et al. 1995] Our work is related to the recent work in dynamic global predictors, predictors that identify and exploit repetitive behavior in the trace of all conditional branches executed by the program. If a branch always goes the same way when reached by a particular pattern of prior ....
....other compile time optimizations with intents or effects similar to SCBP, while Section 5.4 discusses the links between SCBP and DFA minimization. 5. 1 Dynamic Correlated Branch Prediction In 1991, Tse Yu Yeh and Yale Patt introduced two level adaptive prediction schemes [Yeh and Patt 1991; 1993; Yeh 1993]. These schemes capture branch history and use this history as an additional index into the traditional table of two bit counters. Branch history is recorded in branch history shift registers (BHSRs) where a 1 represents a branch that took (jumped) and a 0 represents a branch that fell ....
Yeh, T. 1993. Two-level adaptive branch prediction and instruction fetch mechanisms for high performance superscalar processors. Tech. Rep. CSE-TR-182-93, Univ. of Michigan, Ann Arbor, MI. Oct.
....branch prediction schemes. Keywords: branch prediction, branch correlation, branch stream characteristics. 1 Introduction Recent work in branch prediction has led to the development of both hardware and software schemes that achieve good prediction accuracy by exploiting branch correlation [4, 9, 11, 14, 15, 16, 17]. However, little attention has been paid to why these schemes behave better than prior ones and to where further improvements can be made. In this paper, we describe an analytic framework that helps answer these questions based on the fundamental characteristics of the branch prediction problem. ....
....run. Surprisingly, there are very few designs for dynamic predictors. By far, the most popular dynamic predictor is the 2 bit saturating, up down counter [12] This predictor forms the basis of all of the correlated branch predictors described by McFarling [9] Pan et al. 11] and Yeh and Patt [14, 15, 16]. Lee and Smith [7] observed that the execution streams of most program branches tend to occur in long runs 4 and that an n bit counter predictor can exploit this regularity. Smith [12] further observed that a 2 bit counter empirically provides an appropriate amount of damping (or hysteresis) ....
[Article contains additional citation context not shown here]
T. Yeh, "Two-Level Adaptive Branch Prediction and Instruction Fetch Mechanisms for High Performance Superscalar Processors," Computer Science and Engineering Div. Tech. Report CSE-TR-182-93, Univ. of Michigan, Ann Arbor, MI, Oct. 1993.
....and studied the issues involved in instruction prefetching and branch prediction [38] using both hardware and software schemes. There is an extraordinary amount of literature on branch prediction mechanisms, and a fair amount for instruction prefetching, and speculative execution. Consult [38, 39] for good starting points into the literature. The combined objective of instruction prefetching and branch prediction is to reduce the performance degradation experienced by processors in the presence of breaks in control flow. The five major causes for breaks in control flow are: intraprocedural ....
T.-Y. Yeh, Two-Level Adaptive Branch Prediction and Instruction Fetch Mechanisms for High Performance Superscalar Processors. PhD thesis, University of Michigan, Department of Electrical Engineering and Computer Science, 1993.
....may be represented several times in the BTB. However, our simulations show that the two block ahead BTB does not require many additional entries to achieve the same hit ratio as a conventional BTB. Associativity Since most of the conditional branches are either mostly taken or mostly fall through [20], the BTB will often record only one of AaT or AaN. Thus the associativity required in the two block ahead BTB will not be much higher than in a conventionnal BTB. 4.3 Coping with Procedure Returns Procedure return jumps are a special case where using a BTB alone is inefficient: the target ....
T. Yeh, "Two-Level Adaptive Branch Prediction and Instruction Fetch Mechanisms for High Performance Superscalar Processors," PhD thesis, Department of Electrical Engineering and Computer Science, University of Michigan, 1993.
....for return address prediction. In Multiscalar processors, as in scalar processors, a reasonably deep RAS is nearly perfect in predicting return addresses [2] Microarchitectural structures have been proposed that combine exit and target address prediction to facilitate pipelined implementations [16][3] 5 Adapting Branch Prediction to Tasks In applying scalar dynamic branch prediction to intertask prediction in Multiscalar processors, we equate Multiscalar tasks to scalar basic blocks and leave the overall structure of the predictor mechanism unchanged. However, there are a number of ....
T.-Y. Yeh. Two Level Adaptive Branch Prediction and Instruction Fetch Mechanism for High Performance Superscalar Processors. Ph.D. thesis, Dept. of Electrical Engineering & Computer Science, University of Michigan, 1993.
....storing counters with each branch achieves multiple branch prediction trivially, branch prediction accuracy is limited. Branch prediction is fundamental to ILP, and should have precedence over other factors. For high branch prediction accuracy, we use a 4kB GAg(14) correlated branch predictor [10]. The 14 bit global branch history register indexes into a single pattern history table. This predictor was chosen for its accuracy and because it is more easily extended to multiple branch predictions than other predictors which require address information [2] 4] It is relatively straightforward ....
T-Y Yeh, "Two-level Adaptive Branch Prediction and Instruction Fetch Mechanisms for HighPerformance Superscalar Processors," PhD Thesis, Department of Electrical Engineering and Computer Science, University of Michigan, 1993.
....and studied the issues involved in instruction prefetching and branch prediction [38] using both hardware and software schemes. There is an extraordinary amount of literature on branch prediction mechanisms, and a fair amount for instruction prefetching, and speculative execution. Consult [38, 39] for good starting points into the literature. The combined objective of instruction prefetching and branch prediction is to reduce the performance degradation experienced by processors in the presence of breaks in control flow. The five major causes for breaks in control flow are: intraprocedural ....
T.-Y. Yeh, Two-Level Adaptive Branch Prediction and Instruction Fetch Mechanisms for High Performance Superscalar Processors. PhD thesis, University of Michigan, Department of Electrical Engineering and Computer Science, 1993.
....to enhance instruction cache performance [4] 7] 8] The combined effect of this work is to lessen the impact of instruction cache misses on fetch bandwidth. Branch prediction is the second factor that constrains fetching. Several recent studies address the accuracy of branch prediction [9], 10] 11] But branch prediction alone is not sufficient to deliver high fetch bandwidth. Even when branches are predicted accurately, the fetch unit must extract multiple, non sequential instructions from the instruction cache in one cycle. The layout of instructions in the cache often ....
....cache is accessed for A and B, then B and C, then C and D, etc. The interleaved sequential scheme must determine and eliminate any predicted non sequential instructions before forwarding to the decoder. This is accomplished using a BTB interleaved by the number of instructions in a cache block [9]. A BTB query returns the successor block address and a bit pattern predicting which instructions in the fetched block are valid for decoding. The successor block address is used to invalidate the sequential prefetch block. The block address and bit pattern are found using a chain of comparators ....
[Article contains additional citation context not shown here]
T. Yeh, Two-level adaptive branch prediction and instruction fetch mechanisms for high performance superscalar processors. PhD thesis, Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, 1993.
....examining and issuing four to six instructions per cycle, with higher rates expected [3] 4] 5] 6] Successful use of these high issue rates requires careful tuning of the microarchitecture. There is a wealth of technological alternatives for this task. These include branch handling strategies [7], functional unit duplication [2] and instruction fetch, issue, completion and retirement policies [8] The deciding factor between the December 11, 1997 DRAFT 104 TO APPEAR: IEEE TRANSACTIONS ON VLSI SYSTEMS various techniques is a function of the performance each adds, versus the cost each ....
T. Yeh, Two-level adaptive branch prediction and instruction fetch mechanisms for high performance superscalar processors, PhD thesis, Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, 1993.
....storing counters with each branch achieves multiple branch prediction trivially, branch prediction accuracy is limited. Branch prediction is fundamental to ILP, and should have precedence over other factors. For high branch prediction accuracy, we use a 4kB GAg(14) correlated branch predictor [15]. The 14 bit global branch history register indexes into a single pattern history table. This predictor was chosen for its accuracy and because it is more easily extended to multiple branch predictions than other predictors which require address information [16] 2] Multiporting the pattern ....
T.-Y. Yeh. Two-level Adaptive Branch Prediction and Instruction Fetch Mechanisms for High Performance Superscalar Processors. PhD thesis, EECS Dept., University of Michigan - Ann Arbor, 1993.
....branches in the program. Branch correlation techniques split up the case of predicting a particular branch in the program into multiple separate cases, based on the history of other recent branches. In 1991, Tse Yu Yeh and Yale Patt introduced two level adaptive prediction schemes [60, 61, 62]. These schemes capture branch history and use this history as an additional index into the traditional table of 2 bit counters. Branch history is recorded in instr. fetch buffer decode multi issue slotting register fetch execute one execute two register write back next instr. fetch buffer ....
T. Yeh. "Two-Level Adaptive Branch Prediction and Instruction Fetch Mechanisms for High Performance Superscalar Processors," Computer Science and Engineering Div. Tech. Report CSE-TR-182-93. Univ. of Michigan, Ann Arbor, MI, Oct. 1993.
....and studied the issues involved in instruction prefetching and branch prediction [38] using both hardware and software schemes. There is an extraordinary amount of literature on branch prediction mechanisms, and a fair amount for instruction prefetching, and speculative execution. Consult [38, 39] for good starting points into the literature. The combined objective of instruction prefetching and branch prediction is to reduce the performance degradation experienced by processors in the presence of breaks in control flow. The five major causes for breaks in control flow are: intraprocedural ....
T.-Y. Yeh, Two-Level Adaptive Branch Prediction and Instruction Fetch Mechanisms for High Performance Superscalar Processors. PhD thesis, University of Michigan, Department of Electrical Engineering and Computer Science, 1993.
....stations. Highly accurate branch prediction and speculative execution are generally accepted as essential for high superscalar performance, so a hardware predictor with high prediction accuracy is incorporated in this processor model. Specifically, the two level adaptive training branch predictor [13] is employed. This scheme consists of a 1024 entry table known as the History Register Table (HRT) which maintains a history of the last eight executions of a branch. The entries in this HRT point to locations in another 1024 entry table called the Pattern Table (PT) A branch prediction is made ....
T. Yeh. Two-level adaptive branchprediction and instruction fetch mechanisms for high performance superscalar processors. PhD thesis, Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, 1993.
....MC88110 instruction level simulator (Archsim) reads in the object code and input data; it then simulates execution and produces an instruction trace. The instruction traces are then processed by either the restricted data flow (RDF) simulator [4] or the instruction fetch mechanism (IFM) simulator [37]. The RDF simulator simulates the execution of instructions on a given machine model, which is described in a configuration file read by the RDF simulator. The IFM simulator only simulates the actions involved in predicting instruction fetch address. Thus, the IFM simulator has a shorter ....
T.-Y. Yeh, Two-level adaptive branch prediction and instruction fetch mechanisms for high performance superscalar processors, PhD thesis, University of Michigan, 1993.
No context found.
T. Yeh and Y. Patt, "Two-level Adaptive Branch Prediction and Instruction Fetch Mechanism for High Performance Superscalar Processors," Computer Science and Engineering Div. Tech. Report CSE-TR-18293, University of Michigan, Oct. 1993.
No context found.
T. Yeh and Y. Patt, "Two-level Adaptive Branch Prediction and Instruction Fetch Mechanism for High Performance Superscalar Processors," Computer Science and Engineering Div. Tech. Report CSE-TR-18293, University of Michigan, Oct. 1993.
No context found.
T. Yeh and Y. Patt, "Two-level Adaptive Branch Prediction and Instruction Fetch Mechanism for High Performance Superscalar Processors," Computer Science and Engineering Div. Tech. Report CSE-TR-18293, University of Michigan, Oct. 1993.
No context found.
T. Yeh and Y. Patt, "Two-level Adaptive Branch Prediction and Instruction Fetch Mechanism for High Performance Superscalar Processors," Computer Science and Engineering Div. Tech. Report CSE-TR-18293, University of Michigan, Oct. 1993.
No context found.
T. Yeh, "Two-Level Adaptive Branch Prediction and Instruction Fetch Mechanisms for High Performance Superscalar Processors," PhD Thesis, University of Michigan, 1993.
No context found.
T.-Y. Yeh. Two-level Adaptive Branch Prediction and Instruction Fetch Mechanisms for High Performance Superscalar Processors. PhD Thesis, University of Michigan - Ann Arbor, 1993.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC