24 citations found. Retrieving documents...
Tse-Yu Yeh and Yale N. Patt. Alternative implementations of two-level adaptive branch prediction. In The 19th Annual International Symposium on Computer Architecture (ISCA), pages 124-134, 1992.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Per-Branch History Length Fitting in Pattern-Based Branch.. - Park, Kim   (Correct)

....This is because the number of lost cycles due to branch misprediction is so large that high misprediction rate can result in severe deterioration of the overall microprocessor performance. Various branch prediction schemes have been proposed to achieve high prediction accuracy. Yeh and Patt [8] proposed the two level branch predictor which employs two levels of history in branch prediction. The rst level of history stores the outcomes of the most recent l branches in l bit shift registers, or branch history registers. The second level of history is implemented with a Pattern History ....

Tse-Yu Yeh and Yale N. Patt. Alternative implementations of two-level adaptive branch prediction. In The 19th Annual International Symposium on Computer Architecture (ISCA), pages 124-134, 1992.


Confidence Estimation for Speculation Control - Grunwald, Klauser, Manne.. (1998)   (41 citations)  (Correct)

....predictors. The benchmarks and important measurements from our simulations are listed in Table 1. We used three underlying branch predictors to compare the different confidence estimators: a speculative gshare predictor, a speculative McFarling combining predictor [12] and a non speculative SAg [17] predictor. Figure 2 gives a schematic illustration of each branch predictor. The gshare branch predictor (Figure 2a) combines a global branch history with the program counter to select a two bit counter. The SAg predictor (Figure 2b) uses the program counter to index into an untagged table of ....

Tse-Yu Yeh and Yale N. Patt. Alternative implementations of two-level adaptive branch predictions. In Proceedings 19th Annual Annual International Symposium on Computer Architecture, pages 124--134, Gold Coast, Australia, May 1992. ACM.


Exploiting Load Latency Tolerance in Dynamically.. - Srikanth Srinivasan.. (1998)   (Correct)

....that are waiting for their operands and executing other independent instructions out of order, the processor is able to tolerate some long latency operations including cache misses. To find enough independent instructions, most processors employ sophisticated branch prediction mechanisms [17, 10] and allow speculative execution [8, 16, 3] committing results only when the true outcome of branch is known. Unfortunately, because of finite resources and imperfect branch prediction, some operations must complete quickly to maximize processor performance. Consider a processor capable of ....

Tse-Yu Yeh and Yale N. Patt. Alternative Implementations of Two-Level Adaptive Branch Prediction. In ISCA92, pages 124--134, 1992.


Load Latency Tolerance In Dynamically Scheduled Processors - Srinivasan, Lebeck (1999)   (32 citations)  (Correct)

....other independent instructions out of order, the processor is able to tolerate some long latency operations including cache misses with almost no overall performance degradation. To find enough independent instructions, most processors employ sophisticated branch prediction mechanisms [11, 29] and allow speculative execution [19, 12] committing results only when the true outcome of a branch is known. However, limitations due to finite resources, data dependencies and imperfect branch prediction, render the processor unable to tolerate the latencies of some long latency operations. ....

Tse-Yu Yeh and Yale N. Patt. Alternative Implementations of Two-Level Adaptive Branch Prediction. In Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 124--134, May 1992.


Branch Prediction and Multithreading (V2) - Hily, Seznec (1996)   (Correct)

....to deal with the branch problem on singlethreaded architectures. Smith [Smi81] studied several hardware schemes for predicting branch directions, Lee and Smith [LS84] illustrated the PI n997 4 S ebastien Hily Andr e Seznec interest of combining a good history prediction with BTB, Yeh and Patt [YP92a] devised the two level adaptive branch prediction while McFarling proposed combining branch predictors [McF93] Branch prediction strategies for superscalar architectures now achieve more than 90 accuracy. However, the effectiveness or even simply the usefulness of these prediction mechanisms ....

....uses opcodes, profiling statistics or branch direction and is performed before execution. Dynamic prediction uses run time history, therefore predictions are not known until run time. Among branch prediction policies, one can find, by order of efficiency (as well as of complexity) LS84] YP92a] not taken, taken, BTFN (backward branches taken, forward branches not taken) compiler directed, 1 bit history, 2 bit history and two level algorithm. The first four policies, which are static schemes, have a relatively low accuracy (less than 75 for the compiler directed algorithm) The last ....

[Article contains additional citation context not shown here]

Tse-Yu Yeh and Yale N. Patt. Alternative implementations of two-level adaptive branch prediction. 19th International Symposium on Computer Architecture, pages 124--134, May 1992.


How Useful Are Non-blocking Loads, Stream Buffers, and.. - Farkas, Jouppi, Chow (1994)   (8 citations)  (Correct)

....latency. In addition to cache miss induced stalls, branch instructions may also introduce stalls if the branch delay slot(s) cannot be filled by the compiler or if the branch direction is mispredicted. In view of the high correct prediction rates reported by McFarling [14] and Yeh and Patt [15], we assume that all branches are correctly predicted dynamically with the exception of 5 of conditional branches. For conditional branches, the model associates a two cycle stall with each mispredict. We implement this penalty during the simulation of a benchmark by adding a single cycle stall ....

Tse-Yu Yeh and Yale N. Patt. Alternative implementations of two-level adaptive branch prediction. Proceedings of the 20th Intl. Symp. on Computer Architecture, pages 124--134, May 1992.


Speculative Execution and Instruction-Level Parallelism - Wall (1994)   (1 citation)  (Correct)

....that at least some of the instructions we speculatively execute will be useful. It is possible to use a combination of the two, fanning out part of the time and predicting the rest of the time. This paper presents results concerning three questions. First, recent work in branch prediction [8, 10, 18, 19] has shown how to use very large predictors to improve the performance of hardware predictors from around 92 success to around 98 . What effect does this have on instruction level parallelism Second, how useful is a fan out capability, both by itself and in combination with a predictor Third, ....

....but unfortunately increasing the size of the table does not help much; the success rate levels off at 92 or 93 regardless of the table size. 2 SPECULATIVE EXECUTION AND INSTRUCTION LEVEL PARALLELISM Recent studies have explored more sophisticated hardware prediction using branch histories [10, 18, 19]. These approaches maintain tables relating the recent history of the branch (or of branches in the program as a whole) to the likely next outcome of the branch. These approaches do quite poorly with small tables, but unlike the two bit counter schemes they can benefit from much larger ....

[Article contains additional citation context not shown here]

Tse-Yu Yeh and Yale N. Patt. Alternative implementations of two-level adaptive branch prediction. Nineteenth Annual International Symposium on Computer Architecture, 124-- 134, May 1992. Published as Computer Architecture News 20(2).


Using Software-Extended Architectures for Software.. - Witchel, Kaashoek (1997)   (Correct)

....is a key parameter of the translator, and is analogous to the hardware branch speculation policy of modern processors. Because the translator is implemented in software, its policies can be aggressively tailored for a given application. While the literature on hardware branch prediction is large [YP92, MEP96, GYCS96] we believe that SEA could open up a new trade off space for branch prediction algorithms based on information collected by SEA software. These new branch prediction schemes could either replace hardware prediction, or work with hardware prediction schemes. Like other SEA ....

Tse-Yu Yeh and Yale N. Patt. Alternative implementations of two-level adaptive branch prediction. In Proc. 19th Annual Symposium on Computer Architecture, pages 124--134, may 1992.


Dynamic Feature Selection for Hardware Prediction - Fern, Givan, Falsafi.. (2000)   (6 citations)  (Correct)

....multiple specialized predictors to combine) 1 Introduction Hardware prediction and speculation is emerging as a key technique to hide latency and improve performance in computer systems. Computer architects are exploiting predictive techniques in a variety of tasks such as branch prediction [28,29]; value prediction [19] cache way prediction [7] memory address [13,1] dependence [21] and sharing prediction [17,16] In all cases, these hardware predictors capitalize on repetitive application behavior resulting in predictability in system event outcomes. By 2 predicting such outcomes ....

....that, given its internal state and the values of some input bits, or features, outputs a prediction of the unknown value of a particular bit. For instance state of the art branch predictors typically maintain some internal state such as a table of 2 bit saturating counters and use local history [29] (corresponding to prior outcomes of the same branch) global history [20,29] corresponding to prior outcomes of any branch) branch program counters [22] or register values [25] as input features. Hybrid predictors may incorporate two or more input feature types [9] The vast majority of ....

[Article contains additional citation context not shown here]

Tse-Yu Yeh and Yale N. Patt. Alternative implementations of two-level adaptive branch prediction. In Proceedings of the 19th Annual International Symposium on Computer Architecture, May 1992.


Branch Prediction and Simultaneous Multithreading - Hily, Seznec (1996)   (6 citations)  (Correct)

....architectures, extensive research studies have been conducted on software and hardware mechanisms. Smith [Smi81] studied several hardware schemes for predicting branch directions, Lee and Smith [LS84] illustrated the interest of combining a good history prediction with BTB, Yeh and Patt [YP92a] imagined the two level adaptive branch prediction while McFarling proposed combining branch predictors [McF93] Branch prediction strategies for superscalar architectures now achieve more than 90 accuracy. However, the effectiveness or even simply the usefulness of these prediction mechanisms ....

....prediction uses opcode, profiling statistics or branch direction and is predetermined before execution. Dynamic prediction uses run time history and predictions are not known until run time. Among branch prediction policies, one can find, by order of efficiency (as well as of complexity) LS84] YP92a] not taken, taken, BTFN (backward branches taken, forward branches not taken) compiler directed, 1 bit history, 2 bit history and two level algorithm. The first four policies, which are static schemes, have a relatively low efficiency (less than 75 for the compiler directed algorithm) The next ....

[Article contains additional citation context not shown here]

Tse-Yu Yeh and Yale N. Patt. Alternative implementations of two-level adaptive branch prediction. 19th International Symposium on Computer Architecture, pages 124--134, May 1992.


Branch Prediction Using Large Self History - John Johnson December   (Correct)

....They also report that through the use of nonuniform history retention, two bits for storing the state of a state machine branch predictor performs about as well as a five bit taken not taken history sequence. Yeh and Patt report a few results for histories of 6 to 12 bits [YP91] and 6 to 18 bits [YP92] for the case of their Two Level Adaptive Branch Prediction . To achieve high prediction rates for the looping behavior found in some programs requires a moderately large amount of self history. Assume a conditional branch is used to implement a loop that repeats for 9 times then exits on the ....

Tse-Yu Yeh and Yale N. Patt. Alternative Implementations of Two-Level Adaptive Branch Prediction. In Conference Proceedings, The 19th Annual Symposium on Computer Architecture, pages 124--134, May 1992. 26


Expansion Caches For Superscalar Machines - Johnson (1996)   (2 citations)  (Correct)

....for both branch prediction and cache implementation. Static branch prediction is used by the machines presented here. However, it is possible to improve branch prediction rates with more sophisticated hardware methods that record run time behavior of the programs [Los82] LS84] YP91] PSR92] [YP92] [YP93] YS94] YGS95] The effect of improved branch prediction accuracy on machine performance is the subject of the next section. 6.2 Cost of a Miss Predicted Branch This section investigates the performance costs of miss predicted branches. In a machine without speculative execution and a ....

Tse-Yu Yeh and Yale N. Patt. Alternative Implementations of Two-Level Adaptive Branch Prediction. In Conference Proceedings, The 19th Annual Symposium on Computer Architecture, pages 124--134, May 1992.


Improving Semi-static Branch Prediction by Code Replication - Krall (1994)   (20 citations)  (Correct)

....The second level of history information contains the different patterns of history register values combined with a two bit counter, which gives the branch prediction for this pattern. Yeh and Patt presented a strategy whith a history register for each branch and a pattern table for each branch [YN92]. Pan, So and Rahmeh presented a strategy with a single global history register and a pattern table for each branch [PSR92] Since their strategy depends on the correlation of different branches, they called it branch correlation. Later Yeh and Patt studied all nine combinations of one global ....

Tse-Yu Yeh and Yale N.Patt. Alternative implementations of two-level adaptive branch prediction. In 19th Annual International Symposium on Computer Architecture. ACM, 1992.


Compiling For Multithreaded Architectures - Tang (1999)   (1 citation)  (Correct)

....taken at runtime, the predicted branch is executed earlier than the truth of the branch condition is known. After the condition is determined, the speculative branch is either committed or squashed. The branch prediction is usually carried out with the hardware support, such as branch predicator [141, 142], branch target buffer [78, 115] The similar techniques can also be used to support thread control speculation. However, speculative execution on the thread level is generally harder than on the instruction level. 1 Latencies of some later stages are increased to deal with large register ....

Tse-Yu Yeh and Yale N. Patt. Alternative implementations of two-level adaptive branch prediction. In Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 124--134, Gold Coast, Australia, May 19--21, 1992. ACM SIGARCH and IEEE Computer Society. Computer Architecture News, 20(2), May 1992.


Design And Evaluation Of A Multiscalar Processor - Breach (1998)   (10 citations)  (Correct)

....behind this approach is that sections of code deal with related information, so control flow points dependent on a particular condition are likely to be placed near control flow points for related conditions. Exploiting the correlation between these conditions may lead to more accurate prediction [104, 106]. An illustration of this approach is given in Figure 5.6. The figure shows N target patterns assembled in a pattern register. Each target pattern provides B bits, for a total of I bits, which are XORed with I bits from the supplied address (a technique introduced by McFarling with the gshare ....

Tse-Yu Yeh and Yale N. Patt. Alternative implementations of two-level adaptive branch prediction. In Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 124--134, Gold Coast, Australia, May 19--21, 1992. ACM SIGARCH and IEEE Computer Society TCCA. Computer Architecture News, 20(2), May 1992.


Design And Evaluation Of A Multiscalar Processor - Breach (1998)   (10 citations)  (Correct)

....behind this approach is that sections of code deal with related information, so control flow points dependent on a particular condition are likely to be placed near control flow points for related conditions. Exploiting the correlation between these conditions may lead to more accurate prediction [104, 106]. An illustration of this approach is given in Figure 5.6. The figure shows N target patterns assembled in a pattern register. Each target pattern provides B bits, for a total of I bits, which are XORed with I bits from the supplied address (a technique introduced by McFarling with the gshare ....

Tse-Yu Yeh and Yale N. Patt. Alternative implementations of two-level adaptive branch prediction. In Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 124--134, Gold Coast, Australia, May 19--21, 1992. ACM SIGARCH and IEEE Computer Society TCCA. Computer Architecture News, 20(2), May 1992.


Limits of Instruction-Level Parallelism - Wall (1993)   (230 citations)  (Correct)

....is some expense in undoing speculative execution when the branch goes the other way, we might impose a threshold so that we don t move instructions across a branch that is executed only 51 of the time. Recent studies have explored more sophisticated hardware prediction using branch histories [PSR92, YP92, YP93]. These approaches maintain tables relating the recent history of the branch (or of branches in the program as a whole) to the likely next outcome of the branch. These approaches do quite poorly with small tables, but unlike the two bit counter schemes they can benefit from much larger predictors. ....

....the recent history of the branch (or of branches in the program as a whole) to the likely next outcome of the branch. These approaches do quite poorly with small tables, but unlike the two bit counter schemes they can benefit from much larger predictors. An example is the local history predictor [YP92]. It maintains a table of n bit shift registers, indexed by the branch address as above. When the branch is taken, a 1 is shifted into the table entry for that branch; otherwise a 0 is shifted in. To predict a branch, we take its n bit history and use it as an index into a table of 2 n 2 bit ....

[Article contains additional citation context not shown here]

Tse-Yu Yeh and Yale N. Patt. Alternative implementations of two-level adaptive branch prediction. Nineteenth Annual International Symposium on Computer Architecture, 124--134, May 1992. Published as Computer Architecture News 20(2).


Next Cache Line and Set Prediction - Calder (1995)   (26 citations)  (Correct)

....604 hasa 64 entry fully associative BTB that holds the target address of the most recently taken branches, and uses a separate 512 entry pattern history table (PHT) to predict the direction for conditional branches. There are several different PHT variations. Pan et al. 12] and Yeh and Patt [20, 22] investigated branch correlation or two level branch prediction mechanisms. Although there are a number of variants, these mechanisms generally combine the history of several recent branches to predict the outcome of a branch. The simplest example is the degeneratemethod of Pan et al. 12] When ....

Tse-Yu Yeh and Yale N. Patt. Alternative implementations of two-level adaptive branch predictions. In 19th Annual International Symposium of Computer Architecture, pages 124--134, Gold Coast, Australia, May 1992. ACM.


Fast Accurate Instruction Fetch and Branch Prediction - Calder, Grunwald (1994)   (8 citations)  (Correct)

....flow; Yeh Patts design includes additional information indicating whether the break is a conditional branch, unconditional jump, indirect jump or a return instruction. Each BTB entry also contains a per basic block pattern history register, used to index into a 2 level branch history table [20, 22]. Architectures using BTB s can issue a large number of instructions per cycle because of accurate branch and fetch prediction. However, BTB s lead to a complex architecture. In this paper, we show how to achieve the same or better performance using simpler techniques. We do this by: ffl ....

....and 2 bit techniques that yield much better performance for programs with loops [10, 13, 17] The advantage of the pattern history tables is that they keep track of very little information per conditional branch site and are very effective in practice. More recently Pan et al. [14] and Yeh and Patt [20, 22] have proposed branch correlation or two level branch prediction mechanisms. Although there are a number of variants, these mechanisms generally combine the history of several recent branches to predict the outcome of an incipient branch. The simplest example is the so called degenerate method of ....

[Article contains additional citation context not shown here]

Tse-Yu Yeh and Yale N. Patt. Alternative implementations of twolevel adaptive branch predictions. In 19th Annual International Symposium of Computer Architecture, pages 124--134, Gold Coast, Australia, May 1992. ACM.


Branch Prediction Architectures for 64-bit Address Space - Brad Calder (1993)   (1 citation)  (Correct)

.... flow; Yeh Patts design includes additional information indicating whether the break is a conditional branch, unconditional jump, indirect jump or a return instruction and each BTB entry contains a per basic block prediction history register, used to index into a 2 level branch history table [15, 17]. Architectures using BTB s can issue a large number of instructions per cycle because of accurate branch and fetch prediction. However, BTB s lead to a complex architecture. In this paper, we show how to achieve the same or better performance using simpler techniques. We do this using: ffl A ....

....and 2 bit techniques that yield much better performance for programs with loops [13, 8, 10] The advantage of these bit table techniques is that they keep track of very little information per conditional branch site and are very effective in practice. More recently Pan et al. [11] and Yeh and Patt [15, 17] have proposed branch correlation or two level branch prediction mechanisms. Although there are a number of variants, these mechanisms generally combine the history of several recent branches to predict the outcome of an incipient branch. In this paper, we are primarily concerned with 2 level ....

[Article contains additional citation context not shown here]

Tse-Yu Yeh and Yale N. Patt. Alternative implementations of two-level adaptive branch predictions. In 19th Annual International Symposium of Computer Architecture, pages 124--134, Gold Coast, Australia, May 1992. ACM.


Memory Sharing Predictor: The Key to a Speculative Coherent DSM - An-Chow Lai (1999)   (16 citations)  (Correct)

....capturing sharing patterns in protocol states often limits the protocol to learning one sharing pattern per memory block at a time. In a recent paper [17] Mukherjee and Hill advocate using a general pattern based predictor derived from Yeh and Patt s two level adaptive Pap branch predictor [23] to learn and predict the coherence activity for a memory block in a DSM. By accurately predicting and performing the necessary coherence operations speculatively in advance, a predictor based DSM can potentially eliminate all of the coherence overhead, resulting in remote accesses that are as ....

....predicts the read access by P1, it can invalidate and forward the block to P1 well in advance to hide the entire latency of the remote read. 2. 1 Pattern Based Message Predictors A pattern based coherence message predictor is derived from the widely used two level adaptive PAp branch predictor [23]. Figure 2 depicts the anatomy of a two level message predictor capturing message sequences for memory blocks at the directory. A history table records the most recent sequence of incoming coherence messages for every memory block. A pattern table records all observed sequences of coherence ....

Tse-Yuh Yeh and Yale Patt. Alternative implementations of two-level adaptive branch prediction. In Proceedings of the 19th Annual International Symposium on Computer Architecture, May 1992.


The Precomputed Branch Architecture - Calder, Grunwald (1999)   (Correct)

....we want to remove, branch misfetch and mispredict penalties. A branch target buffer can be used to reduce the misfetch penalty and can be used as a simple branch prediction mechanism. Other branch prediction methods can reduce mispredict penalties, but not misfetch penalties. Many architectures [33, 40, 42] combine branch target buffers and other branch prediction mechanisms to reduce both misfetch and mispredict penalties. 2.1 Branch Target Buffers Misfetch penalties can be reduced in a number of ways, such as using branch delay slots [24] a table of cache indices for fetch prediction [8] or ....

....the branch direction, even if that information was recorded for one of the other branches. The advantage of the pattern history tables is that they keep track of very little information per conditional branch site and are very effective in practice. More recently Pan et al. [28] and Yeh and Patt [40, 42] have proposed branch correlation or two level branch prediction mechanisms. Although there are a number of variants, these mechanisms generally combine the history of several recent branches to predict the outcome of an incipient branch. The simplest example is the so called degenerate method ....

Tse-Yu Yeh and Yale N. Patt. Alternative implementations of two-level adaptive branch predictions. In 19th Annual International Symposium of Computer Architecture, pages 124--134, Gold Coast, Australia, May 1992. ACM.


Reducing Indirect Function Call Overhead In C++ Programs - Calder, Grumwald (1994)   (77 citations)  (Correct)

....an indirect function call can have a large number of potential targets. We have measured programs with 191 different subroutines called from a single indirect function call. This makes branches easier to predict. Some dynamic branch prediction mechanisms achieve 95 Gamma97 prediction accuracy [21, 27, 29]. This level of accuracy is needed for super scalar processors issuing several instructions per cycle [28] The most relevant prior work was on predicting the destination of indirect function calls with hardware conducted by David Wall [26] while examining limits to instruction level parallelism. ....

Tse-Yu Yeh and Yale N. Patt. Alternative implementations of two-level adaptive branch predictions. In 19th Annual International Symposium of Computer Architecture, pages 124--134, Gold Coast, Australia, May 1992. ACM.


Incorporating Guarded Execution into Existing Instruction Sets - Pnevmatikatos (1996)   (1 citation)  (Correct)

....the code that was incorrectly executed. Dynamic branch prediction utilizes information collected at run time to decide which is the most likely direction for a branch. Generally, dynamic branch prediction is based on maintaining a table of counters that record the past behavior of branches [Smi81, YP92, YP93] and a selection mechanism (called a divisor by Young et al. YGS95] that will determine which counter to use for the prediction 2 of each branch. Because the direction of a branch is not by itself sufficient to allow the instruction fetch mechanism to commence fetching new instructions, ....

Tse-Yu Yeh and Yale N. Patt. Alternative Implementations of Two-Level Adaptive Branch Prediction. In Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 124--134, Gold Coast, Australia, May 19--21, 1992.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC