| C. Perleberg and A. Smith, "Branch Target Buffer Design and Optimization", IEEE Transactions on Computers, 42(4): pages 396-412, 1993. |
....To exploit instruction level parallelism, modern processors support out of order execution engines. This requires instruction fetch across basic block boundaries to keep the execution units busy. Many instructions will be fetched and issued before a branch target address can be resolved. A BTB [16] is used for branch target address prediction. There are two flavors of BTB design, coupled and decoupled where the tradeoff in performance and energy is fixed at the design time. The coupled BTB design is used in Alpha 21264 [12] and UltraSparc II[1] where the BTB is integrated with the ....
C. Perleberg and A. Smith. Branch target buffer design and optimization. IEEE Trans. Computers, 42(4):396--412, 1993.
....performance. By introducing OS aware prediction on the Agree predictor, up to 7 of IPC improvement can be achieved. The performance of Agree predictor is largely dependent on branch biases and possibility of identifying the biased behavior the first time the branch is introduced into the BTB [18]. If the branch does not show strongly biased behavior, there is still frequent aliasing between instances of a branch that do not comply with the biasing bit and instances which do comply with the biasing bit. Once we incorporate OS aware predictions into the Agree predictor, the filtering out of ....
C. Perleberg and A. Smith, Branch Target Buffer Design and Optimization, IEEE Transactions on Computers, 42(4): pages 396-412, 1993.
....branch and fetch prediction for wide issue architectures. A BTB entry holds the taken target address for a branch along with other information, such as the type of the branch, conditional branch prediction information, and possibly the fall through address of the branch. Perleberg and Smith [26] conducted a detailed study into BTB design for single issue processors. They even looked at using a multi level BTB design, where each level contains different amounts of prediction information. Because of the cycle time, area costs, and branch miss penalties they were considering at the time of ....
....multi level BTB design, where each level contains different amounts of prediction information. Because of the cycle time, area costs, and branch miss penalties they were considering at the time of their study, they found that the additional complexity of the multi level BTB is not cost effective [26]. Technology has changed since then, and as we show in this paper, a multi level branch prediction design is advantageous. Yeh and Patt proposed using a Basic Block Target Buffer (BBTB) 42, 43] The BBTB is indexed by the starting address of the basic block. Each entry contains a tag, type ....
C.H. Perleberg and A.J. Smith. Branch target buffer design and optimization. IEEE Transactions on Computers, 42(4):396--412, 1993.
....The iteration counter is used to (1) predict when the loop has terminated, and (2) to initialize the loop trip count. The loop termination prediction information described above could be stored directly into a Branch Target Buffer used for predicting target addresses during branch prediction [9, 15]. Instead of doing this, in our study we examine having a small associative buffer on the side to accurately predict the loop termination for loop branches. 2.2 Loop Termination Buffer The Loop Termination Buffer (LTB) is a small hardware structure capable of detecting the impending termination ....
C.H. Perleberg and A.J. Smith. Branch target buffer design and optimization. IEEE Transactions on Computers, 42(4):396--412, 1993.
....the CPU should predict in advance whether or not the branch is taken and what the target address will be in order to preserve a steady flow through the pipeline. However, the execution path of a branch 59 cannot be easily resolved in advance. Thus, branches typically cause delays in the pipeline [30, 23, 9, 15]. A Branch Target Buffer (BTB) can reduce these pipeline disruptions by predicting the path of the branch and caching information used by the branch. Various pieces of information can be kept in the BTB, including tags associated with the branch address, the branch target address, and branch ....
....Buffer (BTB) can reduce these pipeline disruptions by predicting the path of the branch and caching information used by the branch. Various pieces of information can be kept in the BTB, including tags associated with the branch address, the branch target address, and branch prediction information [23]. However, it has been reported that BTB based prediction schemes perform poorly for indirect jumps, since the target of an indirect jump can change with every dynamic instance of that branch [9, 30] In fact, some compilers provide techniques that insert extra conditional branches that check for ....
[Article contains additional citation context not shown here]
C. H. Perleberg and A. J. Smith. Branch target buffer design and optimization. IEEE Transactions on Computers, 42(4):396--412, April 1993.
....execution. Only branch instructions present a similarly substantial source of over 3 head in modern microprocessors. While extensive research has been performed to alleviate the performance impact of branches (for example by using sophisticated branch predictors [LCM97] branch target buffers [PeSm93], and return address stacks [KaEm91] relatively little has been done to address the load latency problem. As opposed to load instructions, latency is not an issue with store instructions because their (slow) memory access takes place after the execution of the store, i.e. the CPU can proceed ....
C. H. Perleberg, A. J. Smith. "Branch Target Buffer Design and Optimization ". IEEE Transactions on Computers, 42(4):396-412. April 1993. 160
....the probability that the functional units can be fully utilized. Branch prediction mechanisms are one means that can be used to increase the window size. However, they are not perfect, so after only a few branches the probability that the correct branch is being followed is too low to be useful [Perleberg and Smith, 1993]. An alternative is to extract coarse grained parallelism by executing multiple blocks of instructions in parallel. This requires some way of ensuring that data dependencies between blocks are preserved. We examine a particular architecture, the WarpEngine [Cleary et al. 1995] The facilities ....
Perleberg, C. H. and Smith, A. J. (1993). Branch target buffer design and optimization. IEEE Transactions on Computers, 42(4):396--412.
....We explore two options. The first one records the latency tolerance of ib loads, and the second, the temporal locality of indirect branches. We also demonstrate one way of combining these two techniques. 5. 1 Related Work Past work on indirect branch prediction originally can be tracked back to [58, 125, 126], where the foundations behind the Branch Target Buffer (BTB) design were established. A BTB is a cache like structure which stores targets of previously seen branches. In the context of value prediction, the simplest implementation of a BTB acts like a last value predictor. It can replace logic ....
C.H. Perleberg and A.J. Smith. Branch Target Buffer Design and Optimization. IEEE Transactions on Computers, 42(4):396-- 412, April 1993.
....Section 5 includes experimental results. Finally Section 6 suggests directions for future work and concludes the paper. 2 Related Work on Indirect Branch Prediction The Branch Target Buffer (BTB) is the first predictor that could be used to provide the next target for indirect branches [11, 12, 13]. In its simplest form, a BTB is a last value predictor; it uses the most recently seen target to predict the next target for a branch. The indexing function is usually formed by taking the lower order bits of the branch address. A n bit wrap around counter normally controls replacement in a ....
....target is replaced on a single misprediction. The 2 bit counter provides hysteresis allowing a target address to be replaced only after two consecutive mispredictions (we will refer to this BTB variation as BTB2b) 4 More complex replacement strategies have been proposed for set associative BTBs [13]. In [15] Kaeli and Emma describe a mechanism which accurately predicts the targets of subroutine returns. The mechanism is called a Call Return Stack (RAS) and targets the prediction of subroutine returns by using the inherent correlation between procedure calls and returns to pair up the ....
C.H. Perleberg and A.J. Smith. Branch Target Buffer Design and Optimization. IEEE Transactions on Computers, 42(4):396--412, April 1993.
....Log Number 105200. Fig. 1. A branch target buffer. Fig. 2. Partial resolution in a direct mapped memory. IEEE TRANSACTIONS ON COMPUTERS, VOL. 46, NO. 10, OCTOBER 1997 1143 consideration of partial resolution is given. Perleberg and Smith present a detailed analysis of BTB design trade offs in [14]. Many features are examined, including the presence or absence of the complete branch tag. Including only portions of the tag is not discussed. Yeh and Patt have proposed a taxonomy of two level branch prediction techniques, and have presented detailed performance studies [20] 21] Their ....
....more tag bits might be necessary to account for the associativity of the AM29000 BTB, it seems clear that the vast majority of tag bits in the buffer are superfluous. Similarly, the Edgecore Edge 2000 processor is equipped with a 1,024 entry direct mapped BTB which stores the complete branch tag [14]. Our results indicate that, at most, one tag bit is required to achieve 99.99 percent of the prediction accuracy possible with this configuration. With a 1,024 entry buffer and 32 bit addressing, the complete branch tag is 22 bits. This indicates that the total amount of tag storage can be ....
C. Perleberg and A. Smith, "Branch Target Buffer Design and Optimization," IEEE Trans. Computers, vol. 42, no. 4, pp. 396-412, Apr. 1993.
....should predict in advance whether or not the branch is taken and what the target address will be in order to preserve a steady flow through the pipeline. However, the execution path of a branch cannot be easily resolved in advance. Thus, branches typically cause delays in the pipeline [Wall 1991; Perleberg and Smith 1993; Chang et al. 1997; Hennessy and Patterson 1996] A Branch Target Buffer (BTB) can reduce these pipeline disruptions by predicting the path of the branch and caching information used by the branch. Various pieces of information can be kept in the BTB, including tags associated with the branch ....
....Buffer (BTB) can reduce these pipeline disruptions by predicting the path of the branch and caching information used by the branch. Various pieces of information can be kept in the BTB, including tags associated with the branch address, the branch target address, and branch prediction information [Perleberg and Smith 1993]. However, it has been reported that BTB based prediction schemes perform poorly for indirect jumps, since the target of an indirect jump can change with every dynamic instance of that branch [Chang et al. 1997; Wall 1991] 28 Delta Gang Ryung Uh and David Whalley . gettimeofday( before, ....
[Article contains additional citation context not shown here]
Perleberg, C. H. and Smith, A. J. 1993. Branch target buffer design and optimization. IEEE Transactions on Computers 42, 4 (April), 396--412.
....in x8. 2 Prior Branch and Fetch Prediction Research This section briefly surveys prior work on branch prediction techniques used in this paper. Branch target buffers (BTB) have been used as a mechanism for branch and instruction fetch prediction, effectively predicting the behavior of a branch [1, 7, 10, 13, 15, 21]. The Intel Pentium is an example of a modern architecture using BTBs it has a 256 entry BTB organized as a four way associative cache. Only branches that are taken are entered into the BTB. If a branch address appears in the BTB and the branch is predicted as taken, the stored address is ....
....stack for return instructions. scheme of Pan et al. [12] where we XOR the global history register with the program counter and use this to index into a 4096 entry (1KByte) PHT. In this model, we store only taken branches in the BTB, since previous studies have shown this to be more effective [2, 13]. If a branch is not taken while it is in the BTB, we leave the branch (target address) in the BTB until it is removed due to the LRU replacement policy, since we might need the taken target address again in the near future. In this architecture, the BTB s main purpose is to provide the taken ....
Chris Perleberg and Alan Jay Smith. Branch target buffer design and optimization. IEEE Transactions on Computers, 42(4):396--412, April 1993.
....i.e. the value of PC does not change, while the LA PC can still move ahead and generate new requests (recall the role of the ORL) As shown in Figure 3.4, the LA PC is maintained with the help of a branch prediction mechanism BPT. BPT designs have been thoroughly investigated [Lee Smith 84, Perleberg Smith 89] and we will not repeat these studies here. In our experiments we use the Branch Target Buffer (BTB) with two bit state transition design described in [Lee Smith 84] and we assume that the BTB has been implemented in the core processor for other purposes. As the LA distance increases, the data ....
Perleberg, C. H. and Smith, A. J. (1989). Branch target buffer design and optimization. Technical Report UCB/CSD 89/552, University of California, Berkeley. 127
....considered in the context of speculation on branch instructions. This can be done a number of ways. For example a prediction can be made as to the direction a branch will take and the CPU follows this direction. It has been shown that these mechanisms can achieve up to 93 prediction accuracy [16]. However, after only a few branches the probability that the correct path is being followed is too low to be useful. A more aggressive approach is to follow both paths through a branch but again this can achieve only a small amount of parallelism because of the exponential explosion in the number ....
....folding and branch target buffers. They vary in the way in which they predict a branch and whether or not they hold branch target information. Also they vary in the location in the instruction pipeline in which they take effect. These forms of dynamic branch prediction are summarised in [13] and [16]. Speculation can be performed on both the data and on the control structure of a program. It has been shown in [14] that data value prediction, by making a guess as to the value to be returned by a load, would give performance gains of 4.5 23 if it were incorporated into a PowerPC 620. This ....
Perleberg, C. H. and Smith, A. J., "Branch Target Buffer Design and Optimization", IEEE Transactions on Computers, Vol 42, No. 4, pp 396---412, Apr. 1993
No context found.
C. Perleberg and A. Smith, "Branch Target Buffer Design and Optimization", IEEE Transactions on Computers, 42(4): pages 396-412, 1993.
No context found.
C. Perleberg and A. J. Smith, "Branch Target Buffer Design and Optimization," IEEE Transactions on Computers, April, 1993, P396-412.
No context found.
C. Perleberg and A. J. Smith, "Branch Target Buffer Design and Optimization," IEEE Transactions on Computers, April, 1993, P396-412.
No context found.
C. Perleberg and A. J. Smith, "Branch Target Buffer Design and Optimization," IEEE Transactions on Computers, April, 1993, P396-412.
No context found.
C. Perleberg and A. J. Smith, "Branch Target Buffer Design and Optimization," IEEE Transactions on Computers, April, 1993, P396-412.
No context found.
C. Perleberg and A. Smith. Branch target buffer design and optimization. IEEE Transactions on Computers, 42(4), 1993.
No context found.
C. Perleberg and A. Smith, "Branch Target Buffer Design and Optimization", IEEE Transactions on Computers, 42(4): pages 396-412, 1993.
No context found.
C. Perleberg and A. Smith, "Branch Target Buffer Design and Optimization", In IEEE Transactions on Computers, 42(4): pages 396-412, 1993.
No context found.
C.H. Perleberg and A.J. Smith. Branch target buffer design and optimization. IEEE Transactions on Computers, 42(4):396--412, 1993.
No context found.
C. Perleberg and A. Smith, Branch Target Buffer Design and Optimization, IEEE Transactions on Computers, 42(4): 396-412, 1993
No context found.
Chris Perleberg and AlanJay Smith. Branch Target Buffer Design and Optimization. IEEE Transactions on Computing, 42(4):396--412, April 1993.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC