24 citations found. Retrieving documents...
B. Calder and D. Grunwald. Fast and accurate instruction fetch and branch prediction. In Proceedings of the 21st International Symposium on Computer Architecture, pages 2--11, April 1994.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Delay-Sensitive Branch Predictors for Future Technologies - Jimenez (2002)   (Correct)

....the processor state such that the correct path can be executed. Thus, branch predictors must be highly accurate to avoid mispredictions. Current techniques can achieve correct branch prediction rates of 95 [41] i.e. misprediction rates of 5 , but the high cost of recovering from mispredictions [12] remains one of the largest impediments to performance on current and future processors. Because of the large penalty of a branch misprediction, small improvements in accuracy can have a large impact on performance. As pipelines become deeper to support higher clock rates, the penalty for a ....

Brad Calder and Dirk Grunwald. Fast and accurate instruction fetch and branch prediction. In Proceedings of the 21st International Symposium on Computer Architecture, pages 2--11, April 1994.


Delay-Sensitive Branch Predictors for Future Technologies - Jiménez (2002)   (Correct)

....the processor state such that the correct path can be executed. Thus, branch predictors must be highly accurate to avoid mispredictions. Current techniques can achieve correct branch prediction rates of 95 [41] i.e. misprediction rates of 5 , but the high cost of recovering from mispredictions [12] remains one of the largest impediments to performance on current and future processors. Because of the large penalty of a branch misprediction, small improvements in accuracy can have a large impact on performance. As pipelines become deeper to support higher clock rates, the penalty for a ....

Brad Calder and Dirk Grunwald. Fast and accurate instruction fetch and branch prediction. In Proceedings of the 21st International Symposium on Computer Architecture, pages 2--11, April 1994.


Perceptron Learning for Predicting the Behavior of.. - Jiménez, Lin (2001)   (Correct)

....dynamic branch prediction, which predicts the likely direction of conditional branch instructions before the conditions have been decided. Current techniques can achieve correct branch prediction rates of 95 [1] i.e. misprediction rates of 5 , but the high cost of recovering from misprediction [2] remains one of the largest impediments to performance on current and future processors. Small improvements in accuracy can have a large impact on performance; decreasing the misprediction rate from, say, 5 to 4 can decrease the execution time of a typical program by as much as 14 , given ....

Brad Calder and Dirk Grunwald. Fast and accurate instruction fetch and branch prediction. In Proceedings of the 21st International Symposium on Computer Architecture, pages 2--11, April 1994.


Cool-Cache for Hot Multimedia - Unsal, Ashok, Koren, Krishna, Moritz (2001)   (3 citations)  (Correct)

....can be augmented with special load store instructions which would channel the scalar data to a separate cache area. The implementation is simple: encode a single additional bit in the instruction, thus marking the load store to be diverted. This is similar to the approach taken by Calder et al. [8] for marking branch instructions. 3.2. Cool Cache Architecture Our caching architecture is completely compilermanaged and is therefore able to leverage static information that is lost in traditional hardware caches. A Cool Cache architecture combines four cache control techniques: 1) fully ....

Calder B., Grunwald D., "Fast and Accurate Instruction Fetch and Branch Prediction", Proceedings of the 21th International Symposium on Computer Architecture, ISCA'94, Chicago, IL, April 1994


Microarchitectural and Compile-Time Optimizations for.. - Kalamatianos (2000)   (1 citation)  (Correct)

....employing speculative execution. The logical components forming the predictor are a Branch Identification Unit (BIU) that indicates that an indirect branch is being fetched. Most implementations assume two bits per entry to indicate whether the branch is conditional, a return or indirect [144]. We also need a PHR that records partial targets from a predetermined branch stream. To use the SFSXS mapping function, we will need to record 3 bits from each one of the last 3 targets. One possible scheme is to record the targets of all indirect branches (PIB path history) in a single PHR. The ....

B. Calder and D. Grunwald. Fast and Accurate Instruction Fetch and Branch Prediction. In Proceedings of the International Symposium on Computer Architecture, pages 2--11, April 1994.


Perceptron Learning for Predicting the Behavior of.. - Jiménez, Lin (2001)   (Correct)

....dynamic branch prediction, which predicts the likely direction of conditional branch instructions before the conditions have been decided. Current techniques can achieve correct branch prediction rates of 95 [1] i.e. misprediction rates of 5 , but the high cost of recovering from misprediction [2] remains one of the largest impediments to performance on current and future processors. Small improvements in accuracy can have a large impact on performance; decreasing the misprediction rate from, say, 5 to 4 can decrease the execution time of a typical program by as much as 14 , given ....

Brad Calder and Dirk Grunwald. Fast and accurate instruction fetch and branch prediction. In Proceedings of the 21st International Symposium on Computer Architecture, pages 2-11, April 1994.


A New Approach to Cache Management - Tyson, Farrens, Matthews, Pleszkun (1995)   (6 citations)  (Correct)

....Our goal is to develop a scheme that will provide the same performance enhancement transparently. In order to select which items should be marked C NA, we turn to the body of work on branch prediction strategies. There has been a great amount written about branch prediction strategies recently [CaGr94, FiFr92, PaS92, Smit81, YeP91, YeP92, YeP93]. Briefly, dynamic branch prediction strategies collect run time information about branch behavior to predict whether a branch will be taken in the future. Typically, these strategies associate several bits of information with a branch instruction. This information is updated each time the branch ....

B. Calder and D. Grunwald, "Fast and Accurate Instruction Fetch and Branch Prediction", Proceedings of the 21th Annual Symposium on Computer Architecture, Chicago, Illinois (April 18-21, 1994), pp. 2-11.


Partial Resolution in Branch Target Buffers - Fagin (1997)   (6 citations)  (Correct)

....branch prediction. The simulation studies in [21] use a 1,024 entry, four way set associative history table, indexed by the low order bits of the branch address. Our results suggest that such a table can be made considerably smaller with little or no loss in performance. Calder and Grunwald [1] propose the decoupling of branch determination from the BTB, using special instruction encoding and branch displacements. Their results indicate that this permits the use of a much smaller BTB, and may even justify dispensing with a BTB altogether. Our work is similar, in that reducing the size ....

B. Calder and D. Grunwald, "Fast and Accurate Instruction Fetch and Branch Prediction," Proc. 21st Ann. Int'l Symp. Computer Architecture, pp. 2-10, Apr. 1994.


A Modified Approach to Data Cache Management - Tyson, Farrens, Matthews.. (1995)   (47 citations)  (Correct)

....Our goal is to develop a scheme that will provide the same performance enhancement transparently. In order to select which items should be marked C NA, we turn to the body of work on branch prediction strategies. There has been a great amount written about branch prediction strategies recently [CaGr94, FiFr92, PaS92, Smit81, YeP91, YeP92, YeP93]. # # Briefly, dynamic branch prediction strategies collect runtime information about branch behavior to predict whether a branch will be taken in the future. Typically, these strategies associate several bits of information with a branch instruction. This information is updated each time the ....

B. Calder and D. Grunwald, "Fast and Accurate Instruction Fetch and Branch Prediction", Proceedings of the 21st Annual International Symposium on Computer Architecture, Chicago, Illinois (April 18-21, 1994), pp. 2-11.


Managing Data Caches using Selective Cache Line.. - Tyson, Farrens.. (1997)   (10 citations)  (Correct)

....Our goal is to develop a scheme that will provide the same performance enhancement transparently. In order to select which items should be marked C NA, we turn to the body of work on branch prediction strategies. There has been a great amount written about branch prediction strategies recently [CaGr94, FiFr92, PaS92, Smit81, YeP91, YeP92, YeP93]. Briefly, dynamic Table 6: Analysis of Average Memory Reference Activity (Improved Static) ############################################################################################## Instruction Number of of Memory Refs Pre Memory Refs Post Transformation Change Classification ....

B. Calder and D. Grunwald, "Fast and Accurate Instruction Fetch and Branch Prediction", Proceedings of the 21st Annual International Symposium on Computer Architecture, Chicago, Illinois (April 18-21, 1994), pp. 2-11.


The Agree Predictor: A Mechanism for Reducing.. - Sprangle, Chappell.. (1997)   (55 citations)  (Correct)

....gcc ranging from 8.62 with a 64K entry PHT up to 33.3 with a 1K entry PHT. Keywords: branch prediction, superscalar, speculative execution, two level branch prediction. 1 Introduction The link between changes in branch misprediction rate and changes in performance has been well documented [1 4, 6]. Yeh and Patt have shown that a two level branch predictor can achieve high levels of branch prediction accuracy [4] In a two level predictor, the first level generates an index into a Pattern History Table (PHT) using some function of the outcomes of preceding branches. The first level function ....

B. Calder and D. Grunwald, "Fast and Accurate Instruction Fetch and Branch Prediction", Proceedings of the 21st International Symposium on Computer Architecture, (April 1994), pp. 2-11.


Analytic Models of Workload Behavior and Pipeline Performance - Squillante, Kaeli, Sinha (1996)   (Correct)

....then use this modeling framework to study the performance impact of branch instructions on different pipeline organiza tions. Branches are chosen because they can dominate the overall performance of a processor pipeline, and thus continue to be the focus of processor design research (e.g. see [3, 6, 13]) In the interest of space, we only present here a small subset of our modeling and performance results, and we refer the interested reader to an expanded version of this paper [11] for additional background material, references, technical details, results and applications of our modeling ....

....version of the random marked point process exists, which tends not to be a problem in practice. One important area that we plan to investigate with our modeling analysis is the study of branch prediction techniques, which have received considerable attention in the research literature (e.g. see [3, 6, 7, 13]) Many current approaches to branch prediction are based on the patterns of taken and non taken branches leading up to the current branch. Our approach makes it possible to probabilistically analyze these patterns for various workloads to gain fundamental insights into the problem of branch ....

B. Calder and D. Grunwald. Fast and accurate instruction fetch and branch prediction. In Proceedings of the International Symposium on Computer Architecture, pages 2--11, May 1994.


Understanding the Branch Performance of Object Oriented.. - Radhakrishnan Tang (1998)   (Correct)

....or vtable and the appropriate function is invoked at runtime. It has been known that virtual functions and dynamic dispatch are expensive [1] 9] 17] However there have also been studies that indicate that the branch execution penalty for C programs are not worse than those for C programs [7]. The objectives of this paper are to analyze the branch behavior of object oriented programs and identify the impact of this branch behavior on execution time. The execution of C and C programs on the Sun UltraSPARC platform are profiled and analyzed. The choice of C to represent OOP is ....

....by the micro architecture and not because of the difference in code generated by the different compilers. Past research on related topics include efforts to characterize OOP instruction mix [4] reduce indirect function calls [5] optimize virtual function calls [1] improve branch prediction [6] [7], analyze the dispatch overhead of the standard virtual function table dispatch [9] etc. Most of this research was based on simulations and executable inspection whereas we also have performance results based on on chip performance monitoring counters. We make an effort to relate profiling data, ....

B. Calder and D. Grunwald, "Fast and Accurate Instruction Fetch and Branch Prediction", In 21st Annual Symposium of Computer Architecture, pp. 2-11, April 1994.


Execution Characteristics of Object Oriented Programs on.. - Radhakrishnan, John (1998)   (2 citations)  (Correct)

....and processor simulation for their study. The cost associated with dynamic dispatch was measured for the C programs on a superscalar processor. Other related work includes efforts to optimize virtual function calls [1] reduce indirect function calls [5] and improve branch prediction [6] [7]. 1 A look at the advanced program of 1998 International Symposium on Computer Architecture [14] shows that the there are six workload characterization and measurement studies on several platforms, in the first two sessions of the conference. 1.2 Overview In Section 2, we describe the ....

B. Calder and D. Grunwald, "Fast and Accurate Instruction Fetch and Branch Prediction ", In 21st Annual Symposium of Computer Architecture, pp. 2-11, April 1994.


Converting Thread-Level Parallelism to.. - Lo, Eggers, Emer, .. (1997)   (43 citations)  (Correct)

....3: Processor instruction latencies. These values are the minimum latencies from when the source operands are ready to when the result becomes ready for a dependent instruction. 10 prediction, we use a 256 entry, 4 way set associative branch target buffer and a 2K x 2 bit pattern history table [4]. These structures are shared by all running threads (even if less than 8 are executing) allowing more flexible and therefore higher utilization. Most importantly, these structures are fully available even if only a single thread is executing. Of course, the competition for the shared resources ....

B. Calder and D. Grunwald. Fast and accurate instruction fetch and branch prediction. In 21st Annual International Symposium on Computer Architecture, pages 2--11, April 1994.


Exploiting Choice: Instruction Fetch and Issue on.. - Tullsen, Eggers.. (1996)   (145 citations)  (Correct)

.... from multiple threads, we never attempt to fetch from threads that conflict (on an I cache bank) with each other, although they may conflict with other I cache activity (cache fills) Branch prediction is provided by a decoupledbranch target buffer (BTB) and pattern history table (PHT) scheme [4]. We use a 256entry BTB, organized as four way set associative. The 2K x 2 bit PHT is accessedby the XOR of the lower bits of the address and the global history register [18, 30] Return destinations are predicted with a 12 entry return stack (per context) We assume an efficient, but not perfect, ....

B. Calder and D. Grunwald. Fast and accurate instruction fetch and branch prediction. In 21st Annual International Symposium on Computer Architecture, pages 2--11, April 1994.


Hardware Optimizations Enabled by a Decoupled Fetch Architecture - Reinman (2001)   Self-citation (Calder)   (Correct)

No context found.

B. Calder and D. Grunwald. Fast and accurate instruction fetch and branch prediction. In Proceedings of the 21st International Symposium on Computer Architecture, pages 2--11, April 1994.


Optimizations Enabled by a Decoupled Front-End Architecture - Reinman, Calder, Austin (2001)   (4 citations)  Self-citation (Calder)   (Correct)

.... really points to the instruction after the potential branch ending the FTB entry (fetch block) We store only the pre computed lower bits of the fall through address in the fetch distance along with a carry bit used to calculate the rest of the fall through address in parallel with the FTB lookup [4]. This helps reduce the amount of storage 13 mis predicted branch target L1 I Cache (pipelined) Decode S R AS L1 FTB MUX next PC branch type fetch block target mis fetched branch target call return target L2 FTB Br Pred and Hist SHQ Issue Buffer Execute tag target carry fetch ....

B. Calder and D. Grunwald. Fast and accurate instruction fetch and branch prediction. In Proceedings of the 21st International Symposium on Computer Architecture, pages 2--11, April 1994.


Quantifying Behavioral Differences Between C and C++ Programs - Calder, Grunwald, Zorn (1995)   (47 citations)  Self-citation (Calder Grunwald)   (Correct)

....not even considered as necessary in C programs, are appropriate for C programs. Hardware designers and compiler writers may use our results to construct a new generation of systems that execute C programs more efficiently. We have conducted some of this research in related publications [13, 14, 15], but a detailed discussion is beyond the scope of this paper. Section 2 describes related work in the area of program behavior measurement. Section 3 describes the tools we used to collect our measurements and the programs that we measured. Section 4 includes the basic data that we gathered as ....

....calls, indirect procedure calls, and returns. These results imply that different branch prediction architectures are needed for C and C programs in order to achieve a high prediction accuracy for the different languages. The implications of these differences are considered in other publications [13, 14, 15], and a detailed discussion is beyond the scope of this paper. As described in x4.4, direct procedure calls and returns are easy to predict, while indirect procedure calls and conditional branches are more difficult to predict. The C programs tend to execute fewer of the branches that are hard ....

Brad Calder and Dirk Grunwald. Fast and accurate instruction fetch and branch prediction. In 21st Annual International Symposium of Computer Architecture, pages 2--11, April 1994.


Quantifying Behavioral Differences Between C and C++ Programs - Calder (1994)   (47 citations)  Self-citation (Calder Grunwald)   (Correct)

....found in C programs (x4.2) Larger basic blocks imply that instruction scheduling will be simpler and that the importance of conditional branch prediction will decrease, because the cost of mispredicted branches can be amortized over a larger number of instructions. In addition, in related work [4], we found that C programs tended to have more predictable conditional branches; that is, traditional branch prediction mechanisms are more effective for a similar set of C programs. In part, this occurs because C programs tend to have fewer branches, reducing the demands on extant ....

Brad Calder and Dirk Grunwald. Fast and accurate instruction fetch and branch prediction. In 21st Annual International Symposium of Computer Architecture, April 1994. (to appear).


Threaded Multiple Path Execution - Wallace, Calder, Tullsen (1998)   (34 citations)  Self-citation (Calder)   (Correct)

....are carefully modeled at all levels of the memory hierarchy. Conflict free miss penalties are 6 cycles to the L2 cache, another 12 cycles to the L3 cache, and another 62 cycles to memory. Branch prediction is provided by a decoupled branch target buffer (BTB) and pattern history table (PHT) scheme [1]. We use a 256 entry, four way set associative BTB. The 2K x 2 bit PHT is accessed by the XOR of the lower bits of the address and the global history register [5, 13] Return destinations are predicted with a 12 entry return stack (per context) The instruction fetch mechanism we assume is the ....

B. Calder and D. Grunwald. Fast and accurate instruction fetch and branch prediction. In 21st Annual International Symposium on Computer Architecture, pages 2--11, April 1994.


A Scalable Front-End Architecture for Fast Instruction.. - Reinman, Austin, Calder (1999)   (16 citations)  Self-citation (Calder)   (Correct)

....architecture we model in this paper is an extension of the BBTB design by Yeh and Patt [30, 31] with two changes to their design. The first change is that we do not store basic blocks in our fetch target buffer that are fall through basic blocks or basic blocks with branches that are seldom taken [6]. The BBTB design stores an entry for all basic blocks. Storing non taken basic blocks wastes BBTB entries, and decreases the size of fetch blocks, which requires additional predictions to traverse what could have been one larger fetch block. The second change we made to the BBTB design is that we ....

....been one larger fetch block. The second change we made to the BBTB design is that we do not store the full fall through address in our FTB. Instead, we store only the pre computed lower bits of the fallthrough address along with a carry bit used to calculate the rest of the fall through address [6]. This helps reduce the amount of storage for each BBTB entry, since the typical distance between the current fetch address and the BBTB s fall through address is not large. Our Fetch Target Buffer (FTB) design is shown in Figure 2. The FTB table is accessed with the start address of a fetch ....

B. Calder and D. Grunwald. Fast and accurate instruction fetch and branch prediction. In Proceedings of the 21st International Symposium on Computer Architecture, pages 2--11, April 1994.


Instruction Recycling on a Multiple-Path Processor - Wallace, Tullsen, Calder (1999)   (5 citations)  Self-citation (Calder)   (Correct)

....carefully modeled at all levels of the memory hierarchy. Conflict free miss penalties are 6 cycles to the L2 cache, an5 other 12 cycles to the L3 cache, and another 62 cycles to memory. Branch prediction is provided by a decoupled branch target buffer (BTB) and pattern history table (PHT) scheme [2]. We use a 256 entry BTB, organized as four way set associative. The 2K x 2 bit PHT is accessed by the XOR of the lower bits of the address and the global history register [9, 20] Return destinations are predicted with a 12 entry return stack (per context) The assumed processor has a 9 stage ....

B. Calder and D. Grunwald. Fast and accurate instruction fetch and branch prediction. In 21st Annual International Symposium on Computer Architecture, pages 2--11, April 1994.


Exploiting Thread-Level Parallelism On . . . - Lo (1998)   (Correct)

No context found.

B. Calder and D. Grunwald. Fast and accurate instruction fetch and branch prediction. In 21st Annual International Symposium on Computer Architecture, pages 2-- 11, April 1994.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC