| Nicolas Gloy, Cliff Young, J. Bradley Chen, Michael D. Smith, "An Analysis of Dynamic Branch Prediction Schemes on System Workloads", Proc. 23rd Annual ISCA, 1996 |
....of the translator, and is analogous to the hardware branch speculation policy of modern processors. Because the translator is implemented in software, its policies can be aggressively tailored for a given application. While the literature on hardware branch prediction is large [YP92, MEP96, GYCS96] we believe that SEA could open up a new trade o space for branch prediction algorithms based on information collected by SEA software. These new branch prediction schemes could either replace hardware prediction, or work with hardware prediction schemes. Like other SEA functions, software ....
Nicolas Gloy, Cli Young, J. Bradley Chen, and Michael D. Smith. An analysis of dynamic branch prediction schemes on system workloads. In Proc. 23th Annual Symposium on Computer Architecture, pages 12-22, May 1996.
.... The first study that showed dynamic characteristics of loops was presented by Kobayashi [14] A more recent study presented a thorough examination of the dynamic characteristics of loops [5] Since then, branch prediction has drawn a lot of attention and several mechanisms have been proposed [2, 6, 7, 9, 11, 12, 13]. Some alternative mechanisms include the branch address cache [28] trace cache [21] and control flow prediction with a tree like predictor [8] Seznec et al. have proposed predicting several branches in the same cycle to produce larger traces of instructions using multiple block ahead ....
N. Gloy, C. Young, J. B. Chen, and M. D. Smith. An analysis of dynamic branch prediction schemes on system workloads. In Proc. of the 23rd Annual Intl. Symp. on Computer Architecture, pages 12-- 21, 1996.
.... (or excluding) the operating system on SMT, even for a multiprogrammed workload of SPECInt benchmarks While we expect OS usage in SPECInt to be low, previous studies have shown that ignoring kernel code, even in such low OS environments, can lead to a poor estimation of memory system behavior [29, 1]. Second, how does the impact of OS code on an eight context SMT compare with that of an out of order superscalar SMT is unique in that it executes kernel mode and user mode instructions simultaneously. That is, in a single cycle, instructions from multiple kernel routines can execute along with ....
....in several hardware data structures when simulating both SPECInt95 and the operating system on an SMT. The total miss results mirror what other researchers have found in single threaded processor studies, namely, that the operating system exhibits poorer performance than SPECInt like applications [1, 29]. The kernel miss rate in the branch target buffer is particularly high, because of two factors: the OS executes so infrequently that it cannot build up a persistent branch target state, and most kernel misses (78 ) displace other kernel entries or are mispredictions due to repeated changes in the ....
[Article contains additional citation context not shown here]
GLOY, N., YOUNG, C., CHEN, J. B., AND SMITH, M. D. An analysis of dynamic branch prediction schemes on system workloads. In Proceedings of the International Symposium on Computer Architecture (May 1996).
....to accurately predict the control (branch) flow in the program, so that we can execute more useful instructions and avoid stalling squashing the pipeline. Branch predictors for control flow prediction have been studied extensively with different programs [29] 31] 23] 15] and also with OS effects [8]. The OS affects control flow predictability by introducing the additional user OS branch aliasing in branch predictor tables. The negative impact of kernel branches on branch prediction has been reported in [8] We also find that kernel code nearly doubles the misprediction rates in 7 out of 13 ....
....studied extensively with different programs [29] 31] 23] 15] and also with OS effects [8] The OS affects control flow predictability by introducing the additional user OS branch aliasing in branch predictor tables. The negative impact of kernel branches on branch prediction has been reported in [8]. We also find that kernel code nearly doubles the misprediction rates in 7 out of 13 of our benchmarks in a Gshare predictor (Figure 1) Branch aliasing characterization shows that user OS aliasing contributes to up to 24 of all misprediction and 46 of aliasing Permission to make digital or ....
N. Gloy, C. Young, J. B. Chen and M. D. Smith, An Analysis of Dynamic Branch Prediction Schemes on System Workloads, In Proceedings of the 23rd Annual International Symposium on Computer Architecture, pages 12-21, 1996.
....the control (branch) flow in the program, so that we can execute more useful instructions and avoid stalling squashing the pipeline. Branch predictors for control flow prediction have been studied extensively with different programs [Yeh91, Young95, Sech96 and Mich97] and also with OS effects [Gloy96]. The OS affects control flow predictability by introducing the additional user OS branch aliasing in branch predictor tables. The negative impact of kernel branches on branch prediction has been reported in [Gloy96] We also find that user OS execution can significantly increase the ....
....different programs [Yeh91, Young95, Sech96 and Mich97] and also with OS effects [Gloy96] The OS affects control flow predictability by introducing the additional user OS branch aliasing in branch predictor tables. The negative impact of kernel branches on branch prediction has been reported in [Gloy96]. We also find that user OS execution can significantly increase the mispredictions in each part (Figure 1) For example, as shown in Figure 1a, kernel code nearly doubles the misprediction rates in 7 out of 13 of our benchmarks in a Gshare predictor. 0 2 4 6 8 10 12 db(16k) db(64k) ....
N. Gloy, C. Young, J. B. Chen and M. D. Smith, An Analysis of Dynamic Branch Prediction Schemes on System Workloads, In Proceedings of the 23rd Annual International Symposium on Computer Architecture, pages 12-21, 1996
....NT Symposium, USENIX Association, August 1998. 1. Introduction 1.1 Background Profile based optimization is predicated on the assumption that profiles can be obtained that accurately depict how users run the application. There has been only limited research on the viability of this assumption [FF92, GY96]. We address this problem by examining different users usage patterns of interactive applications on Windows NT. Here the term usage pattern refers to the way a particular individual uses the code in a particular program. A common assumption in profile based optimization is that people use ....
....prediction. Fisher and Freudenberger [FF92] examined the accuracy of predicting conditional branch directions from previous runs of a program. Their experiments focused on batch computation programs from SPEC benchmark suit, and used subjectively selected datasets to generate profiles. Gloy et al. [GY96] compared user only traces and full system traces for dynamic branch prediction. They used standard traces as well as traces from instrumented runs of selected programs. Our profile analysis is aimed for optimization in general, and our profiles were collected from users unscripted usage of ....
N. Gloy, C. Young, J. B. Chen, and M. D. Smith, "An Analysis of Dynamic Branch Prediction Schemes on System Workloads." In Proceedings of the 23rd Annual International Symposium on Computer Architecture, ACM, pages 12-21, May 1996.
....2 bit update policy is found to perform the best. This result is shown only for the interpreter mode and small instruction footprint benchmarks (richards and deltablue) and does not include kernel code. Although there have been efforts to study branch prediction performance using system workloads [Gloy96 and Sech96] in the past, little work has been done on hardware optimizations. Gloy, Young and Smith [Gloy96] analyze ATOM generated system traces from the Instruction Benchmark Suite (IBS) and find that user only traces yield fidelity when the kernel accounts for less than 5 of the total executed ....
....instruction footprint benchmarks (richards and deltablue) and does not include kernel code. Although there have been efforts to study branch prediction performance using system workloads [Gloy96 and Sech96] in the past, little work has been done on hardware optimizations. Gloy, Young and Smith [Gloy96] analyze ATOM generated system traces from the Instruction Benchmark Suite (IBS) and find that user only traces yield fidelity when the kernel accounts for less than 5 of the total executed instructions. Their simulation results show that including kernel branches in the branch trace worsens the ....
N. Gloy, C. Young, J. B. Chen and M. D. Smith, An Analysis of Dynamic Branch Prediction Schemes on System Workloads, In Proceedings of the 23rd Annual International Symposium on Computer Architecture, pages 12-21, 1996
....to more sophisticated two level adaptive schemes which exploit patterns in the recent global (GAg, GAs and Gshare) or local (SAg and SAs) branch history, have been shown to be successful at predicting user level branches. Branch prediction schemes are represented by name.size (as illustrated in [Gloy96]) where name falls into the taxonomy proposed by Yeh and Patt [Yeh93] and size is number of 2bit counter entries in the Branch History Table (BHT) The two level adaptive schemes use Branch History Shift Registers (BHSRs) to record the recent branch history: GAg, GAs and Gshare exploit single ....
....feature is exploited by a BTB rehashing mechanism, which will be discussed in Section 5. As kernel branch execution forms a significant portion of the overall branches, there is an increase in pressure and competition on the underlying branch prediction resource. The study performed by Gloy et al. [Gloy96] suggests that the prediction accuracies generated by the current implementations of dynamic prediction schemes are negatively affected by problems of aliasing. Aliasing occurs when different branch sites are assigned to the same entry of prediction hardware structures such as BHT, BHSRs and BTB. ....
[Article contains additional citation context not shown here]
N. Gloy, C. Young, J. B. Chen and M. D. Smith, An Analysis of Dynamic Branch Prediction Schemes on System Workloads, In Proceedings of the 23rd Annual International Symposium on Computer Architecture, pages 12-21, 1996
....have used very short time slices. Our work simulates a range of realistic time slices for a broad range of predictor configurations and predictor areas. We focus on benchmarks that are primarily CPU bound and our results are directly useful to researchers using the SPEC benchmarks. Gloy, et al. [7] analyzed dynamic branch prediction schemes on system workloads. They quantify the effects of kernel and user interactions on branch prediction accuracy. They argue that modeling context switching by flushing the branch predictor structures after each context switch is unrealistic because ....
N. Gloy, C. Young, J. B. Chen, and M. D. Smith. An analysis of dynamic branch prediction schemes on system workloads. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, pages 12--21, May 1996.
....have used very short time slices. Our work simulates a range of realistic time slices for a broad range of predictor configurations and predictor areas. We focus on benchmarks that are primarily CPUbound and our results are directly useful to researchers using the SPEC benchmarks. Gloy, et al. [7] analyzed dynamic branch prediction schemes on system workloads. They quantify the effects of kernel and user interactions on branch prediction accuracy. They argue that modeling context switching by flushing the branch predictor structures after each context switch is unrealistic because ....
N. Gloy, C. Young, J. B. Chen, and M. D. Smith. An analysis of dynamic branch prediction schemes on system workloads. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, pages 12--21, May 1996.
....the patterns exhibited during multiple visits to a loop body. We present the design of a table that records path based loop execution history and allows us to predict multiple loop iterations dynamically. 1 Introduction Branch prediction has been studied extensively over the past 20 years [2, 4, 8, 11, 14, 18]. The goal of a dynamic branch predictor is to predict the address of the next instruction to be fetched. When the prediction is correct, the target instruction stream can be fetched prior to the resolution of the branch; when the prediction is wrong, a penalty will be imposed to squash the ....
N. Gloy, C. Young, J. B. Chen, and M. D. Smith. An analysis of dynamic branch prediction schemes on system workloads. In Proc. 23rd Annual Intl. Symp. on Computer Architecture, pages 12--21, 1996.
....taken from that part of the program, the performance measurement of branch prediction strategies is inaccurate. Using full traces does have a negative e ect. The two level adaptive branch predictors all need time to learn a pattern. The in uence of this warm up period is less for long traces. In [GYCS96], the authors advise against the use of single traces for this type of research. It is true that the actual workload of a CPU can consist of many processes and in multi threaded architectures context switches can be frequent. Here we are more interested in the patterns of branch outcomes in an ....
....a bit cheaper then others. It also exhibits a low training interval. 54 In [CP99] Antoine Colin and Isabelle Puaut propose a tool to analyze Execution time in relation to branch prediction. They also write on BTB construction. They conclude that the tool works. 9. 3 On building traces In [GYCS96], Nicolas Gloy, Cli Young, J. Bradley Chen and Michael D. Smith evaluate branch prediction strategies (2bc, GAs, PAs, gshare) using traces with a mix of kernel activity and benchmarks. They conclude that mixed activity gives di erent results as opposed to user activity only. Their advise is to ....
Nicolas Gloy, Cli Young, J. Bradley Chen, and Michael D. Smith, An analysis of dynamic branch prediction schemes on system workload, The 23rd Annual International Symposium on Computer Architecture, May 1996.
....behavior of x86 applications including operating system activity and context switching has not been adequately studied. Architects and system designers have realized the importance of including operating system activity in traces and there has been significant progress in this direction recently [6, 2, 7, 8], however there is a void in literature on the execution behavior of x86 code with operating system and multitasking. This work examines the characteristics of a widely used but not so widely understood application base. The chapter is organized as follows. First, in Section 2, we provide some ....
....[18] They did not have access to traces with actual context switch information, however, simulated the effect of context switches by flushing branch prediction structures periodically. They also proposed the use of hybrid branch predictors to improve the accuracy of the predictions. Gloy et al.# [7] investigated the impact of operating system code and context switches on branch predictors. Using full system traces from the ATOM tracing environment, they concluded that simulations with user only trace is useful only if an application has less than 5 operating systems activity. It was also ....
N. Gloy, C. Young, J. Bradley, and M. D. Smith, "An analysis of dynamic branch prediction schemes on system workloads," in Proceedings of the 23rd International Symposium on Computer Architecture (ISCA), pp. 12-- 21, May 1996.
....is shown only for the interpreting mode and small instruction footprint benchmarks (richards and deltablue) and does not include kernel code. So far, the indirect branch characterization of Java processing, including OS and with different JVM implementations, is not well understood. Gloy et al. [7] analyze system traces from the Instruction Benchmark Suite (IBS) and find that user only traces yield fidelity when the kernel accounts for less than 5 of the total executed instructions. Their simulation results show that including kernel branches in the branch trace worsens the effects of ....
.... 2 bit saturating counter table (2bc) indexed by branch instruction address to more sophisticated two level adaptive schemes which exploit patterns in the recent global (GAg, GAs and Gshare) or local (SAg and SAs) branch history, have been shown to be successful at predicting user level branches [24, 7]. Table 4. Branch Predictor Configurations Branch (PC) bits used for Scheme size (i=1. 6) BHSR selection BHT index BHSR bits used for BHT index (path length) Total Size of scheme (# of BHT entries) 2bc.2 i K0i 10 02 i K GAg.2 i K0 0 i 102 i K GAs.2 i K0 i 6 42 i ....
[Article contains additional citation context not shown here]
N. Gloy, C. Young, J. B. Chen and M. D. Smith, An Analysis of Dynamic Branch Prediction Schemes on System Workloads, In Proceedings of the 23rd Annual International Symposium on Computer Architecture, pages 12-21, 1996.
.... (or excluding) the operating system on SMT, even for a multiprogrammed workload of SPECInt benchmarks While we expect OS usage in SPECInt to be low, previous studies have shown that ignoring kernel code, even in such low OS environments, can lead to a poor estimation of memory system behavior [13, 1]. Second, how does the impact of OS code on an 8 context SMT compare with that of an out oforder superscalar SMT is unique in that it executes kernel mode and user mode instructions simultaneously. That is, in a single cycle, instructions from multiple kernel routines can execute along with ....
....in several hardware data structures when simulating both SPECInt95 and the operating system on an SMT. The total miss results mirror what other researchers have found in single threaded processor studies, namely, that the operating system exhibits poorer performance than SPECInt like applications [13, 1]. The kernel miss rate in the branch target buffer is particularly high, because of two factors: the OS executes so infrequently that it cannot build up a persistent branch target state, and most kernel misses (78 ) displace other kernel entries or are mispredictions due to repeated changes in the ....
[Article contains additional citation context not shown here]
N. Gloy, C. Young, J. Chen, and M. Smith. An analysis of dynamic branch prediction schemes on system workloads. In 23nd Annual International Symposium on Computer Architecture, May 1996.
.... a good prediction, which would have been wrong otherwise) 21] Young et al. have shown that constructive aliasing is much less likely than destructive aliasing [21] Recent studies have shown that large or multi process workloads with a strong OS component exhibit very high degrees of aliasing [11, 5], and require much larger predictor tables than previously thought necessary to achieve a level of accuracy close to an ideal, unaliased predictor table [11] We therefore expect that new techniques for removing conflict aliasing could provide important gains towards increased branch prediction ....
....were traced using a hardware monitor connected to a MIPS based DECstation running Ultrix 3.1. The resulting traces include activity from all user level processes as well as the operating system kernel, and have been determined by other researchers to be a good test of branch prediction performance [5, 11]. Conditional branch counts 2 derived from these traces are given in Table 1. Although we simulated the sdet and video play benchmarks, they exhibited no special behavior compared with the other benchmarks. We therefore omit sdet and video play results from this paper in the interest of saving ....
[Article contains additional citation context not shown here]
N. Gloy, C. Young, B. Chen, and M.D. Smith. An analysis of dynamic branch prediction schemes on system workloads. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, May 1996.
.... would have been wrong otherwise) 13] Young et al. have shown that constructive aliasing is rare and much smaller in magnitude than destructive aliasing [13] Moreover, recent studies have shown that large or multi process workloads with a strong OS component exhibit very high degrees of aliasing [6, 2], and require much larger predictor tables than previously thought necessary to achieve a level of accuracy close to an ideal, unaliased predictor table [6] We therefore expect that new techniques for removing aliasing effects could provide important gains towards increased branch prediction ....
....are now in a position to evaluate it. Our analysis includes three components: a description of our experimental setup and the design space explored, an analytical model, and a simulation model. 4. 1 Experimental Setup and Design Space Benchmark Characterization It has been noted in recent studies [2, 6] that the use of the SPEC benchmark suite, and particularly the use of only user level instructions to evaluate the performance of branch prediction schemes can lead to false conclusions. The resulting simulations often underestimate the predictor table sizes required for good performance on ....
[Article contains additional citation context not shown here]
N. Gloy, C. Young, B. Chen, and M.D. Smith. An analysis of dynamic branch prediction schemes on system workloads. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, May 1996.
....set of portable PARallel MACroS defined at the Argone National Laboratory. One should note that our traces are user only and that larger branch predictor structure should probably be needed to achieve the same prediction rate when a complete program including user and kernel activities is traced [GYCS96, SLM96] In addition, we developed a fully configurable simulator, which reads instruction streams generated by Spy and integrate feedback control necessary for handling parallelism. It can model superscalar execution pipelines, dynamic scheduling, memory hierarchies and branch prediction as ....
Nicolas Gloy, Cliff Young, J. Bradley Chen, and Michael D. Smith. An analysis of dynamic branch prediction schemes on system workloads. In 23rd International Symposium on Computer Architecture, pages 12--21, May 1996.
....consider what must undoubtedly be today s leading server application: the web server. Web servers have been shown to spend over 85 of their CPU cycles running operating system code [4] in contrast, the near ubiquitous SPEC benchmarks execute less than 9 of their instructions in the OS kernel [7]. The figures in Table 1 show that the web server is not unique: the other typical multimedia, commercial, and GUI workloads listed similarly spend between 20 and 90 of their instructions in the kerBenchmark User Kernel Kernel Dynamic Instruction Counts (in millions) SPEC92 compress 70.9 ....
....87.0 Wish 492.5 138.5 21.9 Table 1: Percent Time Spent in Kernel Code. The SPEC benchmarks spend significantly less time in the kernel than a large number of commercial and multimedia workloads, represented here by the IBS, Maynard, and Chen suites. The SPEC and IBS measurements are from [7], the Maynard workloads are from [12] and the Chen measurements are from the data that accompany [4] 3 nel. Amdahl s law tells us that if we want modern applications such as these to run quickly, the operating system must run quickly as well. Since traditional performance models essentially ....
Gloy, N., Young, C., Chen, J., and Smith, M. "An Analysis of Dynamic Branch Prediction Schemes on System Workloads." Proceedings of the International Symposium on Computer Architecture, May 1996.
....of the translator, and is analogous to the hardware branch speculation policy of modern processors. Because the translator is implemented in software, its policies can be aggressively tailored for a given application. While the literature on hardware branch prediction is large [YP92, MEP96, GYCS96] we believe that SEA could open up a new trade off space for branch prediction algorithms based on information collected by SEA software. These new branch prediction schemes could either replace hardware prediction, or work with hardware prediction schemes. Like other SEA functions, software ....
Nicolas Gloy, Cliff Young, J. Bradley Chen, and Michael D. Smith. An analysis of dynamic branch prediction schemes on system workloads. In Proc. 23th Annual Symposium on Computer Architecture, pages 12--22, May 1996.
No context found.
Nicolas Gloy, Cliff Young, J. Bradley Chen, Michael D. Smith, "An Analysis of Dynamic Branch Prediction Schemes on System Workloads", Proc. 23rd Annual ISCA, 1996
No context found.
Nicolas Gloy, Cliff Young, J. Bradley Chen, Michael D. Smith, "An Analysis of Dynamic Branch Prediction Schemes on System Workloads", Proc. 23rd Annual ISCA, 1996
No context found.
Nicolas Gloy, Cliff Young, J. Bradley Chen, Michael D. Smith, "An Analysis of Dynamic Branch Prediction Schemes on System Workloads", In Proc. 23rd Annual Intl. Symp. on Computer Architecture, 1996
No context found.
Gloy, N., Young, C., Chen, J. B., and Smith, M. D. An Analysis of Dynamic Branch Prediction Schemes on System Workloads. In The Proceedings of the 23rd Annual International Symposium on Computer Architecture (May 1996).
No context found.
N. Gloy, C. Young, J. Chen, M. smith, An Analysis of Dynamic Branch Prediction Schemes on System Workloads, Proceedings of the International Symposium on Computer Architecture, May 1996
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC