| P. P. Chang, W. Y. Chen, S. A. Mahlke, W. W. Hwu, " Comparing Static and Dynamic Code Scheduling for Multiple-Instruction-Issue Processors," in Proceedings of the 18 Symposium on Microarchitecture, pp. 69-73, Nov 1991. |
....the burden of dynamic scheduling to an offline dynamic scheduler and using only minimal hardware support to overlap multiple static schedules. The work presented here compares the performance of statically scheduled machines with that of dynamically scheduled machines. Previous work on this topic [3] compared in order execution with out of order execution. Their results differ significantly from those presented here. They present a less severe degradation on performance for the statically scheduled machine when compared with a dynamically scheduled one. This difference can be attributed to ....
P. P. Chang, W. Y. Chen, S. A. Mahlke, and W. W. Hwu. Comparing static and dynamic code scheduling for multipleinstruction -issue processors. In Proceedings of the 24th An- nual International Symposium on Microarchitecture, pages 25--33, 1991.
....of branch units on a static architecture. One important conclusion they made was that there was little performance benefit from having more than 3 4 issue slots for ILP. We also found this to be true in Section 4.1.3 below. General purpose applications typically have similar issue width bounds [109]. There have been a number of evaluations of static and dynamic architectures for general purpose processing in the research community, but nearly all of these focus on only one of the two architecture models. With regards to research that has directly compared static and dynamic architectures, ....
....evaluations of static and dynamic architectures for general purpose processing in the research community, but nearly all of these focus on only one of the two architecture models. With regards to research that has directly compared static and dynamic architectures, we are only aware of two studies [109][110] both made within the IMPACT group. Both of these studies examined static VLIW architectures, in order superscalar architectures, and out of order superscalar architectures. The studies examined two scheduling models, restricted and general. The restricted model did not allow code movement ....
Pohua P. Chang, William Y. Chen, Scott A. Mahlke, and Wen-mei W. Hwu, "Comparing Static and Dynamic Code Scheduling for Multiple-Instruction-Issue
....designed for use in a continuous optimization Continuous Program Optimization: A Case Study 19 infrastructure. Although it has been shown previously that traditional techniques can profit from profiling data (e. g, code placement, code scheduling, or register allocation [Pettis and Hansen 1990; Chang et al. 1991; Chang et al. 1992; Chen et al. 1994] it is not immediately obvious that they can also noticeably benefit from continuous re optimization. This section will present dynamic trace scheduling, an optimization that enables us to study the impact of our infrastructure on more traditional ....
....level parallelism by statically reordering the instructions in a program but without invalidating program semantics. This is especially beneficial for in order and VLIW processors that execute instructions in strict program order and are not very tolerant against pipeline stalls and cache misses [Chang et al. 1991]. Instruction scheduling can also be very e#ective for out of order processors. Although out of order processors already reorder independent instructions on thefly, this reordering is usually restricted in scope the processor can only see a relatively small window of instructions at a time ....
Chang, P. P., Chen, W. Y., Mahlke,S.A., and Hwu, W.-M. W. 1991. Comparing Static and Dynamic Code Scheduling for Multiple-Instruction-Issue Processors. In Proceedings of the 24th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). Albuquerque, New Mexico, 25--33.
....is a technical report on the topic of code expanding optimizations and the added requirements to be met by instruction cache logic. Reference [19] examines the performance problems of machines that use multiple instruction issue architectures, but have a limited number of registers. Reference [20] uses the IMPACT compiler to compare results of static and dynamic code scheduling on processors using multiple instruction issue architectures. Reference [21] examines the value of compiler assisted prefetch of data and its impact upon performance and memory interface design. 35 ....
P. P. Chang, W. Y. Chen, S. A. Mahlke, and W. W. Hwu, "Comparing static and dynamic code scheduling for multiple-instruction-issue processors," in Proceedings of the 24th Annual International Symposium on Microarchitecture, pp. 25--33, November 1991.
....speed, has been dealt with by adding high speed cache memory. However, it is difficult to make a cache both large and fast, so that cache misses are expected to continue to have a significant performance impact. Dynamic scheduling has been proposed as a technique for tolerating memory latency[CCMH91, BP91, GGH92]. By speculating through branches, fetching load instructions, and issuing these as soon as the address is available, the dynamic processor can effectively prefetch data. A non blocking cache[Kro81] enables this prefetching to be quite effective[BP91] Figure 1 demonstrates this effect, but also ....
P. Chang, W. Chen, S. Mahlke, and W. Hwu. Comparing static and dynamic code scheduling for multiple-instruction-issue processors. In Proc. of the 24th International Symposium on Microarchitecture, pages 25--33, November 1991.
....and stream buffers[KF95, Jou90] have included only statically scheduled processors. In this paper we study the effectiveness of these hardware techniques when applied to dynamically scheduled processors. Dynamic scheduling has been proposed as a technique for tolerating memory latency[CCMH91, BP91, GGH92]. By speculating through branches, fetching load instructions, and issuing these as soon as the address is available, the dynamic processor can effectively prefetch data. This aspect of dynamically scheduled processors may limit the effectiveness of other hardware schemes for latency tolerance. ....
....techniques This study has compared various hardware techniques for tolerating memory latency. There has also been work on software techniques for for tolerating memory latency, such as prefetching[MLG92, TE95] and balanced scheduling[KE93] Hardware and software techniques are compared in [BP92] [CCMH91], and [CB94] In general, it appears that software and hardware techniques are complementary. Compile time optimizations for memory latency tolerance can include large scale code motion, such as loop transformations, that are beyond the scope of hardware techniques. On the other hand, hardware ....
P. Chang, W. Chen, S. Mahlke, and W. Hwu. Comparing static and dynamic code scheduling for multiple-instruction-issue processors. In Proc. of the 24th International Symposium on Microarchitecture, pages 25--33, November 1991.
....Several studies have been recently conducted to compare some of the merits of static and dynamic scheduling in superscalar processors. Their results have not been very conclusive, however, and fail to examine the Page 17 interaction between static and dynamic scheduling. Chang et al. CCMH91] applied superblock scheduling to three different architectures to compare the benefits of hardware and software techniques. In the study, the three architectures differed in their support for code scheduling. The first architecture supported in order execution with restricted code percolation ....
Pohua P. Chang, William Y. Chen, Scott A. Mahlke, and Wen-mei W. Hwu. Comparing static and dynamic code scheduling for multiple-instruction-issue processors. In Proceedings of the 24th Annual Workshop on Microarchitecture, pages 25--33. SIGMICRO, IEEE, November 1991.
....can again generate MIPS code, execute the scheduled code sequentially, and produce an instruction trace. Using these traces, we have simulated the superscalar execution of the benchmarks in a previous study and found that the simulated execution time matches the time calculated as described above [25]. We also verified that the program output during this execution and trace generation was correct 8 . The execution time result for each compilation is reported as a speedup relative to the compilation for the base microarchitecture. For the register spilling results, we define a metric called ....
P. P. Chang, W. Y. Chen, S. A. Mahlke, and W. W. Hwu, "Comparing Static and Dynamic Code Scheduling for Multiple-Instruction-Issue Processors," in Proceedings of the 24th International Symposium and Workshop on Microarchitecture, Nov. 1991.
No context found.
P. P. Chang, W. Y. Chen, S. A. Mahlke, W. W. Hwu, " Comparing Static and Dynamic Code Scheduling for Multiple-Instruction-Issue Processors," in Proceedings of the 18 Symposium on Microarchitecture, pp. 69-73, Nov 1991.
No context found.
P. Chang, W. Chen, S. Mahlke, and W. Hwu. Comparing static and dynamic code scheduling for multiple-instruction-issue processors. In Proc. of the 24th International Symposium on Microarchitecture, pages 25--33, November 1991.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC