| M. Butler and Y. Patt. An investigation of the performance of various dynamic scheduling techniques. In Proceedings of the 25th Annual ACM/IEEE International Symposium on Microarchitecture, 1992. |
....is freed for later use . A selection policy is used to decide which of the requesting instructions is granted the functional unit. An example selection policy is oldest ready first the ready instruction that occurs earliest in program order is granted the functional unit. Butler and Patt [5] studied various policies for scheduling ready instructions and found that overall performance is largely independent of the selection policy. For example, the HP PA 8000 [19] uses a selection policy that is based on the location of the instruction in the window. We assume the same selection ....
M. Butler and Y. N. Patt. An Investigation of the Performance of Various Dynamic Scheduling Techniques. In Proceedings of the 25th Annual International Symposium on Microarchitecture, pages 1--9, December 1992.
....shifting instructions around in the queue every cycle and depending on the instruction word width may therefore be a source of considerable power consumption. Studies have shown that overall performance is largely independent of what selection policy is used (oldest first, position based, etc. [4]. As such, the compaction strategy may not be best suited for low power operation; nor is it critical to achieving good performance. So, in this research project, an initial decision was made to avoid compaction. Even if this means that the select arbitration must be performed over a window size ....
M. Butler and Y.N Patt. An investigation of the performance of various dynamic scheduling techniques. Proc. ISCA-92, pp. 1-9.
....stations scheme, instructions at the head of the reservation stations can block other ready instructions behind them from issuing. Second, the instruction distribution logic in the in order reservation stations scheme makes no attempt to minimize the use of inter cluster bypasses. Butler and Patt [BP92] also report significant performance degradation when the headonly (fifo) scheduling policy is used with distributed reservation stations. Tomasulo, in his original proposal [Tom67] on dynamic scheduling, proposed distributed reservation stations as an alternative to centralized reservation ....
M. Butler and Y. N. Patt. An investigation of the performance of various dynamic scheduling techniques. In Proceedings of the 25th Annual International Symposium on Microarchitecture, pages 1--9, December 1992. 146
....is freed for later use . A selection policy is used to decide which of the requesting instructions is granted the functional unit. An example selection policy is oldest ready first the ready instruction that occurs earliest in program order is granted the functional unit. Butler and Patt [5] studied various policies for scheduling ready instructions and found that overall performance is largely independent of the selection policy. For example, the HP In some designs, for example the HP PA 8000 [19] the entry is freed only after the instruction has been committed i.e. its result ....
M. Butler and Y. N. Patt. An Investigation of the Performance of Various Dynamic Scheduling Techniques. In Proceedings of the 25th Annual International Symposium on Microarchitecture, pages 1--9, December 1992.
....shifting instructions around in the queue every cycle and depending on the instruction word width may therefore be a source of considerable power consumption. Studies have shown that overall performance is largely independent of what selection policy is used (oldest first, position based, etc. [10]. As such, the compaction strategy may not be best suited for low power operation; nor is it critical to achieving good performance. So, in this research project, an initial decision was made to avoid compaction. Even if this means that the select arbitration must be performed over a window size ....
M. Butler and Y.N Patt, "An investigation of the performance of various dynamic scheduling techniques," Proc.ISCA-92, pp. 1-9.
....data flow structure, it is very hard to distinguish the wrong path from the correct path (otherwise, this would provide a way to detect mispredicted branches) This observation led us to a sampling method, using correct path instructions to simulate the wrong path. A similar technique was used in [2]. The whole instruction trace is injected in the simulator, as usual, so that its internal structures (branch predictor, cache, store sets, are kept warm . However, we collect statistics only for one slice every 100 on average. We define a slice as a piece of instruction trace delimited by ....
M. Butler and Y. Patt. An investigation of the performance of various dynamic scheduling techniques. In Proceedings of the 25th International Symposium on Microarchitecture, 1992.
....number of instructions dispatched each cycle defines the dispatch width. Registers are renamed using a map table during the dispatch process. Instructions in the issue buffer may be issued out of order when all their operands are available, and a max dependent selection mechanism as described in [1] is used when more than one instruction compete for the same functional unit access. To enforce precise interrupt management, a history buffer [23] similar to the active list of the R10000, records the previous mappings discarded by the renaming process during the dispatch stage. Checkpoints [9] ....
M. Butler and Y. N. Patt, "An Investigation of the Performance of Various Dynamic Scheduling Techniques," Proceedings of the 25th Annual International Symposium on Microarchitecture, December 1992.
....of a possible degradation in performance. On the other hand, if more hardware complexity can be tolerated, other approaches could be used that, for example, decrease the time required to perform the computation on the critical path. Since the performance functionality tradeoff has been explored by Butler and Patt [1992] and since the hardware complexity of an earliest first approach is not excessive, the processor model used in this work assumes this approach. 3.4 System Model Details The previous section summarizes the factors that are accurately modeled and those that are not. In this section, further ....
Butler, M. and Patt, Y. (1992). An Investigation of the Performance of Various Dynamic Scheduling Techniques. In the Proceedings of the 25th Annual International Symposium on Microarchitecture, pages 1--9.
....techniques This study has compared various hardware techniques for tolerating memory latency. There has also been work on software techniques for for tolerating memory latency, such as prefetching[MLG92, TE95] and balanced scheduling[KE93] Hardware and software techniques are compared in [BP92], CCMH91] and [CB94] In general, it appears that software and hardware techniques are complementary. Compile time optimizations for memory latency tolerance can include large scale code motion, such as loop transformations, that are beyond the scope of hardware techniques. On the other hand, ....
M. Butler and Y. Patt. An investigation of the performance of various dynamic scheduling techniques. In Proc. of the 25th International Symposium on Microarchitecture, pages 1--9, December 1992.
....that O is constant for each additional instruction added. The parameter d is the fraction of occurrence of significantly disrupting events (akin to a misguessed branch) which causes the ILP machine to have to restart with a new instruction count. The tradeoff between VLIW and superscalar machines [2] is simply the tradeoff between reducing overhead with a VLIW machine and potentially finding potential parallelism and hence reducing the disruption with a superscalar machine. As before, latency tolerance reduces for all organizations d, the probability of a disruption, and hence improves the ....
M. Butler and Y. Patt. An investigation of the performance of various dynamic scheduling techniques. In Proceedings of MICRO-25, December 1992.
....are the algorithms which arbitrate fireable entries of the instruction issue buffer. This buffer can be unified (all entries are linked to all functional units) split (all entries are linked to only one functional unit) or mixed. A similar mechanism is called the node table by Butler and Patt in [BuPa92]. In their paper, they show that most issuing techniques give almost the same performance for a processor featuring a wide degree. We therefore decided to implement the most natural algorithm named oldest first where the entry which holds the oldest dispatched instruction has priority over the ....
M. Butler and Y. Patt , "An Investigation of the Performance of Various Dynamic Scheduling techniques ", Proceedings of the 25th Annual International Symposium on Microarchitecture, Decem-ber 1992
....are the algorithms which arbitrate issuable entries of the instruction issue buffer. This buffer can be unified (all entries are linked to all functional units) split (all entries are linked to only one unit) or semi unified. A similar mechanism is called the node table by Butler and Patt in [1]. The most natural issuing policy is oldest first where the entry which holds the oldest dispatched instruction has priority over the others. Nonetheless, 8] showed that using this priority is unnecessary: cheaper policies such as a fixed priority give roughly the same performance. But more ....
....policy is oldest first where the entry which holds the oldest dispatched instruction has priority over the others. Nonetheless, 8] showed that using this priority is unnecessary: cheaper policies such as a fixed priority give roughly the same performance. But more recently, Butler and Patt in [1] showed that these results do not apply to floating point programs such as tomcatv since the performance loss is 20 . Thus, we avoid most of the possible degradation of the issuing performance by choosing the oldest first priority since we are not investigating issuing policies. When an ....
M. Butler and Y. N. Patt, "An Investigation of the Performance of Various Dynamic Scheduling Techniques," Proceedings of the 25th Annual International Symposium on Microarchitecture, December 1992.
....decoder or branch unit) ILP (instruction level parallelism) is the basis for enhancing the performance of most current high end microprocessors. The obvious issue is, how much ILP is available Current processor implementations attempt to use 4or perhaps 8 way instruction issues each cycle [5, 3, 8]. At least at this time, it seems problematic that ILP compilers will efficiently support more than these levels. An interesting question is when (and how) to find ILP The obvious alternatives are compile time, when the entire scope of the program is available, and run time, when the true state ....
M. Butler and Y. Patt. An investigation of the performance of various dynamic scheduling techniques. In Proceedings of MICRO-25, December 1992.
....The four curves in each set represent different values of load latency (5, 10, 20 and 50 cycles) Integer and floating point operations were assumed to take 3 and 4 We found that using a critical path based priority to affect this choice did not significantly affect the results. Experiments in [9] also show that the order in which ready instructions are dispatched has a relatively minor effect. 5 Numerous other loops were simulated, with similar results. The results shown here are from the most frequently executed loops in several of the Perfect Club R fl codes. Figure 1: Utilization ....
Michael Butler and Yale Patt. An investigation of the performance of various dynamic scheduling techniques. In Proceedings of the 25th Annual International Symposium on Microarchitecture, pages 1--9, Portland, Oregon, December 1992. IEEE Computer Society Press.
....is freed for later use 5 . A selection policy is used to decide which of the requesting instructions is granted the functional unit. An example selection policy is oldest ready first the ready instruction that occurs earliest in program order is granted the functional unit. Butler and Patt [5] studied various policies for scheduling ready instructions and found that overall performance is largely independent of the selection policy. For example, the HP PA 8000 [19] uses a selection policy that is based on the location of the instruction in the window. We assume the same selection ....
M. Butler and Y. N. Patt. An Investigation of the Performance of Various Dynamic Scheduling Techniques. In Proceedings of the 25th Annual International Symposium on Microarchitecture, pages 1--9, December 1992.
....number of instructions dispatched each cycle defines the dispatch width. Registers are renamed using a map table during the dispatch process. Instructions in the issue buffer may be issued out of order when all their operands are available, and a max dependent selection mechanism as described in [1] is used when more than one instruction compete for the same functional unit access. To enforce precise interrupt management, a history buffer similar to the active list of the MIPS R10000, records the previous mappings discarded by the renaming process during the dispatch stage. Checkpoints of ....
M. Butler and Y. N. Patt, "An Investigation of the Performance of Various Dynamic Scheduling Techniques," Proceedings of the 25th Annual International Symposium on Microarchitecture, December 1992.
....makes no attempt to minimize the use of inter cluster bypasses. Figure 3 16. Comparing against in order distributed reservation stations. 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 2 cluster in order res. stations 2 cluster fifo based compress gcc go ijpeg li m88ksim perl vortex 113 Butler and Patt [BP92] also report significant performance degradation when the headonly (fifo) scheduling policy is used with distributed reservation stations. 3.5 Related Work Tomasulo, in his original proposal [Tom67] on dynamic scheduling, proposed distributed reservation stations as an alternative to ....
M. Butler and Y. N. Patt. An Investigation of the Performance of Various Dynamic Scheduling Techniques. In Proceedings of the 25th Annual International Symposium on Microarchitecture, pages 1--9, December 1992. 152
....associated instruction is issued to the functional unit. A selection policy is used to decide which of the requesting instructions is granted. An example selection policy is oldest first the ready instruction that occurs earliest in program order is granted the functional unit. Butler and Patt [5] studied various policies for scheduling ready instructions and found that overall performance is largely independent of the selection policy. The HP PA 8000 uses a selection policy that is based on the location of the instruction in the window. We assume the same selection policy in our study. ....
M. Butler and Y. N. Patt. An Investigation of the Performance of Various Dynamic Scheduling Techniques. In Proceedings of the 25th Annual International Symposium on Microarchitecture, pages 1--9, December 1992.
....instruction mix. In these simulations, three functional unit types were 3 This ignores multiprocessor loop scheduling and synchronization overhead. 4 It was found that choosing the instructions issued using a critical path based priority did not significantly affect the results. Experiments in [19] also show that the order in which ready instructions are dispatched has a relatively minor effect. 5 Numerous other loops were simulated, with similar results. The results shown here are from the most frequently executed loops in several of the Perfect Club R fl codes. 1 4 8 12 16 W 0.0 0.5 ....
Michael Butler and Yale Patt. An investigation of the performance of various dynamic scheduling techniques. In Proceedings of the 25th Annual International Symposium on Microarchitecture, pages 1--9, Portland, Oregon, December 1992. IEEE Computer Society Press.
No context found.
M. Butler and Y. Patt. An investigation of the performance of various dynamic scheduling techniques. In Proceedings of the 25th Annual ACM/IEEE International Symposium on Microarchitecture, 1992.
....the table to determine its DELAY field, MATCH bit, and SHIFT field. 3.3. Select Logic The select logic for each functional unit grants execution to one ready instruction. If more than one instruction requests execution, heuristics may be used for choosing which instruction receives the grant [1]. The inputs to the select logic are the request signals from each of the functional unit s RSEs, plus any additional information needed for scheduling heuristics such as priority information. Implementations of the select logic are discussed elsewhere [3] and will not be covered in this paper. As ....
M. Butler and Y. Patt, "An investigation of the performance of various dynamic scheduling techniques," in Proceedings of the 25th Annual ACM/IEEE International Symposium on Microarchitecture, 1992.
No context found.
M. Butler and Y. Patt. An investigation of the performance of various dynamic scheduling techniques. In Proc. of the 25th International Symposium on Microarchitecture, pages 1--9, December 1992.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC