22 citations found. Retrieving documents...
M. Butler and Y. Patt. An investigation of the performance of various dynamic scheduling techniques. In Proceedings of the 25th Annual ACM/IEEE International Symposium on Microarchitecture, 1992.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Quantifying the Complexity of Superscalar Processors - Subbarao Palacharla Norman (1996)   (45 citations)  (Correct)

....is freed for later use . A selection policy is used to decide which of the requesting instructions is granted the functional unit. An example selection policy is oldest ready first the ready instruction that occurs earliest in program order is granted the functional unit. Butler and Patt [5] studied various policies for scheduling ready instructions and found that overall performance is largely independent of the selection policy. For example, the HP PA 8000 [19] uses a selection policy that is based on the location of the instruction in the window. We assume the same selection ....

M. Butler and Y. N. Patt. An Investigation of the Performance of Various Dynamic Scheduling Techniques. In Proceedings of the 25th Annual International Symposium on Microarchitecture, pages 1--9, December 1992.


A Circuit Level Implementation of an Adaptive.. - Buyuktosunoglu.. (2001)   (4 citations)  (Correct)

....shifting instructions around in the queue every cycle and depending on the instruction word width may therefore be a source of considerable power consumption. Studies have shown that overall performance is largely independent of what selection policy is used (oldest first, position based, etc. [4]. As such, the compaction strategy may not be best suited for low power operation; nor is it critical to achieving good performance. So, in this research project, an initial decision was made to avoid compaction. Even if this means that the select arbitration must be performed over a window size ....

M. Butler and Y.N Patt. An investigation of the performance of various dynamic scheduling techniques. Proc. ISCA-92, pp. 1-9.


Complexity-Effective Superscalar Processors - Palacharla (1997)   (161 citations)  (Correct)

....stations scheme, instructions at the head of the reservation stations can block other ready instructions behind them from issuing. Second, the instruction distribution logic in the in order reservation stations scheme makes no attempt to minimize the use of inter cluster bypasses. Butler and Patt [BP92] also report significant performance degradation when the headonly (fifo) scheduling policy is used with distributed reservation stations. Tomasulo, in his original proposal [Tom67] on dynamic scheduling, proposed distributed reservation stations as an alternative to centralized reservation ....

M. Butler and Y. N. Patt. An investigation of the performance of various dynamic scheduling techniques. In Proceedings of the 25th Annual International Symposium on Microarchitecture, pages 1--9, December 1992. 146


Quantifying the Complexity of Superscalar Processors - Palacharla, Jouppi, Smith (1996)   (45 citations)  (Correct)

....is freed for later use . A selection policy is used to decide which of the requesting instructions is granted the functional unit. An example selection policy is oldest ready first the ready instruction that occurs earliest in program order is granted the functional unit. Butler and Patt [5] studied various policies for scheduling ready instructions and found that overall performance is largely independent of the selection policy. For example, the HP In some designs, for example the HP PA 8000 [19] the entry is freed only after the instruction has been committed i.e. its result ....

M. Butler and Y. N. Patt. An Investigation of the Performance of Various Dynamic Scheduling Techniques. In Proceedings of the 25th Annual International Symposium on Microarchitecture, pages 1--9, December 1992.


An Adaptive Issue Queue for Reduced Power at High.. - Alper Buyuktosunoglu.. (2000)   (19 citations)  (Correct)

....shifting instructions around in the queue every cycle and depending on the instruction word width may therefore be a source of considerable power consumption. Studies have shown that overall performance is largely independent of what selection policy is used (oldest first, position based, etc. [10]. As such, the compaction strategy may not be best suited for low power operation; nor is it critical to achieving good performance. So, in this research project, an initial decision was made to avoid compaction. Even if this means that the select arbitration must be performed over a window size ....

M. Butler and Y.N Patt, "An investigation of the performance of various dynamic scheduling techniques," Proc.ISCA-92, pp. 1-9.


Data-Flow Prescheduling for Large Instruction Windows in.. - Michaud, Seznec (2001)   (10 citations)  (Correct)

....data flow structure, it is very hard to distinguish the wrong path from the correct path (otherwise, this would provide a way to detect mispredicted branches) This observation led us to a sampling method, using correct path instructions to simulate the wrong path. A similar technique was used in [2]. The whole instruction trace is injected in the simulator, as usual, so that its internal structures (branch predictor, cache, store sets, are kept warm . However, we collect statistics only for one slice every 100 on average. We define a slice as a piece of instruction trace delimited by ....

M. Butler and Y. Patt. An investigation of the performance of various dynamic scheduling techniques. In Proceedings of the 25th International Symposium on Microarchitecture, 1992.


Multiple-Block Ahead Branch Predictors - Seznec, Jourdan, Sainrat, Michaud (1996)   (32 citations)  (Correct)

....number of instructions dispatched each cycle defines the dispatch width. Registers are renamed using a map table during the dispatch process. Instructions in the issue buffer may be issued out of order when all their operands are available, and a max dependent selection mechanism as described in [1] is used when more than one instruction compete for the same functional unit access. To enforce precise interrupt management, a history buffer [23] similar to the active list of the R10000, records the previous mappings discarded by the renaming process during the dispatch stage. Checkpoints [9] ....

M. Butler and Y. N. Patt, "An Investigation of the Performance of Various Dynamic Scheduling Techniques," Proceedings of the 25th Annual International Symposium on Microarchitecture, December 1992.


Memory-System Design Considerations For Dynamically-Scheduled.. - Farkas (1997)   (34 citations)  (Correct)

....of a possible degradation in performance. On the other hand, if more hardware complexity can be tolerated, other approaches could be used that, for example, decrease the time required to perform the computation on the critical path. Since the performance functionality tradeoff has been explored by Butler and Patt [1992] and since the hardware complexity of an earliest first approach is not excessive, the processor model used in this work assumes this approach. 3.4 System Model Details The previous section summarizes the factors that are accurately modeled and those that are not. In this section, further ....

Butler, M. and Patt, Y. (1992). An Investigation of the Performance of Various Dynamic Scheduling Techniques. In the Proceedings of the 25th Annual International Symposium on Microarchitecture, pages 1--9.


Latency Tolerance For Dynamic Processors - Bennett, Flynn (1996)   (1 citation)  (Correct)

....techniques This study has compared various hardware techniques for tolerating memory latency. There has also been work on software techniques for for tolerating memory latency, such as prefetching[MLG92, TE95] and balanced scheduling[KE93] Hardware and software techniques are compared in [BP92], CCMH91] and [CB94] In general, it appears that software and hardware techniques are complementary. Compile time optimizations for memory latency tolerance can include large scale code motion, such as loop transformations, that are beyond the scope of hardware techniques. On the other hand, ....

M. Butler and Y. Patt. An investigation of the performance of various dynamic scheduling techniques. In Proc. of the 25th International Symposium on Microarchitecture, pages 1--9, December 1992.


Basic Issues in Microprocessor Architecture - Flynn   (Correct)

....that O is constant for each additional instruction added. The parameter d is the fraction of occurrence of significantly disrupting events (akin to a misguessed branch) which causes the ILP machine to have to restart with a new instruction count. The tradeoff between VLIW and superscalar machines [2] is simply the tradeoff between reducing overhead with a VLIW machine and potentially finding potential parallelism and hence reducing the disruption with a superscalar machine. As before, latency tolerance reduces for all organizations d, the probability of a disruption, and hence improves the ....

M. Butler and Y. Patt. An investigation of the performance of various dynamic scheduling techniques. In Proceedings of MICRO-25, December 1992.


Exploring Configurations of Functional Units in an.. - St Phan Jourdan (1995)   (13 citations)  (Correct)

....are the algorithms which arbitrate fireable entries of the instruction issue buffer. This buffer can be unified (all entries are linked to all functional units) split (all entries are linked to only one functional unit) or mixed. A similar mechanism is called the node table by Butler and Patt in [BuPa92]. In their paper, they show that most issuing techniques give almost the same performance for a processor featuring a wide degree. We therefore decided to implement the most natural algorithm named oldest first where the entry which holds the oldest dispatched instruction has priority over the ....

M. Butler and Y. Patt , "An Investigation of the Performance of Various Dynamic Scheduling techniques ", Proceedings of the 25th Annual International Symposium on Microarchitecture, Decem-ber 1992


An Investigation of the Performance of Various.. - Jourdan, Sainrat.. (1995)   (2 citations)  (Correct)

....are the algorithms which arbitrate issuable entries of the instruction issue buffer. This buffer can be unified (all entries are linked to all functional units) split (all entries are linked to only one unit) or semi unified. A similar mechanism is called the node table by Butler and Patt in [1]. The most natural issuing policy is oldest first where the entry which holds the oldest dispatched instruction has priority over the others. Nonetheless, 8] showed that using this priority is unnecessary: cheaper policies such as a fixed priority give roughly the same performance. But more ....

....policy is oldest first where the entry which holds the oldest dispatched instruction has priority over the others. Nonetheless, 8] showed that using this priority is unnecessary: cheaper policies such as a fixed priority give roughly the same performance. But more recently, Butler and Patt in [1] showed that these results do not apply to floating point programs such as tomcatv since the performance loss is 20 . Thus, we avoid most of the possible degradation of the issuing performance by choosing the oldest first priority since we are not investigating issuing policies. When an ....

M. Butler and Y. N. Patt, "An Investigation of the Performance of Various Dynamic Scheduling Techniques," Proceedings of the 25th Annual International Symposium on Microarchitecture, December 1992.


What's Ahead in Computer Design? - Flynn Stanford (1997)   (1 citation)  (Correct)

....decoder or branch unit) ILP (instruction level parallelism) is the basis for enhancing the performance of most current high end microprocessors. The obvious issue is, how much ILP is available Current processor implementations attempt to use 4or perhaps 8 way instruction issues each cycle [5, 3, 8]. At least at this time, it seems problematic that ILP compilers will efficiently support more than these levels. An interesting question is when (and how) to find ILP The obvious alternatives are compile time, when the entire scope of the program is available, and run time, when the true state ....

M. Butler and Y. Patt. An investigation of the performance of various dynamic scheduling techniques. In Proceedings of MICRO-25, December 1992.


Processor Design Issues for Parallel Machines - Carl Beckmann   (Correct)

....The four curves in each set represent different values of load latency (5, 10, 20 and 50 cycles) Integer and floating point operations were assumed to take 3 and 4 We found that using a critical path based priority to affect this choice did not significantly affect the results. Experiments in [9] also show that the order in which ready instructions are dispatched has a relatively minor effect. 5 Numerous other loops were simulated, with similar results. The results shown here are from the most frequently executed loops in several of the Perfect Club R fl codes. Figure 1: Utilization ....

Michael Butler and Yale Patt. An investigation of the performance of various dynamic scheduling techniques. In Proceedings of the 25th Annual International Symposium on Microarchitecture, pages 1--9, Portland, Oregon, December 1992. IEEE Computer Society Press.


Quantifying the Complexity of Superscalar Processors - Palacharla, Jouppi, Smith (1996)   (45 citations)  (Correct)

....is freed for later use 5 . A selection policy is used to decide which of the requesting instructions is granted the functional unit. An example selection policy is oldest ready first the ready instruction that occurs earliest in program order is granted the functional unit. Butler and Patt [5] studied various policies for scheduling ready instructions and found that overall performance is largely independent of the selection policy. For example, the HP PA 8000 [19] uses a selection policy that is based on the location of the instruction in the window. We assume the same selection ....

M. Butler and Y. N. Patt. An Investigation of the Performance of Various Dynamic Scheduling Techniques. In Proceedings of the 25th Annual International Symposium on Microarchitecture, pages 1--9, December 1992.


Multiple-Block Ahead Branch Predictors - Seznec, Jourdan, Sainrat, Michaud (1996)   (32 citations)  (Correct)

....number of instructions dispatched each cycle defines the dispatch width. Registers are renamed using a map table during the dispatch process. Instructions in the issue buffer may be issued out of order when all their operands are available, and a max dependent selection mechanism as described in [1] is used when more than one instruction compete for the same functional unit access. To enforce precise interrupt management, a history buffer similar to the active list of the MIPS R10000, records the previous mappings discarded by the renaming process during the dispatch stage. Checkpoints of ....

M. Butler and Y. N. Patt, "An Investigation of the Performance of Various Dynamic Scheduling Techniques," Proceedings of the 25th Annual International Symposium on Microarchitecture, December 1992.


Complexity-Effective Superscalar Processors - Palacharla (1998)   (161 citations)  (Correct)

....makes no attempt to minimize the use of inter cluster bypasses. Figure 3 16. Comparing against in order distributed reservation stations. 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 2 cluster in order res. stations 2 cluster fifo based compress gcc go ijpeg li m88ksim perl vortex 113 Butler and Patt [BP92] also report significant performance degradation when the headonly (fifo) scheduling policy is used with distributed reservation stations. 3.5 Related Work Tomasulo, in his original proposal [Tom67] on dynamic scheduling, proposed distributed reservation stations as an alternative to ....

M. Butler and Y. N. Patt. An Investigation of the Performance of Various Dynamic Scheduling Techniques. In Proceedings of the 25th Annual International Symposium on Microarchitecture, pages 1--9, December 1992. 152


Complexity-Effective Superscalar Processors - Palacharla (1997)   (161 citations)  (Correct)

....associated instruction is issued to the functional unit. A selection policy is used to decide which of the requesting instructions is granted. An example selection policy is oldest first the ready instruction that occurs earliest in program order is granted the functional unit. Butler and Patt [5] studied various policies for scheduling ready instructions and found that overall performance is largely independent of the selection policy. The HP PA 8000 uses a selection policy that is based on the location of the instruction in the window. We assume the same selection policy in our study. ....

M. Butler and Y. N. Patt. An Investigation of the Performance of Various Dynamic Scheduling Techniques. In Proceedings of the 25th Annual International Symposium on Microarchitecture, pages 1--9, December 1992.


Hardware And Software For Functional And Fine Grain Parallelism - Beckmann (1993)   (16 citations)  (Correct)

....instruction mix. In these simulations, three functional unit types were 3 This ignores multiprocessor loop scheduling and synchronization overhead. 4 It was found that choosing the instructions issued using a critical path based priority did not significantly affect the results. Experiments in [19] also show that the order in which ready instructions are dispatched has a relatively minor effect. 5 Numerous other loops were simulated, with similar results. The results shown here are from the most frequently executed loops in several of the Perfect Club R fl codes. 1 4 8 12 16 W 0.0 0.5 ....

Michael Butler and Yale Patt. An investigation of the performance of various dynamic scheduling techniques. In Proceedings of the 25th Annual International Symposium on Microarchitecture, pages 1--9, Portland, Oregon, December 1992. IEEE Computer Society Press.


Demand-Only Broadcast: Reducing Register File and Bypass Power.. - Brown, Patt (2004)   Self-citation (Patt)   (Correct)

No context found.

M. Butler and Y. Patt. An investigation of the performance of various dynamic scheduling techniques. In Proceedings of the 25th Annual ACM/IEEE International Symposium on Microarchitecture, 1992.


On Pipelining Dynamic Instruction Scheduling Logic - Stark, Brown, Patt (2000)   (8 citations)  Self-citation (Patt)   (Correct)

....the table to determine its DELAY field, MATCH bit, and SHIFT field. 3.3. Select Logic The select logic for each functional unit grants execution to one ready instruction. If more than one instruction requests execution, heuristics may be used for choosing which instruction receives the grant [1]. The inputs to the select logic are the request signals from each of the functional unit s RSEs, plus any additional information needed for scheduling heuristics such as priority information. Implementations of the select logic are discussed elsewhere [3] and will not be covered in this paper. As ....

M. Butler and Y. Patt, "An investigation of the performance of various dynamic scheduling techniques," in Proceedings of the 25th Annual ACM/IEEE International Symposium on Microarchitecture, 1992.


Latency Tolerant Architectures - Bennett (1998)   (2 citations)  (Correct)

No context found.

M. Butler and Y. Patt. An investigation of the performance of various dynamic scheduling techniques. In Proc. of the 25th International Symposium on Microarchitecture, pages 1--9, December 1992.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC