| P. P Chang, N. J. Warter, S. A. Mahlke, W. Y. Chen, and W. W. Hwu. Three superblock scheduling models for superscalar and superpipelined processors. Technical Report CRHC-91-29, Center for Reliable and High-Performance Computing, University of Illinois, Urbana-Champaign, 1991. |
....Entire cache DHS PA Periodic Hot backward branch Not Hot When previous line is accessed Cache line Entire cache DHS Bank PA Periodic Hot backward branch Switch banks Not Hot When previous line is accessed Cache line Entire cache Table 1: Leakage control schemes evaluated. tiality of the code [4, 5, 10]. The sequential nature of code can be exploited to predict the next cache line that will be accessed and mask the penalty for transitioning a cache line from drowsy to active mode just in time for access. Specifically, we propose a scheme that preactivates the next cache line, JITA. The leakage ....
P. P. Chang et al. Three superblock scheduling models for superscalar and superpipelined processors. Technical Report CRHC-91-29, Center for Reliable and High-Performance Computing, University of Illinois, Urbana, IL, 1991.
....control and data dependences within each epoch: the transformed code will perform the same operations as the original, but possibly reordered within each control structure and between ambiguous data dependences. However, it is potentially beneficial to move code past control and data dependences [5, 10, 12, 24] to further reduce the critical forwarding path. For example, if a certain path is executed more frequently than alternative paths, then it is advantageous to speculatively schedule the critical forwarding path to exploit this fact. To illustrate, if the else clause is more frequently executed ....
.... builds upon previous dataflow approaches to code motion, namely partial redundancy elimination [18] path sensitive dataflow analysis [16] and hot paths [2] Previous work on speculative code motion to exploit a frequently executed path includes trace scheduling [10] and superblock scheduling [5]. There has also been work on aggressive load store reordering where the runtime check and recovery are performed entirely in software [24] or through a hybrid hardware software approach [12] 1.4 Contributions In the context of thread level speculation, this work makes the following ....
CHANG, P. P., WARTER, N. J., MAHLKE, S. A., CHEN, W. Y., AND HWU, W. W. Three Superblock Scheduling Models for Superscalar and Superpipelined Processors. Center for Reliable and High-Performance Computing, University of Illinois, Urbana-Champaign, 1991.
....extensions to the basic algorithm. Like the simpler method, the inner loops are scheduled rst. The part of the bookkeeping phase of TS concerning rejoin points can be omitted if no branches into a trace are allowed. A very simple way to do so is by using tail duplication to create superblocks [30, 46] instead of traces: if a basic block can be reached via more than one point, the block is duplicated for each such point. The CFG becomes a tree in this way, apart from back edges. 14 Superblocks can also be used for other optimizations, such as constant propagation (tails get specialized for ....
....restricted by: Restriction 1 The de nition of a register value must not be live on the other path of the branch. Restriction 2 The instruction must not cause exceptions that may alter the program behavior or even terminate the program execution. Ways to overcome these restrictions are [30]: predicated execution Both restrictions can be overcome by using predicated (or guarded) execution. The moved instruction gets a predicate (depending on the control condition of the branch) guaranteeing that it will only be executed if the branch is taken in the direction the instruction ....
Chang, P., Warter, N., Mahlke, S., Chen, W., and Hwu, W.-M. Three superblock scheduling models for superscalar and superpipelined processors. Tech. Rep. CRHC-91-29, University of Illinois, 1991.
....[6] is a technical report containing a more thorough treatment of material of [5] Reference [7] describes control flow optimizations which the IMPACT compiler also used. Reference [8] is a technical report showing the advantages of scheduling code prior to register allocation. Reference [9] shows the advantages of scheduling superblocks especially on superpipelined superscalar processors. Reference [10] shows the importance of function inlining in compiling C programs. Reference [11] shows how instruction placement may be improved after function inlining has been performed. ....
P. P. Chang, N. J. Warter, S. A. Mahlke, W. Y. Chen, and W. W. Hwu, "Three superblock scheduling models for superscalar and superpipelined processors," Tech. Rep. CRHC-9125, Center for Reliable and High-Performance Computing, University of Illinois, Urbana, IL, October 1991.
....when a branch is taken the opposite way it was predicted to be taken. Other techniques can be used in the architecture to return the machine to a precise state after executing an instruction speculatively which should not have been executed, also. 3. 4 Superblock scheduling Superblock scheduling [5] is a variant of trace scheduling which uses a method called tail duplication to reduce the bookkeeping complexity associated with branches into the middle of a trace. Chang et al. suggest three models to implement superblock scheduling which have varying hardware support requirements. The main ....
P. P. Chang, N. J. Warter, S. A. Mahlke, W. Y. Chen, and W. W. Hwu. Three superblock scheduling models for superscalar and superpipelined processors. Technical report, Center for Reliable and High-Performance Computing, University of Illinois at Urbana-Champaign.
No context found.
P. P Chang, N. J. Warter, S. A. Mahlke, W. Y. Chen, and W. W. Hwu. Three superblock scheduling models for superscalar and superpipelined processors. Technical Report CRHC-91-29, Center for Reliable and High-Performance Computing, University of Illinois, Urbana-Champaign, 1991.
.... parallelism within each basic block of a program has been shown to be close to 2, hardly enough to justify these new architectures [1] 2] To increase the available operation level parallelism, many new scheduling techniques have been developed: trace scheduling [3] superblock scheduling [4], and percolation scheduling [5] to name a few. All of these global compaction techniques are used in the compiler to schedule operations beyond basic blocks and, thus, create more operation level parallelism. Since loop execution time dominates total execution time, special consideration must ....
....suggests a different technique based on minimizing the global interbody dependence distance [17] In an effort to determine the best global compaction technique to use, several techniques were applied to some sample loops. The results of percolation scheduling [18] and superblock scheduling [4], a variation of trace scheduling, were compared to the loops whose basic blocks had been list scheduled. The findings seem to preclude using trace scheduling or any of its variations. The fundamental idea in trace scheduling is to make the most frequently executed path through the program as ....
P. P. Chang, N. J. Warter, S. A. Mahlke, W. Y. Chen, and W. W. Hwu, "Three superblock scheduling models for superscalar and superpipelined processors," Tech. Rep. CRHC-91-25, Center for Reliable and High-Performance Computing, University of Illinois, Urbana, IL, October 1991.
....is used to ignore all exceptions. The program output is compared against correct program output for verification. 69 5. 3 Experimental Evaluation of Memory Conflict Buffer Memory conflict buffer can assist many code reordering techniques, including trace scheduling [36] superblock scheduling [37], and software pipelining [38] 39] In this section, we report our evaluation results based on superblock scheduling. 5.3.1 Evaluation methodology The compiler algorithms described in Section 5.2 have been implemented in the IMPACT I compiler. The performance implication of MCB is evaluated ....
P. P. Chang, N. J. Warter, S. A. Mahlke, W. Y. Chen, and W. W. Hwu, "Three superblock scheduling models for superscalar and superpipelined processors," Tech. Rep., Center for Reliable and High-Performance Computing, University of Illinois, Urbana, IL, Dec. 1991.
....instruction, or may cause an incorrect result. Smith, Lam, and Horowitz described a method called boosting which uses extra hardware to remove both the first and second restriction without ignoring exceptions [22] We have shown that boosting and general percolation have similar performance [23]. Currently, we are investigating sentinel scheduling, a very promising new technique which allows the code scheduling flexibility of general percolation without ignoring exceptions and without requiring much extra hardware [24] The results achieved with general percolation in this paper confirm ....
P. P. Chang, N. J. Warter, S. A. Mahlke, W. Y. Chen, and W. W. Hwu, "Three Superblock Scheduling Models for Superscalar and Superpipelined Processors," Center for Reliable and High-Performance Computing Report CRHC-91-25, University of Illinois at UrbanaChampaign, Oct. 1991.
No context found.
P. P. Chang, N. J. Warter, S. Mahlke, W. Y. Chen, and WM. W. Hwu. Three superblock scheduling models for superscalar and superpipelined processors. Technical Report CRHC-91-29, Center for Reliable and High-Performance Computing, University of Illinois, Urbana, IL, December 1991.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC