| M. D. Smith, M. Horowitz, and M. S. Lam. Efficient superscalar performance through boosting. In Proc. ASPLOS-V, October 1992. |
.... and corrects mismatches in register assignments (Freudenberger et al. 13] With this approach, hardware support is required to defer faults of potentially faulting 27 29 March 2003, San Francisco, California 4 2003 IEEE instructions moved above branches (e.g. boosting, Smith et al. [23]) to detect overlapping memory operations scheduled out of sequence, and to branch to the compensation code (e.g. memory conflict buffers, Gallagher et al. 14] or the Intel IA 64 ALAT [18] In contrast, Crusoe native VLIW processors provide an elegant hardware solution that supports ....
Michael D. Smith, Mark Horowitz, and Monica S. Lam, "Efficient Superscalar Performance Through Boosting," Proc. 5th Int'l Conf. on ASPLOS, October 1992.
....time path. In the next subsection we discuss our code scheduling techniques which handle cases such as this, in which the duration constraints fail to hold. 7 Code Scheduling The code scheduling algorithm is inspired by a common compiler strategy used for VLIW and superscalar architectures [2, 5, 6, 8, 22, 26]. In such domains, an optimizing compiler exploits a program s inherent fine grained parallelism, and packs its computations into as many functional units as possible. Thus the objective is to keep each unit busy, and to achieve better overall throughput. Our problem context has an entirely ....
M. Smith, M. Horowitz, and M. Lain. Efficient superscalar performance through boosting. In Fifth Iteratioal Coferece o Architectural Support for Programmig Laguages ad Operatitg Systems, pages 248 259. ACM Press, October 1992.
....reservation stations, and complete dependency checking and resolution; actual implementations have been considerably scaled back. Depending on the implementation, the speedup provided by a realistic superscalar architecture may become quite modest: Smith et al. report a 1. 2 speedup over scalar in [164], Mahlke, et al. report 1.6 times scalar in [108] Lee et al. report 2.2 times scalar for a 4 instruction window and 1.7 times scalar for a 2 instruction window in [102] Given these comparatively modest performance results, it seems important to ask whether a simpler architecture would not ....
....scalar machines. It is likely that these type of instructions would be even more useful on a VLIW architecture. 3 Similar Studies One study which is closely related to our work is a comparison by Smith, et al. between a dynamically scheduled superscalar processor and a static superscalar [162][164]. In these studies, the dynamic superscalar architecture has a reservation station style execution mechanism. The static superscalar is a VLIW type architecture where instructions execution in order. Support is included in the static architecture for speculative execution by providing ....
[Article contains additional citation context not shown here]
M. D. Smith, M. Horowitz, M. S. Lam, Efficient Superscalar Performance Through Boosting, Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, 1992, vol. 20, pp. 248-261.
....on this approach. This paper reports on an initial feasibility study to evaluate a verification approach we have developed employing symbolic simulation and decision procedures. We have applied the Stanford Validity Checker (SVC) 2] to the Instruction Fetch Unit of the TORCH microprocessor design [27, 26]. TORCH was designed by Mark Horowitz s group at Stanford University and extends the MIPS R2000 3000 architecture with a number of new features. Although not of industrial scale and not fabricated, it is of significant size for formal verification (approximately 27,000 lines of Verilog) and has ....
M. Smith, M. Horowitz, and M. Lam. Efficient superscalar performance through boosting. In 5th International Conference on Architectural Support for Programming languages and Operating Systems, pages 248--259, Boston, MA, 1992. IEEE/ACM.
....performance RISC processors are being used for numeric problems. Such problems do not show high degree of data reuse, and therefore render caches ineffective. Consequently, researchers have begun to focus on organizations and technologies like software assisted caches [Call91] speculative loads [Smit92] and stream memory controllers [McKe94] Most software approaches that tackle the memory bandwidth problem focus on reducing the memory bandwidth requirements of a program. One of the fundamental compiler optimizations for reducing a program s memory bandwidth requirements is register allocation ....
Smith, M. D., Horowitz, M., and Lam, M. S., "Efficient Superscalar Performance Through Boosting", Proceedings of the Fifth International Symposium on Architectural Support for Programming Languages and Operating Systems, Boston, MA, October 1992, pp. 248-259.
....instructions must not be overwritten, as we shall see in the next section. The change will be much easier if we have a group of registers as buffers to the speculated results. With suitable hardware support, the compiler may just label the speculative instruction as a boosted instruction[2] and the destination register is mapped to a shadow register. Shadow registers can be thought of as copies of the architectural registers. In figure 2, the destination register x in the predicted branch can be mapped by compiler to x:B1. That denotes the shadow register of register x storing the ....
....speculatively executed instruction uses this operand, the exception tag is transferred. However, under this scheme, the exception may be totally ignored if the operand 4 is finally not required by later instructions. Architectures employing these schemes are called nonexcepting architectures [2], and it is used by some proposals like the sentinel scheduling[7] An alternative scheme record the occurrence of the excepting speculatively executed instruction for each conditional branch in a pushdown stack. Since all the speculatively executed instructions have to be re executed anyway, we ....
[Article contains additional citation context not shown here]
Michael D. Smith, Mark Horowitz and Monica S. Lam, "Efficient Superscalar Performance Through Boosting", Proc. of the 5th Intl. Conf. on Architectural Support Programming Language and Operating System, 1992, IEEE, pp. 248-259
....is to perform speculative code motion at compile time. Operations from subsequent basic blocks are moved to preceding basic blocks. These operations will execute before the branch that they were supposed to follow. A limited form of speculative code motion is provide by the boosting scheme [SLH90, SHL92, Smi92] Predicated execution [Hsu86] is an architectural feature that permits the conditional execution of individual operations based on a boolean input. It is used to eliminate branches on acyclic regions of the control flow graph [DHB89, DT93] A limited form of conditional execution appears ....
....from only a path of control, whereas a hyperblock contains instructions from multiple paths of control. If conversion [AKW83] is used to convert control dependences within the hyperblock to data dependences. Boosting Boosting is a technique for statically specifying speculative execution [SLH90, SHL92, Smi92] Boosting converts control dependences into data dependences using a technique similar to if conversion, and then executes the if converted operations in a speculative manner before their guards are available. The guards of a boosted operation includes information about the branches on ....
M.D. Smith, M.A. Horowitz, and M. Lam. Efficient superscalar performance through boosting. In Proc., 5th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, pages 248--259, October 1992.
....preceding branches. When speculation is performed, speculated instructions must be prevented from retiring their results when their control dependent branches are mispredicted. Hardware support identical to that used by speculative out of order issue designs can be used to accomplish this [13] [14], 15] IV. Details of a miss path scheduler The previous section introduced some of the hardware structures required for an MPS implementation. The data stored in the def use table and the reservation table are used to make scheduling decisions. This section details the requirements on this ....
M. D. Smith, M. A. Horowitz, and M. S. Lam, "Efficient superscalar performance through boosting," in Proc. Fifth Int'l. Conf. on Architectural Support for Programming Languages and Operating Systems, Boston, MA, Oct. 1992, pp. 248--259.
....mapping into resources performed in both TS and PS [22] 25] is that some of the greedy code motions have to be undone [8] 21] since they can not be accommodated within the available resources. More efficient global resource constrained parallelization techniques have been reported [8] 21][30], whose key issue is a two phase scheduling scheme. First, a set of operations available for scheduling is computed globally and then heuristics are used to select the best one among them. In [8] a global resource constrained percolation scheduling (GRC PS) technique is described. After the ....
....same results, GSS leads to smaller parallelization time. 3.3 How our contribution relates to previous work On the one hand, we keep in our approach some of the major achievements on resource constrained scheduling in recent years, as follows. 9 a) Like most global scheduling methods [8] 21][30], our approach also adopts a global computation of available operations. However, our implementation is different, since it is based on a CDFG, unlike the above mentioned approaches. b)We perform global code motions at once, in a way similar to [21] but different from [8] and [11] where a ....
[Article contains additional citation context not shown here]
M. Smith, M. Horowitz, and M. Lam., "Efficient Superscalar Performance Through Boosting," in Proc. International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS--5), pp. 248--259, 1992.
....Techniques such as trace scheduling [30] superblocks [29] and hyperblocks [23] have been developed to expand the size of blocks in which the compiler performs optimization and scheduling. Speculative execution techniques have been developed to allow code motion between basic blocks [33] 34] [35]. As a result of these techniques, the impact of control flow instructions on ILP can be significantly reduced. However, this reduction of the impact of control flow instructions on ILP has exposed a secondary impediment to ILP: ambiguous memory dependences [1] In much the same way that branches ....
....instruction. Therefore, the exception caused by the divide instruction in the example above would be ignored. However, the exception should be reported if there is no conflict between the preload and the store. Several schemes for precise exception detection and recovery have been proposed [34] [35], 62] 4.1.6 Discussion of hardware requirements Chen estimated the hardware requirements for a 2 way set associative MCB with 32 sets in CMOS technology to be 60,100 transistors [1] He also estimated the critical path through the MCB to be 13 gate delays, for both preload and store ....
M. D. Smith, M. A. Horowitz, and M. S. Lam, "Efficient superscalar performance through boosting," in Proceedings of the Fifth International Conference on Architecture Support for Programming Languages and Operating Systems, pp. 248--259, October 1992.
....scheduling [12] When such speculation is performed, speculated instructions must be prevented from retiring their results when their dominating branches are mispredicted. Hardware support identical to that used by speculative out of order issue designs can be used to accomplish this [13] [14], 15] 4 Details of a miss path scheduler The previous section introduced some of the hardware structures required for miss path scheduling. The data stored in the def use table and the reservation table are used to make scheduling decisions. Additional logic is needed to interpret the data and ....
M. D. Smith, M. A. Horowitz, and M. S. Lam, "Efficient superscalar performance through boosting," in Proc. Fifth Int'l. Conf. on Architectural Support for Programming Languages and Operating Systems, (Boston, Massachusetts), pp. 248--259, Oct. 1992.
....since it is optimal in terms of performance. 2.3 Instruction Boosting Scheduling Model Instruction boosting proposed by Smith, et al. combines extra hardware support in the form of shadow register files and extra compiler support by generating recovery blocks to handle exception recovery. 9] [11] When an exception occurs for a speculated PEI, the exception is recorded with respect to one of the shadow register files. If no exception occurs for a speculated PEI, the results of the speculated instruction are put into the shadow register file. At the commit point for a speculated PEI, the ....
....block with the use of a compiler generated lookup table. The address of the first instruction in the basic block is used as the index into this table which contains the address of the appropriate recovery block. This is similar to the method that was utilized in Smith s instruction boosting model. [11] The hardware also records the current PC on the stack, so that upon completion of the recovery block, execution can continue where it left off. Execution in the recovery block is performed in a slightly different manner than program code. Most every instruction in the recovery block is ....
M. D. Smith, M. A. Horowitz, and M. S. Lam, "Efficient superscalar performance through boosting," in Proceedings of the Fifth International Conference on Architecture Support for Programming Languages and Operating Systems, pp. 248--259, October 1992.
....to be effective, execution must follow the trace almost always. For numeric programs this is often the case, but for programs like compilers (in which the present author takes a special interest ) the same can not be said. Therefore, an approach which avoids compensation code should be used. In [29], a scheduler is presented which only moves an instruction up past a branch if the instruction is safe and legal. Safeness means that the instruction can not cause an exception; in general, this rules out memory references unless it can be guaranteed that they do not 19 use misaligned or ....
....the original BULLDOG compiler [14] is motivated. If, on the other hand, the on trace path is only moderately more probable, it might be a good idea to only move instructions up past branches if they can fit into holes in the schedules of earlier blocks in the trace; this is the approach taken in [29]. In that work, each block in the trace is scheduled in order. For each block, local scheduling is first performed, and if there are holes in the resulting schedule, safe and legal high priority instructions from later blocks are moved up to fill the holes. Given accurate branch prediction ....
Michael D. Smith, Mark Horowitz, and Monica S. Lam. Efficient superscalar performance through boosting. In 5th International Conference on Architectural Support for Programming Languages and Operating Systems, volume 27, pages 248--259, Boston, MA, October 1992.
....two speculation models that accurately report exceptions and permit recovery. These speculation models also support aggressive compile time speculation. 3.3.3. 1 Instruction Boosting Instruction boosting has been proposed for handling exceptions with compiler controlled speculative execution [32] [33]. The four problems associated with exception detection and recovery are handled with a combination of hardware support (shadow register files) and compiler generated recovery blocks. Detecting delayed exceptions is handled by recording an exception condition raised by a speculative instruction in ....
....so with excessive hardware overhead. The scheme requires multiple copies of register files to implement the shadow registers. The fact that exception recovery requires recovery code blocks also increases code size by about two times, which adds significantly to the pressure on the memory system [33]. 3.3.3.2 Sentinel Scheduling An alternative scheme to enable exception detection and recovery with compiler controlled speculative execution is sentinel scheduling [34] 35] Sentinel scheduling is a compiler based technique that requires few changes to the processor architecture. The four ....
M. D. Smith, M. A. Horowitz, and M. S. Lam, "Efficient superscalar performance through boosting," in Proceedings of the Fifth International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS-V), pp. 248--259, October 1992.
....at the joins. In order to avoid the generation of bookkeeping code, there have been approaches to duplicate code below join points prior to scheduling [Hwu et al. 1993] or to perform nonspeculative code motion on a trace without generating redundant bookkeeping code [Freudenberger et al. 1994; Smith et al. 1992]. These techniques are limited in performing useful code motion in the sense that only the trace is examined for available parallelism, and they might generate unnecessary bookkeeping code. DAG Based techniques have advantages in performing useful code motion, since they have a more global view ....
....target machines with many resources [Aiken and Nicolau 1988; Ellis 1985] tend to suffer from inefficiency problems such as code explosion or long compilation time, which have made them difficult to use in practice. Other techniques that target machines with few resources [Bernstein and Rodeh 1991; Smith et al. 1992] are severely restricted; they employ neither renaming nor software pipelining, and code motion is constrained to occur only between basic blocks, so that neither creation of a new basic block nor destruction of an existing basic block is allowed during scheduling. These restrictions are to avoid ....
[Article contains additional citation context not shown here]
Smith, M., Horowitz, M., and Lam, M. 1992. Efficient superscalar performance through boosting. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM Press, New York, 248--259.
....temporarily in shadow structures until the program control flow resolves the branches in . Currently with IBM Corporation at Rochester, Minnesota question. At this point, the results in the shadow structures are either committed or discarded, depending on the outcome of the branch. Smith et al. [24] identify two characteristics of speculative code motion safety and legality that describe how program semantics can potentially be changed by unwanted side effects. The cross product of the two produces four different types of code motion: safe and legal, safe but illegal, legal but unsafe, ....
.... Rather than extending the hardware to contain multiple copies of the entire register file (one for each conditional branch that an instruction is allowed to move across) to prevent destructive side effects, or limiting performance by employing the less aggressive alternatives presented in [24], we claim that non destructive storage allocation can be performed effectively by the compiler. Modifying the register allocation for instructions that have no implicit side effects is sufficient for guaranteeing semantic correctness for illegal code motion. Without any hardware support, we must ....
[Article contains additional citation context not shown here]
M.D. Smith, M.A. Horowitz, M.S. Lam. Efficient superscalar performance through boosting. ASPLOS V, pp. 248-259. Boston, MA, October, 1992.
....time path. In the next subsection we discuss our code scheduling techniques which handle cases such as this, in which the duration constraints fail to hold. 7 Code Scheduling The code scheduling algorithm is inspired by a common compiler strategy used for VLIW and superscalar architectures [2, 5, 6, 8, 22, 26]. In such domains, an optimizing compiler exploits a program s inherent fine grained parallelism, and packs its computations into as many functional units as possible. Thus the objective is to keep each unit busy, and to achieve better overall throughput. Our problem context has an entirely ....
M. Smith, M. Horowitz, and M. Lam. Efficient superscalar performance through boosting. In Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 248--259. ACM Press, October 1992.
....block for possible issue. Given the sizes of naturally occurring basic blocks, the need to go beyond a basic block became apparent some time ago, and several techniques to permit control speculation have been developed, both in the context of statically and dynamically scheduled machine models [2,4,5,6,7,8,9,10,11]. To improve the accuracy of control speculation, branch prediction techniques are used. Improving the accuracy of control speculation (especially dynamic techniques) has been the subject of intensive research recently, and a plethora of papers on dynamic and static branch prediction techniques ....
M. D. Smith, M. Horowitz, and M. S. Lam. Efficient superscalar performance through boosting. In Proc. ASPLOS V, October 1992.
....instructions across sections in such a way that all the sections satisfy their derived timing constraints. Such a process is similar to that of code scheduling, which is a well defined problem for automatic fine grain (instruction level) parallelization for superscalar and VLIW processors [1, 4, 7, 8, 22, 28]. However, our problem context has a different goal. In what follows, we sketch a code scheduling algorithm, which moves code from sections that violate their duration constraints into those with more lenient constraints. Code scheduling involves copying or relocating unobservable instructions ....
M. Smith, M. Horowitz, and M. Lam. Efficient superscalar performance through boosting. In Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 248--259. ACM Press, October 1992.
....and our conclusion appear in Section 9. 2 Definitions Speculation We say that an execution is speculative if it is carried out before the completion of the test which determines whether the operation would have executed in the original program [12] A speculative execution is correctly predicted [10] if it is carried out in the original sequential program. Lattices Let T be a square n Theta n matrix whose column vectors t 1 ; t n are linearly independent. Then, the set L(T ) f 1 t 1 : n t n j 1 ; n 2 Zg is called the lattice generated by these vectors [9] A ....
M. D. Smith, M. A. Horowitz, and M. S. Lam. Efficient superscalar performance through boosting. In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 248-- 259, October 1992.
....number of registers for storing the results of speculatively executed operations. This is a particular aspect of the more general assumption that register allocation occurs after scheduling. However, we do not assume special hardware support for committing speculative results in execution time [14]. Essentially, this mechanism consists in keeping speculative results in a dedicated buffer until the outcome of the required tests is known. Then, those results are committed by allowing them to update the register file. On the one hand, such hardware support is unnecessary in our approach, since ....
....often in the tasks of a digital system that are subject to time constraints. Therefore, we assume for simplicity that when an operation (if any) might raise an exception, its speculative execution is inhibited. However, this assumption can be relaxed with special hardware support, as suggested in [14] for instance. A comprehensive overview of hardware support for speculative execution can be found in [6] In particular, the use of hardware support for speculative execution in HLS is addressed, for instance, in [7] IV. MODELING AVAILABILITY ANALYSIS In DFGs containing no conditionals, ....
[Article contains additional citation context not shown here]
M. Smith et al., "Efficient Superscalar Performance Through Boosting," Proc. Int. Conf. Architectural Support for Programming Languagues and Operating Systems, pp. 248-259, 1992.
....of the conditional branches in the protocol code. This optimization is similar in approach to what Bershad et al. 9] did by hand for remote procedure calls. To further optimize the code on the frequent I O path, we can apply the trace based ideas of global instruction scheduling [15] 26] 31] 66][101]. To apply this previous work to I O paths, we are extending these scheduling algorithms with more inter procedural analysis [91] We are also investigating the applicability of loop fusion [110] and redundant load removal to the protocol stack code. In this way, the compiler could automatically ....
....code and the network protocol code to the compiler, the compiler can now better overlap communication overheads with application computations. To go beyond just simple instruction scheduling overlap, we are investigating ways to apply the idea of compiler driven speculative execution [14] 67][101] to the network interface. In the area of high performance processor architectures, the compiler uses speculative execution techniques to improve the run time performance of application codes by executing some of the instructions early, before we know whether that instruction execution will be ....
Michael D. Smith, Mark A. Horowitz, and Monica S. Lam. Efficient superscalar performance through boosting. In Proceedings of 5th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 248--259, October 1992.
No context found.
M. D. Smith, M. Horowitz, and M. S. Lam. Efficient superscalar performance through boosting. In Proc. ASPLOS-V, October 1992.
No context found.
M. D. Smith, M. Horowitz and M. S. Lam, `Efficient superscalar performance through boosting', Proc. ACM 5th International Conference on Architectural Support for Programming Languages and Operating Systems, October 1992, pp.248--259.
No context found.
M. Smith, M. Horowitz and M. Lam. Efficient Superscalar Performance Through Boosting. Proceedings Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, 248--261, 1992.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC