| M. D. Smith, M. Lam, and M. A. Horowitz. Boosting beyond static scheduling in a superscalar processor. In Proceedings of the 17th Annual Symposium on Computer Architecture, pages 344-354, 1990. |
....the amount of ILP available for the compiler to exploit. Thus, it is a question whether a simple VLIW machine really outperforms a superscalar machine. Recent VLIW studies proposed hardware mechanisms to remove the restrictions imposed on compiler s scheduling (e.g. guarding [3] and boosting [11]) We recently proposed a mechanism called predicating [2] which provides the compiler with unconstrained speculative code motions. Although that paper reported great ILP improvement through the mechanism and the simplicity of the mechanism, it is unknown how much the hardware mechanism imposes a ....
M. D. Smith, M. S. Lam, and M. A. Horowitz, "Boosting Beyond Static Scheduling in a Superscalar Processor," In Proc. 17th Int. Symp. on Computer Architecture, pp.344-355, May 1990.
....from various configurations will be shown in Chapter 8.1.3. With specialized BTB support for indirect jumps [9] even better results should be obtained. Some machines provide other special architectural support for speculative execution of instructions dependent on branches, such as boosting [27] and predicted execution [24, 19] The relative cost of an indirect jump versus the set of branches it replaces will be affected by such support. The compiler writer must use appropriate cost estimates based on the architectural support available for branches and indirect jumps on the target ....
M. D. Smith, M. S. Lam, and M. A. Horowitz. Boosting beyond static scheduling in a superscalar processor. In Proceedings of the 17th International Symposium on Computer Architecture, pages 344--353, May 1990.
....formats to allow unused operation slots (NOPs) to be left out of the object code [167] 36] 35] This decreases the size of the object code, but adds decoding overhead. The next step, somewhere between VLIW and superscalar, is the static superscalar Torch architecture, described Smith et al. in [162]. Torch executes instructions in the static order determined by the compiler. The architecture allows access to a set of shadow registers and buffers, allowing the compiler to speculatively schedule instructions across conditionals. Simulations of the architecture show performance in the 1.4 to ....
....on scalar machines. It is likely that these type of instructions would be even more useful on a VLIW architecture. 3 Similar Studies One study which is closely related to our work is a comparison by Smith, et al. between a dynamically scheduled superscalar processor and a static superscalar [162][164] In these studies, the dynamic superscalar architecture has a reservation station style execution mechanism. The static superscalar is a VLIW type architecture where instructions execution in order. Support is included in the static architecture for speculative execution by providing ....
[Article contains additional citation context not shown here]
M. D. Smith, M. S. Lam, M. A. Horowitz, Boosting Beyond Static Scheduling in a Superscalar Processor, Proceedings of the 17th Annual International Symposium on Computer Architecture, 1990, vol. 18, pp. 345-353.
....to as general speculation [4] In such compilercontrolled speculative mechanisms, live register analysis can be performed to locate architectural registers that can hold the speculative values. Boosting is another compiler controlled mechanisms that is able to speculate arbitrary instructions [22]. In this model, each speculated instruction is tagged with the proper execution path to its home block. If the home block is reached along the specified path, the speculative results commit, otherwise they are cleared. This mechanism requires bits in the instruction opcode to represent directions ....
M. D. Smith, M. S. Lam, and M. A. Horowitz. Boosting beyond static scheduling in a superscalar processor. In Proceedings of the 17th International Symposium on Computer Architecture, pages 344--354, May 1990.
....on this approach. This paper reports on an initial feasibility study to evaluate a verification approach we have developed employing symbolic simulation and decision procedures. We have applied the Stanford Validity Checker (SVC) 2] to the Instruction Fetch Unit of the TORCH microprocessor design [27, 26]. TORCH was designed by Mark Horowitz s group at Stanford University and extends the MIPS R2000 3000 architecture with a number of new features. Although not of industrial scale and not fabricated, it is of significant size for formal verification (approximately 27,000 lines of Verilog) and has ....
M. Smith, M. Lam, and M. Horowitz. Boosting beyond static scheduling in a superscalar processor. In 17th International Symposium on Computer Architecture, volume 18-2, pages 344--354, Seattle, WA, May 1990. IEEE/ACM.
....is selected; this process repeats until all traces of execution have been compacted. A limitation of trace scheduling is that it cannot in general migrate instructions above conditional branches. However, such code motions are possible on machines that provide support for speculative execution [Smith90]. In section 3 we describe various architectural mechanisms that have been proposed to support speculative execution. We review the results of [Chang95] in which three different speculative execution models are compared. Another limitation of trace scheduling is that it does not handle loops very ....
....and downwards past splits, it is not in general safe to move instructions upwards past a conditional branch as this can destroy the intended semantics of the program. Three specific restrictions on upward motion of an instruction I past a branch B are well known ( Chang91] Huang94] Chang95] [Smith90]) 1. I must not cause an exception. 2. I must not overwrite the value of a register that is needed by some other successor of B. 3. I must not alter system memory. On conventional architectures, I can only be moved upwards past B if all three of the above conditions are met. However, it is ....
[Article contains additional citation context not shown here]
Michael D. Smith, Monica S. Lam, Mark A. Horowitz, Boosting Beyond Static Scheduling in a Superscalar Processor , Proc. ISCA 90, pp. 344-354.
....does not perform the actual area estimation; rather, it enables the calculation of better estimates by deriving information about the physical implementation that can be exploited by estimation tools. Our examples include: the instruction fetch unit from Torch, a 32 bit superscalar processor [9] . a section of MAGIC, a node controller chip in the FLASH multiprocessor [18] MIPS Lite, an 8 bit unpipelined integer processor . the blackjack example from [8] an 8 bit iterative divider based on an example in [2] 6.2 Torch instruction fetch unit The Torch instruction fetch unit model ....
M. D. Smith, et al., "Boosting Beyond Static Scheduling in a Superscalar Processor," in Proceedings of the 17th Annual International Symposium on Computer Architecture, pp. 344-354, 1990.
.... additional hardware support (e.g. a specialized instruction set) is assumed to handle exceptions during the execution of predicted operations [3] To allow more aggressive scheduling or prediction across multiple paths, some static prediction approaches also depend on specialized register files [14, 2]. In this paper we will present multiple path prediction (MPP) a compile time technique originally developed for high level synthesis [8] MPP is a code transformation strategy that improves the optimization potential when software pipelining or loop pipelining [1] is applied for loop ....
M. D. Smith, M. S. Lam, and M. A. Horowitz. Boosting beyond static scheduling in a superscalar processor. In ISCA, pages 344--354, 1990.
....is to perform speculative code motion at compile time. Operations from subsequent basic blocks are moved to preceding basic blocks. These operations will execute before the branch that they were supposed to follow. A limited form of speculative code motion is provide by the boosting scheme [SLH90, SHL92, Smi92] Predicated execution [Hsu86] is an architectural feature that permits the conditional execution of individual operations based on a boolean input. It is used to eliminate branches on acyclic regions of the control flow graph [DHB89, DT93] A limited form of conditional execution ....
....from only a path of control, whereas a hyperblock contains instructions from multiple paths of control. If conversion [AKW83] is used to convert control dependences within the hyperblock to data dependences. Boosting Boosting is a technique for statically specifying speculative execution [SLH90, SHL92, Smi92] Boosting converts control dependences into data dependences using a technique similar to if conversion, and then executes the if converted operations in a speculative manner before their guards are available. The guards of a boosted operation includes information about the ....
M.D. Smith, M.S. Lam, and M.A. Horowitz. Boosting beyond static scheduling in a superscalar processor. In Proc., 17th Annual Internat. Symp. on Computer Architecture, pages 344--354, June 1990. 208 Reducing the Impact of Register Pressure on S.P. Loops
....general percolation scheduling relaxes the constraint by assuming the availability of a non trapping version of instructions. This greatly increases the number of instructions that can be speculated. Although this approach has problem in detecting all exceptions caused by speculated instructions [14, 19], it remains an effective optimization, especially when the program is known to be correct. Percolation scheduling considers all operations in the program simultaneously instead of those among the main traces as in trace scheduling. The scheduled program is first transformed into tree graphs. ....
M. D. Smith, M. S. Lam and M. A. Horowitz, "Boosting Beyond Static Scheduling in a Superscalar Processor," Proc. of the 17th Int'l Symp. on Computer Architecture, May 1990, pp. 344--354. 16
....architectures, the SPARC [20] for example, have hardware support to annul the instruction in the delay slot following a branch. Torch uses a hardware shadow structure to hold the side effects of boosted instructions in order to commit or squash the side effects of executed speculative instructions [33]. Sentinel scheduling is used to support speculative execution, this technique leaves a sentinel instruction in the home block of every potential exception causing instruction that is speculatively executed [26] The sentinel instruction reports any exceptions that were caused by the speculative ....
M.D. Smith, M. Lam, and M.A. Horowitz. Boosting beyond static scheduling in a superscalar processor. In Proceedings of the 17th Annual Symposium on Computer Architecture, pages 344--354, May 1990.
....between nodes and controlling all other intra and inter node communication. MAGIC (shown in Figure2.2) is based on a statically scheduled, dual issue RISC protocol processor core that executes the FLASH firmware. This protocol processor core is derived from the Stanford TORCH design, described in [Smith90]. The cache coherence protocol, communication with I O devices, and communication between nodes are all implemented in the FLASH firmware. Since the firmware can be changed just by rebooting the system, this approach gives FLASH a degree of flexibility not found in other scalable multiprocessors ....
M. Smith, M. Lam, and M. Horowitz. "Boosting Beyond Static Scheduling in a Superscalar Processor." In Proceedings of the 7th Annual International Symposium on Computer Architecture, pp. 344-354, May1990.
....be realized on larger applications, these results do suggest that promising performance gains can be obtained once exception related dependences are eliminated using the techniques described in this work. 6 Related Work Previous work on speculative code motion for superscalar and VLIW processors [8, 24, 19, 13] has some similarities with our work, in that it involves aggressive code motion and recovery from exceptions thrown by speculative instructions. Broadly, our work differs in at least two ways. First, it does not require any hardware support, while these approaches rely on special hardware to ....
M.D. Smith, M.S. Lam, and M.A. Horowitz. Boosting beyond static scheduling in a superscalar processor. In Proc. 17th International Symposium on Computer Architecture, pages 344--354, May 1990.
....the gaps are filled with instructions from the most likely succeeding block. This requires hardware support to store and commit or discard the results of speculative operations. Speculative instructions that cause exceptions must have the exceptions suppressed until they are committed. Boosting [Smith et al. 1990; Smith et al. 1992] tags registers with the number of speculative branches upon which they depend. A correctly predicted branch causes the tags to be decremented; an incorrectly predicted branch causes all speculated results to be discarded. A shift register contains a bit that signals whether an ....
Smith, M. D., Lam, M. S., and Horowitz, M. A. 1990. Boosting beyond static scheduling in a superscalar processor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, (Seattle, Wash., May). Comput. Arch. News, 18, 2 (June), 344--354.
....be realized on larger applications, these results do suggest that promising performance gains can be obtained once exception related dependences are eliminated using the techniques described in this work. 6 Related Work Previous work on speculative code motion for superscalar and VLIW processors [8, 24, 19, 13] has some similarities with our work, in that it involves aggressive code motion and recovery from exceptions thrown by speculative instruc 9 All JVMs, including those from Sun Microsystems, IBM, and Microsoft, tested by Pugh [22, 17] were found to break the Java memory model. 16 tions. The ....
....IBM, and Microsoft, tested by Pugh [22, 17] were found to break the Java memory model. 16 tions. The general percolation scheduling model [8] uses hardware support for silent exceptions, but possibly fails to detect an exception that should be thrown. The instruction boosting scheduling model [24] avoids this drawback, but requires greater hardware support in the form of shadow register files and shadow store buffers, which hold the results of speculative instructions. Sentinel scheduling [19] and its variants use less expensive hardware support. Broadly, our work differs in at least two ....
M.D. Smith, M.S. Lam, and M.A. Horowitz. Boosting beyond static scheduling in a superscalar processor. In Proc. 17th International Symposium on Computer Architecture, pages 344--354, May 1990.
....with software pipeline loop scheduling [7] or straight line code scheduling [8] are effective for exposing ILP only when branch conditions can be exposed in advance. For applications where accurate branch prediction is not possible, speculative execution is an important source of ILP. [9] [10] 3] Lack of ILP is intimately tied to the increasingly important problem of coping with high memory latency. As such, speculation can also diminish the negative effects of memory latency. 1 2 BACKGROUND AND RELATED WORK Speculative execution refers to the execution of an instruction ....
....since it is optimal in terms of performance. 2.3 Instruction Boosting Scheduling Model Instruction boosting proposed by Smith, et al. combines extra hardware support in the form of shadow register files and extra compiler support by generating recovery blocks to handle exception recovery. [9] [11] When an exception occurs for a speculated PEI, the exception is recorded with respect to one of the shadow register files. If no exception occurs for a speculated PEI, the results of the speculated instruction are put into the shadow register file. At the commit point for a speculated PEI, ....
M. D. Smith, M. S. Lam, and M. A. Horowitz, "Boosting beyond static scheduling in a superscalar processor," in Proceedings of the 17th International Symposium on Computer Architecture, pp. 344--354, May 1990.
....the program or incorrectly overwrites a value when the branch is mispredicted. Various Technical Report CRHC 91 29, University of Illinois 3 hardware techniques can be used to prevent such hazards. Buffers can be used to store the values of the moved instructions until the branch commits [12] 21] [22]. If the branch is taken, the values in the buffers are squashed. In this model, exception handling can be delayed until the branch commits. Alternatively, non trapping instructions can be used to guarantee that a moved instruction does not cause an exception [8] In this paper we focus on static ....
....that cannot cause exceptions and those that do not overwrite a value in the live out set of the taken path of a conditional branch can be moved above the branch. The general code percolation model strictly enforces Restriction 1 but not Restriction 2. In the speculative code percolation model [22], code motion is unrestricted. In the Section 4 we discuss the architecture support required for each model. Examples of code motion can be shown using the assembly code in Figure 6. This is the assembly code of the C code in Figure 1 after superblock formation. The loop has been unrolled once to ....
[Article contains additional citation context not shown here]
M. D. Smith, M. S. Lam, and M. A. Horowitz, "Boosting Beyond Static Scheduling in a Superscalar Processor", Proceedings of the 17th International Symposium on Computer Architecture, June, 1990.
....and non excepting instructions [6] 7] which extend, but still limit, the compiler s ability to schedule instructions for speculative execution. Recently we proposed a general architectural mechanism called boosting that provides the compiler with an unconstrained model of speculative execution [23]. That paper discusses the ideas that lead to the concept of boosting, and it contains a preliminary experiment to justify further research. Since then, we have constructed a complete compiler system and a working hardware model to better understand the capabilities and costs of boosting. Section ....
M.D. Smith, M.S. Lam, and M.A. Horowitz. Boosting Beyond Static Scheduling in a Superscalar Processor. In the Proc. 17th Int. Symp. on Computer Architecture, pp. 344--354, May 1990.
No context found.
M. D. Smith, M. Lam, and M. A. Horowitz. Boosting beyond static scheduling in a superscalar processor. In Proceedings of the 17th Annual Symposium on Computer Architecture, pages 344-354, 1990.
No context found.
M. Smith, M. Lam, and M. Horowitz. Boosting beyond static scheduling in a superscalar processor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 344--354, 1990.
No context found.
Smith, M. D., Lam, M., and Horowitz, M. A. Boosting beyond static scheduling in a superscalar processor. In Proceedings of the 17th Annual Symposium on Computer Architecture (1990), pp. 344--354.
No context found.
Smith, M. D., Lam, M., and Horowitz, M. A. Boosting beyond static scheduling in a superscalar processor. In Proceedings of the 17th Annual Symposium on Computer Architecture (1990), pp. 344--354.
No context found.
M. D. Smith, M. S. Lam, and M. A. Horowitz. Boosting beyond static scheduling in a superscalar processor. In Proc. ISCA-17, Seattle, WA, May 1990.
No context found.
M. D. Smith, M. Lam, and M. A. Horowitz. Boosting beyond static scheduling in a superscalar processor. In Proceedings of the 17th Annual Symposium on Computer Architecture, pages 344--354, 1990.
No context found.
M. D. Smith, M. S. Lam, and M. A. Horowitz. Boosting beyond static scheduling in a superscalar processor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 344--354, May 1990.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC