| Y. N. Patt, W. M. Hwu, and M. Shebanow. HPS, a New Microarchitecture: Rationale and Introduction. In Proceedings of the 18th Annual Workshop on Microprogramming, pages 103--108, 1985. |
....without data and branch dependencies between them can be executed concurrently, enhancing performance. In this paper the basic computer system allows concurrency exploitation of input code. Such exploitation can be done either with software based methods [1, 2, 3] or hardware based methods [8, 12, 13, 14, 16, 17], or a combination of the two [18, 20] In this work, the precise methods used are unimportant; the important thing is that in some way the concurrency in the code is exploited. We define branch effects loosely as any attribute of a branch which impedes execution of the code past the branch; ....
Patt, Y., Hwu, W., and Shebanow, M. HPS, a New Microarchitecture: Rationale and Introduction. In Proceedings of MICRO-18, pages 103-108. ACM, December, 1985.
....between references to these distinct areas in the stack exposes more parallelism, which can be exploited by simple in order multiple issue engines. 5.1. 2 Related Work The fill unit and decoded instruction cache were hardware assists proposed by Patt et al. as part of the HPS design philosophy [86, 84]. They describe the fill unit, which compacts instructions generated from a serial instruction stream into a decoded instruction cache. The use of these hardware assists on an HPS version of the DEC VAX processor was seen to result in significant performance improvements [84, 85] The idea of a ....
Y.N. Patt, W.-M. Hwu and M.C. Shebanow. HPS, A New Microarchitecture: Rationale and Introduction. In Proceedings of Micro-18, pages 103--108,
....scheduling to eciently ll pipelines. Hardware to handle multiple conditional jumps can also be e ectively exploited by Percolation Scheduling. The design of such an architecture and its advantages are described in [7] Also suited to take advantage of our system are data ow microengines [14]. Traditionally, it has been claimed that data ow architectures require very little compile time analysis. However, from a pragmatic point of view, this lack of compile time e ort will impose a very heavy burden in terms of communication and synchronization costs, and may lead to extremely ....
Y.N.Patt, W.Hwu, and M.C.Shebanow. HPS, A New Microarchitecture: Rationale and Introduction. Proceedings of the 18th Annual Workshop on Microprogramming, Asilomar, Ca, December 1985.
....within a single paradigm. Further, EDS does not require the compiler and run time software to manage multiple threads of execution as in Hybrid Dataflow or other Multithreading schemes [33, 38, 4, 2] 3 Conventional architectures utilizing dynamic scheduling have also been proposed and built [34, 1, 26, 35]. In these schemes, the static instructions are interpreted sequentially yielding a dynamic instruction stream. Execution of the dynamic instruction stream is allowed to proceed out of order, subject only to dependences detected at run time. Unlike these schemes, the EDS hardware is not burdened ....
Yale N. Patt, Wen-Mei Hwu, and Michael Shebanow. HPS, a new microarchitecture: Rationale and introduction. In Proceedings of the 18th Annual Workshop on Microprogramming, pages 103--108. IEEE Computer Society Press, December 1985.
....of significant amounts of instruction level parallelism. Therefore, superscalar and VLIW (very large instruction word) machines have been designed, which can execute several instructions in parallel. In order to use these resources the instructions are reordered by the hardware [Tho64, Tom67, PHS85, Soh90] or by compiler techniques like basic block instruction scheduling [LDSM80, HG83, GM86, EK92] trace scheduling [Fis81, Ell85] and software pipelining [RG81, Lam88, Rau94] To ensure correctness, the order between dependent instructions must be maintained, which restricts reordering and ....
....by the data flow between instructions, but by reusing registers. Several methods for dealing with anti dependences have been proposed: 2. 1 Register Renaming Anti dependences can be removed (or at least moved) by register renaming [PW86] This technique can be implemented in hardware [Tom67, PHS85, Soh90] and as compiler optimization [PW86, Lam88] Note that only compiler based renaming techniques can increase the reordering freedom for the compiler. x 1 x 3 2 y x op a x 4 x 5 x 3 2 y x op a x 1 x 5 4 x flow dependence anti dependence Fig. 1. Register renaming Figure ....
Yale N. Patt, Wen-mei Hwu, and Michael Shebanow. HPS, a new microarchitecture: Rationale and introduction. In The 18 th Annual Workshop on Microprogramming (MICRO-18), pages 103--108, 1985.
....plot cc 254M ss cc 114M Table 4.3: The maximum rearrangement optimization level of each benchmark along with its instruction count while running the measurement input set. 25 4. 4 Microarchitectural Model The simulated microarchitecture is a 16 wide implementation of the HPS execution model [39, 40] which performs out of order execution using the Tomasulo Algorithm [52] In the HPS model, instructions are issued into a dynamic instruction window (consisting of node tables) In the window, instructions wait for their operands to be generated. Once generated, an instruction is ready to be ....
Y. Patt, W. Hwu, and M. Shebanow, \HPS, a new microarchitecture: Rationale and introduction," in Proceedings of the 18th Annual ACM/IEEE International Symposium on Microarchitecture, pp. 103-107, 1985. 143
....design strategy. This choice should not be misconstrued as a requirement, though. The multiscalar model is flexible in terms of the strategy used to perform the computation of tasks. Nevertheless, the focus here is on the use of processing units based on superscalar techniques. Both academia [1, 55, 58, 68, 69, 81, 86, 89, 95] and industry [18, 31 36, 61, 77, 79, 82, 91 93] have considered a wide range of different superscalar designs to implement dynamic instruction scheduling. The performance of these designs as well as their costs in ideal and real implementations varies widely. Yet, because most of these ....
Yale N. Patt, Wen mei Hwu, and Michael Shebanow. HPS, a new microarchitecture: Rationale and introduction. In Proceedings of the 18th Annual Workshop on Microprogramming, pages 103--108, December 1985.
....design strategy. This choice should not be misconstrued as a requirement, though. The multiscalar model is flexible in terms of the strategy used to perform the computation of tasks. Nevertheless, the focus here is on the use of processing units based on superscalar techniques. Both academia [1, 55, 58, 68, 69, 81, 86, 89, 95] and industry [18, 31 36, 61, 77, 79, 82, 91 93] have considered a wide range of different superscalar designs to implement dynamic instruction scheduling. The performance of these designs as well as their costs in ideal and real implementations varies widely. Yet, because most of these ....
Yale N. Patt, Wen mei Hwu, and Michael Shebanow. HPS, a new microarchitecture: Rationale and introduction. In Proceedings of the 18th Annual Workshop on Microprogramming, pages 103--108, December 1985.
....level. To do this, we allow multiple loop iterations to execute concurrenctly on a single processor. This can be accomplished by various methods such as loop unrolling, modulo scheduling [3, 10, 27, 34, 35, 36] multithreading [8, 37, 2, 40] and various forms of dynamic instruction scheduling [38, 32, 6]. In this paper, we are not concerned with the details of any particular technique, but rather focus on the characteristics common to all of them. Let the window size, W , denote the number of loop iterations that execute concurrently on a single processor at any given time. If the loop is ....
....Another important issue in processor design is the instruction issuing mechanism. Conventional processors issue instructions in the order they are encountered in the program text (subject to control flow) Sophisticated von Neumann architectures allow out of order execution of instructions [38, 32, 1, 6], as do Dataflow architectures [5, 17, 23, 30, 31, 15] These architectures dynamically schedule instructions, subject to dependences evaluated at run time, in order to maximally utilize processors or functional units. However, with appropriate compile time instruction scheduling, more ....
Yale N. Patt, Wen-Mei Hwu, and Michael Shebanow. HPS, a new microarchitecture: Rationale and introduction. In Proceedings of the 18th Annual Workshop on Microprogramming, pages 103--108. IEEE Computer Society Press, December 1985.
....the amount of parallelism available within such a fixed window size, such as loop transformations[WL91] should work well. 6 Conclusion Dynamic scheduling approaches date back to the CDC 6600[Tho64] and the IBM 360 91[Tom67] Various techniques for dynamic scheduling have been proposed in [WS84, PHS85, AKT86, Soh90, DT92] Dynamic scheduling has even been proposed for VLIW processors[Rau93] This paper inquires into the factors that affect the performance of such a machine. The factors of fetch availability, efficiency, and utility were introduced in order to provide some insight into the ....
Y. N. Patt, W. Hwu, and M. Shebanow. HPS, a new microarchitecture: Rationale and introduction. In Proc. of the 18th Annual Workshop on Microprogramming, pages 103--108, December 1985.
....and Patt combined various hardware and software techniques to exploit parallelism. MP91] They examined hardware speculative execution, dynamic scheduling, and basic block enlargement, and found that this combination of techniques could obtain speedups of close to 6 on the HPS architecture[PMHS85] PHS85] Though this study looked at some compiler techniques, it did not examine the effect of more aggressive static scheduling on these architectures. Smith, Horowitz, and Lam have argued that the hardware complexities of dynamic scheduling can be avoided through boosting, which provides simple ....
Y.N. Patt, W. Hwu, and M.C. Shebanow. HPS, a new microarchitecture: Rationale and introduction. In Proceedings of the 18th Annual Workshop on Microprogramming, pages 103--108, December 1985.
....time of programs. In this model, instructions not in a branch s domain 1 may be independent of the branch, and able to execute concurrently with it; branches may execute concurrently with branches. Speculative execution, an orthogonal technique, has been used separately to great advantage [9, 11]. An instruction is executed speculatively if it is executed before a prioroccurring branch has executed, or in other words, before it is known for sure which path the branch will take. Traditionally, there are two forms of speculative execution: branch prediction, in which code execution proceeds ....
....stream order is the order of instructions seen by the classic CPU, as indicated or followed by its Program Counter. The static order [15, 18, 19, 24] is the order of the code as it exists in memory, i.e. it is independent of the control flow, or branch executions, of the code. Patt et al.[9] first combined a classic data dependency reduction scheme, the Tomasulo algorithm[16] with branch prediction to yield the HPS model. Like most superscalar machines [e.g. Intel 586, Metaflow[11] it processes only one branch per cycle at most, and uses the dynamic instruction stream as its ....
Y. Patt, W. Hwu, and M. Shebanow. Hps, a new microarchitecture: Rationale and introduction. In Proceedings of MICRO-18, pages 103--108. ACM, December 1985.
.... renaming, allowing multiple instances of a given instruction to execute simultaneously [106] A similar approach based on an instruction dispatch stack was proposed later by Acosta et al. 1] Dataflow execution of a conventional dynamic instruction stream was also proposed in the HPS architecture [81]. In all of these approaches, instructions are issued to functional units in the same order encountered in a sequential execution, although instruction issuing can get ahead of instruction completion (thus exploiting pipeline parallelism) When a conditional branch is encountered that is dependent ....
....9.355 77.009 ARC2D 1.000 2.616 18.775 2.734 7.582 56.059 TRFD 1.000 3.156 24.715 2.669 9.137 73.815 FLO52Q 1.000 2.428 16.705 2.291 6.575 49.407 SPEC77 1.000 2.531 17.839 3.602 8. 195 54.119 multithreading [18, 47, 104, 113, 2, 79, 77, 17, 111] and various forms of dynamic instruction scheduling [106, 81, 14]. The focus here is not on the details of any particular method, but rather on the characteristics common to all of them. Let the window size W denote the number of loop iterations that execute concurrently on a single processor at any given time. If the loop is completely parallel, then the ....
[Article contains additional citation context not shown here]
Yale N. Patt, Wen-Mei Hwu, and Michael Shebanow. HPS, a new microarchitecture: Rationale and introduction. In Proceedings of the 18th Annual Workshop on Microprogramming, pages 103--108. IEEE Computer Society Press, December 1985.
....to completion. Table 1 (see Section 1) lists the input data set used for each benchmark and the dynamic instruction counts. We will concentrate on the gcc and perl benchmarks, the two benchmarks with the largest number of indirect jumps. The machine model simulated is the HPS microarchitecture [9] [10] HPS is a wide issue out of order execution machine, using Tomasulo algorithm for dynamic scheduling [12] Checkpointing [3] is used to maintain precise exceptions. Checkpoints are established for each branch; thus, once a branch misprediction is determined, instructions from the correct ....
Yale Patt, W. Hwu, and Michael Shebanow. HPS, a new microarchitecture: Rationale and introduction. In Proceedings of the 18th Annual ACM/IEEE International Symposium on Microarchitecture, pages 103--107, 1985.
....library functions so we are not able to recompile them with the block enlargement optimization. 4.3. The Block Structured ISA Processor The block structured ISA processor modeled in our experiments is a sixteen wide issue, dynamically scheduled processor that implements the HPS execution model [16, 17]. The processor supports speculative execution as required by block structured ISAs (see section 2) It can fetch and issue one atomic block each cycle. Each atomic block can contain up to sixteen operations. Dynamic register renaming removes any anti and output dependencies in the dynamic ....
Y. Patt, W. Hwu, and M. Shebanow. HPS, a new microarchitecture: Rationale and introduction. In Proceedings of the 18th Annual Microprogramming Workshop, pages 103--107, 1985.
No context found.
Y. N. Patt, W. M. Hwu, and M. Shebanow. HPS, a New Microarchitecture: Rationale and Introduction. In Proceedings of the 18th Annual Workshop on Microprogramming, pages 103--108, 1985.
No context found.
Y.N. Patt, W. Hwu, and M. Shebanow. HPS, a new microarchitecture: Rationale and introduction. In Proc. of the 18th Annual Workshop on Microprogramming, pages 103--108, December 1985.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC