| Dana S. Henry, Bradley C. Kuszmaul, and Vinod Viswanath. The Ultrascalar processor---an asymptotically scalable superscalar microarchitecture. In The Twentieth Anniversary Conference on Advanced Research in VLSI (ARVLSI'99), pages 256--273, Atlanta, GA, 21--24 March 1999. http://ee.yale.edu/papers/usmemo3.ps.gz. |
....via data speculation is also obtained by branch effect reduction methods. 2.2 Other Approaches Probably the most successful high IPC machine to date is Lipasti and Shens Superspeculative architecture[10] achieving an IPC of about seven with realistic hardware assumptions. The Ultrascalar[4, 5] machine achieves asymptotic scalability, but only realizes a small amount of IPC, due to its conservative execution model. The Warp Engine[2, 6] uses time tags, like Levo, for a large amount of speculation; however their realization of time tags is cumbersome, utilizing floating point numbers ....
D. S. Henry, B. C. Kuszmaul, and V. Viswanath, "The Ultrascalar Processor: An Asymptotically Scalable Superscalar Microarchitecture," in HIPC '98, December 1998, URL: http://ee.yale.edu/papers/HIPC98-abstract.ps.gz.
....produced by the previous producer instructions. For this discussion, it is assumed that all false dependencies due to WAR or WAW hazards have been removed by register renaming 2 , assuming a renaming scheme similar to the Ultrascalar s datapath which effectively allows for unlimited renaming [9, 7], or that there is an infinite pool of physical registers (a method for dealing with a finite pool of physical registers is presented in section 3.4.2) The timestamp assigned to R dest is R dest : max Rarg1 ; Rarg2 Lat op (1) which captures the notion that I can not run until both ....
....in Wall s study on the limits of instruction level parallelism [20] The second is a Wrap Around window in which window entries are continuously recycled in a fashion similar to a circular queue. This policy was also presented in Wall s study [20] and is used in the Ultrascalar processor [9]. The Compressing window makes window entries available as soon as the instruction in the entry has completed. This is accomplished by shifting all other outstanding instructions up in the buffer ( compressing out the vacancies) This policy is implemented in Alpha s 21264 microprocessor [11] ....
[Article contains additional citation context not shown here]
Dana S. Henry, Bradley C. Kuszmaul, and Vinod Viswanath. The ultrascalar processor - an asymptotically scalable superscalar microarchitecture. The Twentieth Anniversary Conference on Advanced Research in VLSI (ARVLSI '99), pages 256--273, March 1999.
....to TRS in Section 2. Section 3 describes the minimalist instruction set architecture (ISA) we use. Section 4 defines the operational semantics of the ISA using a simple in order execution processor (P B ) In Section 5 we explain the implementation of the ISA in the Ultrascalar processor (P U )[3], and define its operational semantics. In Section 6 we formally prove that P U is a correct implementation of the ISA, by showing that P B and P U can simulate each other. The proof is very similar to and is based on proving the correctness of an out of order speculative processor by [1] 2 Term ....
Dana S. Henry, Bradley C. Kuszmaul, and Vinod Viswanath. The Ultrascalar processor---an asymptotically scalable superscalar microarchitecture. In The Twentieth Anniversary Conference on Advanced Research in VLSI (ARVLSI'99), pages 256--273, Atlanta, GA, 21--24 March 1999. http://ee.yale.edu/papers/usmemo3.ps.gz.
....point, interrupts, or a privileged mode. In order to meet the area constraint, we have chosen a RISC instruction set with only 16 32 bit registers. 1 Many of the concepts used in the UltraC2K were motivated by our previous theoretical results on asymptotically optimal superscalar processors [3, 4, 1]. In constrast, this work focuses on solving the engineering problems of building a real 8 issue processor. The processor in Figure 1 illustrates the ideas behind the UltraC2K. For simplicity of illustration, the figure assumes a processor with only four outstanding instructions and four logical ....
Dana S. Henry, Bradley C. Kuszmaul, and Vinod Viswanath. The Ultrascalar processor---an asymptotically scalable superscalar microarchitecture. In The Twentieth Anniversary Conference on Advanced Research in VLSI (ARVLSI'99), pages 256--273, Atlanta, GA, 21--24 March 1999. http://ee.yale.edu/papers/usmemo3.ps.gz.
....it makes no sense to provide more memory bandwidth than the total instruction issue rate. This paper contributes two new architectures, the Ultrascalar II and a hybrid Ultrascalar, and compares them to each other and to a third, the Ultrascalar I. We described the Ultrascalar I architecture in [7], but did not analyze its complexity in terms of L. For L equal to 64 64 bit values, as is found in today s architectures, the improvement in layout area is dramatic over the Ultrascalar I. This paper does not evaluate the benefits of larger issue widths, window sizes, or of providing more ....
....shall show how to implement the Ultrascalar I with only logarithmic gate delays, but for ease of understanding, we start with an explanation of circuits that have linear gate delay. The description given here stands on its own, but a more in depth description of the Ultrascalar I can be found in [7]. Figure 1 shows the datapath of an Ultrascalar I processor implemented in linear gate delay. Eight outstanding instructions are shown, analogous to an eight instruction window in today s superscalars. Each of the eight instructions occupies one execution station. An execution station is ....
[Article contains additional citation context not shown here]
Dana S. Henry, Bradley C. Kuszmaul, and Vinod Viswanath. The Ultrascalar processor---an asymptotically scalable superscalar microarchitecture. In The Twentieth Anniversary Conference on Advanced Research in VLSI (ARVLSI '99), pages 256--273, Atlanta, GA, 21--24 March 1999. http://ee.yale.edu /papers/usmemo3.ps.gz.
....The oldest instruction in the buffer is Instruction A, pointed to by the Head pointer. The youngest, most recently fetched, is Instruction H pointed to by the Tail pointer. This work was partly motivated by our previous theoretical results on asymptotically optimal superscalar processors [3, 6]. In contrast, this work focuses on understanding the engineering problems of the wide issue processors of the near future. Figure 2(a) also shows a linear gate delay implementation of a CSP circuit. A CSP circuit with a linear gate delay consists of a ring of operators, and MUXes. We attach ....
Dana S. Henry, Bradley C. Kuszmaul, and Vinod Viswanath. The Ultrascalar processor---an asymptotically scalable superscalar microarchitecture. In The Twentieth Anniversary Conference on Advanced Research in VLSI (ARVLSI'99), pages 256--273, Atlanta, GA, 21--24 March 1999. http://ee.yale.edu/papers/usmemo3. ps.gz.
....since it makes no sense to provide more memory bandwidth than the total instruction issue rate. This paper contributes two new architectures, the Ultrascalar II and a hybrid Ultrascalar, and compares them to each other and to third, the Ultrascalar I. We described the Ultrascalar I architecture in [7], but did not analyze its complexity in terms of L. For L equal to 64 64 bit values, as is found in today s architectures, the improvement in layout area is dramatic over the Ultrascalar I. This paper does not evaluate the benefits of larger issue widths, window sizes, or numbers of logical ....
....shall show how to implement the Ultrascalar I with only logarithmic gate delays, but for ease of understanding, we start with an explanation of circuits that have linear gate delay. The description given here stands on its own, but a more in depth description of the Ultrascalar I can be found in [7]. Figure 1 shows the datapath of an Ultrascalar I processor implemented in linear gate delay. Eight outstanding instructions are shown, analogous to an eight instruction window in today s superscalars. Each of the eight instructions occupies one execution station. An execution station is ....
[Article contains additional citation context not shown here]
Dana S. Henry, Bradley C. Kuszmaul, and Vinod Viswanath. The ultrascalar processor---an asymptotically scalable superscalar microarchitecture. In The Twentieth Anniversary Conference on Advanced Research in VLSI (ARVLSI'99), Atlanta, GA, 21--24 March 1999. (To appear.) http://ee.yale.edu/papers/usmemo3.ps.gz.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC