12 citations found. Retrieving documents...
G.R. Beck, D.W.L. Yen, and T.L. Anderson, "The Cydra 5 Mini-Supercomputer: Architecture and Implementation, " J. Supercomputing 7, May 1993, pp. 143-180.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Modulo Schedule Buffers - Merten, Hwu (2001)   (1 citation)  (Correct)

....multiply was issued prior to the interrupt and has not yet completed. Furthermore, if that instruction is allowed to complete before the interrupt is actually taken, then the value of r1 would be prematurely overwritten with the result of the multiply. Hardware techniques, such as snapshot buffers [15] and replay buffers [16] have been proposed to save the result and its relative write back time upon a context switch. These features are often costly to implement and are not present in the TI architecture. Therefore, in the TI processors, interrupts must be postponed during any portion of the ....

G. R. Beck, D. W. Yen, and T. L. Anderson, "The Cydra 5 minisupercomputer: Architecture and implementation," The Journal of Supercomputing, vol. 7, pp. 143--180, January 1993.


Instruction Cache Designs for a Class of.. - Conte, Banerjia..   (Correct)

....reside in the same frame to ease the requirements on the i fetch mechanism [14] This requires NOPs, thereby violating RSI. The Cydrome Cydra 5 VLIW machine used a split encoding such that instruction cache blocks were composed of either one MultiOp or multiple, one Op MultiOps called UniOps [15] [16]. Cache blocks composed of one MultiOp are in an uncompressed form and those composed of UniOps are padded with NOPs, if needed for cache block alignment. It is also non RSI. Another 23 commercial VLIW architecture, the Multiflow TRACE family of machines, used a compressed encoding [17] Nops ....

G. R. Beck, D. W. L. Yen, and T. L. Anderson, "The Cydra 5 minisupercomputer: architecture and implementation," J. Supercomputing, vol. 7, no. 1, pp. 143--180, Jan. 1993. 25


NextPC computation for a banked instruction cache for a.. - Banerjia, Menezes, Conte (1996)   (Correct)

....MultiOp contains an Op for each functional unit in the machine. 2.2 Related Work Only two classes of encodings uncompressed and compressed have been introduced so far. Other classes of encodings can also be used for VLIW architectures, such as a frame encoding [5] split encoding [6] [7], and a packet encoding [8] among others. The more germane issue for this report is work related to instruction fetch mechanisms, particularly those that deal with variably sized instructions and NextPC computation. A review of the issue might help to illustrate the (potential) problem. One step ....

G. R. Beck, D. W. L. Yen, and T. L. Anderson, "The Cydra 5 minisupercomputer: architecture and implementation," J. Supercomputing, vol. 7, pp. 143--180, Jan. 1993.


Explicit Multi-Threading (XMT) Bridging Models for.. - Vishkin, Dascal.. (1998)   (Correct)

....of MCCs in a cover does not exceed the number of modules, an adequate solution to the assignment problem follows simply by assigning all the values in one MCC to the same module. A solution that requires a large number of registers was implemented by the context register matrix in the Cydra 5 [BYA, DT93], where each functional unit has a dedicated row for writes in the register matrix, and each row can be read in parallel by all functional units. This structure permits conflict free register reads and writes for every functional unit. Two recent solutions rely on the following general idea. ....

G. R. Beck, D. W. L. Yen and T. L. Anderson. "The Cydra 5 minisupercomputer: architecture and implementation". The Journal of Supercomputing 7, 143--180, 1993.


Structural And Static Analysis Techniques For Enhancing Compiler.. - Crozier (1999)   (Correct)

....promotion and instruction merging optimizations have been applied. These optimizations are discussed in Chapter 3. 5 2.1. 2 Survey of predicated execution in commercial systems Predication is first seen in the Cydra 5, a VLIW multiprocessor system utilizing a directeddataflow architecture [4] [7]. Each Cydra 5 instruction word contains seven operations, each of which is individually predicated. An additional source operand added to each operation specifies a predicate located within the predicate register file. The predicate register file is an array of 128 Boolean (one bit) registers. ....

G. R. Beck, D. W. Yen, and T. L. Anderson, "The Cydra 5 minisupercomputer: Architecture and implementation," The Journal of Supercomputing, vol. 7, no. 1, pp. 143--180, January 1993.


Supporting Predicated Execution: Techniques And Tradeoffs - McCormick (1996)   (1 citation)  (Correct)

....supported a select instruction which selects one of two source operands to copy to the destination operand. The HP PA RISC architecture [8] supports instruction nullification by which some instructions can squash the execution of the next instruction. The Cydra 5 had extensive predication support [9], 10] The predication support in the Cydra 5 was mainly used to enhance the ability of modulo scheduling to expose ILP across loop iterations in numerical code. Each instruction in this architecture had a predicate operand. The predicates were defined by compare operations and special loop ....

G. R. Beck, D. W. Yen, and T. L. Anderson, "The Cydra 5 minisupercomputer: Architecture and implementation," The Journal of Supercomputing, vol. 7, pp. 143--180, January 1993.


Influence of Variable Time Operations in Static.. - Borensztejn, Barrado, .. (1999)   (Correct)

....of cycles needed to execute the program, this is, to maximize instruction level parallelism. Scheduling can be done at different levels of the code: at the basic block level, at loop level or globally [2] A special approach that schedules instructions at the loop level is software pipelining [1, 3, 4, 6, 8], where instructions belonging to different iterations of the loop overlap. Software pipelining replaces the original instructions of the loop body with a sequence of long instructions known as loop kernel. The number of long instructions that compose the loop kernel is known as Initiation ....

....model. 4.1 Software Pipelining Optimal scheduling for limited resources is known to be a NP Complete problem [5] This is true also for software pipelining. As other NP Complete problems, software pipelining can be formalized as maximizing minimizing function subjected to a system of equations. [4, 6] show how optimal solutions can be found for small sized dependence graphs by solving such system with linear programming, but paying a large CPU time penalty. On the other hand, heuristics have improved and reach more than 90 of optimal schedules [7, 2] Most heuristics are based on Modulo ....

G. Beck, D. Yen and T. Anderson: "The Cydra 5 minisupercomputer: Architecture and implementation", The J. Supercomputing, 7, pp.143-180, 1993.


Architectural Support for Compiler-Synthesized.. - August, Connors.. (1997)   (16 citations)  (Correct)

....interface to support sophisticated run time branch prediction schemes. At the architectural level, a set of branch instructions are defined which base their decision on a predicate. These branch instructions are similar to those discussed in [1] and defined in the IBM RS6000 [2] Cydrome Cydra 5 [3], HPL PlayDoh [4] and SPARC V9 [5] Each branch requires a previously executed instruction to set a predicate. Therefore, more instructions are potentially required in the compare and branch model, such as in the HP PA RISC architecture [6] However, in future architectures that support ....

G. R. Beck, D. W. Yen, and T. L. Anderson, "The Cydra 5 minisupercomputer: Architecture and implementation, " The Journal of Supercomputing, vol. 7, pp. 143--180, January 1993.


A Fast Interrupt Handling Scheme for VLIW Processors - Özer, Sathaye, Menezes.. (1998)   (5 citations)  (Correct)

....operations that can be executed in parallel are determined at compile time, rather than at run time as in superscalar processors [5] VLIW architectures [3] 9] are therefore classified as statically scheduled architectures. Commercial examples of VLIW systems include the Cydrome Cydra 5 [1] [9] and Multiflow TRACE [3] 7] Embedded VLIW processors include Texas Instruments TMS320C62 [11] Philips TriMedia TM1000 [12] and Chromatic Mpact [15] In VLIW processors, independent operations are grouped into a single long instruction in order to extract instruction level parallelism (ILP) ....

G.R. Beck, D.W.L. Yen, T.L. Anderson," The Cydra 5 Minisupercomputer: Architecture and Implementation," The Journal of Supercomputing,7, 1993.


A Comparison of Full and Partial Predicated Execution Support for .. - Mahlke (1995)   (32 citations)  (Correct)

....an additional source operand to hold a predicate specifier. In this manner, every instruction may be a predicated. Additionally, a set of predicate defining opcodes are added to efficiently manipulate predicate values. This approach was most notably utilized in the Cydra 5 minisupercomputer [8] [13]. Full predicate execution support provides the most flexibility and the largest potential performance improvements. The other approach is to provide partial predicate support. With partial predicate support, a small number of instructions are provided which conditionally execute, such as a ....

G. R. Beck, D. W. Yen, and T. L. Anderson, "The Cydra 5 minisupercomputer: Architecture and implementation," The Journal of Supercomputing, vol. 7, pp. 143--180, January 1993.


From Algorithm Parallelism to Instruction-Level Parallelism: An.. - Vishkin (1997)   (7 citations)  (Correct)

....of MCCs in a cover does not exceed the number of modules, an adequate solution to the assignment problem follows simply by assigning all the values in one MCC to the same module. A solution that requires a large number of registers was implemented by the context register matrix in the Cydra 5 [BYA, DT93], where each functional unit has a dedicated row for writes in the register matrix, and each row can be read in parallel by all functional units. This structure permits conflict free register reads and writes for every functional unit. Basis for comparison: For comparison with our new solution, ....

G. R. Beck, D. W. L. Yen and T. L. Anderson. "The Cydra 5 minisupercomputer: architecture and implementation". The Journal of Supercomputing 7, 143--180, 1993.


EPIC: Explicitly Parallel Instruction Computing - Schlansker, al. (2000)   (18 citations)  (Correct)

No context found.

G.R. Beck, D.W.L. Yen, and T.L. Anderson, "The Cydra 5 Mini-Supercomputer: Architecture and Implementation, " J. Supercomputing 7, May 1993, pp. 143-180.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC