| M. D. Thistle and B. J. Smith. A processor architecture for horizon. In Supercomputing '88, pages 35--41, November 1988. |
....approach permits the specification of distinct assumed latencies for different occurrences of the same opcode. Although this can be quite useful, it is rather extravagant in its use of instruction bits. The Horizon architecture provides for such a latency specification per MultiOp instruction [17]. Presumably, the value specified is the minimum of the assumed latencies across all operations within a single instruction. The second approach has two sub cases depending on how the assumed latency is deposited into the ELR. One option is to provide all the assumed latencies in the program ....
Thistle, M.R., and Smith, B.J. A processor architecture for Horizon. In Proc. Supercomputing '88, (Orlando, Florida, November 1988), 35-41.
....the first machines to incorporate multi threading was the CDC 6600 Peripheral Processor [3] In this machine, the primary rationale for multithreading was the large di#erence between the processor and memory speeds. More recent commercial attempts include the Denelcor HEP [4] and the Tera Horizon [5]. Both of these machines use multi threading as a means of hiding memory latency in a highly parallel machine intended to execute parallel algorithms. Some interesting research proposals incorporating multi threading include [6] based on the MIPS X architecture and [7] which investigated various ....
Mark R. Thistle and Burton J. Smith. A processor architecture for Horizon. In Proceedings Supercomputing 88, 1988.
....are called finely multithreaded processors, and the others are called coarsely multithreaded processors or block multithreaded processors. Several processor designs have used multithreading to mask communication and synchronization latencies, or to utilize deep pipelines effectively, e.g. [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]. By multithreading a processor such that an instruction from a different thread can be initiated every cycle (or every few cycles) pipeline bubbles due to pipeline dependencies or processor stalls due to memory latency can be prevented. Processors in message passing multicomputers often maintain ....
Mark R. Thistle and Burton J. Smith. A Processor Architecture for Horizon. In Proceedings of Supercomputing '88, November 1988.
....are constructed from reconfigurable logic. An Active Page is, therefore, the data and its associated functions. This greatly reduces the tra#c between the processor and memory for these applications. 2. 9 Tera MTA The Tera Multi Threaded Architecture (MTA) 2] which the successor of the Horizon [24, 35] is a highly parallel machine which seeks to achieve performance by emphasizing throughput and parallelism over speed and complexity. The architecture supports simultaneous multi threaded (SMT) which allows for the interleaving of code from di#erent threads. A thread is merely a sub unit of ....
M. R. Thistle and B. J. Smith. A Processor Architecture for Horizon. Proceedings of Supercomputing
....on multithreaded architectures has been motivated by two concerns: tolerating latency and bridging of synchronization waits by rapid context switches. Three different approaches of multithreaded architectures can be distinguished [Silc98] cycle by cycle interleaving [Halstead88] Papadopulos91] [Thistle88], block interleaving [Agarwal92] Sigmund96] and simultaneous multithreading approach [Tullsen95] Lo97] 2.2.2 Cycle By Cycle Interleaving In the this interleaving model, the processor switches to a different thread after each instruction. An instruction of the same thread is fed in the ....
M. Thistle and B. J. Smith, "A processor architecture for Horizon," Proc. of Supercomputing'88, pp. 35--41, Nov. 1996.
....are constructed from reconfigurable logic. An Active Page is, therefore, the data and its associated functions. This greatly reduces the tra#c between the processor and memory for these applications. 2. 9 Tera MTA The Tera Multi Threaded Architecture (MTA) 2] which the successor of the Horizon [24, 35] is a highly parallel machine which seeks to achieve performance by emphasizing throughput and parallelism over speed and complexity. The architecture supports simultaneous multi threaded (SMT) which allows for the interleaving of code from di#erent threads. A thread is merely a sub unit of ....
M. R. Thistle and B. J. Smith. A Processor Architecture for Horizon. Proceedings of Supercomputing 1988, November 1988.
....to exploit this; however, EDS more elegantly exploits fine grained parallelism and achieves latency tolerance within a single paradigm. Further, EDS does not require the compiler and run time software to manage multiple threads of execution as in Hybrid Dataflow or other Multithreading schemes [33, 38, 4, 2]. 3 Conventional architectures utilizing dynamic scheduling have also been proposed and built [34, 1, 26, 35] In these schemes, the static instructions are interpreted sequentially yielding a dynamic instruction stream. Execution of the dynamic instruction stream is allowed to proceed out of ....
Mark R. Thistle and Burton J. Smith. A processor architecture for Horizon. In Proceedings Supercomputing '88, pages 35--41. IEEE Computer Society Press, November 1988. 37
....machine may issue only one instruction per cycle, but if there is no hardware devoted to preserving the correct execution order of operations, the compiler will have to schedule them with full knowledge of dependences and latencies. Another proposed independence architecture, the Horizon [TS88] encodes an integer N into each operation. The architecture guarantees that all of the next N operations in the instruction stream are independent of the current operation. The task of the hardware is to ensure that no more than N subsequent instructions are issued before the operation is ....
M.R. Thistle and B.J. Smith. A processor architecture for Horizon. In Proc., Supercomputing, pages 35--41, November 1988.
....University, Durham N.C. 27708, U.S.A, sandeep cs.duke.edu] x A preliminary version of this paper appeared in Int l Parallel Processing Symp. 1995 [19] 2 as hypercubes [1] Examples of machines with such topologies include the MasPar MP 1 [3] Intel Paragon, MIT J Machine [6] Tera HORIZON [17], Cray T3D [4, 13] and Polymorphic Torus [9] A torus is a mesh with wrap around links. Although meshes and tori are generally regarded as close families, there are still some distinctions: i) As opposed to meshes, all nodes of a torus are topologically symmetric, ii) a torus has a smaller ....
M. R. Thistle and B. J. Smith. A processor architecture for Horizon. In Supercomputing, pages 35--41, 1988.
....(typically, a RISC architecture) which efficiently executes sequential threads with a set of registers and an advanced control pipeline. The concept of multithreading is not exclusive to the extension of dataflow architectures. For instance, the Denelcor HEP [1] and the Tera Computing System [6] are multithreaded computers in the sense that they execute and control multiple threads in a single pipeline. Dally s J machine [16] does not interleave multiple threads, but it can switch between threads very quickly; thus, we can say that it actually supports the multithreaded computation. In ....
Thistle, M.R. and Smith, B.J.: A Processor Architecture for Horizon, Proc. of IEEE Supercomputing Conference, pp.35-41 (1988).
....in shared memory multiprocessors. Our goal is to select mechanisms which reduce the communication latency between two computational threads. We wish to reduce this latency, since communication is often on the critical path of an application. Other approaches, such as multithreaded processors [22], offer mechanisms to hide latency this will increase the efficiency of the multiprocessor system but may not decrease the time required to complete a particular application. Dally and Wills [9] cite three universal mechanisms which are required to support parallel computation: naming, ....
Mark R. Thistle and Burton J. Smith. A processor architecture for Horizon. Technical Report SRCTR -88-010, Supercomputing Research Center, Institute for Defense Analyses, August 1988.
....that the compiler has to specify independent operations whose operands are available in a given cycle; in other words, the compiler performs the scheduling. The hardware performs the binding of operations and transports to hardware resources. An example is the Horizon architecture as presented in [12]; each operation encodes a number H which specifies the next H concurrent operations. The hardware can execute these H (or less) operations concurrently, without having to test if they are dependent or whether their operands are ready. The hardware still has to allocate (bind) these operations to ....
M. R. Thistle and B. J. Smith. A processor architecture for horizon. Proc. Supercomputing, pages 35--41, November 1988.
....into data transports to the compiler [3] This can be depicted as an extension of a scheme presented in [4] Figure 1 shows a division between the responsibilities of the compiler and the hardware. It shows five classes of architectures: superscalar, data flow machines, independence architectures [5], operation triggered (OT) VLIWs, and transport triggered (TT) VLIWs. Superscalars shift most responsibilities to the hardware, while VLIWs shift most responsibilities to the compiler. The figure clearly shows that TT VLIWs depend even more on compilers than OT VLIWs. TTAs have much in common with ....
M.R. Thistle and B.J. Smith. A processor architecture for horizon. Proc. Supercomputing, pages 35--41, November 1988.
....to exploit additional parallelism, which can only come from the loop level. To do this, we allow multiple loop iterations to execute concurrenctly on a single processor. This can be accomplished by various methods such as loop unrolling, modulo scheduling [3, 10, 27, 34, 35, 36] multithreading [8, 37, 2, 40], and various forms of dynamic instruction scheduling [38, 32, 6] In this paper, we are not concerned with the details of any particular technique, but rather focus on the characteristics common to all of them. Let the window size, W , denote the number of loop iterations that execute ....
.... access 7 Assuming the architecture contains a register scoreboard [19, 29] and the cache interface allows instruction issuing to proceed beyond a miss [14] instructions that do not depend on the cache missing load could be issued to keep the pipeline busy while the miss is being serviced [37, 4]. With static scheduling, instructions are constrained to be issued in the order generated by the compiler, which does not take into account the occasional long latency cache miss. 4.3 Static vs. Dynamic Scheduling Experiments To assess the magnitude of this potential advantage of dynamic ....
[Article contains additional citation context not shown here]
Mark R. Thistle and Burton J. Smith. A processor architecture for Horizon. In Proceedings Supercomputing '88, pages 35--41. IEEE Computer Society Press, November 1988.
....is a promising approach. Research on multithreaded architectures has been motivated by two concerns: tolerating latency and bridging of synchronization waits by rapid context switches. Three dioeerent approaches of multithreaded architectures are distinguished: cycle by cycle interleaving [117,173,202,215], block interleaving [2,3,106] and simultaneous multithreading [221,222] 4.1 Cycle by cycle interleaving In the cycle by cycle interleaving model the processor switches to a dioeerent thread after each instruction. In principle, an instruction of the same thread is fed in the pipeline after ....
....parallel MIMD computer. The machine was designed for up to 256 processing elements and up to 512 memory modules in a 16 Theta 16 Theta 6 node internal network. Like HEP it employed a global address space, and memory based synchronization through the use of full empty bits at each location [94,148,215]. Each processor supported 128 independent instruction streams by 128 register sets with context switches occurring at every clock cycle. Tera MTA Multi Threaded Architecture: The Tera MTA [9,11,12] is based on the Horizon architecture and currently under construction by Tera Computer Company ....
M. Thistle and B.J. Smith, A processor architecture for Horizon, in Proc. Supercomputing '88, Nov. 1988, pp. 3541.
....13, 17] architectures have been proposed based on the multithreaded paradigm. In this paper, we propose a Scalable Multithreaded Architecture that exploits Large Locality, henceforth referred to as SMALL. The architecture of SMALL, was influenced by some of the other multithreaded architectures [8, 10, 18]. The significance of the proposed architecture, however, lies in making appropriate decisions on various design alternatives in a consistent manner to achieve high performance, in terms of both speedup and processor utilization. The rest of this paper is organized as follows. Our execution model ....
....While executing instructions in a pipelined manner, data hazards can occur due to inter instruction dependencies, causing the pipeline to freeze or stall for a few cycles. In multithreaded architectures, interleaved execution of threads can be used effectively to avoid such data dependent stalls [2, 10, 17, 18]. In the event of . Network Controller Instruction Cache Cache Loader Data Memory M.M Score Board Unit Buffer Loader H.S. Buffer Synchronization Thread Unit Execution Pipe Filter Unit P.E. P.E. P.E. P.E. Interconnection Network Fig. 2 The Architecture of SMALL a data hazard a ....
[Article contains additional citation context not shown here]
Mark R. Thistle and Burton J. Smith. A processor architecture for Horizon.
....to, say, a loop exit) In that case we need an unbranch. Looked at another way, a branch with a variable number of delay slots must be in the same basic block as the original branch. However, in practice, basic blocks are far too short to make this practical. Another related idea is the Horizon[6] project: encoding in each instruction the number of subsequent instructions which are independent of the results of this instruction. Like future branches, the Horizon project relies upon compiler technology to determine data dependencies. However, like the variable number of delay slots ....
Mark R. Thistle and Burton J. Smith. "a processor architecture for horizon". In Proceedings of Supercomputer '88, pages 35--41, 1988.
....cycle, removes an instruction from this queue and dispatches it into the instruction pipeline. A number of more conservative architectures have been proposed that maintain the state of several conventional instruction streams and switch among these based on some event, either every instruction [Hals88, Smit78, This88], every load instruction [Ianu88, Nikh89] or every cache miss [Webe89] While the approach is conceptually appealing, the available evidence for its effectiveness is drawn from a small sample of programs under hypothetical implementations [Arvi88, Ianu88, Webe89] No coherent framework for ....
Thistle, M.R., and Smith, B.J., "A Processor Architecture for Horizon". Supercomputing '88, Florida, October 1988, pp. 35-40. 10
....for full utilization are easier to extract. 22 2.3.1 Commercial machines Only two companies, Delecor and Tera (both founded by Burton Smith) have tried to manufacture multithreaded machines for commercial use. Burton Smith s companies have produced three machines, HEP [25] Horizon [26], and Tera [27] HEP sold a few machines, Horizon was never completed, and Tera should be introduced as a prototype in late 1996. All three architectures support cycle by cycle multithreading and full empty bits for synchronization. HEP used a global register file with support for eight thread ....
M. R. Thistle and B. J. Smith, "A Processor Architecture for Horizon", Proceedings of Supercomputing, 1988
.... networks have recently received a lot of attention for their better scalability to larger networks, as opposed to more complex networks such as hypercubes [BP95] Examples of machines with such topologies include the MasPar MP 1 [Mas] Intel Paragon, MIT J Machine [DDF 89] Tera HORIZON [TS88] Cray T3D [Cra93, Oed93] Polymorphic Torus [LM89] Fujitsu AP 1000, and iWarp [BCC 88] A torus is a mesh with wrap around links. Although meshes and torii are generally regarded as close families, there are still some distinctions: i) As opposed to a mesh, all nodes of a torus are ....
M. R. Thistle and B. J. Smith. "A Processor Architecture for Horizon". In Supercomputing, pages 35--41, 1988.
....into data transports to the compiler [2] This can be depicted as an extension of a scheme presented in [3] Figure 1 shows a division between the responsibilities of the compiler and the hardware. It shows five classes of architectures: superscalar, data flow machines, independence architectures [4], operation triggered (OT) VLIWs, and transport triggered (TT) VLIWs. Superscalars shift most responsibilities to the hardware, while VLIWs shift most responsibilities to the compiler. The figure clearly shows that TT VLIWs depend even more on compilers than OT VLIWs. TTAs have much in common with ....
M.R. Thistle and B.J. Smith. A processor architecture for horizon. Proc. Supercomputing, pages 35--41, November 1988.
....locality for parallelism, a highly optimizing compiler may work hard improving locality and trade the parallelism thereby saved for more speed. On the other hand, if there is sufficient parallelism the compiler has a relatively easy job. The Tera architecture is derived from that of Horizon [6, 9, 10]; although they are highly similar multistream MIMD systems, there are many significant differences between the two designs. 2 Interconnection Network The interconnection network is a three dimensional mesh of pipelined packet switching nodes, each of which is linked to some of its neighbors. ....
M. R. Thistle and B. J. Smith. A processor architecture for Horizon. In Proceedings of Supercomputing '88, pages 35--41, Orlando, Florida, November 1988.
No context found.
M. D. Thistle and B. J. Smith. A processor architecture for horizon. In Supercomputing '88, pages 35--41, November 1988.
No context found.
M.R. Thistle and B.J. Smith. A processor architecture for Horizon. In Proceedings of Supercomputing, 1988.
No context found.
M.R. Thistle, B.J. Smith, "A processor architecture for horizon", Proc. Supercomputing, Nov. 1988, pp. 35--41.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC