| Halstead, R. H. & Fujita, T. [1988], `MASA: A Multithreaded Processor Architecture for Parallel Symbolic Computing', Proceedings of the 15th Annual International Symbosium on Computer Architecture pp. 443--451. |
....technique. In [Agarwal92] and [Thekkath94] it is shown that hardware multithreading can significantly improve processor utilization. A large number of designs have been proposed and or implemented which incorporate hardware multithreading; examples include HEP [Smith81] Horizon [Thistle88] MASA [Halstead88], Tera [Alverson90] April [Agarwal95] and the M Machine [Dally94b] Most of these designs are capable of executing instructions from a different thread on every cycle, allowing even single cycle pipeline bubbles in one thread to be filled by instructions from another. An extreme model of ....
Robert H. Halstead Jr., Tetsuya Fujita, "MASA: A Multithreaded Processor Architecture for Parallel Symbolic Computing", Proc. ISCA '88, pp. 443-451.
....threads to hide memory latencies and pipeline delays has been examined in several different studies and machines. Gupta and Weber explore the use of multiple hardware contexts in multiprocessors [10] but the context switch overheads they used are too large to mask pipeline latencies. MASA [13] as well as HEP [29] and TERA [3] use fine grain multithreading to issue an instruction from a different context on every cycle in order to mask pipeline latencies. However, with the required round robin scheduling, single thread performance is degraded by the number of pipeline stages. The zero ....
HALSTEAD, R. H., AND FUJITA, T. MASA: a multithreaded processor architecture for parallel symbolic computing. In 15th Annual Symposium on Computer Architecture (May 1988), IEEE Computer Society, pp. 443[51.
....tolerating latencies and increasing processor utilization in a large scale multiprocessor. It accomplishes this by rapidly switching control of the processor to a different thread whenever a high latency operation is encountered. While previous multithreaded designs switch contexts at every cycle [27, 12], Alewife s multithreaded processor [2] switches contexts only on synchronization faults and remote cache misses. This style is called block multithreading [20] and has the advantage of high single thread performance. Since multithreading introduces novel methods for manipulating threads, we ....
R.H. Halstead and T. Fujita. MASA: A Multithreaded Processor Architecture for Parallel Symbolic Computing. In Proceedings of the 15th Annual International Symposium an Camputer Architecture, pages 443 451, New York, June 1988. IEEE.
.... Control Processors of the Control Data 6600 computer architecture of the early 1960s to provide several virtual peripheral processors[3] More recently, the Denelcor HEP computer [4] and some proposed architectures including the Multithreaded Processor Architecture for Parallel Symbolic Computing[5], the Circulating Context RISC Multiprocessor[6] and the Cyclic Pipeline Computer[7] have all used multi threaded pipelining to provide virtual multiprocessing. 1.3. Potential Advantages of Multi threaded Pipelining Data Hazards: Because each instruction in the pipeline of a multi threaded ....
Halstead Jr., R. H. and Fujita, T., "MASA: A Multithreaded Processor Architecture for Parallel Symbolic Computing," pp. 443451 in The 15th Annual International Symposium on Computer Architecture, Conference Proceedings, Computer Society Press of the IEEE, Washington, D.C. (1988).
....latency seen by the processors. Relaxed memory consistency models [1, 5, 8] hide latency by allowing buffering and pipelining of memory references. Prefetching techniques [11, 16, 21, 23] hide the latency by bringing data close to the processor before it is actually needed. Multiple contexts [3, 12, 13, 26, 29] allow a processor to hide latency by switching from one context to another when a high latency operation is encountered. Our primary objective in this paper is to characterize the benefits and costs of these four latency hiding techniques in a systematic and consistent manner. Although one can ....
....contexts, is that it can be implemented using existing commercial processors, as has been done in DASH [18] 6 Multiple Context Processors Although prefetching is useful for many applications, it requires explicit programmer or compiler intervention. Processors with multiple hardware contexts [3, 12, 13, 26, 29] do not have this disadvantage. They make use of increased concurrency to hide latency. Each processor has several processes assigned to it, which are kept as hardware contexts. When the context that is currently running encounters a long latency operation, it is switched out and another context ....
R. H. Halstead, Jr. and T. Fujita. MASA: A multithreaded processor architecture for parallel symbolic computing. In Proc. Int. Symp. Comput. Arch., pages 443451, June 1988.
....caches to help reduce the latency seen by the processor. More recently, weaker memory consistency models [1, 4, 6, 7] have been proposed that allow buffering and pipelining of memory references to hide latency. Still another technique is the use of processors with multiple hardware contexts [2, 10, 11, 26]. These processors tolerate latency by switching from one context to another when they encounter a high latency memory access. The various techniques that have been proposed are not mutually exclusive, but are complementary and offset the limitations of one another. In this paper, we evaluate ....
Robert H. Halstead, Jr. and Tetsuya Fujita. MASA: A multithreaded processor architecture for parallel symbolic computing. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 443-451, June 1988.
....are called finely multithreaded processors, and the others are called coarsely multithreaded processors or block multithreaded processors. Several processor designs have used multithreading to mask communication and synchronization latencies, or to utilize deep pipelines effectively, e.g. [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]. By multithreading a processor such that an instruction from a different thread can be initiated every cycle (or every few cycles) pipeline bubbles due to pipeline dependencies or processor stalls due to memory latency can be prevented. Processors in message passing multicomputers often maintain ....
....to mitigate these problems, such as special frames for pipeline state, might adversely impact the processor cycle time. The opposing goals of high single thread performance and fast context switches have been previously addressed largely in their extremes. Finely multithreaded processors [2, 3, 5, 7] that disallow the execution of consecutive instructions from the same process can support very fast context switches, because the various instructions in the pipeline at any given time are independent. Consequently, they can use multithreading to utilize deep pipelines efficiently, in addition to ....
R.H. Halstead and T. Fujita. MASA: A Multithreaded Processor Architecture for Parallel Symbolic Computing. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 443-451, IEEE, New York, June 1988.
.... 1986) 76 Yariv Aridor, Shimon Cohen and Amiram Yehudai Recent work has been focused on run time techniques for improving the efficiency of Actor based languages (Taura et al. 1993, Chien et al. 1992, Chien et al. 1993) and hardware mechanisms for speeding up fine grain computations (Dally 1990, Halstead and Gujita 1988). These techniques and mechanisms, complemented with the techniques and performance results described in this paper, demonstrate that Actor languages, particularly SYMPAL, can be efficiently implemented. 4.1 Implicit COOP languages The languages closely related to SYMPAL are MENTAT (Grimshaw ....
Halstead, R. H. Jr and Gujita, T. (1988) MASA : a multithreaded processor architecture for parallel symbolic computing. In Proceedings of the 15th Annual Symposium on Computer Architecture. IEEE Computer Society, New York.
....execution appears to be a key ingredient in general purpose parallel computing systems. Many researchers suggest that processors should support multiple instruction streams and switch very rapidly between them in response to remote memory reference latencies or synchronization[AI87, Smi90, HF88, ALKK90, ACC 90] However, the proposed architectural solutions make thread scheduling invisible to the compiler, preventing it from applying optimizations that might reduce the cost of thread switching or improve scheduling based on analysis of the program. Inherently parallel languages, ....
R. H. Halstead, Jr. and T. Fujita. MASA: a Multithreaded Processor Architecture for Parallel Symbolic Computing. In Proc. of the 15th Int. Symp. on Comp. Arch., pages 443--451, Hawaii, May 1988.
....Within each locality a single process operates independently of other localities. The CPU within each locality utilizes pipelining to take advantage of fine grained parallelism. Therefore, a program need not have a level of parallelism larger than the number of localities, i.e. multithreading [AgC91, AlC90, HaF88, Smi81, KuC91] is not used. A parallel program can now be viewed as a set of localities, and interprocessor communication can be viewed as the movement of data between localities. This is distinguished from message passing (where this activity is generally thought of as I O) in that the data being moved are ....
R. H. Halstead and T. Fujita, "MASA: a multithreaded processor architecture for parallel symbolic computing," 15th Annual International Symposium on Computer Architecture, June 1988, pp. 443-451.
....hardware. Several machines based on this idea were proposed by Flynn and others in the early 1970s [20] 22] 24] 66] An attempt at a commercial machine of this type was the Denelcor HEP [69] 43] Microprocessors proposed with similar multiplexing of the execution pipeline include MASA [29] and METRIC [68] though these machines context switch on a demand basis rather than in fixed rotation. Processors with short but nonzero latency context switches have been proposed to mask long latencies of multiprocessor arrays [1] 82] 2] 51] Sharing of the instruction issue unit is not ....
Robert H. Halstead and Tetsuya Fujita. MASA: A multithreaded processor architecture for parallel symbolic computing. In Proc. 15th IEEE International Symposium on Computer Architecture, 1988.
....Many message passing systems also use multitasking to reduce the impact of communication [AtS88] however, none currently utilize true multithreading. Multithreaded processors for parallel systems fall into two categories, medium and fine grained. Medium grained multithreaded systems include MASA [HaF88], and ALEWIFE [AgL90] Both of these are designed for symbolic computation and support a form of data synchronization with a mechanism similar to the full empty bits in HEP [Smi81] MASA was primarily designed for executing programs written in a parallel version of LISP. Therefore, it is optimized ....
R. H. Halstead and T. Fujita, "MASA: a multithreaded processor architecture for parallel symbolic computing," 15th Annual International Symposium on Computer Architecture, June 1988, pp. 443-451.
....by this thesis towards understanding some of the performance and design issues of multiple hardware context architectures. 1. 1 Using Multithreading to Hide Long Latencies Multithreading processors tolerate the long memory latencies inherent in large scale sharedmemory multiprocessors [Smi81, HF88, KS88, ALKK90, ACC 90, Chi91, DKC 94, LGH94] A program executing on a multithreaded processor consists of one or more instruction streams, each of which is called a thread. The processor hardware supports multiple contexts, i.e. general purpose registers, program counter, etc. enabling ....
....for memory operations. Thus, processor utilization is increased, and an application makes progress even when some of its threads are stalled. Multithreaded architectures are usually classified as fine grain or coarse grain, depending on the context switch policy. Fine grain machines [Smi81, HF88, KS88, ACC 90, LGH94] have a small context switch interval, typically switching to a different thread every 3 cycle. They must thus provide a fast, zero cycle switching mechanism. By ensuring that instructions in the pipeline are from different program threads, these machines are capable of ....
[Article contains additional citation context not shown here]
R. H. Halstead and T. Fujita. MASA:A multithreaded processor architecture for parallel symbolic computing. 15th Annual International Symposium on Computer Architecture, pages 443--451, May 1988.
....on switching, and the main tradeoff is between better processor utilization and the higher cache miss rate which results. The general idea is to have multiple hardware contexts on the processor and simply switch between them. Some examples of studies of this approach follow. The MASA architecture [Halstead and Fujita 1988] assigned each processor a fixed number of hardware task frames. Each task frame was capable of storing a complete process context and consisted of a set of auxiliary registers (such as the program counter) and a set of generalpurpose registers. Since the number of processes may exceed the number ....
R Halstead and T Fujita. Masa: A Multithreaded Processor Architecture for Parallel Symbolic Computing, Proc. 15th Ann. Int. Symp. on Computer Arch., May 1988, Honolulu, pp 443-451.
....threads to hide memory latencies and pipeline delays has been examined in several different studies and machines. Gupta and Weber explore the use of multiple hardware contexts in multiprocessors [10] but the context switch overheads they used are too large to mask pipeline latencies. MASA [13] as well as HEP [29] and TERA [3] use fine grain multithreading to issue an instruction from a different context on every cycle in order to mask pipeline latencies. However, with the required round robin scheduling, single thread performance is degraded by the number of pipeline stages. The zero ....
Halstead, R. H., and Fujita, T. MASA: a multithreaded processor architecture for parallel symbolic computing. In 15th Annual Symposium on Computer Architecture (May 1988), IEEE Computer Society, pp. 443--451.
....threads and arrange the instructions of each thread into a fixed sequential order at compile time. At run time, a scheduler dynamically orders execution of the threads. Other systems employ schedulers that dynamically order threads based on the availability of data in shared memory multiprocessors [1, 10, 23] or message arrivals in message passing multicomputers [2, 17, 29, 44] Rapid execution of a multithreaded computation on a parallel computer requires exposing and exploiting parallelism in the computation by keeping enough threads concurrently alive to keep the processors of the computer busy. If ....
<F3.755e+05> R. H. Halstead, Jr. and T.<F3.854e+05> Fujita,<F4.047e+05> MASA: A multithreaded processor architecture for parallel symbolic<F3.854e+05> computing, in Proc. of the 15th Annual Intl. Symposium on Computer Architecture, Honolulu, HI, IEEE Computer Society Press, Los Alamitos, CA, 1988, pp. 443--451.
....networks have long memory access latencies. Together, they can cause considerable performance degradation. Multithreaded architectures address the second problem of long latencies by context switching between threads and executing useful instructions while waiting for memory accesses to complete [4, 10, 12, 18]. This decreases processor idle time, i.e. improves processor utilization. However, frequent context switching can exacerbate the first problem by increasing inter thread conflict misses from the combined working sets of multiple threads. Therefore the improved processor utilization could be ....
R. H. Halstead and T. Fujita. MASA: A multithreaded processor architecture for parallel symbolic computing. Proceedings of 15th ISCA, pages 443--451, May 1988.
....where synchronization is implicit, and strict coarse grain execution, where locality is highly exploited. Such advantages make multithreading a candidate for general purpose parallel computing, and many researchers have been interested in multithreaded models, architectures, and abstract machines[2, 3, 4, 8, 10]. Even with such new parallel computational models and improvements in processor speeds, the memory system in parallel computing is still a stumbling block to achieve comparable performance of computer systems. Especially in the parallel computations with centralized global data structures, ....
....needs not to be computed again for later tag checking when I store requests arrive. For example, let A be an Iarray whose lower and upper bounds are 0 and 31, respectively. Figure 8 shows the tag area of A after a certain sequence of I fetches and I stores. When a bundled read request for A[0] A[4], A[8] A[12] A[16] and A[20] is received, two masks are computed. These are two tag words, t 1 and t 2 , and two masks, m 1 and m 2 . t 1 = 1111 1111 1111 1100 0000 0000 0000 0000 t 2 = 1000 0000 0000 0000 1010 0000 0000 0000 m 1 = 0111 0111 0111 0111 1111 1111 1111 1111 m 2 = 0111 0111 ....
R. H. Halstead, Jr. and T. Fujita. "MASA: A Multithreaded Processor Architecture for Parallel Symbolic Computing". In Proc. 15th Annual Int'l Sympo. on Computer Architecture, pp. 443-451, June 1988.
....tolerating latencies and increasing processor utilization in a large scale multiprocessor. It accomplishes this by rapidly switching control of the processor to a different thread whenever a high latency operation is encountered. While previous multithreaded designs switch contexts at every cycle [27, 12], Alewife s multithreaded processor [2] switches contexts only on synchronization faults and remote cache misses. This style is called block multithreading [20] and has the advantage of high single thread performance. Since multithreading introduces novel methods for manipulating threads, we ....
R.H. Halstead and T. Fujita. MASA: A Multithreaded Processor Architecture for Parallel Symbolic Computing. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 443--451, New York, June 1988. IEEE.
....a file of local general purpose registers, instruction fetch and dispatch logic, and a branch unit for executing control instructions. Duplicating contexts to support a number of active threads is not excessively costly, given that several previous processors, including HEP [Jord83] MASA [HaFu88], PRISC 1 [ShNi89] and Sparcle [Agar 93] have also supported multiple register files or register cache and multiple instruction pointers. Concurro has a relatively minor additional burden of a small instruction buffer, simple pre decoder, and control logic for each context. It is assumed that ....
R.H. Halstead, Jr. and T. Fujita, "MASA: A multithreaded processor architecture for parallel symbolic computing," Proc. 19th Ann. Int'l Symp. on Computer Architecture, pp. 443--451, May 1988.
....threads and arrange the instructions of each thread into a fixed sequential order at compile time. At run time, a scheduler dynamically orders execution of the threads. Other systems employ schedulers that dynamically order threads based on the availability of data in shared memory multiprocessors [1, 10, 23] or message arrivals in message passing multicomputers [2, 17, 29, 44] Rapid execution of a multithreaded computation on a parallel computer requires exposing and exploiting parallelism in the computation by keeping enough threads concurrently alive to keep the processors of the computer busy. If ....
R. H. Halstead, Jr. and T. Fujita, MASA: A multithreaded processor architecture for parallel symbolic computing, in Proceedings of the 15th Annual International Symposium on Computer Architecture, Honolulu, Hawaii, May 1988, pp. 443--451.
....there were many configurations for which the addition of a few hardware contexts brought as much or greater performance than a larger multiprocessor with fewer than the optimal number of contexts. 1 Introduction Multiple hardware contexts are a technique for tolerating long instruction latencies [3, 4, 12, 21]. When one thread interlocks for a multi cycle operation, rather than waiting until the operation completes, a multithreaded processor switches to another context and continues execution. For coarse grain multithreaded processors, a context switch is triggered by long latency memory references ....
R. H. Halstead and T. Fujita. MASA: A multithreaded processor architecture for parallel symbolic computing. 15th Annual International Symposium on Computer Architecture, pages 443--451, May 1988.
....is a promising approach. Research on multithreaded architectures has been motivated by two concerns: tolerating latency and bridging of synchronization waits by rapid context switches. Three dioeerent approaches of multithreaded architectures are distinguished: cycle by cycle interleaving [117,173,202,215], block interleaving [2,3,106] and simultaneous multithreading [221,222] 4.1 Cycle by cycle interleaving In the cycle by cycle interleaving model the processor switches to a dioeerent thread after each instruction. In principle, an instruction of the same thread is fed in the pipeline after ....
....producer consumer basis by a full empty bit synchronization on a data memory word. MASA Multilisp Architecture for Symbolic Applications: The MASA was a multithreaded processor architecture for parallel symbolic computation with various features intended for eoeective Multilisp program execution [85,117]. MASA featured a tagged architecture, multiple contexts, fast trap handling, and a synchronization bit in every memory word. Its principal novelty was the use of multiple contexts both to support interleaved execution from separate instruction streams and to speed up procedure calls and trap ....
R.H. Halstead, Jr. and T. Fujita, MASA: A multithreaded processor architecture for parallel symbolic computing, in Proc. 15th ISCA, May 1988, pp. 443451.
....aggressive data communication and synchronization mechanisms between threads to exploit more finegrained parallelism. In addition, multiple functional units can be shared among threads for better utilization. Many concurrent multiple threaded processor architectures have been proposed and studied [1, 3, 4, 5, 6, 7, 8, 10, 11, 12, 14, 15, 16, 19]. Some of them [4, 7, 11, 14] are primarily for increasing system throughput by allowing multiple programs (one program for each thread) to be run concurrently. In this paper, we focus on models that are primarily for speeding up the execution of one single program. Among them, models such as ....
....threads to exploit more finegrained parallelism. In addition, multiple functional units can be shared among threads for better utilization. Many concurrent multiple threaded processor architectures have been proposed and studied [1, 3, 4, 5, 6, 7, 8, 10, 11, 12, 14, 15, 16, 19] Some of them [4, 7, 11, 14] are primarily for increasing system throughput by allowing multiple programs (one program for each thread) to be run concurrently. In this paper, we focus on models that are primarily for speeding up the execution of one single program. Among them, models such as Simultaneous Multithreading [16] ....
Robert H. Halstead, Jr. and Tetsuya Fujita. MASA: A multithreaded processor architecture for parallel symbolic computing. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 443--451, May 30-- June 2, 1988.
....and support fast context switching between computations without blocking processors. Multithreaded architectures that have been studied or developed recently are based on the dataflow model such as T[15] TAM[6] EM 4[20] EM X[12] and DAVRID[9] or the von Neumann model such as HEP[23] MASA[7], Alewife[1] DASH[13] and J Machine[17] Dataflow based multithreaded architectures usually can tolerate latency and synchronization naturally by applying dataflow computing rule over inter thread while they can exploit the locality of computation by applying von Neumann computing rule over ....
R. H. Halstead and T. Fujita, MASA: A Multithreaded Processor Architecture for Parallel Symbolic Computing, In Proc. of the 15th Int'l Sympo. on Computer Architecture, 1988.
....thread switching. Multithreaded architecture is characterized by hardware support for fast context switching and efficient synchronization. Multithreaded architectures are categorized into two groups according to their computational models: von Neumann model and hybrid model. HEP[2] Tera[3] MASA[4], J Machine[5] and Alewife[6] are based on the former model. Iannucci s machine[7] P RISC[8] T[9] TAM[10] and DAVRID[11, 12] are based on the latter model. Von Neumann multithreaded architectures keep conventional model with additional hardware support, however, parallelism is restricted due ....
R. H. Halstead Jr. and T. Fujita, "MASA: A Multithreaded Processor Architecture for Parallel Symbolic Computing," In Proc. 15th Int'l Sympo. on Computer Architecture, 1988.
....of the proposed architecture is evaluated using deterministic discrete event simulation. Initial simulation results indicate that the proposed architecture can achieve high performance in terms of both speedup and processor utilization. 1 Introduction Hybrid von Neumann dataflow architectures [1, 2, 6, 9, 12, 13, 17] have the potential to satisfy the increasing demands on very high computation speed. In the hybrid computation model, a program is a partially ordered graph of nodes. The nodes, called threads, consist of a sequence of instructions which are executed in the conventional von Neumann way. ....
....models efficient instruction execution order for implicit synchronization between instructions, ii) dataflow model s cheap context switch to mask long memory latency, and (iii) dataflowlike synchronization between threads to exploit parallelism. Several uniprocessor [10, 8] and multiprocessor [1, 2, 6, 8, 9, 12, 13, 17] architectures have been proposed based on the multithreaded paradigm. In this paper, we propose a Scalable Multithreaded Architecture that exploits Large Locality, henceforth referred to as SMALL. The architecture of SMALL, was influenced by some of the other multithreaded architectures [8, 10, ....
R. H. Halstead Jr and T. Fujita. MASA: A multithreaded processor architecture for parallel symbolic computing. pages 443--451.
....several concurrent threads. This allows the processor to quickly switch among those threads, although switching outside of that small set is no faster than on a conventional processor. Multithreaded processors may interleave successive instructions from different threads on a cycle by cycle basis [27,16,24,19], or as blocks of instructions [8,3] Although the techniques introduced in this paper are applicable to both forms of multithreading, this discussion will concentrate on block multithreading. 3.1 Segmented Register Files Figure 2 describes a typical implementation of a multithreaded processor ....
....or as blocks of instructions [8,3] Although the techniques introduced in this paper are applicable to both forms of multithreading, this discussion will concentrate on block multithreading. 3. 1 Segmented Register Files Figure 2 describes a typical implementation of a multithreaded processor [27, 16,3,28]. This processor partitions a large register set into a few register frames, each of which FIGURE 1. Advantage of fast context switching. A processor idling on remote accesses or synchronization points (top) compared with rapid context switching between threads (bottom) Thread1 Thread1 Remote ....
Robert H. Halstead Jr. and Tetsuya Fujita. MASA: a multithreaded processor architecture for parallel symbolic computing. In 15th Annual Symposium on Computer Architecture, pages 443--451. IEEE Computer Society, May 1988.
....cycle, removes an instruction from this queue and dispatches it into the instruction pipeline. A number of more conservative architectures have been proposed that maintain the state of several conventional instruction streams and switch among these based on some event, either every instruction [Hals88, Smit78, This88], every load instruction [Ianu88, Nikh89] or every cache miss [Webe89] While the approach is conceptually appealing, the available evidence for its effectiveness is drawn from a small sample of programs under hypothetical implementations [Arvi88, Ianu88, Webe89] No coherent framework for ....
Halstead, R.H., Jr., and Fujita, T., "MASA: A Multithreaded Processor Architecture for Parallel Symbolic Computing". Proc. of the 15th Annual Int. Symp. on Comp. Arch., Honolulu, Hawaii, June 1988, pp. 443-451.
....In the TreeAdd example, thread 14 does not start till both future values have been returned both backto instructions (9 and 13) specify the same synchronization slot containing the thread pointer 14. 4 Architectural Support for the Dynamic SPMD Model A multi threaded node architecture [1, 2, 10, 13, 14] is best characterized as an architecture which attempts to efficiently support the execution of multiple threads of control via the introduction of multiple copies of fast memory required in executing multiple active threads, and mechanisms for storing suspended threads, selecting an active ....
R. H. Halstead Jr and T. Fujita. MASA: A multithreaded processor architecture for parallel symbolic computing. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 443--451, 1988.
....In the equation below, TX is the time that process X takes to complete. X:Y denotes the sequence of two processes, XjY denotes parallel execution of two processes. T A:B = TA TB (1) TAjB = max(TA ; TB ) 2) This model also covers multithreaded architectures, such as the HEP [7] or MASA [8], provided that the two processes A and B can reside in the pipeline at the same time. If the number of contexts is exhausted, more time will be spent. Below, we introduce UX , which is the fraction of the pipeline utilised by X . U A:B = UATA UBTB TA TB (3) T A:B = TA TB (4) U AjB = ....
R. H. Halstead, Jr and T. Fujita. MASA: A Multithreaded Processor Architecture for Parallel Symbolic Processing. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pp 443--451, June 1988.
....have long memory access latencies. Together, they can cause considerable performance degradation. Multithreaded architectures address the second problem of long latencies, by context switching to another thread, and executing useful instructions while waiting for a memory access to complete [4, 11, 13, 19]. This decreases processor idle time, i.e. improves processor utilization. However, frequent context switching can exacerbate the first problem by increasing inter thread conflict misses from the combined working sets of multiple threads. Therefore the improved processor utilization could be ....
R. H. Halstead and T. Fujita. MASA: A multithreaded processor architecture for parallel symbolic computing. Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 443--451, May 1988.
....several concurrent threads. This allows the processor to quickly switch among those threads, although switching outside of that small set is no faster than on a conventional processor. Multithreaded processors may interleave successive instructions from different threads on a cycle by cycle basis [76,47,65,54]. This prevents pipeline bubbles due to data dependencies between instructions, or long memory latencies. Other processors interleave blocks of instructions from each concurrent thread [29,4,23] This exploits conventional processor pipelines, and performs well when there is insufficient ....
....insufficient parallelism. While the techniques introduced in this research are equally applicable to both forms of multithreading, we will usually discuss them in terms of block interleaving. 1.3. 1 Segmented Register Files Figure 1 4 describes a typical implementation of a multithreaded processor [76, 47,4,5]. This processor partitions a large register set among a small set of concurrent threads. Each register frame holds the registers of a different thread. A frame pointer selects the current active frame. Instructions from the current thread refer to registers using short offsets from the frame ....
[Article contains additional citation context not shown here]
Robert H. Halstead Jr. and Tetsuya Fujita. "MASA: a multithreaded processor architecture for parallel symbolic computing." In 15th Annual Symposium on Computer Architecture, pages 443--451. IEEE Computer Society, May 1988.
....architectures is whether interleaving of thread execution is supported. Processing elements that support the interleaved execution of threads use instruction execution pipelines to share the computational resources among several threads by switching to a new thread on every clock tick [Smi81] HF88] ACC 90] This interleaved scheme has the disadvantage that there must be enough threads ready for execution to hide the expected latency. Another technic, called block multithreading, switches to a new thread if a high latency operation occurs. The Sparcle architecture [AKK 93] uses this ....
Robert H. Halstead and Tetsuya Fujita. MASA: A Multithreaded Processor Architecture for Parallel Symbolic Computing. IEEE, 1988.
....it as well. To tolerate the latency of reading remote memory, we must separate the request for data from the use of that data, while finding enough useful parallelism to keep the processor busy in between. The two main techniques for accomplishing this are prefetching [4, 23] and multithreading [2, 9, 14, 15]; the distinction between the two is that prefetching finds the parallelism within a single thread of execution, while multithreading exploits parallelism across multiple threads. To hide the latency within a single thread, the request for the data (i.e. the prefetch request) must be moved back ....
....stall times are small or are dominated by barrier stalls, it is less clear that the additional overhead of supporting multithreading will be worthwhile. 6. Related Work Both prefetching and multithreading have been studied previously in the context of tightly coupled multiprocessors [2, 8, 9, 14, 15, 20]. Prefetching for software DSMs has been studied by Dwarkadas et al. 7] and by Bianchini et al. 3] The former study focused on compilation techniques to automatically insert prefetches into numeric applications. The latter study examined binding prefetches which were launched at ....
R. H. Halstead, Jr. and T. Fujita. MASA: A Multithreaded Processor Architecture for Parallel Symbolic Computing. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 443--451, June 1988.
....only to user level code and it does not allow protection between threads. Its major advantage is not requiring specialized hardware support to support multithreading. Multithreaded machines have also been used to support functional languages. One of the earliest multithreaded systems is MASA [35], from MIT. The goal of MASA was efficient execution of multiLISP programs. MASA supports cycle by cycle multithreading, futures, and full empty bits. PACE [2] from the University of Essex, was designed for efficient graph reduction. 25 3. ARCHITECTURE This chapter provides the overview ....
R.R. Halstead Jr, T. Fujita, "MASA: a multithreaded processor architecture for parallel symbolic computing", Proceedings of the 15th International Symposium on Computer Architecture, 1988
....i.e. enough to fill all the pipe stages and mask the memory latency. Recent commercial and research projects in fine grained multistreaming have focused on multiprocessor systems, e.g. Tera Computer System [Alve90] DART [ShBu91] and CCMP [Butn86] or special purpose processors,roce MASA [Hals88] and the multistreamed CRAY 1 [Farr91] In [Nemi90, Nemi91] the DISC processor used a technique called dynamic interleaving, a fine grained multistreaming technique developed for real time systems that allows multiple instructions from a given stream in the pipeline at a given time. Using dynamic ....
R. H. Halstead and T. Fujita, "MASA: A Multithreaded Processor Architecture for Parallel Symbolic Computing," Proceedings of The 15th Symposium on Computer Architecture, June 1988, pp. 443-451.
....APRIL has the ability to tolerate the latency of both memory requests and synchronization. Interleaved processors have been proposed to solve two problems: stalls due to both pipeline dependencies and memory latency. Two representative examples are the Denelcor HEP [23] and Halstead s MASA [8]. In the HEP multiprocessor, interleaved contexts were used to remove almost all pipeline stalls. A context could only issue an instruction every 8 cycles, which corresponded to the pipeline depth. Since an instruction could never encounter a pipeline dependency, no hardware or compiler resources ....
Robert H. Halstead, Jr. and Tetsuya Fujita. MASA: A multithreaded processor architecture for parallel symbolic computing. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 443--451, June 1988.
....method for tolerating latencies and increasing processor utilization in a large scale multiprocessor. It accomplishes this by rapidly switching the processor to a different thread whenever a high latency operation is encountered. While previous multithreaded designs switch contexts at every cycle [53, 21], Alewife s multithreaded processor [1] switches contexts only on synchronization faults and remote cache misses. This style is called block multithreading [33] and has the advantage of high single thread performance. In this thesis, we considered multithreaded processors as providing additional ....
R.H. Halstead and T. Fujita. MASA: A Multithreaded Processor Architecture for Parallel Symbolic Computing. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 443--451, New York, June 1988. IEEE.
....multiprocessing was an earlier term used in this context by Flynn and Podvin [12] The prototypical multithreaded machine is the HEP [33] In the HEP, the processor switches every cycle between eight processor resident threads. Cycle by cycle interleaving of threads is also used in other designs [12, 30, 18]. Such architectures are termed finely multithreaded. Although fine multithreading offers the potential of high processor utilization, it results in relatively poor single thread performance and low processor utilization when there is not enough parallelism to fill all the hardware contexts. In ....
R.H. Halstead and T. Fujita. MASA: A Multithreaded Processor Architecture for Parallel Symbolic Computing. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 443--451, New York, June 1988. IEEE.
....been proposed to eliminate and or hide the large memory latencies required in interprocessor communication. Of the hardware mechanisms, coherent caches [Tang76, Cens78, YenW85] relaxed memory models [Dubo88, Sche88, Adve90, Ghar90] data prefetching [LeeR87, Port89, Mowr91] and multithreading [Smit78, Hals88, Agar90] are considered the most promising. Equally important are program transformations which make it possible to exploit the benefits offered by the hardware mechanisms. For example, it is known that without some amount of loop restructuring, such as unimodular transformations (loop interchange, loop ....
Halstead, R.H., Jr., and Fujita, T., "MASA: A Multithreaded Processor Architecture for Parallel Symbolic Computing ", Proc. of the 15th Annual Int. Conf. on Comp. Arch., Honolulu, Hawaii, June 1988, pp. 443-451.
....switch to a process not in this set. Saving and restoring the entire process state on a context switch is wasteful since only a few (typically 2 or 3) words of this state are required by the next instruction of the new process. If context switches are frequent, as in some types of multicomputers [6, 5], much of the restored state may not be used before it is saved for the next context switch. This unnecessary data movement is required because the granularity of binding between names and registers is very coarse. A large block of physical registers are allocated to a single process. To switch ....
Robert H. Halstead Jr. and Tetsuya Fujita. MASA: A Multithreaded Processor Architecture for Parallel Symbolic Computing. In 15th Annual Symposium on Computer Architecture, pages 443--451. IEEE Computer Society, May 1988.
....a single task or from different tasks, be present in the processor. Assuming that it is possible to saturate the processor with ready to execute threads, the pernicious effects of very large memory latencies are alleviated. Multithreading has been used or proposed in some parallel architectures [Hals88, Smit78], such that the processor switches between threads on every cycle independent on the behavior of the thread. Although this approach simplifies the design it may increase the turnaround time of a task, as each thread executes one instruction every s cycles, where s is the depth of the execution ....
Halstead, R.H., Jr., and Fujita, T., "MASA: A Multithreaded Processor Architecture for Parallel Symbolic Computing". Proc. of the 15th Annual Int. Symp. on Comp. Arch., Honolulu, Hawaii, June 1988, pp. 443-451.
....been in use for almost two years, providing a wealth of data on the nature of threads, locality and synchronization [29] This experience has directly influenced the T design. Many researchers have arrived at similar mechanisms starting from other vantage points (e.g. Halstead and Fujita s MASA [17] and Maurer s SAM [22] There are very interesting comparisons to be made between T and recent multithreaded architectures, notably the Tera Computer System [3] a descendant of the HEP) Alewife [1] and the J Machine with Message Driven Processors (MDPs) 12] Undoubtedly, aspects of these ....
....processor may still idle when there is a cache miss. A possible solution for the distributed cache coherence problem is to use directories; and for the cache miss problem, to multiplex between a small number of contexts to cover cache loading. Implementing this appears non trivial (see MIT s MASA [17] and Alewife [1] and Stanford s DASH [21] for example) In any case, the proposals in this paper may be seen as orthogonal to cacheing solutions. Further, while distributed cacheing helps in the remote load situation, it does not offer anything for the synchronizing load problem. 3.2 ....
R. H. Halstead Jr. and T. Fujita. MASA: A Multithreaded Processor Architecture for Parallel Symbolic Computing. In Proc. 15th Ann. Intl. Symp. on Computer Architecture, Honolulu, Hawaii, June 1988.
No context found.
Halstead, R. H. & Fujita, T. [1988], `MASA: A Multithreaded Processor Architecture for Parallel Symbolic Computing', Proceedings of the 15th Annual International Symbosium on Computer Architecture pp. 443--451.
No context found.
R. Halstead and T. Fujita. MASA: a multithreaded processor architecture for parallel symbolic computing. In 25th pages 443--451, 1998.
No context found.
R. Halstead and T. Fujita. MASA: A multithreaded processor architecture for parallel symbolic computing. In 15th Annual International Symposium on Computer Architecture, pages 443--451, May 1988.
No context found.
Jr. R.H. Halstead and T. Fujita. MASA: A multithreaded processor architecture for parallel symbolic computing. In Proceedings of the Fifteenth International Symposium on Computer Architecture, 1988.
No context found.
R.H.Halstead,Jr. and T.Fujita, "MASA: A Multithreaded Processor Architecture for Parallel Symbolic Computing," Proc. Int. Symp. Comput. Arch., pp.443-451, 1988.
No context found.
Robert H. Halstead, Jr. and Tetsuya Fujita. MASA: A multithreaded processor architecture for parallel symbolic computing. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 443--451, Honolulu, Hawaii, May--June 1988.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC