72 citations found. Retrieving documents...
Halstead, R. H. & Fujita, T. [1988], `MASA: A Multithreaded Processor Architecture for Parallel Symbolic Computing', Proceedings of the 15th Annual International Symbosium on Computer Architecture pp. 443--451.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Design and Evaluation of the Hamal Parallel Computer - Grossman (2002)   (1 citation)  (Correct)

....technique. In [Agarwal92] and [Thekkath94] it is shown that hardware multithreading can significantly improve processor utilization. A large number of designs have been proposed and or implemented which incorporate hardware multithreading; examples include HEP [Smith81] Horizon [Thistle88] MASA [Halstead88], Tera [Alverson90] April [Agarwal95] and the M Machine [Dally94b] Most of these designs are capable of executing instructions from a different thread on every cycle, allowing even single cycle pipeline bubbles in one thread to be filled by instructions from another. An extreme model of ....

Robert H. Halstead Jr., Tetsuya Fujita, "MASA: A Multithreaded Processor Architecture for Parallel Symbolic Computing", Proc. ISCA '88, pp. 443-451.


The M-Machine Multicomputer - Fillo, Keckler, Dally, Carter.. (1995)   (22 citations)  (Correct)

....threads to hide memory latencies and pipeline delays has been examined in several different studies and machines. Gupta and Weber explore the use of multiple hardware contexts in multiprocessors [10] but the context switch overheads they used are too large to mask pipeline latencies. MASA [13] as well as HEP [29] and TERA [3] use fine grain multithreading to issue an instruction from a different context on every cycle in order to mask pipeline latencies. However, with the required round robin scheduling, single thread performance is degraded by the number of pipeline stages. The zero ....

HALSTEAD, R. H., AND FUJITA, T. MASA: a multithreaded processor architecture for parallel symbolic computing. In 15th Annual Symposium on Computer Architecture (May 1988), IEEE Computer Society, pp. 443[51.


Waiting Algorithms for Synchronization in Large-Scale.. - Lira, Agarwal (1991)   (3 citations)  (Correct)

....tolerating latencies and increasing processor utilization in a large scale multiprocessor. It accomplishes this by rapidly switching control of the processor to a different thread whenever a high latency operation is encountered. While previous multithreaded designs switch contexts at every cycle [27, 12], Alewife s multithreaded processor [2] switches contexts only on synchronization faults and remote cache misses. This style is called block multithreading [20] and has the advantage of high single thread performance. Since multithreading introduces novel methods for manipulating threads, we ....

R.H. Halstead and T. Fujita. MASA: A Multithreaded Processor Architecture for Parallel Symbolic Computing. In Proceedings of the 15th Annual International Symposium an Camputer Architecture, pages 443 451, New York, June 1988. IEEE.


The Effect of Cache on the Performance of a Multi-Threaded.. - MacIntyre, Preiss (1991)   (1 citation)  (Correct)

.... Control Processors of the Control Data 6600 computer architecture of the early 1960s to provide several virtual peripheral processors[3] More recently, the Denelcor HEP computer [4] and some proposed architectures including the Multithreaded Processor Architecture for Parallel Symbolic Computing[5], the Circulating Context RISC Multiprocessor[6] and the Cyclic Pipeline Computer[7] have all used multi threaded pipelining to provide virtual multiprocessing. 1.3. Potential Advantages of Multi threaded Pipelining Data Hazards: Because each instruction in the pipeline of a multi threaded ....

Halstead Jr., R. H. and Fujita, T., "MASA: A Multithreaded Processor Architecture for Parallel Symbolic Computing," pp. 443451 in The 15th Annual International Symposium on Computer Architecture, Conference Proceedings, Computer Society Press of the IEEE, Washington, D.C. (1988).


Comparative Evaluation of Latency Reducing and.. - Gupta, Hennessy.. (1991)   (103 citations)  (Correct)

....latency seen by the processors. Relaxed memory consistency models [1, 5, 8] hide latency by allowing buffering and pipelining of memory references. Prefetching techniques [11, 16, 21, 23] hide the latency by bringing data close to the processor before it is actually needed. Multiple contexts [3, 12, 13, 26, 29] allow a processor to hide latency by switching from one context to another when a high latency operation is encountered. Our primary objective in this paper is to characterize the benefits and costs of these four latency hiding techniques in a systematic and consistent manner. Although one can ....

....contexts, is that it can be implemented using existing commercial processors, as has been done in DASH [18] 6 Multiple Context Processors Although prefetching is useful for many applications, it requires explicit programmer or compiler intervention. Processors with multiple hardware contexts [3, 12, 13, 26, 29] do not have this disadvantage. They make use of increased concurrency to hide latency. Each processor has several processes assigned to it, which are kept as hardware contexts. When the context that is currently running encounters a long latency operation, it is switched out and another context ....

R. H. Halstead, Jr. and T. Fujita. MASA: A multithreaded processor architecture for parallel symbolic computing. In Proc. Int. Symp. Comput. Arch., pages 443451, June 1988.


Tolerating Latency Through Software-Controlled Prefetching in.. - Mowry, Gupta (1991)   (232 citations)  (Correct)

....caches to help reduce the latency seen by the processor. More recently, weaker memory consistency models [1, 4, 6, 7] have been proposed that allow buffering and pipelining of memory references to hide latency. Still another technique is the use of processors with multiple hardware contexts [2, 10, 11, 26]. These processors tolerate latency by switching from one context to another when they encounter a high latency memory access. The various techniques that have been proposed are not mutually exclusive, but are complementary and offset the limitations of one another. In this paper, we evaluate ....

Robert H. Halstead, Jr. and Tetsuya Fujita. MASA: A multithreaded processor architecture for parallel symbolic computing. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 443-451, June 1988.


Performance Tradeoffs In Multithreaded Processors - Agarwal (1991)   (38 citations)  (Correct)

....are called finely multithreaded processors, and the others are called coarsely multithreaded processors or block multithreaded processors. Several processor designs have used multithreading to mask communication and synchronization latencies, or to utilize deep pipelines effectively, e.g. [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]. By multithreading a processor such that an instruction from a different thread can be initiated every cycle (or every few cycles) pipeline bubbles due to pipeline dependencies or processor stalls due to memory latency can be prevented. Processors in message passing multicomputers often maintain ....

....to mitigate these problems, such as special frames for pipeline state, might adversely impact the processor cycle time. The opposing goals of high single thread performance and fast context switches have been previously addressed largely in their extremes. Finely multithreaded processors [2, 3, 5, 7] that disallow the execution of consecutive instructions from the same process can support very fast context switches, because the various instructions in the pipeline at any given time are independent. Consequently, they can use multithreading to utilize deep pipelines efficiently, in addition to ....

R.H. Halstead and T. Fujita. MASA: A Multithreaded Processor Architecture for Parallel Symbolic Computing. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 443-451, IEEE, New York, June 1988.


SYMPAL: a software environment for implicit concurrent.. - Aridor, Cohen, Yehudai (1997)   (Correct)

.... 1986) 76 Yariv Aridor, Shimon Cohen and Amiram Yehudai Recent work has been focused on run time techniques for improving the efficiency of Actor based languages (Taura et al. 1993, Chien et al. 1992, Chien et al. 1993) and hardware mechanisms for speeding up fine grain computations (Dally 1990, Halstead and Gujita 1988). These techniques and mechanisms, complemented with the techniques and performance results described in this paper, demonstrate that Actor languages, particularly SYMPAL, can be efficiently implemented. 4.1 Implicit COOP languages The languages closely related to SYMPAL are MENTAT (Grimshaw ....

Halstead, R. H. Jr and Gujita, T. (1988) MASA : a multithreaded processor architecture for parallel symbolic computing. In Proceedings of the 15th Annual Symposium on Computer Architecture. IEEE Computer Society, New York.


Compiling Dataflow into Threads - Efficient Compiler-Controlled.. - Schauser (1991)   (2 citations)  (Correct)

....execution appears to be a key ingredient in general purpose parallel computing systems. Many researchers suggest that processors should support multiple instruction streams and switch very rapidly between them in response to remote memory reference latencies or synchronization[AI87, Smi90, HF88, ALKK90, ACC 90] However, the proposed architectural solutions make thread scheduling invisible to the compiler, preventing it from applying optimizations that might reduce the cost of thread switching or improve scheduling based on analysis of the program. Inherently parallel languages, ....

R. H. Halstead, Jr. and T. Fujita. MASA: a Multithreaded Processor Architecture for Parallel Symbolic Computing. In Proc. of the 15th Int. Symp. on Comp. Arch., pages 443--451, Hawaii, May 1988.


A Preliminary Performance Evaluation of the Seamless Parallel.. - Fineberg, al. (1992)   (Correct)

....Within each locality a single process operates independently of other localities. The CPU within each locality utilizes pipelining to take advantage of fine grained parallelism. Therefore, a program need not have a level of parallelism larger than the number of localities, i.e. multithreading [AgC91, AlC90, HaF88, Smi81, KuC91] is not used. A parallel program can now be viewed as a set of localities, and interprocessor communication can be viewed as the movement of data between localities. This is distinguished from message passing (where this activity is generally thought of as I O) in that the data being moved are ....

R. H. Halstead and T. Fujita, "MASA: a multithreaded processor architecture for parallel symbolic computing," 15th Annual International Symposium on Computer Architecture, June 1988, pp. 443-451.


Instruction-Processing Optimization Techniques For VLSI.. - Bunda (1993)   (1 citation)  (Correct)

....hardware. Several machines based on this idea were proposed by Flynn and others in the early 1970s [20] 22] 24] 66] An attempt at a commercial machine of this type was the Denelcor HEP [69] 43] Microprocessors proposed with similar multiplexing of the execution pipeline include MASA [29] and METRIC [68] though these machines context switch on a demand basis rather than in fixed rotation. Processors with short but nonzero latency context switches have been proposed to mask long latencies of multiprocessor arrays [1] 82] 2] 51] Sharing of the instruction issue unit is not ....

Robert H. Halstead and Tetsuya Fujita. MASA: A multithreaded processor architecture for parallel symbolic computing. In Proc. 15th IEEE International Symposium on Computer Architecture, 1988.


The Seamless Computation Model for Efficient Use of Parallel.. - Fineberg (1992)   (Correct)

....Many message passing systems also use multitasking to reduce the impact of communication [AtS88] however, none currently utilize true multithreading. Multithreaded processors for parallel systems fall into two categories, medium and fine grained. Medium grained multithreaded systems include MASA [HaF88], and ALEWIFE [AgL90] Both of these are designed for symbolic computation and support a form of data synchronization with a mechanism similar to the full empty bits in HEP [Smi81] MASA was primarily designed for executing programs written in a parallel version of LISP. Therefore, it is optimized ....

R. H. Halstead and T. Fujita, "MASA: a multithreaded processor architecture for parallel symbolic computing," 15th Annual International Symposium on Computer Architecture, June 1988, pp. 443-451.


Design and Performance of Multithreaded Architectures - Thekkath (1995)   (Correct)

....by this thesis towards understanding some of the performance and design issues of multiple hardware context architectures. 1. 1 Using Multithreading to Hide Long Latencies Multithreading processors tolerate the long memory latencies inherent in large scale sharedmemory multiprocessors [Smi81, HF88, KS88, ALKK90, ACC 90, Chi91, DKC 94, LGH94] A program executing on a multithreaded processor consists of one or more instruction streams, each of which is called a thread. The processor hardware supports multiple contexts, i.e. general purpose registers, program counter, etc. enabling ....

....for memory operations. Thus, processor utilization is increased, and an application makes progress even when some of its threads are stalled. Multithreaded architectures are usually classified as fine grain or coarse grain, depending on the context switch policy. Fine grain machines [Smi81, HF88, KS88, ACC 90, LGH94] have a small context switch interval, typically switching to a different thread every 3 cycle. They must thus provide a fast, zero cycle switching mechanism. By ensuring that instructions in the pipeline are from different program threads, these machines are capable of ....

[Article contains additional citation context not shown here]

R. H. Halstead and T. Fujita. MASA:A multithreaded processor architecture for parallel symbolic computing. 15th Annual International Symposium on Computer Architecture, pages 443--451, May 1988.


Performance Implications of Context Switches on Misses to DRAM - Meerdervoort (1999)   (Correct)

....on switching, and the main tradeoff is between better processor utilization and the higher cache miss rate which results. The general idea is to have multiple hardware contexts on the processor and simply switch between them. Some examples of studies of this approach follow. The MASA architecture [Halstead and Fujita 1988] assigned each processor a fixed number of hardware task frames. Each task frame was capable of storing a complete process context and consisted of a set of auxiliary registers (such as the program counter) and a set of generalpurpose registers. Since the number of processes may exceed the number ....

R Halstead and T Fujita. Masa: A Multithreaded Processor Architecture for Parallel Symbolic Computing, Proc. 15th Ann. Int. Symp. on Computer Arch., May 1988, Honolulu, pp 443-451.


The M-Machine Multicomputer - Fillo, Keckler, Dally, Carter.. (1995)   (22 citations)  (Correct)

....threads to hide memory latencies and pipeline delays has been examined in several different studies and machines. Gupta and Weber explore the use of multiple hardware contexts in multiprocessors [10] but the context switch overheads they used are too large to mask pipeline latencies. MASA [13] as well as HEP [29] and TERA [3] use fine grain multithreading to issue an instruction from a different context on every cycle in order to mask pipeline latencies. However, with the required round robin scheduling, single thread performance is degraded by the number of pipeline stages. The zero ....

Halstead, R. H., and Fujita, T. MASA: a multithreaded processor architecture for parallel symbolic computing. In 15th Annual Symposium on Computer Architecture (May 1988), IEEE Computer Society, pp. 443--451.


Space-Efficient Scheduling of Multithreaded Computations - Blumofe, Leiserson (1998)   (30 citations)  (Correct)

....threads and arrange the instructions of each thread into a fixed sequential order at compile time. At run time, a scheduler dynamically orders execution of the threads. Other systems employ schedulers that dynamically order threads based on the availability of data in shared memory multiprocessors [1, 10, 23] or message arrivals in message passing multicomputers [2, 17, 29, 44] Rapid execution of a multithreaded computation on a parallel computer requires exposing and exploiting parallelism in the computation by keeping enough threads concurrently alive to keep the processors of the computer busy. If ....

<F3.755e+05> R. H. Halstead, Jr. and T.<F3.854e+05> Fujita,<F4.047e+05> MASA: A multithreaded processor architecture for parallel symbolic<F3.854e+05> computing, in Proc. of the 15th Annual Intl. Symposium on Computer Architecture, Honolulu, HI, IEEE Computer Society Press, Los Alamitos, CA, 1988, pp. 443--451.


Impact of Sharing-Based Thread Placement on Multithreaded.. - Radhika Thekkath (1994)   (17 citations)  (Correct)

....networks have long memory access latencies. Together, they can cause considerable performance degradation. Multithreaded architectures address the second problem of long latencies by context switching between threads and executing useful instructions while waiting for memory accesses to complete [4, 10, 12, 18]. This decreases processor idle time, i.e. improves processor utilization. However, frequent context switching can exacerbate the first problem by increasing inter thread conflict misses from the combined working sets of multiple threads. Therefore the improved processor utilization could be ....

R. H. Halstead and T. Fujita. MASA: A multithreaded processor architecture for parallel symbolic computing. Proceedings of 15th ISCA, pages 443--451, May 1988.


Effects of Data Bundling in Non-strict Data Structures - Rho, Han, Kim, Hwang (1995)   (Correct)

....where synchronization is implicit, and strict coarse grain execution, where locality is highly exploited. Such advantages make multithreading a candidate for general purpose parallel computing, and many researchers have been interested in multithreaded models, architectures, and abstract machines[2, 3, 4, 8, 10]. Even with such new parallel computational models and improvements in processor speeds, the memory system in parallel computing is still a stumbling block to achieve comparable performance of computer systems. Especially in the parallel computations with centralized global data structures, ....

....needs not to be computed again for later tag checking when I store requests arrive. For example, let A be an Iarray whose lower and upper bounds are 0 and 31, respectively. Figure 8 shows the tag area of A after a certain sequence of I fetches and I stores. When a bundled read request for A[0] A[4], A[8] A[12] A[16] and A[20] is received, two masks are computed. These are two tag words, t 1 and t 2 , and two masks, m 1 and m 2 . t 1 = 1111 1111 1111 1100 0000 0000 0000 0000 t 2 = 1000 0000 0000 0000 1010 0000 0000 0000 m 1 = 0111 0111 0111 0111 1111 1111 1111 1111 m 2 = 0111 0111 ....

R. H. Halstead, Jr. and T. Fujita. "MASA: A Multithreaded Processor Architecture for Parallel Symbolic Computing". In Proc. 15th Annual Int'l Sympo. on Computer Architecture, pp. 443-451, June 1988.


Waiting Algorithms for Synchronization in Large-Scale.. - Lim, Agarwal (1991)   (18 citations)  (Correct)

....tolerating latencies and increasing processor utilization in a large scale multiprocessor. It accomplishes this by rapidly switching control of the processor to a different thread whenever a high latency operation is encountered. While previous multithreaded designs switch contexts at every cycle [27, 12], Alewife s multithreaded processor [2] switches contexts only on synchronization faults and remote cache misses. This style is called block multithreading [20] and has the advantage of high single thread performance. Since multithreading introduces novel methods for manipulating threads, we ....

R.H. Halstead and T. Fujita. MASA: A Multithreaded Processor Architecture for Parallel Symbolic Computing. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 443--451, New York, June 1988. IEEE.


Superscalar Performance in a Multithreaded Microprocessor - Gunther (1993)   (3 citations)  (Correct)

....a file of local general purpose registers, instruction fetch and dispatch logic, and a branch unit for executing control instructions. Duplicating contexts to support a number of active threads is not excessively costly, given that several previous processors, including HEP [Jord83] MASA [HaFu88], PRISC 1 [ShNi89] and Sparcle [Agar 93] have also supported multiple register files or register cache and multiple instruction pointers. Concurro has a relatively minor additional burden of a small instruction buffer, simple pre decoder, and control logic for each context. It is assumed that ....

R.H. Halstead, Jr. and T. Fujita, "MASA: A multithreaded processor architecture for parallel symbolic computing," Proc. 19th Ann. Int'l Symp. on Computer Architecture, pp. 443--451, May 1988.


Multithreading in Rapid Prototyping Target Platforms - Jurgen Niehaus Karsten   (Correct)

No context found.

Halstead, R. H. & Fujita, T. [1988], `MASA: A Multithreaded Processor Architecture for Parallel Symbolic Computing', Proceedings of the 15th Annual International Symbosium on Computer Architecture pp. 443--451.


Balanced Multithreading: Increasing Throughput via a.. - Tune, Kumar, Tullsen, .. (2004)   (Correct)

No context found.

R. Halstead and T. Fujita. MASA: a multithreaded processor architecture for parallel symbolic computing. In 25th pages 443--451, 1998.


Exploiting Thread-Level Parallelism On . . . - Lo (1998)   (Correct)

No context found.

R. Halstead and T. Fujita. MASA: A multithreaded processor architecture for parallel symbolic computing. In 15th Annual International Symposium on Computer Architecture, pages 443--451, May 1988.


Hardware and Software Mechanisms for Multithreading in.. - Bradford (2001)   (Correct)

No context found.

Jr. R.H. Halstead and T. Fujita. MASA: A multithreaded processor architecture for parallel symbolic computing. In Proceedings of the Fifteenth International Symposium on Computer Architecture, 1988.


Processor Management Policies for Multiprocessors - Yu (1994)   (Correct)

No context found.

R.H.Halstead,Jr. and T.Fujita, "MASA: A Multithreaded Processor Architecture for Parallel Symbolic Computing," Proc. Int. Symp. Comput. Arch., pp.443-451, 1988.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC