14 citations found. Retrieving documents...
Wolf-Dietrich Weber and Anoop Gupta. Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: Preliminary results. In Proc. 16th International Symposium on Computer Architecture, pages 273--280, 1989.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Synchronisation in a Multithreaded Processor - Sen, Muller, May (2000)   (Correct)

....i.e. the fork join processes. From our results we found that the number of contexts required was dependent on the number of execution units available as well as the number of communication points. Performance generally peaks at four contexts, which corroborates with the results found by Weber [9]. It would be naive to assume by adding more contexts and more execution units that performance would scale accordingly. This is due to instruction resource availability. There may be instructions ready to be executed but no available resources to accomodate them. For example, if all threads ....

Wolf-Dietrich Weber and Anoop Gupta. Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: Preliminary results. In Proc. 16th International Symposium on Computer Architecture, pages 273--280, 1989.


Thread Prioritization: A Thread Scheduling Mechanism for.. - Fiske, Dally (1995)   (3 citations)  (Correct)

....3 . Context N Pipeline T1 T2 T3 T4 Thread queue in main memory Context 1 Multiple context processor Figure 1: Multiple context processor with N contexts. be loaded in a software scheduling queue. To allow a traditional RISC pipeline design, we assume a block multithreading model [23, 3], in which blocks of instructions are executed from each context in turn, rather than a cycle bycycle interleaving of instructions from the different contexts [20, 15, 13] At any given time, the processor is executing one of the loaded threads. A context switch occurs when the processor switches ....

....switch time is the time spent in switching between two active contexts, and is an important parameter in determining the efficiency of context switching for tolerating latency. In order to effectively tolerate latency, the total context switch time in a multiple context processor must be small [1, 23]. The context switch time consists of a number of components. For instance, in the APRIL processor [3] the time required is 11 cycles, which is used to drain the processor pipeline (5 cycles) and execute a trap handler which saves state, and chooses the next context to execute using a round ....

Wolf-Dietrich Weber and Anoop Gupta. Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: preliminary results. In Proceedings of the 16th Annual International Symposium on Computer Architecture, pages 273--280, Jerusalem, Israel, May 1989. ACM.


Local Memory Reference Behavior of Fine-Grain.. - Motomura, Papadopoulos (1993)   (Correct)

....explicit receive, or split phase transaction. Unfortunately, rapidly switching a processor among distinct contexts will, in general, decrease the locality of the processor s instruction, activation frame (stack frame) and heap reference streams with a commensurate increase in cache miss ratios [1, 2]. This study is concerned with quantifying, both analytically and experimentally, the effects on instruction and activation frame cache performance of an aggressive fine grained multithreaded compiling model, the threaded abstract machine (TAM) model developed by Culler et al. [3] The instruction ....

.... execution traces for two classes of applications: loop parallel dense matrix arithmetic (matrix multiply MMP) and procedure parallel event simulations (Monte Carlo neutral particle transport GAMTEB) Previous studies obtained traces by artificially interleaving traces from several contexts [1, 2]. 2. Working Set Model. Surprisingly, we have determined a working set model for local activation frame references that closely matches uniprocessor working sets for stack and heap data. The multithreaded working set is welldescribed by the power law function W (t) fft fi the size of the ....

[Article contains additional citation context not shown here]

Wolf-Dietrich Weber and Anoop Gupta. Exploring the Benefits of Multiple Hardware Contexts in a Multiprocessor Architecture: Preliminary Results. In Proceedings of 16th Annual International Symposium on Computer Architecture, IEEE, June 1989, pages 273-280.


\Xi\Pi \Theta \Gamma\Delta - Mcgill University   (Correct)

....to good data locality. We believe that the multithreaded program execution model has great potential to offset the performance loss that is due to imperfect data partitioning with its ability to tolerate communication and synchronization latencies inherent in multiprocessor architectures [1, 3, 11, 13, 14]. With multithreading support, a processor switches quickly among a set of ready threads whenever it encounters a long latency operation. Thus, by overlapping the communication on one thread with computation in other threads, the processor utilization as well as the program execution time ....

Wolf-Dietrich Weber and Anoop Gupta, "Exploring the Benefits of Multiple Hardware Contexts in a Multiprocessor Architecture: Preliminary Results," in Proc. of the 16th Ann. Intl. Symp. on Computer Architecture, Jerusalem, Israel, pp. 273--280, May--Jun. 1989.


Closing the Window of Vulnerability in Multiphase Memory.. - Kubiatowicz (1993)   (20 citations)  (Correct)

....invalidations required to enforce coherence. Applying basic pipelining ideas, resource utilization can be improved by allowing a processor to transmit more than one memory request at a time. Multiple outstanding transactions can be supported using software prefetch [7, 24] rapid context switching [28, 4], or weak ordering [1] Studies have shown that the utilization of the network, processor, and memory systems can be improved almost in proportion to the number of outstanding transactions allowed [22, 16] Allowing multiple outstanding transactions in a cache based multiprocessor opens the window ....

Wolf-Dietrich Weber and Anoop Gupta. Exploring the Benefits of Multiple Hardware Contexts in a Multiprocessor Architecture: Preliminary Results. In Proceedings 16th Annual International Symposium on Computer Architecture, pages 273--280, New York, June 1989.


Relaxed Consistency and Synchronization in Parallel Processors - Zucker (1992)   (3 citations)  (Correct)

....latency will not be observed. Another example of reducing the penalty on a cache miss is to do other useful work while waiting for a memory request. For example, if there is cheap context switching, then when there is a cache miss the processor can switch to another context in just a few cycles [6, 93]. Despite all these techniques, the degrading effects of high memory latency can be quite significant. Additional solutions to the problem are still needed. A promising way to deal with high memory latency is to have a system that implements a relaxed model of memory consistency. 1 When using a ....

Wolf-Dietrich Weber and Anoop Gupta. Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: Preliminary results. In 16th Annual International Symposium on Computer Architecture, pages 273--280, 1989.


The Effects of Architecture on the Performance of Latency.. - Rajat Mukherjee   (Correct)

....Version 9 architecture reference allows for stacked traps, which would reduce the switch overhead by 3 4 cycles. Previous studies have examined the effectiveness of context switch overheads of less than 15 processor cycles, therefore effectively eliminating software approaches from consideration [13]. Our approach is appropriate for large scale systems where typical miss latencies are hundreds of processor cycles. Moreover, the assumptions used in some studies are not realistic, e.g. the small cache sizes used in [7] result in very large miss rates and therefore tend to reduce the intermiss ....

....was traditionally considered to have a deleterious effect on the cache performance, but we show that positive cache interference due to context switching can significantly improve performance. In a related study, all programs chosen were reported to demonstrate negative cache interference [13]. When extending the principle of latency hiding to multiprocessor environments, network bandwidth becomes an important concern; if the network is incapable of handling the higher bandwidth required to sustain high utilization processors, the benefits of latency hiding can easily be offset by ....

Wolf-Dietrich Weber and Anoop Gupta. Exploring the Benefits of Multiple Hardware Contexts in a Multiprocessor Architecture: Preliminary Results. In Proceedings of the 16th International Symposium on Computer Architecture, New York, June 1989.


The MIT Alewife Machine: A Large-Scale Distributed-Memory.. - Agarwal (1991)   (167 citations)  (Correct)

....a context switch can be done in about 14 cycles. By maintaining a separate PC and PSR for each context, a custom processor could switch contexts even faster. Even with 14 cycles of overhead and four processor resident contexts, multithreading significantly improves the system performance. See [35] for additional evidence of the success of multithreaded processors. ffl The emulation of multiple hardware contexts in the SPARC floating point unit is achieved by modifying floating point instructions in a context dependent fashion as they are loaded into the FPU and by maintaining four ....

Wolf-Dietrich Weber and Anoop Gupta. Exploring the Benefits of Multiple Hardware Contexts in a Multiprocessor Architecture: Preliminary Results. In Proceedings 16th Annual International Symposium on Computer Architecture, pages 273--280, New York, June 1989. IEEE.


Latency Tolerance through Multithreading in Large-Scale .. - Kurihara, Chaiken.. (1991)   (12 citations)  (Correct)

....processor. While previous architectures have implemented multithreading with cycle by cycle interleaving of instructions from different processes [11, 12, 16, 21] termed fine multithreading) we use the same name for systems that interleave blocks of instructions from different processes as well [3, 23] (termed coarse or block multithreading) Block multithreaded processors do not force a context switch every cycle and can achieve high single thread performance. They switch between threads only on long latency memory requests or synchronization attempts. We are using such a block multithreaded ....

Wolf-Dietrich Weber and Anoop Gupta. Exploring the Benefits of Multiple Hardware Contexts in a Multiprocessor Architecture: Preliminary Results. In Proceedings 16th Annual International Symposium on Computer Architecture, IEEE, New York, June 1989.


Integrated Shared-Memory and Message-Passing Communication in.. - Kubiatowicz (1998)   (2 citations)  (Correct)

....waiting for a response. Applying basic pipelining ideas, resource utilization can be improved by allowing a processor to transmit more than one memory request at a time. Multiple outstanding transactions can be supported using software prefetch [20, 85] multithreading via rapid context switching [115, 5], or weak ordering [1] Studies have shown that the utilization of the network, processor, and memory systems can be improved almost in proportion to the number of outstanding transactions allowed [63, 50] Multithreading may be implemented with either polling or signaling mechanisms. Polling ....

....is initiated at the same time 8 . By maintaining a separate PC and PSR for each context, a more aggressive processor design could switch contexts much faster. However, even with 14 cycles of overhead and four processorresident contexts, multithreading can significantly improve system performance [115, 63]. Appendix A shows a more detailed use of the Sparcle context switching mechanisms to implement featherweight threads for user level active message handlers. The code shown in this appendix illustrates how the value in the WIM is combined with the nextf and prevf instructions to achieve fast ....

Wolf-Dietrich Weber and Anoop Gupta. Exploring the Benefits of Multiple Hardware Contexts in a Multiprocessor Architecture: Preliminary Results. In Proceedings 16th Annual International Symposium on Computer Architecture, pages 273--280, New York, June 1989.


APRIL: A Processor Architecture for Multiprocessing - Agarwal (1990)   (186 citations)  (Correct)

....0.4 0.5 0.6 0.7 0.8 0.9 1.0 Processes p Figure 5: Relative sizes of the cache, network and overhead components that affect processor utilization. which corresponds to our initial SPARC based implementation of APRIL. This result is similar to that reported by Weber and Gupta [26] for coarse grain multithreaded processors. The main reason a low degree of multithreading is sufficient is that context switches are forced only on cache misses, which are expected to happen infrequently. The marginal benefits of additional processes is seen to decrease due to network and cache ....

Wolf-Dietrich Weber and Anoop Gupta. Exploring the Benefits of Multiple Hardware Contexts in a Multiprocessor Architecture: Preliminary Results. In Proceedings 16th Annual International Symposium on Computer Architecture, IEEE, New York, June 1989.


The MIT Alewife Machine: A Large-Scale.. - Agarwal, Chaiken, .. (1991)   (167 citations)  (Correct)

....switch can be done in about 11 cycles. By maintaining a separate PC and PSR for each context, a custom processor could switch contexts even faster. We show that even with 11 cycles of overhead and four processor resident contexts, multithreading significantly improves the system performance. See [40] for additional evidence of the success of multithreaded processors. ffl The effect of multiple hardware contexts in the SPARC floating point unit is achieved by modifying floating point instructions in a context dependent fashion as they are loaded into the FPU and by maintaining four different ....

Wolf-Dietrich Weber and Anoop Gupta. Exploring the Benefits of Multiple Hardware Contexts in a Multiprocessor Architecture: Preliminary Results. In Proceedings 16th Annual International Symposium on Computer Architecture, IEEE, New York, June 1989.


Closing the Window of Vulnerability in Multiphase.. - Kubiatowicz, Chaiken, .. (1992)   (20 citations)  (Correct)

....invalidations required to enforce coherence. Applying basic pipelining ideas, resource utilization can be improved by allowing a processor to transmit more than one memory request at a time. Multiple outstanding transactions can be supported using software prefetch [5, 6] rapid context switching [7, 8], or weak ordering [9] Studies have shown that the utilization of the network, processor, and memory systems can be improved almost in proportion to the number of outstanding transactions allowed [10, 11] Allowing multiple outstanding transactions in a cache based multiprocessor opens the window ....

Wolf-Dietrich Weber and Anoop Gupta. Exploring the Benefits of Multiple Hardware Contexts in a Multiprocessor Architecture: Preliminary Results. In Proceedings 16th Annual International Symposium on Computer Architecture, pages 273--280, New York, June 1989.


Multithreaded Architectures: Principles, Projects and Issues - Dennis, Gao (1994)   (4 citations)  (Correct)

....overflows to be handled by software [29] A major feature of the Alewife multiprocessor is its use of block multithreading to reduce context switching overhead. This idea was evaluated by the DASH project at Stanford and found to be a promising approach to efficient tolerance of memory latency [128]. In block multithreading each processor is equipped to hold context information for several sequential threads that make up an active set. Switching between active threads is then very fast compared with switching to a thread not in the active set. By minimizing the occurrence of events that move ....

....last may be accomplished as a consequence of changing the memory map. If physical addresses are used as cache tags, no cache resetting is required for context switching and register saving restoring form the remaining portion of context switching overhead. Early studies for the DASH multiprocessor [128] have tested the suggestion of providing a small number of register sets to support low cost switching among a group of processes (block multithreading) Because the number of processes waiting for service will often be larger than the number of register sets, the full advantage of having multiple ....

Wolf-Dietrich Weber and Anoop Gupta, "Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: Preliminary results," in Proceedings of the 16th Annual International Symposium on Computer Architecture, Jerusalem, Israel, pp. 273-- 280, May--June 1989.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC