40 citations found. Retrieving documents...
J. Laudon, A. Gupta, and M. Horowitz. Interleaving: A multithreading technique targeting multiprocessors and workstations. In Sixth International Conference on Architectural pages 308--318, Oct. 1994.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Data Locality Optimizations for Multigrid Methods on Structured.. - Weiß   (Correct)

.... HP PA 8600 1.5 Gbyte s 56 instr 512 K 1 M Intel Pentium 3 1.1 Gbyte s 40 ROPs 16 K 16 K 256 K MIPS R12000 0.5 Gbyte s 48 instr 32 K 32 K Table 2. 2: Memory peak bandwidth, out of order capability, and on chip cache sizes of microprocessor chips in 2001 [Mic00] ing [LGH94] and out of order execution [HP96] to tolerate at least some memory access latency. However, these techniques are not able to compensate a latency of over 100 cycles. Especially, since instructions executed while loading data might access other data themselves which may again lead to idle time ....

J. Laudron, A. Gupta, and M. Horowitz. Interleaving: A Multithreading Technique Targeting Multiprocessor and Workstations. In Proceedings of the 6th Symposium on Architectural Support for Programming Languages and Operating Systems, pages 308--318, October 1994.


An Analysis of Software Interface Issues for SMT Processors - Redstone (2002)   (1 citation)  (Correct)

....SMT under a variety of application level workloads. Some workloads examined include SPEC (92 and 95) 82, 81] SPLASH 2 [44] MPEG 2 decompression [68] and a database workload [43] Evaluations of other multithreading 52 and CMP architectures have similarly been limited to application code only [3, 35, 15, 2, 71, 41, 30] or PALcode [91] Our study is the first to measure operating system behavior on a simultaneous multithreading architecture. SMT differs significantly from previous architectures with respect to operating system execution, because kernel instructions from multiple threads can execute ....

LAUDON, J., GUPTA, A., AND HOROWITZ, M. Interleaving: A multithreading technique targeting multiprocessors and workstations. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (October 1994).


Handling Long-latency Loads in a Simultaneous Multithreading.. - Tullsen, Brown (2001)   (9 citations)  (Correct)

....between threads each cycle. Simultaneous multithreading outperforms previous models of hardware multithreading primarily because it hides short latencies (which can often dominate performance on a uniprocessor) much more effectively. For example, neither fine grain multithreaded architectures [2, 8], which context switch every cycle, nor coarse grain multithreaded architectures [1, 12] which context switch only on long latency operations, can hide the latency of a single cycle integer add if there is not sufficient parallelism in the same thread. What has not been shown previously is that ....

....only at the long latency load problem, and makes no attempt to address any other machine latency. Because coarse grain architectures allow only one thread to have access to execution resources at any time, they alway flush stalled threads completely from the machine. Fine grain multithreading [2, 8] could potentially have shared scheduling resources which exhibit this problem, depending on the architecture. However, these architectures (e.g. the Cray Tera MTA [2] have traditionally been coupled with in order execution, where scheduling windows only need to keep a few instructions per ....

J. Laudon, A. Gupta, and M. Horowitz. Interleaving: A multithreading technique targeting multiprocessors and workstations. In Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 308--318, Oct. 1994.


An Analysis of Operating System Behavior on a.. - Redstone, Eggers, Levy (2000)   (9 citations)  (Correct)

....SMT under a variety of application level workloads. Some workloads examined include SPEC (92 and 95) 42, 41] SPLASH 2 [22] MPEG 2 decompression [35] and a database workload [21] Evaluations of other multithreading and CMP architectures have similarly been limited to application code only [3, 18, 6, 2, 37, 20, 14] or PALcode [47] Our study is the first to measure operating system behavior on a Metric SMT Superscalar Apache only Apache OS Change Apache only Apache OS Change Branch misprediction rate ( 4.4 9.1 2.1x 3.3 7.4 2.2x BTB misprediction rate ( 36.7 59.6 62 31.1 55.3 77 L1 Icache miss ....

J. Laudon, A. Gupta, and M. Horowitz. Interleaving: A multithreading technique targeting multiprocessors and workstations. In 6th International Conference on Architectural Support for Programming Languages and Operating Systems, October 1994.


Simultaneous Static Threading for VLIW/EPIC Processors - Özer   (Correct)

....it separately. Those separately scheduled threads are merged statically to decrease static code density or to optimize for execution time. SST merges threads at run time taking advantage of dynamic events. There are other multithreaded architectures such as multiple context processors [14] [15] [16] 17] 18] and concurrent multithreading [19] 20] These in part or fully rely on dynamic instruction scheduling. 5 Conclusion In this paper, a new paradigm is proposed called Simultaneous Static Threading (SST) for VLIW EPIC processors. SST combines ISA, compiler and hardware for ....

J. Laudon, A. Gupta, and M. Horowitz, "Interleaving: A Multithreading Technique Targeting Multiprocessors and Workstations," in Proc. Sixth Int'l. Conf. on Architectural Support for Programming Languages and Operating Systems, (San Jose, CA), Oct. 1994.


Software-Controlled Multithreading Using Informing Memory.. - Todd Mowry Sherwyn (1998)   (8 citations)  (Correct)

....is to use a form of multithreading [1, 12, 15] whereby a longlatency access from one thread is overlapped with the computation from other parallel threads. 1. 1 Previous Work on Multithreading Several researchers have proposed and evaluated hardware based multithreading schemes in the past [1, 2, 6, 12, 15]. Throughout this paper, we use the term multithreading to refer to multithreading for the sake of latency tolerance, as opposed to more general forms of multithreading. These schemes can be broken down into roughly three categories: fine grained, coarse grained, and simultaneous ....

J. Laudon, A. Gupta, and M. Horowitz. Interleaving: A Multithreading Technique Targeting Multiprocessors and Workstations. In Proceedings of the 6th ASPLOS, pages 308--318, October 1994.


Process Prefetching for a Simultaneous Multithreaded.. - Goncalves, Sagula.. (1999)   (Correct)

....used in this work. Sections V and VI show the results and conclusions, respectively. II. RELATED WORKS The main object of a multithreaded architecture is to maximize processor utilization at the occurrence of high latency operations, like those caused by i cache misses or data dependencies ( LAU 94] Unfortunately, such latencies can only be concealed if there are enough instructions available from other threads previously stored on L1 i cache during the context switching. Laudon, Gupta e Horowitz ( LAU 94] had proposed a technique for multithreaded execution called Interleaving, that ....

....high latency operations, like those caused by i cache misses or data dependencies ( LAU 94] Unfortunately, such latencies can only be concealed if there are enough instructions available from other threads previously stored on L1 i cache during the context switching. Laudon, Gupta e Horowitz ( LAU 94] had proposed a technique for multithreaded execution called Interleaving, that could be applied on traditional superscalar processors to allow the execution of monothreaded as well as multithreaded applications, without enlarging the hardware. Otherwise, Govindarajan e Nemawarkar ( GOV 92] ....

Laudon, J. et al: Interleaving: A Multithreading Technique Targeting Multiprocessors and Workstations Proceedings of the International Conference on ASPLOS, Oct, 1994.


Bus Utilization Analysis of Multithreaded Shared-Bus.. - Giorgi, Foglia, Prete   (Correct)

.... operation [2, 24] switch on every instruction fetch is limited by the great amount of contexts necessary to keep the execution pipeline full [25] switch on miss works with a smaller degree of multithreading but causes inter thread conflictmisses, which can break the locality of each thread [26], and also may cause a loss of shared data that requires extra coherence overhead [15] instruction interleaving performing better on a uniprocessor, but the difference with the previous scheme becomes negligible on multiprocessors [26] Other variants are: simultaneous multithreading [27] ....

.... which can break the locality of each thread [26] and also may cause a loss of shared data that requires extra coherence overhead [15] instruction interleaving performing better on a uniprocessor, but the difference with the previous scheme becomes negligible on multiprocessors [26]. Other variants are: simultaneous multithreading [27] switch on block of instructions, switch on every load [28] and switch on load miss. Once a switch is decided, one of the available contexts is activated and several options have been presented in literature: thread priorization [29] MRU ....

J. Laudon, A. Gupta, and M. Horowitz, "Interleaving: A multithreading technique targeting multiprocessors and workstations," in Proceedings oh the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 308--318, Oct. 1994.


Improving Prediction for Procedure Returns with.. - Kevin Skadron Pritpal (1998)   (22 citations)  (Correct)

....requires substantial extra hardware per path contexts, more fetch, rename, and issue bandwidth, and a larger instruction window this extra hardware overlaps substantially with other likely directions for future microprocessors. These include clustered approaches [12] multithreaded processors [23], and particularly simultaneous multithreading (SMT) 34] This paper mentions multipath execution because the design of the return address stack proves critical to multipath performance. A single, unified stack does not function properly in a multipath processor. With concurrent paths ....

J. Laudon, A. Gupta, and M. Horowitz. Interleaving: A multithreading technique targeting multiprocessors and workstations. In Proc. ASPLOS-IV, pages 308--318, Oct. 1994.


Dynamic Load Balancing Issues In The Earth Runtime System - Kakulavarapu (1999)   (Correct)

.... has proved to be difficult to support due to the higher relative cost of communication latencies [142] Multi threading is a promising approach to overcome two major pitfalls of conventional parallel computing, and in particular fine grain parallelism communication and synchronization latencies [30, 13, 10, 49, 55, 131, 75, 83, 82, 170, 112, 72, 76, 102, 106, 134, 138, 148, 149, 137, 162, 161]. Multi threaded languages efficiently manage the low computation to communication ratio (R C) in fine grain parallelism by supporting several threads of control per node and switching to a new thread whenever a long latency operation is encountered. The fine grain threads offer better ....

James Laudon, Anoop Gupta, and Mark Horowitz. Interleaving: A multithreading technique targeting multiprocessors and workstations. In Proc. of the Sixth Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, pages 308--318, San Jose, Calif., Oct. 1994.


Design and Performance of Multithreaded Architectures - Thekkath (1995)   (Correct)

....performance and design issues of multiple hardware context architectures. 1. 1 Using Multithreading to Hide Long Latencies Multithreading processors tolerate the long memory latencies inherent in large scale sharedmemory multiprocessors [Smi81, HF88, KS88, ALKK90, ACC 90, Chi91, DKC 94, LGH94] A program executing on a multithreaded processor consists of one or more instruction streams, each of which is called a thread. The processor hardware supports multiple contexts, i.e. general purpose registers, program counter, etc. enabling several threads to be loaded simultaneously. At a ....

....Thus, processor utilization is increased, and an application makes progress even when some of its threads are stalled. Multithreaded architectures are usually classified as fine grain or coarse grain, depending on the context switch policy. Fine grain machines [Smi81, HF88, KS88, ACC 90, LGH94] have a small context switch interval, typically switching to a different thread every 3 cycle. They must thus provide a fast, zero cycle switching mechanism. By ensuring that instructions in the pipeline are from different program threads, these machines are capable of tolerating pipeline ....

[Article contains additional citation context not shown here]

J. Laudon, A. Gupta, and M. Horowitz. Interleaving: A multithreading technique targeting multiprocessors and workstations. Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 308--318, October 1994.


Improving Pointer-Based Codes Through Cache-Conscious Data.. - Chilimbi, Larus, Hill (1998)   (14 citations)  (Correct)

....performance is dominated by memory references. Moreover, the large cost disparity undercuts the fundamental random access memory (RAM) model used by most programmers to design data structures and algorithms. Many hardware and software techniques such as prefetching [32, 12] multithreading [28, 45], non blocking caches [23] dynamic instruction scheduling, and speculative execution have attempted to reduce or tolerate memory latency. These techniques require complex hardware and compilers, but have proven ineffective for many programs [10, 36] The fundamental problem with these ....

James Laudon, Anoop Gupta, and Mark Horowitz. "Interleaving: A multithreading technique targeting multiprocessors and workstations." In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 308--318, San Jose, California, 1994.


A Novel Multicast Scheme to Reduce Cache Invalidation.. - Zhou, Shi, Tang (2000)   (1 citation)  (Correct)

....Climbing Program, National Natural Science Foundation of China under grant no. 69896250 1 and 69973046. ated with remote cache operations, several latency tolerating schemes have been proposed, including relaxed memory consistency models[8] 9] prefetching[10] and multiple context processors[11]. However, latency tolerating techniques cannot essentially reduce the communication frequency and improve the communication efficiency, which are two important aspects to reduce the cache coherence overhead. We assume that there are n processor nodes having the shared readable copies, i.e. ....

J.Laudon, A.Gupta and Mark Horowitz, "Interleaving: A Multithreading Technique Targeting Multiprocessors and Workstations," Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 308318, 1994.


A Novel Multicast Scheme to Reduce Cache Invalidation.. - Zhou, Shi, Tang (2000)   (1 citation)  (Correct)

....Climbing Program, National Natural Science Foundation of China under grant no. 69896250 1. 1 loss associated with remote cache operations, several latency tolerating schemes have been proposed, including relaxed memory consistency models[8] 9] prefetching[10] and multiple context processors[11]. However, latency tolerating techniques cannot essentially reduce the communication frequency and improve the communication efficiency, which are two important aspects to reduce the cache coherence overhead. We assume that there are n processor nodes having the shared readable copies, i.e. ....

J.Laudon, A.Gupta, and Mark Horowitz. Interleaving: A multithreading technique targeting multiprocessors and workstations. In Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems, pages 308--318, 1994.


Characterizing and Removing Branch Mispredictions - Skadron (1999)   (Correct)

....schemes and evaluated their performance on a range of programs. While the hardware requirements of a four path approach are not insignificant, this technique substantially complements other likely directions for future microprocessors. These include clustered approaches, multithreaded processors [56], and particularly simultaneous multithreading (SMT) 99] Multipath speculation at conditional branches is largely an orthogonal way to make use of this hardware at times when other multithreading mechanisms do not. For example, Wallace, Calder, and Tullsen describe a system that combines SMT ....

J. Laudon, A. Gupta, and M. Horowitz. Interleaving: A multithreading technique targeting multiprocessors and workstations. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 308--318, Oct. 1994.


Multipath Execution: Opportunities and Limits - Ahuja, Skadron, Martonosi, Clark (1998)   (8 citations)  (Correct)

....schemes and evaluated their performance on a range of programs. While the hardware requirements of a 4 path approach are not insignificant, this technique substantially complements other likely directions for future microprocessors. These include clustered approaches, multithreaded processors [9], and particularly simultaneous multithreading (SMT) 15] Multipath speculation at conditional branches is largely an orthogonal way to make use of this hardware at times when other multithreading mechanisms do not. For example, Wallace, Calder, and Tullsen describe a system that combines SMT ....

J. Laudon, A. Gupta, and M. Horowitz. Interleaving: A multithreading technique targeting multiprocessors and workstations. In Proc. ASPLOS-IV, pages 308--318, Oct. 1994.


DataScalar Architectures and the SPSD Execution Model - Burger, Kaxiras, Goodman (1996)   (Correct)

.... in processor performance, increases of main memory sizes, and increases in the disparity between processor and memory cycle times [5] Memory latency tolerance reduction techniques such as non blocking caches [20, 11] hardware and software prefetching [6, 8, 7, 14, 19, 22] multithreading [21, 27], and out of order execution [32, 30] may reduce memory related processor stalls until available memory bandwidth is saturated. A recent study showed that aggressively latency tolerant processors owe fully half of their memory stall times to limited off chip bandwidth [4] Although scaling up ....

James Laudon, Anoop Gupta, and Mark Horowitz. Interleaving: A Multithreading Technique Targeting Multiprocessors and Workstations. In Proceedings of the Sixth Symposium on Architectural Support for Programming Languages and Operating Systems, pages 308--318, October 1994.


Bus Utilization Analysis of Multithreaded Shared-Bus.. - Giorgi, Foglia, Prete (1997)   (Correct)

.... operation [2, 24] switch on every instruction fetch is limited by the great amount of contexts necessary to keep the execution pipeline full [25] switch on miss works with a smaller degree of multithreading but causes inter thread conflictmisses, which can break the locality of each thread [26], and also may cause a loss of shared data that requires extra coherence overhead [15] instruction interleaving performing better on a uniprocessor, but the difference with the previous scheme becomes negligible on multiprocessors [26] Other variants are: simultaneous multithreading [27] ....

.... which can break the locality of each thread [26] and also may cause a loss of shared data that requires extra coherence overhead [15] instruction interleaving performing better on a uniprocessor, but the difference with the previous scheme becomes negligible on multiprocessors [26]. Other variants are: simultaneous multithreading [27] switch on block of instructions, switch on every load [28] and switch on load miss. Once a switch is decided, one of the available contexts is activated and several options have been presented in literature: thread priorization [29] MRU ....

J. Laudon, A. Gupta, and M. Horowitz, "Interleaving: A multithreading technique targeting multiprocessors and workstations," in Proceedings oh the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 308--318, Oct. 1994.


Evaluating the Performance of Multithreading and Prefetching.. - Bianchini, Lim (1996)   (1 citation)  (Correct)

....performance. To address this limitation, Alewife introduced the idea of block multithreading, where caches reduce the number of remote cache misses and context switches occur only on such misses. This allows for a less aggressive implementation of multithreading [4] Some recent architectures [21, 24] promise both single cycle context switches and high single thread performance by allowing instructions from multiple threads to interleave arbitrarily in the processor pipeline. The Tera MTA [5] takes an extreme approach by providing 128 contexts on each processor. At each cycle, the processor ....

J. Laudon, A. Gupta, and M. Horowitz. Interleaving: A Multithreading Technique Targeting Multiprocessors and Workstations. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VI), pages 308--318, San Jose, CA, October 1994. ACM.


Limits On The Performance Benefits Of Multithreading And.. - Lim, Bianchini (1995)   (7 citations)  (Correct)

....To address this limitation, Alewife introduced the idea of block multithreading, where caches reduce the number of remote cache misses and context switches occur only on such misses. This allows Alewife to use a less aggressive implementation of multithreading [3] Some recent hardware designs [18, 21] try to provide both single cycle context switches and high single thread performance by allowing instructions from multiple threads to coexist and interleave arbitrarily in the processor pipeline. The Tera MTA [4] takes an extreme approach by providing each processor with 128 hardware contexts ....

James Laudon, Anoop Gupta, and Mark Horowitz. Interleaving: A Multithreading Technique Targeting Multiprocessors and Workstations. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VI), pages 308--318, San Jose, CA, October 1994. ACM.


Balanced Multithreading: Increasing Throughput via a.. - Tune, Kumar, Tullsen, .. (2004)   (Correct)

No context found.

J. Laudon, A. Gupta, and M. Horowitz. Interleaving: A multithreading technique targeting multiprocessors and workstations. In Sixth International Conference on Architectural pages 308--318, Oct. 1994.


Proceedings of 12th Intl Conference on Parallel.. - Initial Observations Of   (Correct)

No context found.

J. Laudon, A. Gupta, and M. Horowitz. Interleaving: A multithreading technique targeting multiprocessors and workstations. In Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 308--318, Oct. 1994.


Efficient Remapping Mechanisms for an Adaptable Memory System - Zhang (2002)   (Correct)

No context found.

J. Laudon, A. Gupta, and M. Horowitz. Interleaving: A multithreading technique targeting multiprocessors and workstations. In Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 308-318, San Jose, CA USA, Oct. 1994.


Exploiting Thread-Level Parallelism On . . . - Lo (1998)   (Correct)

No context found.

J. Laudon, A. Gupta, and M. Horowitz. Interleaving: A multithreading technique targeting multiprocessors and workstations. In Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 308--318, October 1994.


Hardware and Software Mechanisms for Multithreading in.. - Bradford (2001)   (Correct)

No context found.

James Laudon, Anoop Gupta, and Mark Horowitz. Interleaving: A multithreading technique targeting multiprocessors and workstations. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, 1994.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC