35 citations found. Retrieving documents...
B. Boothe and A. Ranade. Improved multithreading techniques for hiding communication latency in multiprocessors. In Proceedings of the Nineteenth International Symposium on Computer Architecture, 1992.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Processor Management Policies for Multiprocessors - Yu (1994)   (Correct)

....to avoid the long latency by exploiting the locality of reference. However, processor utilization is not improved much even though the cache hit rate is high (measurements with the DASH [11] multiprocessor shows 20 28 of utilization with a 66 80 hit ratio with a release consistency memory model [64]) Multithreading has been introduced to tolerate the network latency by overlapping remote memory access of one thread with the computation of other threads [61] 65] Analysis of the multithreaded processor has been studied in [66] 67] However, we believe that it would be preferable to reduce ....

B.Boothe and A.Ranade, "Improved Multithreading Techniques for Hiding Communication Latency in Multiprocessors," Proc. Int. Symp. Comput. Arch., pp.214-223, 1992.


Dynamic Characteristics of Multithreaded Execution in.. - Hirofumi Sakane.. (1995)   (1 citation)  (Correct)

....data allocated to another processor. This remote memory latency is often regarded as the main bottleneck towards high performance. Various techniques have been developed to reduce tolerate hide the latency including data partitioning, runtime load balancing, multithreading, coherent cache, etc. [1, 2, 3, 4, 6, 7]. Of these techniques multithreading is known to be a latency tolerance approach. The idea behind multithreading is to overlap computation and communication such that the effect of communication is minimal, if not negligible. Studies have indicated that multithreading is effective for a ....

....with packet based dataflow execution for synchronization and message handling support. The EMC Y processor consists of Switching Unit (SU) Input Buffer Unit(IBU) Matching Unit a b c d e f g h a b c d e f g h (ca, ga) ca : column address ga : group address [0, 0] 0, 1] [0, 2] [0, 3] 1, 0] 1, 1] 1, 2] 1, 3] 2, 0] 2, 1] 2, 2] 2, 3] Figure 2: Circular Omega Network (12 PEs) MU) Execution Unit (EXU) Output Buffer Unit(OBU) and Memory Control Unit(MCU) The EXU includes a packet generation mechanism and a RISC pipeline which executes a sequential ....

[Article contains additional citation context not shown here]

Boothe,B., Ranade,A. Improved Multithreading Techniques for Hiding Communication Latency in Multiprocessors, Proc. 19th Annual Int. Symp. on Computer Architecture, (1992), pp.214-223.


Analysis of Pipeline Stall Effects in Block Multithreaded.. - Zuberek (2000)   (Correct)

....arguably the best example of such improvements. Two popular approaches to instruction level parallelism are known as superscalar and VLIW (very long instruction word) architectures in which several instructions can be issued in a single processor cycle [15] and instruction level multithreading [1, 3, 4], and in particular, block multithreading [2, 6] Block multithreading tolerates long latency memory accesses and synchronization delays by switching to another thread rather than waiting for the completion of a long latency operation which, in a distributed memory system, can require a hundred or ....

Boothe, B. and Ranade, A., \Improved multithreading techniques for hiding communication latency in multiprocessors"; Proc. 19-th Annual Int. Symp. on Computer Architecture, pp.214-223, 1992.


Design and Performance of Multithreaded Architectures - Thekkath (1995)   (Correct)

.... the HEP, several flavors of multithreaded architectures have been proposed or built [HF88, Ian88, KS88, ALKK90, ACC 90, Chi91, KD92, HKN 92, LGH94, TEL95] There have also been a lot of studies on the performance of multithreaded architectures [WG89, SBCvE90, Aga92, PW91, DT91, YST 94, BR92, NGGA93, TE94b, TE94a] A recent example of a fine grained multithreaded multiprocessor is the Tera computer [ACC 90] Like the HEP, the Tera s processor has 128 hardware contexts and switches every cycle with a zero cycle penalty, ensuring that all instructions in the pipeline 4 are from ....

....switching, and the second uses fine grain multithreading to improve uniprocessor performance. The Boothe and Ranade Proposal To avoid excessive context switching in a multithreaded processor, Boothe and Ranade propose an explicit context switch instruction to group together shared accesses [BR92] This grouping is done by the compiler based on dataflow and data dependence analysis of 80 the program. The compiler explicitly inserts the context switch instruction after scheduling a group of memory accesses. As they show, this can improve the latency hiding property of multithreading, but ....

B. Boothe and A. Ranade. Improved multithreading techniques for hiding communication latency in multiprocessors. 19th Annual International Symposium on Computer Architecture, pages 214--223, May 1992.


Performance Modeling of Multithreaded Distributed Memory.. - Zuberek Department Of (1999)   (Correct)

....switches that control the traffic between the nodes. Instruction reordering is one of the approaches used to alleviate the problem of divergent processor and memory performances. Multithreading is another approach which combines software (compilers) and hardware (multiple thread contexts) means [1, 5, 15]. Multithreading is an architectural approach to tolerating long latency memory accesses and synchronization delays in distributed memory systems. The 64 general idea is quite straightforward. When a long latency memory operation occurs, the processor instead of waiting for its completion ....

Boothe, B. and Ranade, A., "Improved multithreading techniques for hiding communication latency in multiprocessors"; Proc. 19-th Annual Int. Symp. on Computer Architecture, pp.214-223, 1992.


Approximate Performance Evaluation of Multi- Threaded Distributed .. - Zuberek (1999)   (Correct)

....results obtained by simulation of a detailed model of the analyzed architecture. 1. Introduction Multithreading is a technique of tolerating long latency memory accesses and synchronization delays in distributed memory systems. The basic idea of multithreading is quite straightforward [BH95, BR92]; instead of waiting for the completion of long latency memory accesses (which in distributed memory systems can require hundreds of processor cycles) the processor suspends the current thread, switches to another thread waiting for execution (provided such a thread is available) and executes ....

Boothe, B. and Ranade, A., "Improved multithreading techniques for hiding communication latency in multiprocessors"; Proc. 19-th Annual Int. Symp. on Computer Architecture, pp.214-223, 1992.


Space-Efficient Scheduling of Multithreaded Computations - Blumofe, Leiserson (1998)   (30 citations)  (Correct)

....threads and arrange the instructions of each thread into a fixed sequential order at compile time. At run time, a scheduler dynamically orders execution of the threads. Other systems employ schedulers that dynamically order threads based on the availability of data in shared memory multiprocessors [1, 10, 23] or message arrivals in message passing multicomputers [2, 17, 29, 44] Rapid execution of a multithreaded computation on a parallel computer requires exposing and exploiting parallelism in the computation by keeping enough threads concurrently alive to keep the processors of the computer busy. If ....

<F3.755e+05> B. Boothe and A.<F3.854e+05> Ranade,<F4.047e+05> Improved multithreading techniques for hiding communication latency in<F3.854e+05> multiprocessors, in Proc. of the 19th Annual Intl. Symposium on Computer Architecture, Gold Coast, Australia, IEEE Computer Society Press, Los Alamitos, CA, 1992, pp. 214--223.


A Queuing Model of Multithreading: A Case Study - Vlassov Thorelli   (Correct)

..... 5 1: Introduction Most of recent scalable shared memory architectures typically provide different combinations of latency reducing and tolerating mechanisms, such as coherent cacheing, weak ordering, data prefetching, and multithreading [3, 6, 8]. Multithreading [11, 15, 18] is used for hiding long memory latency in multiprocessor systems, and aims to increase system efficiency. A number of threads are allocated to a processing node which switches thread contexts according to some context switch policy, such as switch on cache misses, ....

....[9] and the Tera MTA [4] can be mentioned as examples of multiprocessors whose nodes support fine multithreading. Block multithreading explicitly aims at tolerating long remote memory latency or synchronization latency in large scale multiprocessors. Studies of such multithreading technique [1, 2, 6, 8, 26, 31] allowed the conclusion that a block multithreaded processor with as small number of hardware supported contexts as 2 4 can achieve high efficiency by switching contexts on cache misses. However, as mentioned in [19] one can intuitively assume that more threads are required to cover long ....

B. Boothe, A. Ranade, "Improved Multithreading Techniques for Hiding Communication Latency in Multiprocessors", Proc. 19th Ann. Int. Symp. on Comp. Arch., pp. 241-223, 1992.


Space-Efficient Scheduling of Multithreaded Computations - Blumofe, Leiserson (1993)   (30 citations)  (Correct)

....threads and arrange the instructions of each thread into a fixed sequential order at compile time. At run time, a scheduler dynamically orders execution of the threads. Other systems employ schedulers that dynamically order threads based on the availability of data in shared memory multiprocessors [1, 10, 23] or message arrivals in message passing multicomputers [2, 17, 29, 44] Rapid execution of a multithreaded computation on a parallel computer requires exposing and exploiting parallelism in the computation by keeping enough threads concurrently alive to keep the processors of the computer busy. If ....

B. Boothe and A. Ranade, Improved multithreading techniques for hiding communication latency in multiprocessors, in Proceedings of the 19th Annual International Symposium on Computer Architecture, Gold Coast, Australia, May 1992, pp. 214--223.


Exploiting the Locality of Data Structures in.. - Kim, Kim, Rhee, Kim, ..   (Correct)

....that IB structure is superior to I structure over several benchmarks. 1 Introduction The conventional von Neumann multiprocessors and dataflow computers are in the opposite ends of a spectrum of multiprocessor systems. They have complementary advantages each other. The multithreaded architectures[3,4,5,7] based on a hybrid of dataflow and von Neumann computational models are attractive in the sense that we can exploit the advantages of both models. In addition, they opened a road to the use of conventional microprocessor in highly parallel machine, thus exploiting the locality of computation and ....

B. Boothe and A. Ranade, "Improved Multithreading Techniques for Hiding Communication Latency in Multiprocessor," Proc. 19th Annual Int'l Symp. on Computer Architecture, pp. 156167, May 1992.


Latency Tolerance: A Metric for Performance Analysis of.. - Shashank Nemawarkar   (Correct)

....to another computation thread, when a long latency is encountered. Multiple outstanding requests for multiple threads at a processor increase the latencies. An informal notion of latency tolerance is that if the processor utilization is high due to multithreading, then the latencies are tolerated [3, 5]. However, there is no clear understanding of the latency tolerance. Performance of multithreaded architectures has been studied using analytical models [2, 1] and simulations of single and multiple processor systems [10, 9, 3] Kurihara et al. 6] show how the memory access costs are reduced ....

....0.4 0.6 0.8 1 0 5 10 15 20 0 0.2 0.4 0.6 0.8 1 p remote Number of Threads, n t Tolerance Index, tol network Figure 3: tol network at R = 20. Impact of a Thread Partitioning Strategy A thread partitioning strategy strives to minimize communication overheads and to maximize the exposed parallelism [5]. Let us assume that our thread partitioning strategy varies n t and adjusts R such that n t Theta R is constant. 2 Figure 4 shows tol network with respect to n t and R. We highlight certain values of n t Theta R from Figure 4 in Table 3 and Figure 5. Table 3 shows that at a fixed value of p ....

[Article contains additional citation context not shown here]

B. Boothe and A. Ranade. Improved multithreading techniques for hiding communication latency in multiprocessor. In Proc. of the 19th ISCA, 1992.


The Effectiveness of Multiple Hardware Contexts - Thekkath, Eggers (1994)   (21 citations)  (Correct)

....multiple threads in a single cache architecture increases cache conflicts. Both aspects of multithreading performance, processor utilization and cache network traffic, have been investigated in the past, using studies based on simulation and analytical modeling of multithreaded processors [2, 6, 19, 26]. In this paper we address the performance bottom line, evaluating whether the benefits outweigh the drawbacks for coarse grain multithreading, and whether the extra hardware needed for multiple contexts has adequate payoffs. We approach the questions from several different angles. First, we ....

....that varies the number of contexts, the network latency, context switch times, and remote reference rate. They also incorporate cache performance degradation in their model) The study shows that few contexts cannot effectively hide very long memory latencies. The work by Boothe and Ranade [6] proposes an explicit context switch instruction to group together shared accesses,and hence avoid excessive context switching. As they show, this can improve multithreading s latency hiding property, but the processor incurs the extra cost of managing multiple memory requests per thread. A bigger ....

B. Boothe and A. Ranade. Improved multithreading techniques for hiding communication latency in multiprocessors. 19th Annual International Symposium on Computer Architecture, pages 214--223, May 1992.


Timed Petri Net Models of Multithreaded Multiprocessor.. - Govindarajan, Suciu, al. (1997)   (Correct)

....partially ordered threads, and a thread consists of a sequence of instructions which are executed in the conventional von Neumann model. Scheduling of different threads follows the data driven approach. Switching from one thread to another is performed according to one of the following policies [7]: ffl Switching on every instruction: the processor switches from one thread to another every cycle. In other words, it interleaves the instructions from different threads on a cycle by cycle basis [21] ffl Switching on block of instructions: blocks of instructions from different threads are ....

Boothe, B. and Ranade, A., "Improved multithreading techniques for hiding communication latency in multiprocessors"; Proc. 19-th Annual Int. Symp. on Computer Architecture, pp.214-223, 1992.


Performance Bounds for Distributed Memory Multithreaded.. - Zuberek, Govindarajan (1998)   (Correct)

....Since careful allocation of the CPU resource is vital for efficient execution of many applications, a larger thread size has to be tolerated so that suitable scheduling decision can be made. Several multithreaded architectures have been proposed which differ in the implementation of multithreading [1, 4, 5, 6, 8, 10, 12]. They differ in two basic aspects, in the number of instructions executed before switching to another thread (one, several, as many as possible) and the cause of context switching (every load, remote load) It is assumed in this paper that context switching can be performed very efficiently (in ....

Boothe, B. and Ranade, A., "Improved multithreading techniques for hiding communication latency in multiprocessors"; Proc. 19-th Annual Int. Symp. on Computer Architecture, pp.214223, 1992.


Timed Petri Net Models of Multithreaded Multiprocessor.. - Govindarajan, Suciu.. (1997)   (Correct)

....ordered threads, and a thread consists of a sequence of instructions which are executed in the conventional von Neumann model. Scheduling of different threads follows the data driven approach. In a multithreaded model, switching from one thread to another follows one of the following policies [BR92]: ffl Switch on every instruction: The processor switches from one thread to another every cycle. In other words, it interleaves the instructions from different threads on a cycleby cycle basis [Sm81] ffl Switch on block of instructions: Blocks of instructions from different threads are ....

Boothe, B. and Ranade, A., "Improved multithreading techniques for hiding communication latency in multiprocessors"; Proc. 19-th Annual Int. Symp. on Computer Architecture, pp.214-223, 1992.


Software-Based Communication Latency Hiding for Commodity.. - Strumpen (1996)   (2 citations)  (Correct)

....some thread, ready for useful computation when another thread blocks in some operation, for example a communication operation. Multithreading is now widely accepted as a means for hiding latencies, not only for memory load operations but also for hiding interprocessor communication latencies [5, 7, 12, 13, 15]. For multicomputers, multithreading and asynchronous message passing, such as offered by Intel s NX interface [22] are widely used latency hiding mechanisms. Nevertheless, they require careful structuring of parallel programs. With multithreading the programmer must identify independent threads ....

B. Boothe and A. Ranade. Improved multithreading techniques for hiding communication latency in multiprocessors. In 19th Annual International Symposium on Computer Architecture, pages 214--223, Queensland, Australia, May 1992.


Advanced Vector Architectures - Espasa (1997)   (Correct)

....schemes. 7.8 RELATED WORK Multithreading for scalar programs has received much attention in recent years [ALKK90, Aga92, TEE 95, TEL96, HKN 92, EJK 96] and has been found to be generally useful. The latency properties of multithreading have been asserted by several researchers [LB96, BR92] Research has produced many alternative multithreaded designs [GHG 91, GNL95, GB96] most focusing on extending high performance RISC cores with extra instructions or synchronization primitives to exploit thread level parallelism. Designs combine several degrees of hardware and software ....

B. Boothe and A. Ranade. Improved multithreading techniques for hiding communication latency in multiprocessors. In Proceedings of the 19th Annual International Symposium on Computer Architecture, pages 214--223, May 1992.


Evaluating the Performance of Multithreading and Prefetching.. - Bianchini, Lim (1996)   (1 citation)  (Correct)

....strategies. We significantly extend these contributions by modeling software prefetching both in terms of processor utilization and gain, defining a range of run lengths where prefetching is profitable, and comparing this range to experimentally observed run lengths. Boothe and Ranade [8] use a compiler based approach that combines prefetching with multithreading. In their scheme, hardware provides prefetch and context switch instructions. The compiler issues prefetches for a groups of shared memory accesses before context switching. Grouping prefetches increases run lengths and ....

B. Boothe and A. Ranade. Improved Multithreading Techniques for Hiding Communication Latency in Multiprocessors. In Proceedings of the 19th International Annual Symposium on Computing Architecture (ISCA), pages 214--223, Gold Coast, Australia, June 1992. IEEE.


Data Prefetching for High-Performance Processors - Chen (1993)   (24 citations)  (Correct)

....91] If several threads are assigned to a processor, memory latencies can be masked by rapidly context switching to a different thread rather than waiting for a memory reference to complete. The two key issues for implementing multiple context processors are: when is context switching performed [Boothe Ranade 92, Laudon et al. 92] 4 and what defines a context [Hum Gao 91] Variations on these issues include conditionalswitch, switch on cache miss, and switch every cycle. The cache coherence, or cache consistency, problem [Archibald Baer 86] arises in shared memory multiprocessors where several ....

Boothe, B. and Ranade, A. (1992). Improved multithreading techniques for hiding communication latency in multiprocessors. In Proc. of the 19th Annual Intl. Symp. on Computer Architecture, pages 214--223.


Thread Integration for Error Detection and Performance - Dean (1997)   (1 citation)  (Correct)

....and perhaps on a multi threaded processor. Running multiple threads in a uniprocessor requires context switching. The overhead of switching between the two threads can be significant when performed by the operating system. This reduces system throughput and increases response times [Agar 92] Boot 92] Alternatively a uniprocessor capable of supporting simultaneous multithreading can be used. However, a recent study [Chou 96] indicates that significant hardware and possibly cycle time overheads are required to implement a uniprocessor that is capable of supporting simultaneous multithreading. ....

B. Boothe, A. Ranade. "Improved Multithreading Techniques for Hiding Communication Latency in Multiprocessors," Proceedings of the 19th International Symposium on Computer Architecture, pp. 213-223, May 1992


Latency Tolerance: A Metric for Performance Analysis of.. - Nemawarkar, Gao   (Correct)

....3 0.7 13.1 53.6 0.0174 49.69 0.543 Table 2: Network Latency Tolerance, with R = 10 and R = 20. Impact of a Thread Partitioning Strategy on Latency Tolerance Performance objectives of a thread partitioning strategy are to minimize communication overheads and to maximize the exposed parallelism [25, 9]. Recall from Section 2 that our model assumes the threads as iterations of a doall loop. So, performance related questions are: How many iterations should be grouped into each thread And, how do the workload parameters affect the tolerance Let us assume that our thread partitioning strategy ....

....tol network is surprisingly high. In Figure 7, when R L, the memory subsystem has more effect on tol network , so tol network lines for n t Theta R = constant converge (the next section discusses the details) When R L, we note that: 5 This is similar to the grouping of accesses by Boothe [9] to improve R. For a large grouping, the message size will affect S, the routing delay on switch. Here, we will ignore this effect. ffl A high value of n t Theta R exposes more computation at a time, so tol network is high. ffl A high R (than a high n t ) provides better latency tolerance, as ....

B. Boothe and A. Ranade. Improved multithreading techniques for hiding communication latency in multiprocessor. In Proc. of the 19th Int'l. Symp. on Computer Architecture, 1992.


General Purpose Parallel Computing - McColl (1993)   (64 citations)  (Correct)

....is available in the machine, then the available parallel slackness can be effectively exploited to hide the kind of network latencies one finds in distributed memory architectures. The only requirement is that the processors provide efficient support for multithreading and fast context switching [44, 228]. Latency tolerance via multithreading is likely to be more effective on large scale general purpose parallel computing systems than the use of complex caching schemes for latency reduction [111, 175] The idea of exploiting parallel slackness can even be carried over into the area of sequential ....

....distributed memory architectures are based on conventional microprocessors [119, 120, 208] We need alternative processor designs which can support a very large number of lightweight threads simultaneously, and can provide fast context switching, message handling, address translation, hashing etc. [44, 69, 71, 130, 274]. If such designs are not produced then we may find that the processors, and not the communications network, will be the bottleneck in the system. We need to continue to develop improved networks for communication [69, 76, 170, 209, 222] and synchronisation [40, 138, 159] There is currently great ....

B Boothe and A Ranade. Improved multithreading techniques for hiding communication latency in multiprocessors. In Proc. 19th Annual International Symposium on Computer Architecture, 1992. To appear.


Performance Balancing in Multithreaded Multiprocessor Systems - Zuberek, al. (1998)   (Correct)

....nodes) In the multithreaded execution model, a program is a collection of partially ordered threads, and a thread consists of a sequence of instructions which are executed in the conventional von Neumann model. Switching from one tread to another can be performed according to different policies [4]; switching on every load [3, 7] is assumed in this paper. That is, if the currently executed instruction issues an operation of accessing either a local or a remote memory location, the execution of the current thread suspends, and another ready thread is selected for execution. When the ....

Boothe, B. and Ranade, A., "Improved multithreading techniques for hiding communication latency in multiprocessors"; Proc. 19-th Annual Int. Symp. on Computer Architecture, pp.214-223, 1992.


Performance Characterization of A Multithreaded Architecture.. - Fi Ts   (Correct)

....of computation and communication overlap on the performance of processor, memory and network subsystems has not been systematically studied. So far, performance evaluation studies of multithreaded architectures have focused on how the design trade offs affect the performance on benchmark programs [5, 22, 21, 13]. The performance gains on a benchmark program is a combined effect of all architectural and program workload optimizations, and to optimize the performance, the number of parameters (which may need to be considered) is too large. For traditional multiprocessor systems, studies like Woo et al. ....

....to analyze and optimize the performance of a program workload on a multithreaded multiprocessor system like the EARTH. Many studies based on simulation and system measurements report the performance of multithreaded architectures using benchmark program workloads, e.g. Agarwal et al. [1] Boothe [5], Hum et al. [11] Thekkath [21] and Weber [23] Their approach shows the combined effectiveness of multithreading along with various optimizations of program workload and architectures. However, the impact of individual parameters on the performance is not known, e.g. it is not clear how many ....

[Article contains additional citation context not shown here]

B. Boothe and A. Ranade. Improved multithreading techniques for hiding communication latency in multiprocessor. In Proc. of the 19th Int'l. Symp. on Computer Architecture, 1992.


A Fine-Grain Multithreading Superscalar Architecture - Mat Loikkanen (1996)   (5 citations)  (Correct)

....to be the execution of a multi threaded program on a processor that is able to exploit the existence of multiple threads in increasing performance. The execution of multiple threads, or contexts, on a multithreading processor has been considered as a means for hiding the latency of long operations [1, 8] and for feeding more independent instructions to a processor with lookahead out of order capabilities and, thus, exposing more instruction level parallelism [2, 3, 8] The negative effect that a long latency instruction can have on the performance of a processor is clear: processing generally ....

Boothe, B.; Ranade, A. "Improved Multithreading Techniques for Hiding Communication Latency in Multiprocessors," 19th Annual International Symposium on Computer Architecture, pp. 214-223, May 1992.


Performance Evaluation of Multithreaded Architectures with Data.. - Vlassov   (Correct)

....multithreading. Data caching allows keeping copies of remote data in local memory and decreases remote access ratio. A number of caching techniques, such as non blocking and prefetching caches, have been developed to eliminate enough of the remote memory accesses and to hide long memory latency [3, 4, 5, 6]. In spite of the memory consistency problem in shared memory multiprocessors which needs system resources to maintain cache coherency, caching continues to be a subject of considerable interest. Caching already includes data prefetching, even if a cache is not specially constructed to support ....

....before the data is actually needed by the running process. Like caching, explicit prefetching provides local access to remote data, which have been requested by prefetch operations executed in advance. Mainly, explicit prefetching is used by compilers as a techniques of code optimization [3, 5, 9]. Its efficiency strongly depends on predictability of remote reference sequences. Multithreading is a general solution to the latency problem. A number of threads is assigned to the same processing node and shares its resources: processing time, memory, etc. When an active thread becomes ....

[Article contains additional citation context not shown here]

B. Boothe and A. Ranade, "Improved Multithreading Techniques for Hiding Communication Latency in Multiprocessors", in Proc. 19th Ann. Int. Symp. on Comp. Arch., pp. 241-223, 1992.


Performance Analysis of Multithreaded Architectures using an.. - Nemawarkar, Gao (1996)   (2 citations)  (Correct)

....(R L 2 p remote S obs ) This value represents the wait time for a thread at all queueing nodes. Each thread spends a duration R at the processor, so for n t threads, we obtain U p as n t R wait time at all queueing nodes . Similar approach of an assumed, fixed network load is used by Boothe [10] and Thekkath [29] to study various aspects in multithreading. This naive model works well when n t = 1. Let us assume that S obs is 27.33, i.e. its un loaded value. Substituting values of R, L and p remote in (R L 2 p remote S obs ) we obtain U p as 10 10 10 2 Thetapremote Theta27:33 ....

....approximation to apply AMVA. Our results show the effectiveness of multithreading to tolerate long latencies. In particular, we have identified the role of network capacity on the network latency and processor utilization. Simulation studies also report the performance benefits of multithreading [31, 10, 29]. Weber [31] shows the differences in performance gains due to multithreading, because of variations in the bus traffic. While Thekkath s results [29] indicate the need for tuning multithreaded workload, Boothe [10] suggests compiling techniques for multithreading. While confirming these results, ....

[Article contains additional citation context not shown here]

B. Boothe and A. Ranade. Improved multithreading techniques for hiding communication latency in multiprocessor. In the 19th ISCA, 1992.


The Nexus Approach to Integrating Multithreading and Communication - Foster (1996)   (145 citations)  (Correct)

....execution; heterogeneous environments, in which parallel computers are embedded into networks, with variable and unpredictable latencies; and the integration of parallel computers into client server applications. Multithreading has proven useful for overlapping computation and communication or I O [7, 21], for load balancing [14] and for implementing process abstractions in parallel languages [10, 12, 25, 28, 33, 44] A difficult issue that must be addressed if multithreading is to be used successfully in distributed memory environments is the integration of threads and communication. In ....

R. Boothe and A. Ranade. Improved multithreading techniques for hiding communication latency in multiprocessors. ACM SIGARCH Computer Architecture News, 20(2), 1992.


Hardware And Software For Functional And Fine Grain Parallelism - Beckmann (1993)   (16 citations)  (Correct)

....It does not require single assignment properties, and readily handles anti and output dependences. Chapters 2, 3 and 4 describe compiler algorithms needed to support our hardware, that are both practical and efficient. Multithreading has received a great deal of attention in the recent literature [18, 47, 104, 113, 2, 79, 77, 17, 111]. Almost all of these proposals involve changes to the hardware organization. Waldspurger and Weihl describe an approach in which the compiler can partition a large physical register file to provide efficient use of registers and a large number of threads, given a limited physical register file ....

....3.111 8.379 61.060 MDG 1.000 3.142 24.560 4.237 10.662 74.915 DYFESM 1.000 3.255 25.807 2.589 9.355 77.009 ARC2D 1.000 2.616 18.775 2.734 7.582 56.059 TRFD 1.000 3.156 24.715 2.669 9.137 73.815 FLO52Q 1.000 2.428 16.705 2.291 6.575 49.407 SPEC77 1.000 2.531 17.839 3.602 8. 195 54.119 multithreading [18, 47, 104, 113, 2, 79, 77, 17, 111], and various forms of dynamic instruction scheduling [106, 81, 14] The focus here is not on the details of any particular method, but rather on the characteristics common to all of them. Let the window size W denote the number of loop iterations that execute concurrently on a single processor at ....

Bob Boothe and Abhiram Ranade. Improved multithreading techniques for hiding communication latency in multiprocessors. In Proceedings of the 19th International Symposium on Computer Architecture, pages 214--223, May 1992.


Hiding Miss Latencies with Multithreading on the Data.. - Muller, Stallard, Warren (1995)   (Correct)

....latency hiding are more effective for their machine. The benefit of multithreading varied according to the application from good to very poor. Much of the negative effects can be attributed to the interference between threads in the small (4 Kbytes) direct mapped coherent caches. Boothe and Ranade [8] produced results for predicted performance for much larger machines, up to 1024 nodes. Their model was quite simple, assuming a fixed latency for the network and hence not modelling the network congestion expected with larger machines. The simulations for this work were based on a smaller number ....

B. Boothe and A. Ranada. Improved Multithreading Techniques for Hiding Communication Latency in Multiprocessors. In Proc. of the 19th ISCA, pp 214--223, Gold Coast, Australia, May 1992. IEEE Computer Society Press.


Timed Colored Petri Net Models of Distributed Memory.. - Zuberek.. (1998)   (1 citation)  (Correct)

....interprocess communication and to alleviate operating system overheads. Several multithreaded architectures have recently been proposed which differ in the implementation of multithreading [1, 3, 5, 7, 9] Switching from one thread to another can be performed under different circumstances [4]: ffl Switching on every instruction: one instruction is picked from each of runnable threads and is inserted into the processor s pipeline; if there are many threads, then each stage of the pipeline is executing an instruction from a different thread, and no instruction dependency problems ....

Boothe, B. and Ranade, A., "Improved multithreading techniques for hiding communication latency in multiprocessors"; Proc. 19-th Annual Int. Symp. on Computer Architecture, pp.214--223, 1992.


Hardware and Software Mechanisms for Multithreading in.. - Bradford (2001)   (Correct)

No context found.

B. Boothe and A. Ranade. Improved multithreading techniques for hiding communication latency in multiprocessors. In Proceedings of the Nineteenth International Symposium on Computer Architecture, 1992.


Real Time Behavior Of Multithreaded Processors - Lioupis, al.   (Correct)

No context found.

B. Boothe, A. Ranade, "Improved Multithreading Techniques for Hiding Communication Latency in Multiprocessors ", Proc. of the 19th ISCA, 1992.


Exploring Cache Performance in Multithreaded Processors - Lioupis, Milios   (Correct)

No context found.

B. Boothe, A. Ranade, "Improved Multithreading Techniques for Hiding Communication Latency in Multiprocessors", Proc. of the 19th ISCA, 1992.


Exploring the Cache Design Space in a Multithreaded Processor - Dimitris Lioupis   (Correct)

No context found.

B. Boothe, A. Ranade, "Improved Multithreading Techniques for Hiding Communication Latency in Multiprocessors", Proc. of the 19th ISCA, 1992.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC