23 citations found. Retrieving documents...
R. Thekkath and S. Eggers. The effectiveness of multiple hardware contexts. In Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, 1994.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Analysis of the Effectiveness of Multithreading for.. - Pattery, Lee, Won (2002)   (Correct)

....details are discussed. Finally, Section 6 presents a conclusion and future direction. 2. Related Work Multithreading was originally proposed for tolerating long latency events in parallel processing systems, such as cache misses or local memory misses that require remote memory accesses [7]. This is done by executing a new thread of computation on a new hardware context, thereby overlapping memory latency with computation. Multithreading has also been proposed for modern superscalar processors to exploit thread level parallelism (TLP) Multiple threads of execution are generated ....

R. Thekkath and S. J. Eggers, The Effectiveness of Multiple Hardware Contexts, 6th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VI), San Jose, California, October 1994, pp. 328-337.


Bus Utilization Analysis of Multithreaded Shared-Bus.. - Giorgi, Foglia, Prete   (Correct)

....The global amount of shared data may increase, because the number of running processes is far larger than in the case of single thread processors. The overall performance is a function of cache interference between multiple contexts, which therefore depends on the features of the application [15, 16], the number of contexts per processor, the context switching strategy, the cache features, and the coherence protocol. Trace driven simulation has been typically used to evaluate the performance of such machines [17, 18] Some important aspects concerning accuracy [19, 20, 21] must be taken into ....

.... of contexts necessary to keep the execution pipeline full [25] switch on miss works with a smaller degree of multithreading but causes inter thread conflictmisses, which can break the locality of each thread [26] and also may cause a loss of shared data that requires extra coherence overhead [15]; instruction interleaving performing better on a uniprocessor, but the difference with the previous scheme becomes negligible on multiprocessors [26] Other variants are: simultaneous multithreading [27] switch on block of instructions, switch on every load [28] and switch on load miss. Once a ....

R. Thekkath and S. J. Eggers, "The effectiveness of multiple hardware contexts," in Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 328--337, October 1994.


ILP versus TLP on SMT - Mitchell, Carter, Ferrante, Tullsen (1999)   (3 citations)  (Correct)

....increasingly coarser, thread level parallelism (also known as fork and join parallelism) Distributed memory machines enable message passing parallelism. Architectural evaluation: Thekkath and Eggers evaluated the benefit of a varying number of hardware contexts with a fixed number of threads [15]. They found that multiple contexts benefit locality optimized codes more Appeared in the Proceedings of Supercomputing 99. 9 0 100 200 Implementations (from best to worst) 0 50 100 150 200 predicted actual Figure 10: Predicting performance of integer sort with fourth degree terms gives an R ....

Radhika Thekkath and Susan J. Eggers. The effectiveness of multiple hardware contexts. In Architectural Support for Programming Languages and Operating Systems, pages 328--337, San Jose, CA, October 1994.


Latency Tolerance: A Metric for Performance Analysis of.. - Shashank Nemawarkar   (Correct)

....is high due to multithreading, then the latencies are tolerated [3, 5] However, there is no clear understanding of the latency tolerance. Performance of multithreaded architectures has been studied using analytical models [2, 1] and simulations of single and multiple processor systems [10, 9, 3]. Kurihara et al. 6] show how the memory access costs are reduced with 2 threads. Our conjecture, however, is that the memory access cost is not a direct indicator of how well the latency is tolerated. The objectives of this paper are, to quantify the latency tolerance, to analyze the latency ....

R. Thekkath and S. Eggers. The effectiveness of multiple hardware contexts. In Proc. of the 6th ASPLOS, 1994.


\Xi\Pi \Theta \Gamma\Delta - Mcgill University   (Correct)

....to good data locality. We believe that the multithreaded program execution model has great potential to offset the performance loss that is due to imperfect data partitioning with its ability to tolerate communication and synchronization latencies inherent in multiprocessor architectures [1, 3, 11, 13, 14]. With multithreading support, a processor switches quickly among a set of ready threads whenever it encounters a long latency operation. Thus, by overlapping the communication on one thread with computation in other threads, the processor utilization as well as the program execution time ....

Radhika Thekkath and Susan J. Eggers, "The Effectiveness of Multiple Hardware Contexts, " in Proc. of the Sixth Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, San Jose, Calif., pp. 328--337, Oct. 1994.


Evaluating the Performance of Multithreading and Prefetching.. - Bianchini, Lim (1996)   (1 citation)  (Correct)

....Tera does not have caches, and requires a large number of concurrent threads and high network bandwidth to achieve high processor utilization. Previous experimental research on multithreading performance shows that multithreading is effective at tolerating memory latencies for some applications [17, 33, 18, 32]. Previous analytical research [2, 29, 27, 14] focuses on modeling processor utilization and predicting the number of contexts needed for good processor utilization. In contrast, this paper combines analytical models and experimental measurements in a novel way. It defines a range of run lengths ....

R. Thekkath and S. Eggers. The Effectiveness of Multiple Hardware Contexts. In Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VI), pages 328--337, San Jose, CA, October 1994. ACM.


Limits On The Performance Benefits Of Multithreading And.. - Lim, Bianchini (1995)   (7 citations)  (Correct)

....to full processor utilization, provided enough threads are available, and enough network bandwidth exists. Previous experimental research in evaluating multithreading performance has shown that multithreading can be effective at tolerating memory and synchronization latencies for some applications [15, 30, 29]. Previous analytical research [2, 26, 24, 13] has focused on modeling processor utilization and predicting the number of contexts necessary to achieve a reasonably high level of processor utilization. In contrast, this paper combines analytical models and experimental measurements and applies the ....

Radhika Thekkath and Susan Eggers. The Effectiveness of Multiple Hardware Contexts. In Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VI), pages 328--337, San Jose, CA, October 1994. ACM.


Data and Workload Distribution in a Multithreaded.. - Sohn, Sato, Yoo, Gaudiot (1996)   (3 citations)  (Correct)

....be effective in tolerating the latency caused on cache misses for sharedmemory applications such as MP3D. Simulated results on the effectiveness of multiple hardware contexts indicated that multithreading is effective for programs which are optimized for data locality by programmers or compilers [23]. The study based on simulated multithreading further indicated that multiple hardware contexts have limited effects on unoptimized programs. Our preliminary study, however, indicated that simple minded data distribution can give performance comparable to that of the best performing algorithms ....

R. Thekkath and S. Eggers, "The Effectiveness of Multiple Hardware Contexts," in Proceedings of ACM International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, California, October 1994, pp.328-337.


Measurement and Modeling of EARTH-MANNA Multithreaded.. - Nemawarkary, Gao (1995)   (Correct)

.... 571.2 787.2 65.5 a 8 182.7 39.3 678.0 1016 72.3 16 279.6 45.6 874.6 1035 72.3 Table 1: An Example of Workload Optimization 7 Related Work Performance evaluation studies on multithreaded architectures can be classified as: analytical models [21, 5, 3, 14, 18, 1, 25] and trace driven simulations [24, 22]. Queuing Network and Petri Net Models: Petri Net models of multithreaded processors are proposed by Saavedra et al. [21] and Alkalaj and Bopanna [5] They use state space analyses to derive the system performance, so modeling a large system and the contentions at subsystems, is computationally ....

....[4] Simulations: Weber and Gupta [24] performed trace driven simulations with strategies for switching contexts, constant context switching times, and constant shared bus latencies. Their considered a workload with multiple copies of a program trace as multiple threads. Thekkath and Eggers [22] extended a similar approach using an analytical model [2] for the network performance. Waldspurger and Weihl [23] report the results of simulations on a single node of multiprocessor system. They also assume that the network is lightly loaded (no contentions) In this paper, we expanded the set ....

R. Thekkath and S. Eggers. The effectiveness of multiple hardware contexts. In Proc. of the 6th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems, pages 328--337. ACM, Oct. 1994.


Latency Tolerance: A Metric for Performance Analysis of.. - Nemawarkar, Gao   (Correct)

....a memory access or a synchronization) A multithreaded processor performs the computation in a thread, issues a request for a long latency operation, and then switches to another thread. By context switching to other threads (rather than idling) the processor achieves a high processor utilization [3, 27]. A side effect of multiple outstanding requests is to increase the contention at the memory and the interconnection network (IN) which further increases the memory and network latencies. An informal notion of latency tolerance is that if the processor utilization is high due to multithreading, ....

....the performance bottlenecks in the system. Analytical performance evaluation studies by Agarwal [3] and Saavedara Barrera et al. 23] modeled a multithreaded processor in a cache based multiprocessor system. Willick [31] Johnson [15] and Adve [1] modeled a closed system. Weber [30] and Thekkath [27] simulated bus and network based multithreaded systems. Most of these studies focus on processor performance and report that 4 to 5 threads per processor yield the most performance gains. Johnson [15] and Thekkath [27] also discussed the optimizations of program workload parameters to achieve ....

[Article contains additional citation context not shown here]

R. Thekkath and S. Eggers. The effectiveness of multiple hardware contexts. In Proc. of the 6th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems, pages 328--337. ACM, Oct. 1994.


Performance Characterization of A Multithreaded Architecture.. - Fi Ts   (Correct)

....of computation and communication overlap on the performance of processor, memory and network subsystems has not been systematically studied. So far, performance evaluation studies of multithreaded architectures have focused on how the design trade offs affect the performance on benchmark programs [5, 22, 21, 13]. The performance gains on a benchmark program is a combined effect of all architectural and program workload optimizations, and to optimize the performance, the number of parameters (which may need to be considered) is too large. For traditional multiprocessor systems, studies like Woo et al. ....

....performance of a program workload on a multithreaded multiprocessor system like the EARTH. Many studies based on simulation and system measurements report the performance of multithreaded architectures using benchmark program workloads, e.g. Agarwal et al. [1] Boothe [5] Hum et al. [11] Thekkath [21], and Weber [23] Their approach shows the combined effectiveness of multithreading along with various optimizations of program workload and architectures. However, the impact of individual parameters on the performance is not known, e.g. it is not clear how many parallel threads were used for ....

R. Thekkath and S. Eggers. The effectiveness of multiple hardware contexts. In Proc. of the 6th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems, pages 328--337. ACM, Oct. 1994.


When does Dedicated Protocol Processing Make Sense? - Babak Falsafi (1996)   (4 citations)  (Correct)

....as long as the switch time is lower than the round trip time of a miss. Adding processors similarly allows for higher number of simultaneous requests, increasing the request rate. Figure 5 indicates that both multithreading and multiprocessing can utilize the available bandwidth effectively [27]. That is, the request bandwidth is not limited by the switch time. Saturation for both Floating and Fixed occurs quite rapidly as we increase the number of threads and or processors; this follows from the relatively high protocol processing overheads of our remote miss handlers. Fixed is 30 ....

Radhika Thekkath and Susan J. Eggers. The Effectiveness of Multiple Hardware Contexts. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 328--337, San Jose, California, 1994.


Limits on the Performance Benefits of Multithreading and.. - Lim, Bianchini (1996)   (7 citations)  (Correct)

....Tera does not have caches, and requires a large number of concurrent threads and high network bandwidth to achieve high processor utilization. Previous experimental research on multithreading performance shows that multithreading is effective at tolerating memory latencies for some applications [13, 26]. Previous analytical research [1, 23, 21, 11] focuses on modeling processor utilization and predicting the number of contexts needed for good processor utilization. In contrast, this paper combines analytical models and experimental measurements in a novel way. It defines a range of runlengths ....

R. Thekkath and S. Eggers. The Effectiveness of Multiple Hardware Contexts. In Proceedings of the 6th International Conference on Architectural Support for Programming Languagesand Operating Systems, pages 328--337, San Jose, CA, October 1994. ACM.


Converting Thread-Level Parallelism to.. - Lo, Eggers, Emer, .. (1997)   (43 citations)  Self-citation (Eggers)   (Correct)

....Section 4, large speedups can be attained. In addition, our study somewhat overstates the amount of inter thread interference, because we have not applied compiler optimizations (such as cache tiling [10] 22] to minimize interference by reducing the size of the working sets. Thekkath and Eggers [24] found that for traditional multithreaded architectures, programmer or compiler based locality optimizations can significantly reduce inter thread interference. We believe that the same should hold for simultaneous multithreading, and this is an area of further research. Figure 7: Categorization ....

....been the subject of several studies. Yamamoto, et al. and Gulati and Bagherzadeh both found that the cache miss rates in simultaneous multithreading processors increased when more threads were used. Neither quantified the direct effect from inter thread interference, however. Thekkath and Eggers [24] examined the effectiveness of multiple contexts on conventional, coarse grained multithreaded architectures. They found that cache interference between threads varied depending on the benchmark. For localityoptimized programs, the total number of misses remained fairly constant as the number of ....

R.Thekkath and S. Eggers. The effectiveness of multiple hardware contexts. In Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 328--337, October 1994.


Simultaneous Multithreading: Maximizing On-Chip Parallelism - Tullsen, Eggers, Levy (1995)   (210 citations)  Self-citation (Eggers)   (Correct)

....(from 1.5 IPC to 3.2) The graph shows that there is little advantage to adding more than four threads in this model. In fact, with four threads, the vertical waste has been reduced to less than 3 , which bounds any further gains beyond that point. This result is similar to previous studies [2, 1, 19, 14, 33, 31] for both coarse grain and fine grain multithreading on single issue processors, which have concluded that multithreading is only beneficial for 2 to 5 threads. These limitations do not apply to simultaneous multithreading, however, because of its ability to exploit horizontal waste. Figures ....

....Competition for non execution resources, then, plays nearly as significant a role in this performance region as the competition for execution resources. Others have observed that caches are more strained by a multithreaded workload than a single thread workload, due to a decrease in locality [21, 33, 1, 31]. Our data (not shown) pinpoints the ex Number of Threads 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Instructions Issued Per Cycle (a) Fine Grain Multithreading Instructions Issued Per Cycle (c) SM: Full Simultaneous Issue Instructions Issued Per Cycle (b) SM: Single Issue Per Thread SM:Full ....

[Article contains additional citation context not shown here]

R. Thekkath and S.J. Eggers. The effectiveness of multiple hardware contexts. In Sixth International Conference on Architectural Support for Programming Languagesand Operating Systems, pages 328--337, October 1994.


Balanced Multithreading: Increasing Throughput via a.. - Tune, Kumar, Tullsen, .. (2004)   (Correct)

No context found.

R. Thekkath and S. Eggers. The effectiveness of multiple hardware contexts. In Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, 1994.


Mambo - A Full System Simulator for the PowerPC.. - Bohrer, Elnozahy.. (2004)   (1 citation)  (Correct)

No context found.

R. Thekkath and S. Eggers. The effectiveness of multiple hardware contexts. In International Conference on Architectural Support for Programming Languages and Operating Systems, 1994.


Exploiting Thread-Level Parallelism On . . . - Lo (1998)   (Correct)

No context found.

R. Thekkath and S. J. Eggers. The effectiveness of multiple hardware contexts. In Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 328--337, October 1994.


Hardware and Software Mechanisms for Multithreading in.. - Bradford (2001)   (Correct)

No context found.

R. Thekkath and S.J. Eggers. The effectiveness of multiple hardware contexts. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, 1994.


Execution and Cache Performance of the Scheduled Dataflow.. - Kavi, Arul, Giorgi (2000)   (Correct)

No context found.

R. Thekkath and S.J. Eggers: "The effectiveness of multiple hardware contexts," Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, pp.328--337, October 1994.


Execution Performance of the Scheduled Dataflow Architecture - Kavi   (Correct)

No context found.

R. Thekkath and S.J. Eggers. The effectiveness of multiple hardware contexts," Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, pp.328--337, October 1994.


Execution and Cache Performance of a Decoupled Non-Blocking .. - Kavi, Giorgi, Arul (2000)   (Correct)

No context found.

R. Thekkath and S.J. Eggers, The effectiveness of multiple hardware contexts," Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, pp.328--337, October 1994.


Multithreaded Systems - Kavi, Lee, Hurson   (Correct)

No context found.

Thekkath, R., and Eggers, S. J., "The Effectiveness of Multiple Hardware Contexts," Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, 1994, pp 328-337.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC