| Hum HHJ et al. A design study of the EARTH multiprocessor. Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), Limassol, Cyprus, 1995. IEEE Computer Society Press: Los Alamitos, CA, 1995. |
....of the MAP provide fast single thread execution as well as latency tolerance for better local memory bandwidth utilization. Furthermore, none of the multithreaded machines have multiple clusters for exploiting wide instruction level parallelism. Various machines optimized for dataflow languages [24, 16, 28] provide hardware support for fine grained synchronization between threads (usually via memory synchronization bits) but they do not exploit instruction level parallelism, nor do they provide low cost register based synchronization between threads. The X1MD architecture [33] uses multiple ALUs to ....
HUM, H. H., ET AL. A design study of the EARTH multiprocessor. In International Conference on Parallel Architectures and Compilation Techniques (1995), pp. 59-68.
....83] Our current architecture specifically addresses the third limitation. Some researchers have proposed designs in which the dataflow scheduling is applied only at thread level (i.e. macro dataflow) while each thread is comprised of conventional control flow instructions [Govindarajan 95] Hum 95] Sakai 93] In such hybrid dataflow control flow systems, the instructions within a thread do not retain functional properties, and hence, introduce Write After Write (WAW) and Write After Read (WAR) dependencies. This in turn requires complex hardware to perform dynamic instruction scheduling. ....
....execution of dataflow program utilized in previous architectures required two cycles per (dyadic) instructions. By scheduling dataflow instructions (akin to control flow execution) results in one cycle per instruction. Using analytical models we compared SDF with hybrid architectures (e.g. EARTH [Hum 95] that use two processors: one (Execution Processor) for executing instructions of a thread and a second processor (Synchronization Processor) to perform thread synchronizations and scheduling of threads. SDF outperformed hybrid architectures both because of the decoupling of memory accesses (not ....
H.H.-J. Hum, ET. al., "A Design Study of the EARTH Multiprocessor," Proceedings of the Condrence on Parallel Architectures and Compilation Techniques ACT), Limassol, Cyprus, June 1995, pp. 59-68.
....we propose a new architecture that addresses the third limitation. The literature has addressed several designs in which the dataflow scheduling was applied only at thread level (i.e. macro dataflow) where each thread was comprised of conventional control flow instructions [Govindarajan 95] Hum 95] Sakai 93] In such systems, the instructions within a thread do not retain functional properties, and hence, introduce write after write (WAW) and write after read (WAR) dependencies. Consequently, deviation from dataflow properties at instruction level requires complex hardware. In our ....
H.H.-J. Hum, et. al., "A Design Study of the EARTH Multiprocessor," Proceedings of the Conference on Parallel Architectures and Compilation Techniques (PACT), Limassol, Cyprus, June 1995, pp. 59-68.
....such as result forwarding (where the results of an instruction are directly supplied to a dependent instruction) cannot be incorporated into the ETS pipeline. 1. 3 Hybrid Architectures Hybrid dataflow control flow organizations have been proposed by several researchers ( Govindarajan 95] Hum 95] Sakai 93] In most of these systems, coarse grained threads represent macro dataflow nodes while each thread includes conventional load store instruc tions. In one such proposed system, EARTH [Hum 95] two processors are used for execution of macro dataflow threads. One processor, Execution ....
....flow organizations have been proposed by several researchers ( Govindarajan 95] Hum 95] Sakai 93] In most of these systems, coarse grained threads represent macro dataflow nodes while each thread includes conventional load store instruc tions. In one such proposed system, EARTH [Hum 95] two processors are used for execution of macro dataflow threads. One processor, Execution Unit (EU) behaves like a traditional RISC processor executing instructions belonging to a thread. The second processor, Synchro nization Unit (SU) is responsible for scheduling of threads on EU, remote ....
[Article contains additional citation context not shown here]
H.H.-J. Hum, et. al., "A Design Study of the EARTH Multiprocessor," Proceed- ings of the Conference on Parallel Architectures and Compilation Techniques, Limassol, Cyprus, June 1995, pp. 59-68.
....current architecture specifically addresses the third limitation. Other researchers have proposed hybrid designs in which the dataflow scheduling is applied only at thread level (i.e. macro dataflow) while each thread is comprised of conventional control flow instructions ( Govindarajan 95] Hum 95] Sakai 93] In such systems, the instructions within a thread do not retain functional properties, and hence, introduce Write After Write (WAW) and Write After Read (WAR) dependencies. This in turn requires complex hardware to perform dynamic instruction scheduling. In our system, the ....
H.H.-J. Hum et al. "A Design Study of the EARTH Multiprocessor," Proceedings of the Conference on Parallel Architectures and Compilation Techniques (PACT), Limassol, Cyprus, June 1995, pp. 59-68.
....83] Here we present a new architecture that addresses the third limitation. Some researchers have proposed designs in which the dataflow scheduling is applied only at thread level (i.e. macro dataflow) while each thread is comprised of conventional control flow instructions [Govindarajan 95] Hum 95] Sakai 93] In such systems, the instructions within a thread do not retain functional properties, and hence, introduce Write After Write (WAW) and Write After Read (WAR) dependencies. This in turn requires complex hardware to perform dynamic instruction scheduling. In our system, the ....
H.H.-J. Hum, ET. al., "A Design Study of the EARTH Multiprocessor," Proceedings of the Conference on Parallel Architectures and Compilation Techniques (PACT), Limassol, Cyprus, June 1995, pp. 59-68.
....techniques such as result forwarding (where the results of an instruction are directly supplied to a dependent instruction) cannot be incorporated into the ETS pipeline. 1. 3 Hybrid Architectures Hybrid dataflow control flow organizations have been proposed by several researchers ( 6] [10], 8] In most of these systems, coarse grained threads represent macro dataflow nodes while each thread includes conventional load store instructions. In one such proposed system, EARTH[10] two processors are used for the execution of macro dataflow threads. One processor, Execution Unit (EU) ....
....Architectures Hybrid dataflow control flow organizations have been proposed by several researchers ( 6] 10] 8] In most of these systems, coarse grained threads represent macro dataflow nodes while each thread includes conventional load store instructions. In one such proposed system, EARTH[10], two processors are used for the execution of macro dataflow threads. One processor, Execution Unit (EU) behaves like a traditional RISC processor executing instruc tions belonging to a thread. The second processor, Synchronization Unit (SU) is responsible for scheduling of threads on EU, remote ....
[Article contains additional citation context not shown here]
H. H.-J. Hum, O. Maquelin, K. B. Theobald, X. Tian, X. Tang, G. R. Gao, P. Cupryk, N. Elmasri, L. J. Hendren, A. Jimenez, S. Krishnan, A. Marquez, S. Merali, S.S. Ne- mawarkar, P. Panangaden, X. Xue, 8z Y. Zhu, A design study of the EARTH multipro- cessor, Proc. of the Conference on Parallel Architectures and Compilation Techniques, 23 Limassol, Cyprus, 1995, 59-68.
....requires extensive experimental evaluation. For example, the thread firing rule (which determines when threads are enabled) can be based on either a blocking or a non blocking strategy. The blocking strategy is adopted in Iannucci s Hybrid Architecture [18] the Tera MTA [2] and the EARTH machine [17]. The non blocking strategy is adopted in Monsoon [26, 27] T [23] and the EM 4 [31] among others.The Threaded Abstract Machine (TAM) 11] is a software implemented multithreaded execution that has been ported to a number of platforms (such as the TMC CM 5 and the Cray T3D) it implements the ....
H. Hum, O. Macquelin, K. Theobald, X. Tian, G. Gao, P. Cupryk, N. Elmassri, L. Hendren, A. Jimenez, S. Krishnan, A. Marquez, S. Merali, S. Nemawarkar,P. Panangaden, X. Xue, and Y. Zhu. A design study of the EARTH multiprocessor. In Parallel Architectures and Compilation Techniques, 1995.
....whenever a long latency operation is encountered. Moreover, the benefits of multithreading are not limited to loop based algorithms but apply also to irregular parallelism. EARTH (Efficient Architecture for Running Threads) is a parallel multithreaded environment developed at McGill University [1]. The EARTH programming model has been implemented on several existing, conventional multiprocessors such as MANNA (developed at GMDFIRST, Germany) IBM SP 2, and networks of workstations connected with either Myrinet or TCP IP. Multithreaded programming support can be provided in two ways. One ....
Herbert H.J. Hum et al., "A Design Study of the EARTH Multiprocessor," in Proc. of PACT'95, Limmassol, Cyprus, June 1995, ACM Press.
....map provide fast single thread execution as well as latency tolerance for better local memory bandwidth utilization. Furthermore, none of the previous multithreaded machines have multiple clusters for exploiting wide instruction level parallelism. Various machines optimized for dataflow languages [24, 16, 28] provide hardware support for fine grained synchronization between threads (usually via memory synchronization bits) but they do not exploit instruction level parallelism, nor do they provide low cost register based synchronization between threads. The XIMD architecture [33] uses multiple ALUs ....
Hum, H. H., et al. A design study of the EARTH multiprocessor. In International Conference on Parallel Architectures and Compilation Techniques (1995), pp. 59--68.
....(like PL PS) to alleviate memory latencies to further exploit multithreading. There have been several hybrid architectures proposed where the dataflow scheduling was applied only at thread level (i.e. macro dataflow) with conventional control flow instructions comprising threads (e.g. 5] [7], 14] In such systems, the instructions within a thread do not retain functional properties, and introduce side effects, WAW and WAR dependencies. Lacking dataflow properties at instruction level requires complex hardware for the detection of data dependencies and dynamic scheduling of ....
....execution of instructions is not new. We have described three examples of decoupled architectures in section 2. There are other systems where separate hardware units have been proposed to handle the synchronization among threads in multithreaded architectures (e.g. Alewife[1] StartTNG [3] EARTH[7]) We follow this tradition and propose two hardware units for the Scheduled Dataflow. One of the hardware units (EP) will be similar to conventional RISC Pipelines as described previously. The other hardware unit (SP) is responsible for accessing memory to load the initial operands of enabled ....
H.H.-J. Hum, et. al., "A design study of the EARTH multiprocessor, " Proc. of the Conference on Parallel Architectures and Compilation Techniques, Limassol, Cyprus, June 1995, pp. 59--68.
....and communication to tolerate communication latency. In addition, multithreading provides a natural way to implement nonblocking communication operations, active messages and remote memory copy. Multithreading can be supported by software or even hardware (e.g. MTA [3] Alewife [2] EARTH [23], START [24] Although it has been popular in singleprocessor and shared memory processor systems, multithreading on distributed systems encounters more difficulties mainly because of the high latency and low bandwidth communication links underlying distributed memory computers. In this paper, we ....
Herbert H.J. Hum, Olivier Maquelin, Kevin B. Theobald and fourteen others. A design study of the EARTH multiprocessor. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques, pages 59--68, June 1995.
....techniques such as result forwarding (where the results of an instruction are directly supplied to a dependent instruction) cannot be incorporated into the ETS pipeline. 1. 3 Hybrid Architectures Hybrid dataflow control flow organizations have been proposed by several researchers ( 6] [8], 15] In most of these systems, coarse grained threads represent macro dataflow nodes while each thread includes conventional load store instructions. In one such proposed system, EARTH[8] two processors are used for the execution of macro dataflow threads. One processor, Execution Unit (EU) ....
....Architectures Hybrid dataflow control flow organizations have been proposed by several researchers ( 6] 8] 15] In most of these systems, coarse grained threads represent macro dataflow nodes while each thread includes conventional load store instructions. In one such proposed system, EARTH[8], two processors are used for the execution of macro dataflow threads. One processor, Execution Unit (EU) behaves like a traditional RISC processor executing instructions belonging to a thread. The second processor, Synchronization Unit (SU) is responsible for scheduling of threads on EU, remote ....
[Article contains additional citation context not shown here]
H.H.-J. Hum, et. al., "A Design Study of the EARTH Multiprocessor," Proceedings of the Conference on Parallel Architectures and Compilation Techniques, Limassol, Cyprus, June 1995, pp. 59--68.
.... hiding mechanism, and keep the processing units usefully busy How can the system automatically balance the computation load across many processing nodes How much latency can the system tolerate without significant performance penalty The Efficient Architecture for Running Threads (EARTH) [8, 15] is an architecture and program execution environment that defines a fine grain multithreading model. Multi threading programs for EARTH are written in Threaded C, an explicitly multithreaded extension of the C language. In this paper, we demonstrate that the addition of both a single assignment ....
....the references to elements of the I structure through a blocking mechanism. Instead of requesting a single element of the structure, an entire block of data including the requested element is requested to the node that hosts the I structure. We implement ISSC in the Threaded C language for EARTH [8, 15] systems. The layout of our software cache is the one of a set associative cache. Setassociative software caches have faster cache entry searching time than fully associative caches and better cache utilization than direct mapped caches. The caching address consists of the node number of the host ....
H. H.J. Hum, O. Maquelin, K. B. Theobald, X. Tian, X. Tang, G. Gao, P. Cupryk, N. Elmasri, L. J. Hendren, A. Jimenez, S. Krishnan, A. Marquez, S. Merali, S. S. Nemawarkar, P. Panangaden, X. Xue, and Y. Zhu. A Design Study of the EARTH Multiprocessor. In PACT 95, June 1995.
....By switching to the execution of other ready threads, the communication latency can be hidden from useful computations as long as there is enough parallelism in an application. Split phased transaction [7, 16] schemes have been used in some multithreaded models, like TAM [4] PRISC [13] and EARTH [5], to achieve communication latency tolerance. By splitting the remote memory access into two phases, requesting and consuming, the This research is supported in part by NSF grant # MIP9707125 and INT 9815742 processor can continue executing other useful computations without waiting for the ....
....hit 479 ISSC miss 2693 ISSC deferred 1354 Table 1: Latency of EARTH and ISSC operations on EARTH MANNA SPN, measured in number of cycles (1 cycle = 20 ns) 2.2 ISSC implementation on EARTHMANNA We implemented ISSC [20] in the Threaded C [8] language for EARTH systems. The EARTH [17, 5], Efficient Architecture for Running Threads, is an architecture and program execution environment that defines a fine grain non blocking multi threading model. Our studies are based on an implementation of EARTH on the MANNA machine. MANNA [19] is a 20 node, 40 processor machine. Each node has ....
H. H.J. Hum, O. Maquelin, K. B. Theobald, X. Tian, X. Tang, G. Gao, P. Cupryk, N. Elmasri, L. J. Hendren, A. Jimenez, S. Krishnan, A. Marquez, S. Merali, S. S. Nemawarkar, P. Panangaden, X. Xue, and Y. Zhu. A Design Study of the EARTH Multiprocessor. In PACT 95, June 1995.
....multithreaded execution environment. This paper describes the design, implementation and performance evaluation of nine dynamic load balancing algorithms running on the EARTHSP multithreaded multiprocessor testbed a portable implementation of the EARTH multithreaded program execution model [4] on the IBM SP2 multiprocessor system. In the course of this study we developed a set of generic test cases, which we call stress tests, that measure the performance of the different dynamic load balancing algorithms for specific workload patterns. Based on the experimental results from the ....
....load balancing, distributed memory 1 Introduction Multithreaded execution models [2, 10] have been proposed to address the overheads inherent in multiprocessor systems, such as network and synchronization latencies. The EARTH (Efficient Architecture for Running THreads) architecture model [4] is designed for the efficient parallel execution of both numerical and non numerical programs, This paper represents views of the author rather than IBM, Toronto y School of Computer Science, McGill University, Montreal, Canada z Dept. of EE and CIS, University of Delaware, Newark, USA by ....
[Article contains additional citation context not shown here]
Herbert H. J. Hum, Olivier Maquelin, Kevin B. Theobald, Xinmin Tian, Xinan Tang, Guang R. Gao, et.al., A Design Study of the EARTH Multiprocessor, Proceedings of the IFIP WG 10.3 Working Conference on Parallel Architectures and Compilation Techniques, PACT '95, Limassol, Cyprus, pp. 59-68, ACM Press, June 1995.
No context found.
Hum HHJ et al. A design study of the EARTH multiprocessor. Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), Limassol, Cyprus, 1995. IEEE Computer Society Press: Los Alamitos, CA, 1995.
No context found.
H. H. J. Hum, O. Maquelin, K. B. Theobald, X. Tian, X. Tang, and G. R. G. et al., "A design study of the EARTH multiprocessor," Proceedings of the Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 59-- 68, 1995.
No context found.
H.H.J. Hum et al., "A Design Study of the Earth Multiprocessor," Int'l. Conf. Paral. Arch. Compil. Techn., 1995, pp. 59-68.
No context found.
H.H.J. Hum, et al., "A Design Study of the Earth Multiprocessor," Int'l. Conf. Paral. Arch. Compil. Techn., 1995, pp. 59-68.
No context found.
H. Hum, O. Maquelin, K. Theobald, X. Tian, X. Tang, G. Gao, et., "A Design Study of the EARTH Multiprocessor," in Proc. of the IFIP WG 10.3, PACT'95, pp. 59-68, ACM Press, June 1995.
No context found.
H. Hum, O. Macquelin, K. Theobald, X. Tian, G. Gao, P. Cupryk, N. Elmassri, L. Hendren, A. Jimenez, S. Krishnan, A. Marquez, S. Merali, S. Nemawarkar, P. Panangaden, X. Xue, and Y. Zhu. A design study of the EARTH multiprocessor. In Proc. Int. Conf. on Parallel Architectures and Compilation Techniques, 1995.
No context found.
Herbert H.J. Hum et al. A design study of the EARTH multiprocessor. In Lubomir Bic, Wim B#hm, Paraskevas Evripidou, and Jean-Luc Gaudiot, editors, Proceedings of the IFIP WG 10.3 Working Conference on Parallel Architectures and Compilation Techniques, PACT '95, pages 5968. ACM Press, 1995.
No context found.
H. Hum, O. Macquelin, K. Theobald, X. Tian, G. Gao, P. Cupryk, N. Elmassri, L. Hendren, A. Jimenez, S. Krishnan, A. Marquez, S. Merali, S. Nemawarkar, P. Panangaden, X. Xue, and Y. Zhu. A design study of the EARTH multiprocessor. In Proc. Int. Conf. on Parallel Architectures and Compilation Techniques, 1995.
No context found.
H. Hum, O. Macquelin, K. Theobald, X. Tian, G. Gao, P. Cupryk, N. Elmassri, L. Hendren, A. Jimenez, S. Krishnan, A. Marquez, S. Merali, S. Nemawarkar, P.Panangaden, X. Xue, and Y. Zhu. A design study of the EARTH multiprocessor. In Parallel Architectures and Compilation Techniques, 1995.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC