| Thinking Machine Corp. CMMD Reference Manual Version 3.0. May 1993. |
....throughput. The design reduces cycles spent in context switching as each thread uses a smaller amount of execution context. The cycles which may be wasted otherwise are productively used to process messages from different programs. Network throughput increases because, unlike gang scheduling [123], context switch occurs between lightweight threads, and thus, there is no possibility of packet loss during context switch and no need to flush the network. Finally, the design eases the operating system s burden for fair scheduling in an environment where processors may have the dual ....
Thinking Machine Corporation. CMMD Reference Manual Version 3.0, May 1993.
....for low latency communication. Each 20 byte active message packet can carry up to 16 bytes of payload. Sending and receiving a single packet active message on the CM 5 takes 1.6 s and 1. 7 s respectively [4] We used the CMMD message passing library and CMAML (the CMMD active messages layer) [12]. Two other implementations of active messages on the CM 5 exist: the original CMAM library [4] from UC Berkeley and the Strata library from MIT [2] The time taken to send a message from one node on the CM 5 to another can be modeled as O( M ) where is the overhead, is the transfer rate ....
Thinking Machines Corporation. CMMD Reference Manual Version 3.0, October 1991.
....user. The CMMD library may be used from a variety of familiar programming languages (e.g. C, C , and f77) and, like the iPSC, provides the user with an independent thread of control for each processing node. CMMD I O is also layered on top of SFS and, like CFS, provides a variety of I O modes [39, 23]. CMMD s local independent mode, like mode 0 in CFS, gives each process its own view of the file, and allows each process to make arbitrary requests to the file. In global independent mode each process has a private file pointer, but all other state is shared. For example, if one process performs ....
Thinking Machines Corporation, CMMD Reference Manual Version 3.0, May 1993.
....for low latency communication. Each 20 byte active message packet can carry up to 16 bytes of payload. Sending and receiving a single packet active message on the CM 5 takes 1.6 s and 1. 7 s respectively [6] We used the CMMD message passing library and CMAML (the CMMD active messages layer) [14]. Two other implementations of active messages on the CM 5 exist: the original CMAM library [6] from UC Berkeley and the Strata library from MIT [3] 4.2 Modeling the CM 5 ffl Sending a Message The time taken to send a message from one node on the CM 5 to another can be modeled as O( M ) ....
Thinking Machines Corporation. CMMD Reference Manual Version 3.0, October 1991.
.... employs compiler controlled stack based scheduling for local messages inspired by [10, 29] Because our runtime system provides a flexible interface to the compiler, we were able to implement an efficient stack based scheduling mechanism [22] We have implemented our runtime system on the CM 5 [32, 31]. Primitive operations, such as object creation and message send, have been carefully designed and optimized while maintaining the semantics of the language. The runtime system exposes part of its scheduling mechanism to the compiler so that the compiler can exploit frequently occurring special ....
.... 1 Columns Seq, BP and CP were implemented using a broadcast mechanism based on a hypercube like minimum spanning tree communication structure, built on top of the CM 5 Active Message (CMAM) layer [35] except for column Bcast which was implemented using vendor provided CMMD broadcast primitive [32]. KERNEL message create message PROCESSING ELEMENTS FRONT END U S E R S PARTITION MANAGER ( Figure 1: The architecture of the Hal runtime kernel 3 The Runtime System Architecture The runtime system is currently running on the CM 5 and on networks of workstations. This paper describes the ....
Thinking Machine Corporation. CMMD Reference Manual Version 3.0, May 1993.
....it adds a one with a call to CMMD set global or. If the status of a processor changes, another call is made to CMMD set global or, and a zero is added to value of the global or. This continues until all processors complete the program section and the value returned by CMMD get global or is a one [Thi93] The CMMD functions CMMD synch with nodes start and CMMD synch with nodes stop are also used in tandem to ensure that the processors are synchronized while completing a section of the deer component. These two functions are used in the main function of the deer code to provide synchronization ....
....is equal to the the total number of sends, P 0 sends a message to indicate that all processors may move to the next section. This method provides the same synchronizations as performed in the PSIMPDEL model with CMMD commands. However, the CMMD synchronizations are performed in hardware, Thi93] and are thus much more efficient than the MPI method which may suffer from high idle times. Chapter 4 Verification and Performance Results The outputs of the vegetation, hydrology, and deer components of the DSIMPDEL model were verified by comparing three selected outputs with values produced ....
Thinking Machines Corporation, Cambridge, Massachusetts. CMMD Reference Manual Version 3.0, May 1993. Appendices
....transmission) and software overlapping of message preparation and receipt. 3. Exchange This test measures the speed of exchanging data values between two nodes. This test is identical to the send reply test except separate send and receive calls were replaced by a single CMMD send and receive call [13]. 4. Virtual Channel This test measures the performance of the CMMD read and write channel calls; it is identical to the simple send test except that the blocking send receive calls were replaced by read write channel calls. Intuitively, a channel is a connection oriented communication protocol, ....
....than that of simple send and send reply. This can be attributed to savings from software overlapping of message preparation and receipt. Table 1 also shows that the latencies for active messages are lower than those for virtual channels. Because active messages are never buffered on arrival [13], the overhead required to do buffering can be eliminated. The average latency for the active message request and reply primitives are comparable to the latencies observed in the Berkeley implementation of active messages [14] 3 Table 2 contains one seeming anomaly: the normalized bandwidth for ....
Thinking Machines Corporation. CMMD Reference Manual Version 3.0, May 1993.
....solver stage. There is very little performance penalty for using separate program units for the filling and factor solve because the DataVault or the SDA has very high data rates. As an illustration, we have run some exercises to demonstrate our claim. There are time data from both CMMD 2 timer [74] and the CM Fortran timer [75, 75] listed in Tables 4.1 and 4.2. In Table 4.1, the elapsed time for writing the moment matrix to a SDA file is measured by the CMMD timer. The writing operation is executed by the CMMD global write under CMMD synchronous sequential mode [74, 77] The CM Fortran ....
....from both CMMD 2 timer [74] and the CM Fortran timer [75, 75] listed in Tables 4.1 and 4.2. In Table 4.1, the elapsed time for writing the moment matrix to a SDA file is measured by the CMMD timer. The writing operation is executed by the CMMD global write under CMMD synchronous sequential mode [74, 77]. The CM Fortran Utility library [75] provides a SO mode which is compatible with almost all CM systems. One can see that the extra effort in using a high performance storage device to utilize both the message passing paradigm and the data parallel paradigm is justified. For this moderate sized ....
[Article contains additional citation context not shown here]
Thinking Machines Corporation, CMMD Reference Manual Version 3.0, May 1993.
....programming languages like C, C , and f77. Under CMMD the user sees multiple threads of control, one for each PN. CMMD I O (again layered on top of SFS) provides a variety of I O modes in some, action is taken by a single PN; in others, all PN s co operatively perform a parallel I O [Thi93c, BGST93] 3 Tracing Methodology The CM 5 at the National Center for Supercomputing Applications (NCSA)was chosen as our target machine because this is one of the most widely used CM 5 machines in the United States. The user population is distributed all across the nation and there are ....
Thinking Machines Corporation. CMMD Reference Manual Version 3.0, May 1993.
No context found.
Thinking Machine Corp. CMMD Reference Manual Version 3.0. May 1993.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC