| Thinking Machines Corporation. CMMD Reference Manual, December 1992. 17 |
....If we assume that the data section of a message, fi, is four 4 byte words, and the header and trailer are an additional 4 bytes, then fl = 16 4 152 20 1:27s packet. To support these claims, we give the following empirical results. Using the message passing library of the CM 5, CMMD version 3. 0 [41], each processing node swaps a packet with another node of fixed distance away. Figure 3 confirms that there is a linear relationship between message length and total transmission time on the CM 5. We find that when the regular block permutation consists of only local communications, i.e. messages ....
Thinking Machines Corporation, Cambridge, MA. CMMD Reference Manual, 3.0 edition, May 1993.
....total bandwidth is approximately 11.8 Mbytes second [16] which is a limit imposed by the software overhead of handling messages. 41 My experiments were run on a 128 node CM 5 system that runs version 7.4.0 Final of the CMOST operating system and version 3. 3 of the CMMDmessage passing library [93]. Each CM 5 node contains a SPARC v7 processor that runs at 32 MHz, and 32 Mbytes of physical memory. Each node has a direct mapped, 64K unified cache [94] The CM 5 network has several properties that are unusual for multiprocessor networks. First, the interconnection network provides multiple ....
....costs a great deal more. The CMAML library provides a bulk data movement facility called scopy;ituses hard wired active messages to avoid sending a handler address in data packets. The compiler we used on the CM 5 is gcc 2.6.3, with the optimization level set at O2. In CMMD terminology [93], the programs are node programs; that is, there is no master program running on the host. The CM 5 was one of the platforms for the research on active messages [95] Active messages reduce the software overhead for message handling: messages are run as handlers, which consist of user code that ....
Thinking Machines Corporation, Cambridge, MA. CMMD Reference Manual, version 3.0 edition, May 1993.
....for network transport, in which case the sending node is blocked until the message can be injected into the network and the handler executes immediately upon the message arriving at the receiving node. We have implemented our algorithms on CM 5 using CMMD and CMAML (the CMMD active messages layer) [11]. 3. Scheduling Strategy We assume that each processor knows only its sending vector sendl. The scheduling algorithms can be classified into two groups: i) Algorithms that require the global n Theta n communication matrix COM at each processor. ii) Algorithms that require the receiving ....
Thinking Machines Corporation, Cambridge, MA. CMMD Reference Manual, version 3.0 edition, December 1992.
....al. 12] presents the parallel I O modes of the machine, while the article by Hillis and Tucker [49] provides a general overview and presents examples of current applications of the CM 5. For details the interested reader can also refer to the CM 5 technical summary [96] and the CMMD library guide [97]. 90 7.3 The suite of test problems To assess the relative performance of the three splittings we used MNETGEN [4] a derivative of NETGEN [57] to generate three hundred random multicommodity network problems (MC) one hundred with linear objectives and two hundred with quadratic objective ....
Thinking Machines Corporation, Cambridge, MA. CMMD reference manual, version 3.0, 1993.
....distributed computing opportunities that are currently available to the user community. 1. CM 5: CM 5 is a MIMD machine with processing nodes connected together using a fat tree. Each processing node has four vector units and a scalar unit. The vector units work in SIMD mode. We have used CMMD [TMC, 1993a] an explicit message passing language, and CM Fortran [TMC, 1993b] to implement the algorithms. 2. CM 200: CM 200 is a SIMD machine with the inter connection network forming a hypercube. Each node of the hypercube has a VLSI chip with 16 bit serial processing elements and routing hardware. Each ....
....to s subdomains in our partitioning scheme. One processor from each of the s processor groups, except the last one, is selected to compute the state variables on the corresponding subdomain interface. The implementation uses a message passing SPMD paradigm, using the CMMD messagepassing library [TMC, 1993a] to communicate between the processing nodes. Each processing node is viewed as an SPMD computer with four vector processors and nodal CM Fortran is used to program them. This gives us the flexibility and efficiency of the message passing style without sacrificing the programmability of a ....
Thinking Machines Corporation. 1993. CMMD Reference manual. Cambridge, MA.
....is parallelized by letting each node maintain its own copy of k while distributing n and the corresponding columns of the P matrix across the nodes. After every iteration, the nodes communicate their subresults, i.e. partial multiplier sums, to each other using built in combiner operators [39]. The experimental work is based on the CTI Siemens ECAT 921 PET scanner which generates 384 Theta 80 sinograms and 128 Theta 128 images for arbitrary offsets and zoom factors; the P matrix is designed accordingly. We use both simulated phantom and real scan data. The simulated phantom data ....
Thinking Machines Corporation, Cambridge, MA, CMMD Reference Manual, May
....of the same size (e.g. checkpoint files) The generic format allows a code to re read a file without the partition size restriction. Finally, the serial order format allows other codes and workstations to access the data. 2.2. 2 CMMD Communication Library The CMMD communication library [30] provides a set of message passing routines for programs written using the message passing programming model. The CMMD library supports data network functions such as synchronous messaging (both the source node and destination node block until the message has been delivered) asynchronous ....
....the address of the handler function on the receiving node; the rest of the packet contains the arguments to be passed to the function. Active message primitives are part of the CMMD active message layer (CMAML) a protocolless transport layer, upon which the higher level CMMD functions are built [30]. There are three tests in this set of active message benchmarks. The first test, active request, measures the latency to send a request active message using the CMAML request primitive; this primitive transmits data using the left data network interface. The second test, active reply, measures ....
[Article contains additional citation context not shown here]
Thinking Machines Corporation. CMMD Reference Manual, Version 3.0 Beta, Dec 1992. 74
....network for global operations. To keep our study architecturally general, we do not use the vector units, nor do we use the control network. 3. 2 Strata For efficient communication and accurate timing, we made extensive use of the Strata communications library [BB94] Strata is a CMMD compatible [Thi93b] package with high performance communication and extensive support for timing and debugging. Section 6 discusses the impact of using Strata s communication primitives. The standard CMMD timers on the CM 5 require kernel calls for each timing event, which cost hundreds of cycles. Strata, however, ....
Thinking Machines Corporation, Cambridge, MA. CMMD Reference Manual (Version 3.0), May 1993.
....the reference manual for the Strata communications package. Strata was originally developed for Thinking Machines s CM 5 [TMC92] and also runs on the PROTEUS simulator [BDCW92] 1 Many of the original goals derived from inadequacies in Thinking Machine s low level communication software, CMMD [TMC93]. Berkeley s CMAM [vECGS92] LC94] arose from similar considerations, but primarily focused on active messages. The current version of CMMD inherited much from CMAM; similarly, Thinking Machines is currently integrating many of the features of Strata into CMMD. Section 6.5 summarizes the ....
....TAM project ( threaded abstract machine ) which led to the active messages paper [vECGS92] and brought the technique to commercial supercomputers including the CM 5. Berkeley s active message layer for the CM 5, called CMAM, was integrated into CMMD 3. 0 by Thinking Machines to produce TMC s CMAML [TMC93]. Strata s active message layer was influenced by both CMAML and PROTEUS message passing code. The original motivation for providing a new active message library was Robert Blumofe s work on better deadlock free protocols [BBK94] and the importance of using both sides of the network. Guy ....
[Article contains additional citation context not shown here]
Thinking Machines Corporation. CMMD Reference Manual, Version 3.0, May 1993.
....services, current and likely future routing hardware features, and how a typical messaging layer bridges the two. 2.1 Communication Services Application programs expect messaging layers to provide a minimal set of communication services. Most messaging layers provide the following services [5, 24, 12]: Message Delivery Message Ordering Deadlock Overflow Safety Reliable Delivery First, the most basic communication service is message delivery. Second, messages between a particular sender and receiver should be delivered in order of transmission. There is some debate about ....
Thinking Machines Corporation, Cambridge, Massachusetts. CMMD Reference Manual, V3.0, May 1993.
....OR operation on a single bit from all nodes and then returns the result to each node. All nodes are assumed to participate in Control Network tasks unless they explicitly abstain (set an abstain flag) Nodes may change their abstain bits only when the network 9 is quiescent. 2. 4 CMMD CMMD[11, 12] is the CM 5 message passing library designed for node level interprocessor communications. It supports the host node programming model. In this model, the initiator program, or host code, runs on the partition manager (the front end) During initial setup, the partition manager broadcasts the ....
Thinking Machine Corp. The CMMD Reference Manual, 1.1 edition, January 1992.
....with the computations. 1.2 Objective of the Thesis In this thesis the possibilities to communicate efficiently on a CM 5 parallel computer are investigated. This is done by evaluating the message passing features of the CM Message passing Library (CMMD) supplied by Thinking Machines Corporation [12]. Especially the possibilities to overlap communication computation are evaluated. Then a ANN algorithm, the Real Time Recurrent Learning (RTRL) algorithm, is implemented on the CM 5. This implementation is written in the C language and the CMMD features are used for message passing among the ....
Thinking Machines Corporation. CMMD Reference Manual, May 1993.
.... , AC with CMMD) The following subsections give a brief description of the software packages which were used to rate the performance of the CM 5, and explain why the choice of that respective platform was made. 2.1 CMMD 3.1 CMMD 3. 1 is the CM 5 message passing library for MIMD code [12]. Code is written as though it will be executed on an arbitrary processing node, and all synchronization and message passing between the nodes are explicitly stated via CMMD library function calls. We found this package to be useful when testing the communications performance between nodes in the ....
Thinking Machines Corporation, Cambridge, MA. CMMD Reference Manual, 3.0 edition, May 1993.
....parallel code was developed on the 32 node CM 5 installed at the Northeast Parallel Architectures Center using the message passing library CMMD. Since the code is written in FORTRAN 77, the four vector units (VU s) on each node may be accessed only by writing non portable assembly language code [13]. CMMD also provides Virtual Channels and an Active Message Interface which considerably reduce the message latency. However, in keeping with the main design consideration of constructing portable code, none of the above functionalities were incorporated. The details of the parallel implementation ....
Thinking Machines Corporation, CMMD Reference Manual (1993).
....services, current and likely future routinghardware features, and how a typical messaging layer bridges the two. 2.1 Communication services Application programs typically expect a basic set of communication services from messaging layers. Most messaging layers provide the following services [3, 25, 11, 10]: 1. Message Delivery 2. Message Ordering 3. Deadlock Overflow Safety 4. Reliable Delivery First, the most basic service is data movement from the sender to the receiver. Second, messages between a particular sender and receiver should be delivered in order of transmission. Although not ....
Thinking Machines Corporation. CMMD Reference Manual, V3.0, May 1993.
....best solution and termination conditions. Split C facilitates easy coding of the synchronization problem which obtains its data via message passing, while allowing the data for the subproblems to be physically distributed across the processors. Much of this can also be carried out using CMMD [18], the message passing library of the CM 5. However, Split C enables the code to be written in a more readily portable manner. The current implementation uses MINOS 5.4, a newer version of [14] to solve both the parallel subproblems and the synchronization problem. MINOS uses a Quasi Newton ....
Thinking Machines Corporation, Cambridge, MA. CMMD Reference Manual, Version 3.0, 1993.
....messages were sent and received. As an extreme example, if all messages are sent before any is received, a large machine will simply crash when the number of virtual channels has been exhausted. In the CMMD message passing library (version 3. 0) each outstanding send requires a virtual channel [51] and the number of channels is limited. Instead, we used a protocol which alternates sends with receives (Figure 3.6) The problem is thus reduced to ordering the messages to be sent. For example, sending messages in order of increasing destination address gives low throughput since virtual ....
....results and compares them with related work. Our platform is the Connection Machine CM 5 with SPARC vector units [49] Each processing node has 32M bytes of memory and can perform floating point operations at peak rate of 128 Mflop s [52] We use the CMMD communication library (version 3. 0) [51]. The vector units are programmed in CDPEAC which provides an interface between C and the DPEAC assembly language for vector units [50] The rest of the program is written in C. The experiments sketched here included three input distributions: the uniform distribution, the Plummer distribution [1] ....
Thinking Machine Corporation. CMMD Reference Manual, 1993.
....means comprehensive; however, it serves our purpose in the sense that it provides us with a list of services that are required to support HPDC. The software tools studied include EXPRESS [5] PICL [3] PVM [6] ISIS [2] the iPSC communication library [4] and the CM5 communication library (CMMD) [7]. These tools were selected because of their availability at the Northeast Parallel Architecture Center at Syracuse University and also the following two reasons: 1) they support most potential computing environments, i.e. parallel, homogeneous and heterogeneous distributed systems; and (2) they ....
Thinking Machines Corporation, Massachusetts, Cambridge. Cmmd reference manual, version 1.1. 1992.
....loop invariant, the compiler could not move them out of the loop. 4.2. Preliminary library performance We implemented our data parallel library on Thinking Machine Corp. CM 5, and compared its performance to the code generated by the C compiler. The library uses the message passing library CMMD [8], and executes in an SPMD model. All processors begin execution and work independently until the synchronization points at communication operations. The library supports general block cyclic distribution, but for comparison purposes, block distribution is used as the C compiler. Since the C ....
Thinking Machines Corporation. CMMD Reference Manual, April 1995.
....solutions (and very often provably optimal ones) We implemented our algorithm in C on a Thinking Machines CM 5 multiprocessor [Thi91] with 2 partitions of 32 SPARC processors each. To co ordinate the processors we used the CMMD v.3. 1 message passing library provided by Thinking Machines Inc [Thi93] . The communication overhead of our Genetic Algorithm is very low as the program spends less than 3 of total time in communications between processors. We have also tried to solve some of these problems using the GRASP code of Pardalos, Resende et al. 2 This GRASP code uses a Branch Bound ....
Thinking Machines Corporation. CMMD Reference Manual, May 1993.
....research [CSBS95] BCL 95] CS95] and researchers 1 with a clear message passing bias. Second, we perform all experiments on the same hardware. The MIT Alewife multiprocessor provides a uniquely integrated 1 For example, one of the original developers of the CMMD message passing library [Thi93b] on the CM5. CHAPTER 1. INTRODUCTION 23 environment for comparing cache coherent distributed shared memory, message passing, and DMA. 1.5 Data Transfer Even when an application receives no benefit from caching, cache coherent shared memory hardware is still an extremely efficient communication ....
....our iterative applications to favor message passing because they had little data reuse. However, we found shared memory to be an extremely efficient communication mechanism even without the benefits of caching. 1 For example, one of the original developers of the CMMD message passing library [Thi93b] on the CM5. CHAPTER 4. SHARED MEMORY VERSUS MESSAGE PASSING 49 Although we expected clear gains from the superior synchronization semantics of messagepassing with active messages [E 92] we were able to overcome the synchronization difficulties of shared memory by shifting from an ....
[Article contains additional citation context not shown here]
Thinking Machines Corporation, Cambridge, MA. CMMD Reference Manual (Version 3.0), May 1993.
....of active message send provided by the Strata [13] communications library, SendBothLRPollBoth. In our previous study [1] we found that using SendBothLRPollBoth, which polls on every message injection, keeps the network clear of congestion and results in superior performance to the TMC s CMAML [14] functions. Differences were as high as a factor of three in communications overhead and ranged from 25 to 30 percent in overall applications performance. 7 Results Our results demonstrate that, using partitioned inverses, speedups are possible where substitution would only produce slowdowns. ....
Thinking Machines Corporation, Cambridge, MA, CMMD Reference Manual (Version 3.0), May 1993.
No context found.
Thinking Machines Corporation. CMMD Reference Manual, December 1992. 17
No context found.
Thinking Machines Corporation. CMMD Reference Manual. Cambridge, MA, Feb. 1993.
No context found.
Thinking Machines Corporation, Massachusetts, Cambridge. Cmmd reference manual, version 1.1. 1992.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC