33 citations found. Retrieving documents...
Paul Pierce. The NX message passing interface. Parallel Computing 20(4), April 1994, pp. 463-80.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Relaxed Synchronization Message Passing - Alpert, Philbin   (Correct)

....others in syntax and semantics. The section following that describes the API of cBSP, a RSMP library designed, developed, tested, and evaluated for this research. 2.2 TMP and RSMP APIs 2.2. 1 A Traditional Message Passing API The TMP API used in this work is that of NX 2 [Pie88, PR94, Pie94a, Pie94b] The functionality of the communication calls in NX 2 shares its behavior and semantics with those of other popular message passing interfaces, including MPI [GLS94] Those used for this research are # csend(tag, buf, len, dest, PID) sends a message of size len, from address buf, with tag tag ....

Paul Pierce. The NX Message Passing Interface. Parallel Computing, 20(4), April 1994.


Relaxed Synchronization Message Passing - Alpert, Philbin   (Correct)

....data structures without risking program correctness [AG95] Somewhat analogous techniques applied to the messagepassing model produce RSMP. Historical approaches to message passing library design testify to the popularity of of single message, pointto point communication [GBD # 94, GLS94, Pie94a] RSMP takes a different approach. To reduce aggregate message overhead, synchronization constraints are relaxed, widening the time window during which communication can occur, permitting messages to be coalesced. Relaxing synchronization constraints, coalescing messages, eliminating the need for ....

....and others in syntax and semantics. The section following that describes the API of cBSP, a RSMP library designed, developed, tested, and evaluated for this research. 2.2 TMP and RSMP APIs 2.2. 1 A Traditional Message Passing API The TMP API used in this work is that of NX 2 [Pie88, PR94, Pie94a, Pie94b] The functionality of the communication calls in NX 2 shares its behavior and semantics with those of other popular message passing interfaces, including MPI [GLS94] Those used for this research are # csend(tag, buf, len, dest, PID) sends a message of size len, from address buf, with ....

Paul Pierce. The NX Message Passing Interface. Parallel Computing, 20(4), April 1994.


Ornl/tm-13682 - Computer Science And   (Correct)

....or default MPI communication protocol, as it is what we would expect to be optimal knowing nothing else about the platform. There are many other MPI communication commands that can be used to implement SWAP and SENDRECV. Our choices are primarily historical, reflecting the capabilities of the NX [11] communication library more than MPI. However, little is actually being ignored. The communication patterns, message sizes, and buffer addresses vary throughout the code, and the MPI persistent communication requests are not appropriate for this code. The MPI synchronous commands are also unlikely ....

P. Pierce, The NX message passing interface, Parallel Computing, 20 (1994), pp. 463--480.


A Structured Approach to Parallel Programming - Massingill (1998)   (1 citation)  (Correct)

....and support for reduction operations and file input output. We have developed for this archetype an implementation consisting of a code skeleton and an archetype specific library of communication routines, 3 with versions based on Fortran M [40] Fortran with p4 [17] and Fortran with NX [60]. The implementation is described in detail in [57] it has been used to run applications on the IBM SP, the Intel Delta, the Intel Paragon, and a network of Sun workstations. 7.3 Applications This section discusses the second phase of the archetypes related experimental work, in which we used ....

P. Pierce. The NX message-passing interface. Parallel Computing, 20(4):463--480, 1994.


The Cranium Network Interface Architecture: Support for Message.. - McKenzie (1997)   (Correct)

....we state the thesis of this dissertation which concerns the architecture and implementation of a network interface for an MPP computer system. 1. 3 The role of the network interface in message passing In general, scalable parallel computers use a style of communication known as message passing [7, 8, 9]. The invocation of each message is stated explicitly in the program, 8 by means of send and receive commands 1 . A simple message consists of a block of data, whose size may range from a single bit to millions of bits. A simple message is uni directional and asynchronous. After the sender ....

....In Chapter 3 we explain the difficulty of interfacing processors to adaptive routers, through the use of a specific design example. We then specify the Cranium architecture. Chapter 4 describes the software interface for Cranium, and compares it with other message passing systems such as Intel NX [7]. Chapter 5 describes a simulation environment that was developed to evaluate the performance of Cranium, based on the Talisman 19 processor simulator [27] and the Chaos router [15] Chapter 6 characterizes the performance of Cranium. We begin with an analysis of the basic latency and throughput ....

[Article contains additional citation context not shown here]

Paul Pierce. The NX message passing interface. Parallel Computing 20(4), April 1994, pp. 463-80.


cBSP: Zero-Cost Synchronization in a Modified BSP Model - Alpert, Philbin (1997)   (4 citations)  (Correct)

.... based on specific types of application programming interfaces (APIs) These APIs have been designed around experience and need, integrating techniques of concurrent programming (e.g. Mutual Exclusion, Monitors, Threads) and interprocessor communications (e.g. PVM, RPC, NX, MPI) BDG 91, BN84, Pie94, SOHL 95] Because these APIs are not models of computation, it is difficult to use them to study or to predict the behavior of parallel algorithms running on parallel machines. Recent efforts such as the LogP model [CKP 92] can be used to study the behavior of message passing programs ....

....time, each process allocates a region of memory for messages from each other process. Because message buffers are completely cleared at the end of each superstep, there is no need for a buffer to have a strict, regular format. This contrasts with the VMMC implementation of NX 2 [ADFL96, Pie94] Because NX messages can be consumed out of order, and because there is no time at which it is known that a message buffer will be completely empty, NX messages are sent to clearly defined slots in the message buffer. In cBSP, messages of different sizes fill the buffer space as needed with ....

Paul Pierce. The NX Message Passing Interface, Parallel Computing Vol. 20 no. 4, April. Parallel Computing, 20(4), April 1994.


Design Choices in the SHRIMP System: An Empirical Study - Blumrich, Alpert, Chen.. (1998)   (12 citations)  (Correct)

....headers. Finally, at the applications level, our software evaluations draw on prior work on several programming models. The shared virtual memory used here relates to a significant body of prior SVM research [16, 30, 33, 47] We also leverage off of the NX model for message passing programs [38]. 6 Conclusions We constructed a 16 node prototype SHRIMP system and experimented with applications using various highlevel APIs. We found that the SHRIMP multicomputer performs quite well for applications that do not perform very well with traditional network interfaces. Using applications ....

Paul Pierce. The NX Message Passing Interface. Parallel Computing, 20(4), April 1994.


Parallel CG-Methods - Automatically Optimized For.. - Eisenbiegler..   (Correct)

....the IBM SP1 machine [15] All of those machines work with precisely the modelled running times, in practice. However, the LogP model ignores that in most parallel architectures long messages can be transmitted with a significantly higher bandwidth than short messages (e.g. IBM SP 2 [1] Paragon [17]) This discrepancy is even more true for clusters of workstations and PCs because the costly overhead for communication may be saved. Therefore, it is not possible to predict the execution time of a program using a wide range of message sizes or to predict the execution times of different ....

P. Pierce, The NX message passing interface, Parallel Computing, 20(4), 1994, pp. 463--480.


A System Software Architecture for High-End Computing - Greenberg, Brightwell.. (1997)   (8 citations)  (Correct)

....address space. 5.2 Portals as organizers Portals are more than just a mechanism for moving data. They represent a way for applications to manage the flow of data. Rather than just supplying buffers to smooth out temporal or speed mismatches the portal organizes data. Many systems supply handlers [25, 30] which can be associated with certain message events such as message arrival. A completely general handler system will, of course, allow the data to be organized. However, issues of deadlock and interleaving of handler instructions with network interface instructions can be quite complex and ....

P. Pierce. The NX message passing interface. Parallel Computing, 1993.


Chapter in Wiley Encyclopedia of Electrical and.. - Dongarra, Fagg.. (1999)   (Correct)

.... certain types of calculations) lack of message identification and filtering at the receiver s end (only one type compared to up to three used on later systems) and initially a high 11 software overhead compared to simpler protocols such as active messages which had direct access to hardware[29]. 4.1.4 The German SUPRENUM project SUPRENUM was a German project to develop supercomputing expertise in Europe by developing a supercomputer and the required software infrastructure. Although only five machines were delivered, this 5 Gflop MPP system and the follow up ESPRIT GENESIS project led ....

P. Pierce, The NX message passing interface, Parallel Computing 20(4) (1994) 463--480.


Experimental Validation of Parallel Computation Models on the.. - Juurlink (1998)   (2 citations)  (Correct)

.... wormhole routing, and the routing algorithm first sends messages in the horizontal direction to their destination column and then in the vertical direction to their destinations (dimension ordered routing) We implemented several simple benchmark programs using the NX message passing library [Pie94] NX is the programming interface supplied by Intel. All performance figures have been obtained using release level 1.4 of the OSF 1 operating system. A technical detail is that a non blocking receive must be posted before the message actually arrives, or otherwise the OS must store the message ....

P. Pierce. The NX message passing interface. Parallel Computing, 20:463--480, 1994.


Design Choices in the SHRIMP System: An Empirical Study - Matthias Blumrich (1998)   (12 citations)  (Correct)

....headers. Finally, at the applications level, our software evaluations draw on prior work on several programming models. The shared virtual memory used here relates to a signi Thetacant body of prior SVM research [16, 31, 34, 47] We also leverage off of the NX model for message passing programs [39]. 6 Conclusions We constructed a 16 node prototype SHRIMP system and experimented with applications using various highlevel APIs. We found that the SHRIMP multicomputer performs quite well for applications that do not perform very well with traditional network interfaces. Using applications built ....

Paul Pierce. The NX Message Passing Interface. Parallel Computing, 20(4), April 1994.


An Overview of Message Passing Environments - McBryan (1994)   (31 citations)  (Correct)

....calls are supported at the host. These features were fixed in NX2 which provides full symmetry between host and node operations. 18 3.4. The Intel iPSC2 Computer 3.4.1. Intel iPSC2 Programming: NX2 NX2 is the current Intel operating system for the iPSC 860 and Paragon computers [4,5], see Table 5. The NX2 operating system introduced some small changes to the NX1 system, and added extra capabilities such as interrupt driven communication. Again NX2 represents each processor by an integer node number in the range [0, P 1] but now the host is denoted by the integer P . As in ....

P. Pierce, "The NX Message Passing Interface", later in this volume.


Simulative And Experimental Analysis Of Communication.. - Roger Butenuth   (Correct)

....at the destination node without the need of software interaction on intermediate nodes (hardware routing) In contrast to distributed systems, the network of parallel computers is usually regular. Popular topologies are meshes of various degrees. The Intel Paragon uses a 2 dimensional grid (Pierce 1994b) whereas the Cray T3D uses a 3 dimensional torus (Adams 1995) Other possibilities are multistage networks, which can be found in the IBM SP 2 (Bala 1994) The probability of network errors is often in the range of memory errors and therefore usually ignored in parallel computers. Different ....

P. Pierce. 1994a."The NX message passing interface."Parallel Computing, Pages 463 - 480, Vol. 20, 1994.


A Tool for Distributed Application Development Based.. - Antoniol, Fiutem..   (Correct)

....differ in the way in which services are offered. Among these, two main categories can be identified: ffl proprietary environments, provided together with MPP computers. Some examples are: Caltech s CROS [1] IBM s External User Interface (EUI) 2] Meiko s Computing Surface (CS) 4] Intel s NX [3] and CMMD for the Connection Machine CM 5 [5] Although the services offered are actually the same, the environments are incompatible; ffl portability environments, which implement an MPE specification on different hardware platforms often exploiting native MP environments. Among these, some ....

P. Pierce, The NX message passing interface, Parallel Computing (1994) 463-480.


Implementing Communication Latency Hiding in High-Latency.. - Volker Strumpen (1995)   (2 citations)  (Correct)

....program send a message to each other before invoking the corresponding receive operation. If both messages are larger than the capacities of the send and receive buffers, both processes block in their send calls the processes deadlock. Interfaces like Intel s NX message passing system [4] or IBM external user interface [2] leave the problem of deadlock avoidance with the programmer. We solve this problem by fragmenting messages on top of the transport layer into sizes not bigger than the send or receive buffers, and reading pending data in a receive buffer before writing. This ....

....has arrived, and the buffer contains the expected data. The idea of splitting the send and receive primitives into two calls is not new. It has been introduced to utilize hardware support for latency hiding. Intel s NX asynchronous message passing primitives isend, irecv and msgwait are an example [4]. 3 Experimental Results To illustrate the benefit of our latency hiding implementation, we present experiments performed on two processor configurations. These experiments investigate the dependency of gain G on granularity. An explicit finite difference solver based on a fivepoint stencil, has ....

P. Pierce. The NX Message Passing Interface. Parallel Computing, 20(4):463--480, 1994.


Generalized Subspace Correction Methods For Parallel.. - Kolm, Arbenz, Gander (1995)   (5 citations)  (Correct)

....j and resumes program execution as soon as the message has left the application space. recv(buf) blocks the node application until a message arrives in the memory buffer. These two routines are similar to the blocking send and receive functions provided by Intel s NX message passing interface [27]. For simplicity we assume that p processors are available and the system matrix, A, is split into p overlapping parts, A i ) i=1; p , according to a chosen regular block decomposition. The vector x representing the approximate solution is distributed in an overlapped fashion such that the ....

....on a single or a few processors without swapping data to the slow disk memory. Clearly, disk usage should be avoided for the sake of efficiency. All the computations reported here are done with a C implementation of the RPSC algorithm using message passing primitives from Intel s NX library [27]. The codes are run with IEEE arithmetic and compiled with the compiler switch O3, making the compiler do basic scalar and pipelining optimizations where possible. We use a slightly modified definition of speedup and efficiency of base p 0 S p0 (p) T (p 0 ) Delta p 0 T (p) E p0 (p) S ....

P. Pierce, The NX message passing interface, Parallel Comput., 20 (1994), pp. 463--480.


Software-Based Communication Latency Hiding for Commodity.. - Strumpen (1996)   (2 citations)  (Correct)

.... Multithreading is now widely accepted as a means for hiding latencies, not only for memory load operations but also for hiding interprocessor communication latencies [5, 7, 12, 13, 15] For multicomputers, multithreading and asynchronous message passing, such as offered by Intel s NX interface [22], are widely used latency hiding mechanisms. Nevertheless, they require careful structuring of parallel programs. With multithreading the programmer must identify independent threads of control that can be executed concurrently. With asynchronous message passing, a nonblocking send or recv ....

P. Pierce. The NX Message Passing Interface. Parallel Computing, 20(4):463--480, April 1994.


Integrating Task and Data Parallelism with the Group.. - Mani Chandy (1995)   (2 citations)  (Correct)

....is based on a specialized version of the group communication archetype and consists of developing an archetype implementation (code library and program skeleton) plus one or more applications that make use of it. The programs described here are written in either Fortran M [11] or Fortran with NX [15], but we have also developed archetype based programs in Fortran with p4 [4] Fortran with PVM [17] and CC [6] 8.1 Spectral methods archetype In this archetype, described in the example in x7.1, data is a two dimensional array, and the computation consists of a sequence of row operations ....

P. Pierce. The NX message-passing interface. Parallel Computing, 20(4):463--480, 1994.


LogGP: Incorporating Long Messages into the LogP.. - Alexandrov.. (1995)   (127 citations)  (Correct)

.... accurately predict communication performance when only fixed sized short messages are sent [CKP 93, CDMS94, LC94] However, many existing parallel machines have special support for long messages which provide a much higher bandwidth than short messages (e.g. IBM SP 2 [BBB 94] Paragon [Pie94] Meiko CS 2 [BCM94] Ncube 2 [SV94] Even low overhead communication architectures, such as the generic active message specification [CKL 94] support bulk transfers. The LogP model only deals with short messages and does not adequately model machines with support for long messages. Our ....

P. Pierce. The NX message passing interface. Parallel Computing, 20(4), April 1994.


Small, Scalable, and Efficient, Microkernels for Highly.. - Butenuth, Heiss   (Correct)

....load code used by more than one program only once and therefore saves memory. In the example, the segment number 3 might be one of these code segments. 6. Communication The idea of minimal processes discourages the addressing of messages to processes, as done in many message passing environments [1, 11, 13]. Instead, there are independent communication objects, called channels. A channel is an object with operations send and receive that can exist independently from any process. A communication takes place when both sender and receiver have executed their respective operations. If the sender arrives ....

P. Pierce. The NX Message Passing Interface, Parallel Computing, vol. 20, pp. 463-480, 1994.


Latency-driven Programming of Computer Networks - Strumpen (1995)   (Correct)

....send and receive operations to register the message with the runtime layer, and the corresponding synchronization routine that blocks a thread until the requested transfer is finished. This interface is similar to the isend irecv and msgwait routines of Intel s NX message passing interface [18]. The (ANSI C) syntax of the asynchronous message passing operations is as follows: int send(int pid, int sno, void ( mf) margvt ) margvt ma) int recv(int pid, int sno, void ( mf) margvt ) margvt ma) int msync(int mid) Both the send and recv operation return a unique message identifier, ....

....with the latency hiding capability of a specialized parallel machine, we present some experimental data, obtained from an Intel Paragon system. This multicomputer employs an additional communication processor per node in order to enhance the communication capabilities. Intel s NX programming model [18] offers asynchronous communication by means of the isend irecv and msgwait routines. Since no user level continuations are available to run a benchmark similar to the one above, a different experiment was designed: Consider a fixed communication volume and a varying amount of computation, which is ....

P. Pierce. The NX Message Passing Interface. Parallel Computing, 20(4):463--480, April 1994.


Cranium: An Interface for Message Passing on Adaptive Packet.. - Mckenzie (1994)   (10 citations)  (Correct)

....improve program portability and readability, compared with using the specific message passing primitives provided by the architecture. We now show a simple example of run time support for Cranium using the semantics of csend( and crecv( the most basic method of communication under NX [8]. Figure 4 illustrates how the NX commands are implemented in terms of Cranium commands, for the case of medium length messages (between 32 and 32K bytes) In the figure, italics represent pseudocode, and roman type represents NI commands. In this example, X is the source node and Y is the ....

Paul Pierce. The NX Message Passing Interface. Parallel Computing 20(4), April 1994, pp. 463-480.


Communication Latency Hiding - Model and. . . - Strumpen (1994)   (Correct)

....does not suffice to exchange messages in a safe way, because deadlocks may occur due to bounded buffer implementations. The solution to this problem is left to the programmer. But not only with the socket interface, also IBM s external user interface [5] and Intel s NX message passing interface [18] burden the programmer with deadlock avoidance. Furthermore, message transfer might be inefficient due to protocol implementation details. The design decisions of the protocol, presented in this report, are partially based on empirical studies to ensure highest possible performance in a portable ....

....TCP receive buffer. As mentioned above, if both messages are larger than the capacities of the corresponding send and receive buffers, both processes block in their send calls the processes deadlock. Pierce discusses this, and a more complicated example, for the Intel NX message passing system [18]. Bala et al. 5] describe how to write a safe program with the IBM external user interface. However, in both systems, deadlocks can occur, and the programmer is responsible for deadlock avoidance. For UNIX stream sockets, the default 25 send and receive buffer size is 4 Kbyte. Although ....

[Article contains additional citation context not shown here]

Pierce P., "The NX Message Passing Interface," Parallel Computing 20(4), 1994, 463--480.


On the Use and Performance of Explicit Communication Primitives.. - Qin, Baer (1996)   (2 citations)  (Correct)

....issues, we have concentrated on the latter, imposing a global coherence strategy for prefetching and bulk data transfers. The communication primitives that instruct the system to perform efficient data transfers resemble the asynchronous send receive operations in message passing interfaces [4, 24], prefetching [23, 6, 14] and poststore [19] commands, non blocking (bulk) read (get) and write (put) operations in the split phase assignment statement of Split C [9] and explicit communication mechanisms [26] The common idea is the overlap of communication with computation. The differences ....

P. Pierce. The NX message passing interface. Parallel Computing, 20(4):463--480, April 1994.


A Users' Guide To Pstswm - Worley, Toonen (1995)   (Correct)

....distribution of PSTSWM, the PVM only version used in the ParkBench v1.0 suite of benchmark codes, and the MPI only version developed for inclusion in the next generation of the ParkBench suite. 1 The PICL message passing model is fairly rich, representing a substantial subset of both the NX [12] and MPI [2] low level primitives, and has been adequate for obtaining efficient message passing performance on most current message passing systems. But PICL message passing is not as efficient as the native commands on all systems. For example, the SHMEM remote read write commands are ....

P. Pierce, The NX message passing interface, Parallel Computing, 20 (1994), pp. 463-- 480.


Generalized Subspace Correction Methods (Extended Abstract) - Kolm, Arbenz, Gander   (Correct)

....constraints. The result then follows from Theorem 3.2. 4. Numerical Results. We restrict the experiments to the PSC method due to its natural parallelism. The implementation is done in C for a 96 node Intel Paragon XP S5 at ETH Zurich using message passing primitives from Intel s NX library [8]. As model problem we consider the elliptic partial differential equation Gamma Deltau 96 x u x y u y = g ; x; y) 2 (0; 1) Theta (0; 1) 3) with Dirichlet boundary conditions. The differential equation is discretized over a 300 Theta 300 grid using centered differences for ....

P. Pierce, The NX message passing interface, Parallel Computing, 20 (1994), pp. 463--480.


LogGP: Incorporating Long Messages into the LogP.. - Alexandrov.. (1995)   (127 citations)  (Correct)

.... accurately predict communication performance when only fixed sized short messages are sent [CKP 93, CDMS93, LC94] However, many existing parallel machines have special support for long messages which provide a much higher bandwidth than short messages (e.g. IBM SP 2 [BBB 94] Paragon [Pie94] Meiko CS 2 [BCM94] Ncube 2 [SV94] Even low overhead communication architectures, such as the generic active message specification [CKL 94] support bulk transfers. The LogP model only deals with short messages and does not adequately model machines with support for long messages. Our ....

P. Pierce. The NX message passing interface. Parallel Computing, 20(4), April 1994.


The CLAM Approach to Multithreaded Communication on.. - Gomez, Mascarenhas, Rego (1996)   (Correct)

.... within communication services layered upon TCP IP or UDP protocols (such as in PVM [1] P4 [2] MPI [3] for example, which execute on workstation clusters) provided by the OS kernel, or upon specialized communication facilities tied to a given hardware environment (e.g. Intel NX library [4], IBM EUI [5, 6] Though this model has proven adequate for many distributed applications with known or well structured process interactions, there is a recognized need for improved communicability in situations where process interactions are either unstructured or unpredictable or both. To ....

Paul Pierce. The NX Message Passing Interface. Parallel Computing, 20(4):463--480, April 1994.


Merl -- A Mitsubishi Electric Research Laboratory - Http Www Merl (1998)   (Correct)

No context found.

Paul Pierce. The NX message passing interface. Parallel Computing 20(4), April 1994, pp. 463-80.


Merl -- A Mitsubishi Electric Research Laboratory - Http Www Merl (1998)   (Correct)

No context found.

Paul Pierce. The NX message passing interface. Parallel Computing 20(4), April 1994, pp. 463-80.


Practical Parallel Divide-and-Conquer Algorithms - Hardwick (1997)   (1 citation)  (Correct)

No context found.

P. Pierce. The NX message passing interface. Parallel Computing, 20(4):463-- 480, April 1994.


COSY - An Operating System for Highly Parallel Computers - Burke, Heiß (1996)   (2 citations)  (Correct)

No context found.

P. Pierce: "The NX Message Passing Interface", Parallel Computing, vol. 20, pp. 463-480, 1994.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC