| R. Butler and E. Lusk, "User's guide to the p4 parallel programming system," Tech. Rep. ANL-92/17, Argonne National Laboratory, Oct. 1992. Version 1.4. |
.... This encompasses theoretical models such as CSP[2] CCS[3] and the Actor model[4] programming languages such as Occam[5] LOTOS[6] and various flavors of concurrent Object Oriented and object based languages; remote procedure call schemes; and programming tools such as PVM[7] MPI[8] P4[9], PARMACS[10] etc. Consider the following simple example of a concurrent application where the two active entities (i.e. processes) p and q must cooperate with each other. The process p at some point produces two values which it must pass on to q. The process q, in turn, must perform some ....
R. Butler and E. Lusk, "User's guide to the p4 parallel programming system," Tech. Rep. ANL-92/17, Argonne National Laboratory, Oct. 1992. Version 1.4.
....overlapping communication and computation which produced only a marginal improvement of 6 . The communication overhead of the network architecture could be reduced using a better networking technology such as ATM (Asynchronous Transfer Mode) and or a low latency messagepassing library such as p4 [Butler et al. 1992]. These improvements coupled with a larger problem size could make the parallel performance of the CG scheme on the workstation network more attractive. Execution times for Domain Decomposition on a single processor as a function of the number of subdomains are shown in Figure 11. The minimum ....
Butler, R and Lusk, E. 1992. User's guide to the p4 parallel programming system. Technical Report ANL-92/17. Argonne National Laboratory.
....share this characteristic. This encompasses theoretical models such as CSP[1, 2] CCS[3] and the Actor model[12] 4 ; programming languages such as Occam[7] LOTOS[8] and various flavors of concurrent Object Oriented languages; and concurrent programming tools such as PVM[13] MPI[14, 15] P4[16], PARMACS[17] etc. Consider the following simple example of a concurrent application where the two active entities (i.e. processes) p and q must cooperate with each other. The process p at some point produces two values which it must pass on to q. The process q, in turn, must perform some ....
R. Butler and E. Lusk, "User's guide to the p4 parallel programming system," Tech. Rep. ANL-92/17, Argonne National Laboratory, Oct. 1992. Version 1.4.
....on the UDP protocol. 77 Tables 4.1 and 4.2 show communication round trip latency and bandwidth on an IP network for the APA and several workstation cluster message packages [91] TCGMSG [92] uses static TCP sockets which created at the time the program is started. In the configuration tested, p4 [93] and PVM [14] both use dynamic TCP sockets which are created on demand at first communication. The application under test is a simple program that sends a packet to a single other node which immediately sends the packet back. This procedure was repeated 100 times, and the mean, variance, minimum, ....
....these segments are not exported through the interface to client code. Recent work has reported experiences adding user level threads to PVM [95] PVM does not provide direct support for shared memory, synchronization, or memory management. 4.7. 2 p4 Developed at the Argonne National Laboratory, p4 [93] provides a send receive model on workstation clusters via TCP sockets. The p4 library is a thin layer above the socket interface, which generally reduces problems of initialization and shutdown. The p4 interface also provides shared memory primitives on machines that provide them but does not ....
R. Butler and E. Lusk, "User's guide to the p4 parallel programming system," Argonne National Laboratory, Argonne, IL, Tech. Rep. ANL-92/17, June 1992. 157
....to a ring communication structure. 4 Model Validation We consider the performance of the benchmark molecular dynamics simulations executed on the Intel Touchstone Delta. The original program was rewritten in C with the p4 message passing library developed at Argonne National Laboratory [16]. This library allows the code to be portable to other machines such as the Paragon, IBM SP 1, and CM 5. We developed analytical models of the sequential and parallel execution time based upon the costs described in the previous sections. The values of the parameter t op were obtained ....
Lusk, Ewing and Ralph Butler, "User's Guide to the p4 Parallel Programming System," Technical Report ANL-92/17, Argonne National Laboratory, October 1992.
....model, with different approaches to determine which sections of the algorithm are performed in parallel and which ones are performed sequentially. The three parallel implementations presented in this section are based on the stabilized BlockCG Algorithm 2. 3, and we use the P4 library of routines [Butler and Lusk (1992)] to manage the parallel environments on different machines (also versions of these implementation are available using the PVM library of routines [Beguelin, Dongarra, Geist et al. 1992) For the following, we will focus on the processes and the tasks they perform in each implementation, rather ....
R. Butler and E. Lusk, (1992), User's Guide to the p4 Parallel Programming System. Mathematics and Computer Science Division, Argonne National Laboratory.
....selection under some circumstances would be altered to exploit the attributes of the various machines which may directly impact the program s throughput. Last is the general heterogeneous processing scheme. This classification includes systems such as PVM (Geist et al. 1993, Sunderam 1990) p4 (Butler and Lusk 1994a, Butler and Lusk 1994b) and HAsC (Scott and Potter 1994, Scott and Potter 1993, Scott 1994) which are designed to distribute a program s tasks for execution in order to exploit the variety of machine architectures and configurations available on a network of heterogeneous machines. In these ....
....circumstances would be altered to exploit the attributes of the various machines which may directly impact the program s throughput. Last is the general heterogeneous processing scheme. This classification includes systems such as PVM (Geist et al. 1993, Sunderam 1990) p4 (Butler and Lusk 1994a, Butler and Lusk 1994b) and HAsC (Scott and Potter 1994, Scott and Potter 1993, Scott 1994) which are designed to distribute a program s tasks for execution in order to exploit the variety of machine architectures and configurations available on a network of heterogeneous machines. In these systems, the application ....
Butler, R., E.Lusk, "User's Guide to the p4 Parallel Programming System", Argonne National Laboratory, Argonne, IL, Preprint MCS-P362-0493, April 1994.
.... be created; and (4) the login name of the user which the processes should be created under (optional) These processes are then created either using the Unix rsh command [Sun 1994] or through the use of a server process that is manually started by the user prior to the process creation operation [Butler and Lusk 1992]. Processes created using this mechanism are said to be p4 managed . This means that the processes created are able to use p4 s message passing primitives to communicate with other processes. The first process created by the user at the Unix prompt is also p4 managed [Butler and Lusk 1993] p4 ....
....where the new processes are created on the same machine as the invoking process, and communicate with the parent process through shared memory. The processes created using this mechanism are limited, as they are unable to use the p4 message passing primitives to communicate with other processes [Butler and Lusk 1992]. PVM provides dynamic process creation. The first process of an application can be created either by the user running a program at the Unix shell prompt, or by using the console utility provided in the PVM package to spawn one or more processes. These process can then create further tasks using ....
[Article contains additional citation context not shown here]
R. Butler and E. Lusk, "User's Guide to the p4 Parallel Programming System", Technical Report ANL-92/17, Mathematics and Computer Science Division, Argonne National Laboratory, Illinois, USA, October 1992.
....POSYBL, Glenda, C Linda, etc. Due to the high level abstraction provided by the model, it is not obvious to provide an efficient implementation. However, tests on workstations clusters ( 4, 13] have shown that some implementations can compete with message passing libraries. 1.2. 2 P4 P4 ([1]) is a library developped at the Argonne National Laboratory. It is mainly a message passing library. However, it is using shared memory each time it is possible, implicitly when doing message passing on shared memory machines or explicitly by providing a set of shared memory functions for use on ....
R. Butler, E. Lusk. "User's Guide to the p4 Parallel Programming System".
....the number of atoms assigned to a processor. 4 Model Validation We analyzed the performance of the benchmark molecular dynamics simulation executed on the Intel Touchstone Delta. The original program was rewritten in C with the p4 message passing library developed at Argonne National Laboratory [12]. This library allows the code to be portable to other machines such as the Paragon, IBM SP, and CM 5. We developed analytical models of the sequential and parallel execution time based upon the costs described in the preceding sections. The parallel execution time is the sequential time divided ....
Lusk, E. and R. Butler, "User's Guide to the p4 Parallel Programming System," Technical Report ANL-92/17, Argonne National Laboratory, October 1992.
.... parallel execution of the simulation on either a distributed system or on a highly parallel machine [11] This also implies the existence of the p4 software package on the user s computer, which might not be installed on all systems, although it is available cost free from Argonne National Labs [3]. IV. Application in Research and Teaching The primary goal of the GUI development was to assist both neural network professionals and novel users in performing neural network experimentations on heterogeneous simulators. From the professionals perspective, the two embedded neural network ....
R. Butler, E. Lusk, "User's Guide to the p4 Parallel Programming System," Technical Report ANL-92117, Argonne National Laboratory, Argonne, IL, 1992.
.... The algorithm is suitable for implementation on a distributed platform since the communication graph is simple and the total number of communications is small (Figure 1) Our implementation uses p4 which supports parallel programming for both distributed environments and highly parallel computers [1]. It helps to create the master and the slave processes and provides easy means of communication between them. Another advantage of using p4 for neural networks implementation is its ability to port directly from a distributed to a highly parallel platform [4] 2.1 Epoch based Cooperation In ....
R.Butler and E.Lusk,"User's Guide to the p4 Parallel Programming System," Argonne National Lab., November, 1992.
.... the code has been parallized and in a first implementation the communications have been realized through the shared memory, and the synchronizations effected using locks; in a second one we have used P4 package (Portable Programs for Parallel Processors) developed at Argonne National Laboratory by Butler and Lusk (1992) that is a portable parallel programming environment, currently available on a wide range of multiprocessors, that includes a set of message passing primitives. For additional information, we refer to Boyle, Butler, Disz et al. 1987) 5.1.1 The shared memory like implementation In this first ....
R. Butler and E. Lusk, (1992), User's Guide to the p4 Parallel Programming System. Mathematics and Computer Science Division, Argonne National Laboratory.
....of which corresponds and provides input to a different context of the multithreaded processor model. The cache models have been developed using a customized version of the DineroIII simulation tool (Hill 1987) The SPLASH suit (Sigh et al. 1991) have been used as benchmarks making use of the p4 (Butler and Lusk 1992; Butler and Lusk 1993) synchronization primitives. Each Spy instance monitors and traces a different software thread of a SPLASH benchmark, which is implemented as a UNIX process. Three different classes of synchronization points may be encountered by the software threads of a benchmark, namely ....
Butler R. and E. Lusk. 1992. "User's Guide to the p4 Parallel Programming System", Technical Report ANL-92/17. Mathematics and Computer Science Division, Argonne National Laboratory.(Oct.).
....passing routines as well as operations on sequential and distributed arrays. This library can be ported to different shared memory, distributed memory or shared virtual memory machines. Support for clusters of workstations is available using general message passing systems including PVM [21] P4 [22, 23], PARMACS [24] and MPI. For machines like the Intel Paragon XP S and iPSC 860, the Meiko CS, the Connection Machine CM5, the KSR 1 and the Alliant FX 2800, the implementation achieves greater efficiency by using the native message passing systems. Though ADAPTOR is a powerful tool, it does not ....
R. Butler and E. Lusk, "User's Guide to the P4 Parallel Programming System," Tech. Rep. ANL-92/17, Argonne National Laboratory, October 1992.
....protocols (TCP IP and MPI) because ffl TCP IP: Greater flexibility and user control Higher performance Wide usage and support ffl MPI: Programming ease A widely accepted standard for distributed computing applications. Since our MPI implementation, MPICH, runs on top of P4 [2] which in turn uses TCP IP, a performance comparison of TCP IP and MPI gives a good estimate of the overheads and advantages associated with using the higher level abstraction of MPI as opposed to TCP IP sockets. We also compared three different networks to understand the inherent hardware and ....
Ralph Butler and Ewing Lusk, "User's Guide to the p4 Parallel Programming System", Version 1.3, Argonne National Laboratory, ANL-92/17, August 1993.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC