| T. W. Clark, J. A. McCammon, and L. R. Scott. Parallel molecular dynamics. In Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing, pages 338--344, Houston, March 1991. SIAM. Available via anonymous ftp from hpc.uh.edu as pub/articles/siam91. |
....of this F matrix to different processors. 3 Replicated Data Method The most commonly used technique for parallelizing MD simulations of molecular systems is known as the replicated data (RD) method [24] Numerous parallel algorithms and simulations have been developed based on this approach [5, 8, 9, 16, 17, 18, 22, 25]. Typically, each processor is assigned a subset of atoms and updates their positions and velocities for the duration of the simulation, regardless of where they move in the physical domain. To explain the method, we first define x and f as vectors of length N which store the position and total ....
....faster on the Paragon, the two sets of RD timings (filled and open squares) are similar. Both curves show a marked roll off in parallel efficiency above 64 128 processors due to the poor scaling of the expand and fold operations. This is typical of the results reported in references [8, 9, 16, 17, 18, 22, 25] for RD implementations of other macromolecular codes such as CHARMM, AMBER, and GROMOS on a variety of parallel machines. Parallel efficiencies as low as 10 15 on a few dozens to hundreds of processors are reported and in some cases the overall speed up is even reduced as more processors are ....
T. W. Clark, J. A. McCammon, and L. R. Scott. Parallel molecular dynamics. In Proc. 5th SIAM Conference on Parallel Processing for Scientific Computing, pages 338--344. SIAM, 1992.
....[13, 33] and is beyond the scope of this paper. The most commonly used technique for parallelizing short range MD simulations of molecular systems is known as the replicated data (RD) method [31] Numerous parallel algorithms and simulations have been developed based on this approach [7, 11, 12, 18, 21, 23, 29, 32]. Typically, each processor stores a copy of all the atom positions in the simulation. It uses this vector of information to compute non bonded forces for the subset of atoms assigned to it. The bonded force computation can be simply parallelized in this scheme, since each processor can compute ....
.... of CHARMM for a 32 processor Intel iPSC 860 [23] of a CHARMM like code with long range forces for a 24 processor transputer machine [18] of AMBER for a 128 processor nCUBE [12] and 512 processor Fujitsu AP1000 [29] of GROMOS for a 64 processor nCUBE and 128 processor Intel iPSC 860 [11], and of general molecular simulation codes for a 64 processor Intel iPSC 860 [32] and IBM workstation cluster [21] All of these efforts show reduced parallel efficiencies as more processors are used due to the scaling problems inherent in the RD approach. Depending on the parallel machine and ....
[Article contains additional citation context not shown here]
T. W. Clark, J. A. McCammon, and L. R. Scott. Parallel molecular dynamics. In Proc. 5th SIAM Conference on Parallel Processing for Scientific Computing, pages 338--344. SIAM, 1992.
....must communicate with many surrounding processors to acquire needed information. The extra communication results in lower parallel efficiencies. For these reasons, particle decomposition methods have been the method of choice in organic MD simulation codes that have been parallelized to date [5, 7, 12]. They have the additional advantage that the extra 2 , 3 , and 4 body forces that must be computed in organic simulations within the topology of the molecules are easily divided among the processors in a load balanced fashion because each processor knows the positions of all atoms. Recently, ....
T. W. Clark, J. A. McCammon, and L. R. Scott. Parallel molecular dynamics. In Proc. 5th SIAM Conference on Parallel Processing for Scientific Computing, pages 338--344. SIAM, 1992.
....of molecular systems. This is because the duplication of information makes for straight forward computation of additional three and four body force terms. Parallel implementations of state of the art biological MD programs such as CHARMM and GROMOS using this technique are discussed in [13, 17]. Force decomposition methods which systolically cycle atom data around a ring or through a grid of processors have been used on MIMD [26, 49] and SIMD machines [16, 57] Other force decomposition methods that use the force matrix formalism we discuss in Sections 3 and 4 have been presented in ....
T. W. Clark, J. A. McCammon, and L. R. Scott. Parallel molecular dynamics. In Proc. 5th SIAM Conference on Parallel Processing for Scientific Computing, pages 338--344. SIAM, 1992.
....effective. While these techniques show great promise for improving the run time of very large scale problems, they do little to help with small to modest sized systems in the range 50K atoms, which is the focus of this paper. Many implementations of parallel molecular dynamics have been developed [4, 6, 8, 9, 14, 19, 20], but few groups have addressed issues related to the use of massively parallel machines with 100K to 1M processors for small to modest size systems. In this paper we address two main issues: a good decomposition method that can take advantage of a massively parallel system and the communication ....
Clark, T. W., J. A. McCammon, and L. R. Scott, "Parallel Molecular Dynamics." Proc. 5th SIAM Conference on Parallel Processing for Scientific Computing, 338-344. SIAM, 1992.
....simulation of biological molecules (and arbitrary molecules) using molecular dynamics or stochastic dynamics. In addition, energy minimization and analysis programs are provided. The molecular dynamics program, GROMOS, has been parallelized for the distributed memory architecture of the iPSC 860 [2, 3]. Hereafter, GROMOS refers to the molecular dynamics program unless context indicates otherwise; the parallel version of GROMOS refers to the distributed memory version. The nonbonded force calculation and pairlist generation routines, which account for at least 80 of the computation, were ....
Terry W. Clark, J. A. McCammon, and L. Ridgway Scott. Parallel molecular dynamics. Technical Report 101, Dept. of Mathematics, University of Houston, Univ. of Houston, Houston TX 77204-3476, November 1991.
....simulation of biological molecules (and arbitrary molecules) using molecular dynamics or stochastic dynamics. In addition, energy minimization and analysis programs are provided. The molecular dynamics program, GROMOS, has been parallelized for the distributed memory architecture of the iPSC 860 [2, 3]. Hereafter, GROMOS refers to the molecular dynamics program unless context indicates otherwise; the parallel version of GROMOS refers to the distributed memory version. The nonbonded force calculation and pairlist generation routines, which account for at least 80 of the computation, were ....
....The efficacy of load balancing influences the maximum number of nonbonded pairs managed by a processor. The initial distribution of nonbonded pairs over processors is based on the iteration space of the triangular loop used for the pairlist construction (Subroutines Npbal and Nbpmlx) [4, 2]. This approximate, geometry dependent distribution is less than satisfactory when a long range force is calculated in the pairlist generation subroutine; this initial distribution serves as the starting point for an iterative scheme. 5 LOAD BALANCING THE NONBONDED FORCE CALCULATION 5 As a ....
[Article contains additional citation context not shown here]
Terry W. Clark, J. A. McCammon, and L. Ridgway Scott. Parallel molecular dynamics. In Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing, Houston, TX, March 1991.
....Thus, for some problems, it is more important to execute many time steps on a modest size problem than few time steps on a large size problem. We analyze the use of current and future MPPs for these modest sized problems. Many implementations of parallel molecular dynamics have been developed [2, 3, 5, 6, 9, 11, 14, 15], but very little work has addressed issues related to the use of machines with 50,000 processors for modest sized problems. In this paper we focus on a fine grained decomposition of molecular dynamics applications that can be parallelized beyond the number of atoms in the system. In particular, ....
Clark, T. W., J. A. McCammon, and L. R. Scott, "Parallel Molecular Dynamics," Proc. 5th SIAM Conference on Parallel Processing for Scientific Computing, 338--344. SIAM, 1992.
....first is the development of efficient and inherently parallelizable algorithms to do the inter particle force calculations, which happens to consume bulk of the computation time in these codes. A number of parallel algorithms has been designed and implemented for both SIMD and MIMD architectures [7, 1, 8, 9]. The second is an implementation issue. Most often, the programmer is required to explicitly distribute large arrays over multiple local processor memories, and keep track of which portions of each array reside on which processors. In order to access a given element of a distributed array, ....
J. A. M. Terry W. Clark and L. R. Scott, Parallel molecular dynamics, in Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing, Houston, TX., March 1991.
....petaflop performance. This class of machines is likely to be memory limited because of cost. Molecular dynamics is a good application to examine because it has modest memory requirements. Many implementations of parallel molecular dynamics have been developed for the first two classes of MPPs [3, 4, 6, 7, 11, 16, 17], but few groups have addressed issues related to the use of the third class, particularly for small to modest sized problems. In this paper we focus on a fine grained decomposition of the molecular dynamics algorithm that parallelizes beyond the number of atoms in the systems. Traditional ....
Clark, T. W., J. A. McCammon, and L. R. Scott, "Parallel Molecular Dynamics," Proc. 5th SIAM Conference on Parallel Processing for Scientific Computing, 338-344. SIAM, 1992.
....to speed up the simulation. An example of this is in the biological molecular dynamics (MD) community where many body calculations are used to atomistically simulate bonded molecular systems such as polymers and proteins. Several recent parallel implementations of state of the art MD codes [6, 10, 17, 21] have all used some kind of particle decomposition technique because of the limitations of spatial decompositions discussed above. Unfortunately, these parallel implementations exhibit poor scaling when P becomes large due to the cost of the all to all communication step. To alleviate these ....
T. W. Clark, J. A. McCammon, and L. R. Scott. Parallel molecular dynamics. In Proc. 5th SIAM Conf. on Parallel Processing for Scientific Computing, pages 338--344. SIAM, 1992.
....structures, common to older Fortran programs, is also apparent in Gromos. 3 UHGROMOS Over the past several years, a parallel implementation of promd has been developed by Clark, McCammon, Scott, and v. Hanxleden of the University of Houston and the Texas Center for Advanced Molecular Computation [2, 4, 3]. Their work has resulted in two parallel implementations: UHGromos and EulerGromos. These implementations differ primarily in the approach to problem decomposition; UHGromos uses an atom based decomposition approach with replicated data structures while EulerGromos employs a spatial ....
T. W. Clark, J. A. McCammon, and L. R. Scott. Parallel molecular dynamics. In Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing, Houston, TX, March 1991.
....to speed up the simulation. An example of this is in the biological molecular dynamics (MD) community where many body calculations are used to atomistically simulate bonded molecular systems such as polymers and proteins. Several recent parallel implementations of state of the art MD codes [6, 7, 11, 18, 22] have all used some kind of particle decomposition technique because of the limitations of spatial decomposition methods discussed above. Unfortunately, these parallel implementations all exhibit poor scaling when P becomes large due to the cost of the all to all communication step. In this ....
....processors and 3 3.5 times faster on 1024 processors than does the particle decomposition algorithm. As mentioned in x2, the dramatic roll off in the particle decomposition timings is typical of the performance degradation seen in other parallel implementations of this kind of MD simulation [6, 7, 11, 18, 22] as P increases and the communication portion of the algorithm begins to dominate. The loss of efficiency in both algorithms is more pronounced on the Intel Paragon because of its 2 D mesh architecture. As discussed in x2, systems like the liquid crystal simulation presented here are difficult ....
T. W. Clark, J. A. McCammon, and L. R. Scott. Parallel molecular dynamics. In Proc. 5th SIAM Conf. on Parallel Processing for Scientific Computing, pages 338--344. SIAM, 1992.
....O(N 2 ) work to loop over pairs of atoms. In the replicated algorithm, data parallelism is exploited simply by subdividing the pairlist data structures inb and jnb, while replicating the other principal arrays, which includes the coordinate and velocity arrays, x and v, and the forces, f [1, 3, 16]. It is shown in [4] for the replicated algorithm that the ratio of computation to communication is R comp=comm = P airs ave =N = INB ave =P; where forces are accumulated using an O(N ) communication algorithm with 2 Thetalog P messages per process. Thus for fixed INB ave , the cost of ....
T. W. Clark, J. A. McCammon, and L. R. Scott. Parallel molecular dynamics. In Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing, pages 338--344, Houston, March 1991. SIAM. Available via anonymous ftp from hpc.uh.edu as pub/articles/siam91.
....the data structures representing the pair list tend to be the most space consuming in the program. Within a time step, the computation for each atom is independent from the computation for all other atoms and therefore is inherently parallel. 4 We base this report on the replicated approach [3], where we distribute the pairlist data structures, inb and jnb, while replicating the other principal arrays, which includes the coordinate and velocity arrays, x and v, and the forces, f. For studys on more aggressive distributions we refer the interested reader to the literature [2] 2.1 ....
T. W. Clark, J. A. McCammon, and L. R. Scott. Parallel molecular dynamics. In Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing, pages 338--344, Houston, March 1991. SIAM. Available via anonymous ftp from kacha.chem.uh.edu as pub/articles/siam91.
....with data decompositions propagated by programmer supplied control structures. This methodology, while crude relative to globally viewed, automatic decompositions [2] can provide greater exibility. Moreover, industrial strength codes have been ported to IPfortran in a short amount of time [1]. With the IPfortran programming model, the logical processor number pinpoints the data item in an expression through the use of the (Section 3) where i p represents a di erent variable from i q, if p 6= q. 1 Computational workload translates to loop iterations in the targeted technical ....
....in IPfortran. Each processor will compute a collection of the iterations, ISTART p I IEND p. In the following code, we will assume this information is in the arrays ISTART(p) and IEND(p) known at all processors. The number may depend on p due to the triangular shape of the double loop [1]. Each processor will have a (local) copy of part of the the JNB array. In fact, the original code can be used, the only exception being that it may be preferable to keep the local INB array distinct. C Pairlist computation: Pfortran version J0 = 0 JJ = 0 for I = ISTART(myProc) IEND(myProc) ....
[Article contains additional citation context not shown here]
T. W. Clark, J. A. McCammon, and L. R. Scott. Parallel molecular dynamics. In Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientic Computing, pages 338-344, Houston, March 1991. SIAM. Available via anonymous ftp from hpc.uh.edu as pub/articles/siam91.
....and writing a parallel program from scratch. Our examples will illustrate the need for modification of algorithms to achieve scalability. The remainder of this paper is organized as follows. Section 2 describes the GROMOS code [10] a standard molecular dynamics program that we are parallelizing [4]. Section 3 gives a short description of the parallel languages used, IPfortran and Fortran D. Sections 4, 5, and 6 each describe the parallelization of one phase of GROMOS. Section 7 gives some performance results, andin Section 8 we discuss some lessons learned from this project. 2 Molecular ....
....equations of motion to determine the new atomic momenta and positions, 3. save data as appropriate for post analysis. The pairwise, nonbonded interactions dominate the computation with O(N 2 ) time complexity [5] and therefore are key considerations in both the model and its implementation [4]. The molecular dynamics program used in this study is from the GROMOS (GROningen MOlecular Simulation) suite designed for the dynamic modeling of biomolecules [10] GROMOS provides programs for the simulation of biological molecules (and arbitrary molecules) using molecular dynamics or stochastic ....
[Article contains additional citation context not shown here]
Terry W. Clark, J. A. McCammon, and L. Ridgway Scott. Parallel molecular dynamics. In Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing, Houston, TX, March 1991.
....with data decompositions propagated by programmer supplied control structures. This methodology, while crude relative to globallyviewed, automatic decompositions [CHKKS92] can provide greater flexibility. Moreover, industrialstrength codes have been ported to IPfortran in a short amount of time [CMS91]. With the IPfortran programming model, the logical processor number pinpoints the data item in an expression through the use of the (Section 2) where i p represents a different variable from i q, if p 6= q. Since each Fortran program operates on its own memory the question arises, what can be ....
Terry W. Clark, J. Andrew McCammon and L. Ridgway Scott, "Parallel Molecular Dynamics, " Proc. Fifth SIAM Conf. on Parallel Proc. for Sci. Comp., 1991, to be published by SIAM.
....: k. 5. Wave form relaxation In simulating molecular dynamics on parallel computers, an entirely different orientation may be appropriate. The conventional approach parallelizes the work to be done at each time step, as in (6) with the end of one time step being a global synchronization point [6, 18]. Wave form relaxation instead computes entire trajectories in parallel before synchronization occurs [26, 34] The idea is as follows, presented without numerical discretization. Suppose the ordinary differential equation to be integrated is u 0 = f(u) where u denotes a vector, u = u 1 ; ....
T.W. Clark, J.A. McCammon & L.R. Scott, Parallel Molecular Dynamics, Proc. Fifth SIAM Conf. on Parallel Proc. for Sci. Comp., J. Dongarra et al. ed's, Philadelphia: SIAM, 1992, 338--344.
....to distribute JNB in such a way that the cumulative lengths of the parts JNB assigned to each processor are roughly equal. This is unlikely to be the case immediately after computing INB and JNB. A global accumulation of INB provides each processor a complete picture of the total work load [5]. The balanced work load can be visualized as JNB divided into groups of contiguous JNB segments consisting of the neighbors of blocks of atoms, with the lengths of the segments approximately the same for each processor [4, 5] In the molecular system described in Section 5, each atom typically ....
.... of INB provides each processor a complete picture of the total work load [5] The balanced work load can be visualized as JNB divided into groups of contiguous JNB segments consisting of the neighbors of blocks of atoms, with the lengths of the segments approximately the same for each processor [4, 5]. In the molecular system described in Section 5, each atom typically has on the order of 100 neighbors. However, just as with pairlist generation with a long range force calculation, it is hard to predict how imbalanced the load could be before balancing is done. Once the load is balanced, the ....
T. W. Clark, J. A. McCammon, and L. R. Scott. Parallel molecular dynamics. In Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing, Houston, TX, March 1991.
....and writing a parallel program from scratch. Our examples will illustrate the need for modification of algorithms to achieve scalability. The remainder of this paper is organized as follows. Section 2 describes the GROMOS code [10] a standard molecular dynamics program that we are parallelizing [6]. Section 3 gives a short description of the parallel languages used, IPfortran and Fortran D. Sections 4, 5, and 6 each describe the parallelization of one phase of GROMOS. Section 7 gives some performance results, followed by conclusions in Section 8. 2 Molecular dynamics First developed for ....
....bonded and nonbonded forces on each atom as the analytical gradient of a potentialenergy function of the atom positions. The pairwise, nonbonded interactions dominate the computation with O(N 2 ) time complexity [5] and therefore are key considerations in both the model and its implementation [6], see Sections 4 and 5. 2. Integrate Newton s equations of motion to determine the new atomic momenta and positions. By removing uninteresting, high frequency motions, larger timesteps can be taken resulting in a more efficient computer utilization [19, 20] This usually involves the constraining ....
[Article contains additional citation context not shown here]
T. W. Clark, J. A. McCammon, and L. R. Scott. Parallel molecular dynamics. In Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing, Houston, TX, March 1991.
....if n d becomes larger, the overhead associated with a traversal of the subboxes to locate the atoms increases. We also use our subbox structure to limit our search for nonbonded interaction partners of a given atom, which allows us to avoid the naive O(N 2 ) pairlist generation algorithm [4, 21]. For that purpose it is advantageous if box d is an integral fraction of R cut [17] The hierarchical decomposition should also be able to balance the workload for the trivial case of a system with constant density. Therefore it must be possible to create subdomains of equal size; for all d, n d ....
....to computation ratio. Load imbalance is another effect detracting from ideal speedup when decreasing the atom to processor ratio. We did not use load balancing in these runs. 3.5 EulerGromos vs. UHGromos We are also interested in how EulerGromos performs relative to its cousin, UHGromos [4]. UHGromos is a parallelization of Gromos using the replicated algorithm [4, 21] The replicated algorithm replicates the full force and coordinate array at each processor. A global sum of the forces is required at every timestep due to a lack of locality [5] In Figure 8, the total time for ....
[Article contains additional citation context not shown here]
T. W. Clark, J. A. McCammon, and L. R. Scott. Parallel molecular dynamics. In Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing, pages 338--344, Houston, TX, March 1992.
No context found.
T. W. Clark, J. A. McCammon, and L. R. Scott. Parallel molecular dynamics. In Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing, Houston, TX, March 1991.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC