| B. R. Brooks and M. Hodoscek. Parallelization of CHARMm for MIMD machines. CDA, 7:16-22, Dec. 1992. |
....of ProtoMol [7] a framework for MD that uses encapsulation and generic programming to provide an extensible component platform for parallel algorithms for MD. The emphasis on design and extensibility distinguishes ProtoMol from other excellent MD programs, such as GROMOS [15] Amber [18] CHARMM [2]. A program with similar goals, which provided the initial inspiration for ProtoMol, is NAMD2 [8] However, NAMD 2 s design goal is primarily high scalability, which forces the algorithm developer to consider the parallelization scheme of the program from the outset. We present an approach that ....
....slaves and has both communication and load balancing difficulties. b) The atom decomposition, which is based on data replication; it is an easy but memory expensive approach with poor scaling due to global communication. Programs using this decomposition include GROMOS [15] Amber [18] CHARMM [2], and an early version of EGO. c) The force decomposition, which involves either force matrix or systolic loop methods. It scales better than atom decomposition by reducing communication costs through the use of a block decomposition, as in LAMMPS [13] and CHARMM [6] and as discussed in ....
B. R. Brooks and M. Hodoscek. Parallelization of CHARMm for MIMD machines. CDA, 7:16--22, Dec. 1992.
....difficulties, especially for a large number of processors. Atom decomposition based on data replication is an easy but memoryexpensive approach. It has poor scaling properties due to global communication [91] Programs using this decomposition include UHGromos [22] Amber [121] CHARMM [15], Moldy [98] and an early version of EGO [31] Systolic or hypersystolic loop algorithms [78, 114] are a possible remedy to reduce the memory usage and to improve the scaling. Force decomposition involves either force matrix or systolic loop methods. It scales better than atom decomposition ....
....5 The PROTOMOL Framework MD programs may substantially differ in design or level of implementation, but they are essentially all based on Algorithm 1. The proposed MD component based framework distinguishes PROTOMOL [63] from other well known MD programs, such as GROMOS [119] Amber [121] CHARMM [15]. A program with similar goals, which provided the initial idea behind PROTOMOL, is NAMD2 [66] However, NAMD2 s design goal is primarily high scalability. By revisiting and learning from these MD programs, three main requirements were addressed during the design of the framework: 1. Allow ....
B. R. Brooks and M. Hodoscek. Parallelization of CHARMM for MIMD machines. Chemical Design Automation News, 7:16--22, December 1992.
....Most often, the dynamics of such systems must be studied over several nanoseconds to develop significant understanding of the phenomena being studied. At the same time, the time scale of atomic interactions requires that ome simulates their behavior in time steps as small as one femtosecond (10 15 seconds) In spite of the dramatically increased speeds of individual processors, the number of computational steps required to complete relevant simulations is prohibitive on any single processor computer. Parallel computing provides the potential for making the required large scale simulations ....
....maintaining proxy patches on each processor ensures that data communication is not visible to other components of the program. The generic force computation objects make it easy to add new types of forces, such as those required in steered molecular dynamics or free energy calculations (sections 5.3.1 and 5.3.3) This modular organization is further enhanced by using C , a language that encourages modularity, and Converse, a portable parallel runtime framework that allows multiple parallel paradigms to co exist in the single application. As a result, each module can be written in a ....
[Article contains additional citation context not shown here]
B. R. Brooks and M. Hodoscek. Parallelization of CHARMM for MIMD machines. Chemical Design Automation News, page 16, July 1992.
....O(N 2 ) work to loop over pairs of atoms. In the replicated algorithm, data parallelism is exploited simply by subdividing the pairlist data structures inb and jnb, while replicating the other principal arrays, which includes the coordinate and velocity arrays, x and v, and the forces, f [1, 3, 16]. It is shown in [4] for the replicated algorithm that the ratio of computation to communication is R comp=comm = P airs ave =N = INB ave =P; where forces are accumulated using an O(N ) communication algorithm with 2 Thetalog P messages per process. Thus for fixed INB ave , the cost of ....
Bernard R. Brooks and Milan Hodoscek. Parallelization of CHARMM for MIMD machines. Chemical Design Automation News, 7(12):16--22, 1992.
.... architectures [18, 26, 32] languages and compilers [7, 13, 17, 25] and software systems [10] Efforts to improve molecular dynamics performance include sequential algorithms addressing the pairlist calculation [1, 22, 30] and numerous vectorization [3, 14, 15, 29] and parallelization efforts [4, 5, 6, 11, 12, 23]. A common application of a benchmark uses total time to ascertain the efficacy of an algorithm or computing system. The details about the benchmark that should be considered will vary according to the goals of the study and what is being measured. To compare two parallel molecular dynamics ....
Bernard R. Brooks and Milan Hodoscek. Parallelization of CHARMM for MIMD machines. Chemical Design Automation News, 7(12):16--22, 1992.
....of ProtoMol [7] a framework for MD that uses encapsulation and generic programming to provide an extensible component platform for parallel algorithms for MD. The emphasis on design and extensibility distinguishes ProtoMol from other excellent MD programs, such as GROMOS [15] Amber [18] CHARMM [2]. A program with similar goals, which provided the initial inspiration for ProtoMol, is NAMD2 [8] However, NAMD 2 s design goal is primarily high scalability, which forces the algorithm developer to consider the parallelization scheme of the program from the outset. We present an approach that ....
....among slaves and has both communication and load balancing diculties. b) The atom decomposition, which is based on data replication; it is an easy but memory expensive approach with poor scaling due to global communication. Programs using this decomposition include GROMOS [15] Amber [18] CHARMM [2], and an early version of EGO. c) The force decomposition, which involves either force matrix or systolic loop methods. It scales better than atom decomposition by reducing communication costs through the use of a block decomposition, as in LAMMPS [13] and CHARMM [6] and as discussed in ....
B. R. Brooks and M. Hodoscek. Parallelization of CHARMm for MIMD machines. CDA, 7:16-22, Dec. 1992.
....of this F matrix to different processors. 3 Replicated Data Method The most commonly used technique for parallelizing MD simulations of molecular systems is known as the replicated data (RD) method [24] Numerous parallel algorithms and simulations have been developed based on this approach [5, 8, 9, 16, 17, 18, 22, 25]. Typically, each processor is assigned a subset of atoms and updates their positions and velocities for the duration of the simulation, regardless of where they move in the physical domain. To explain the method, we first define x and f as vectors of length N which store the position and total ....
....atoms can be simulated in 3.53 seconds timestep, about 165 times faster than a C90 processor. Timing results for a macromolecular simulation of myoglobin using the force model of Equations 1 and 2 are shown in Figure 8. This is a prototypical protein benchmark proposed by Brooks et al. [5] who have 9 done extensive testing of a variety of machines with CHARMM for this problem. A 2534 atom myoglobin molecule (with an adsorbed CO) is surrounded by a shell of solvent water molecules for a total of 14,026 atoms. The resulting ensemble is roughly spherical in shape. The benchmark is a ....
[Article contains additional citation context not shown here]
B. R. Brooks and M. Hodoscek. Parallelization of CHARMM for MIMD machines. Chemical Design Automation News, 7:16--22, 1992.
....[13, 33] and is beyond the scope of this paper. The most commonly used technique for parallelizing short range MD simulations of molecular systems is known as the replicated data (RD) method [31] Numerous parallel algorithms and simulations have been developed based on this approach [7, 11, 12, 18, 21, 23, 29, 32]. Typically, each processor stores a copy of all the atom positions in the simulation. It uses this vector of information to compute non bonded forces for the subset of atoms assigned to it. The bonded force computation can be simply parallelized in this scheme, since each processor can compute ....
....to be CHARMM compatible in the sense that it uses the same force equations as CHARMM [6] Since the RD and FD methods both use the same communication primitives, ParBond simply has a switch that partitions the force matrix either by rows or sub blocks as in Figures 1 and 4. Brooks et al. [7] have done extensive benchmarking with CHARMM on a variety of machines with a large prototypical protein simulation. A 2534 atom myoglobin molecule (with an adsorbed CO) is surrounded by a shell of solvent water molecules for a total of 14,026 atoms. The resulting ensemble is roughly spherical in ....
[Article contains additional citation context not shown here]
B. R. Brooks and M. Hodoscek. Parallelization of CHARMM for MIMD machines. Chemical Design Automation News, 7:16--22, 1992.
....i=1 execution time of processor i Table 10: Weighted and Unweighted Binary Dissection v.s. Data Replication Processors Unweighted Weighted Replicated 1 (sec iteration) sec iteration) sec iteration) 16 5.38 4.86 4.95 32 3.51 2.59 2.65 64 2.27 1.35 1.53 128 1.37 0.83 0. 94 1 Results from [4] and Table 10. We present these results to make clear that we have found that our runtime support and optimizations can be, and are, being used to port challenging application codes. These results are comparable to all other implementations of which we are aware [4] We should also note that the ....
....1.37 0.83 0.94 1 Results from [4] and Table 10. We present these results to make clear that we have found that our runtime support and optimizations can be, and are, being used to port challenging application codes. These results are comparable to all other implementations of which we are aware [4]. We should also note that the optimization scalability function for the weighted coordinate bisection load balancing optimization function takes on values of 1.11, 1.35, 1.69 and 1.65 for 16,32,64 and 128 processor respectively. As one would expect, a better load balancing procedure becomes ....
B. R. Brooks and M. Hodoscek. Parallelization of charmm for mimd machines. Chemical Design Automation News, 7:16, 1992.
....of molecular systems. This is because the duplication of information makes for straight forward computation of additional three and four body force terms. Parallel implementations of state of the art biological MD programs such as CHARMM and GROMOS using this technique are discussed in [13, 17]. Force decomposition methods which systolically cycle atom data around a ring or through a grid of processors have been used on MIMD [26, 49] and SIMD machines [16, 57] Other force decomposition methods that use the force matrix formalism we discuss in Sections 3 and 4 have been presented in ....
B. R. Brooks and M. Hodoscek. Parallelization of CHARMM for MIMD machines. Chemical Design Automation News, 7:16--22, 1992.
....effective. While these techniques show great promise for improving the run time of very large scale problems, they do little to help with small to modest sized systems in the range 50K atoms, which is the focus of this paper. Many implementations of parallel molecular dynamics have been developed [4, 6, 8, 9, 14, 19, 20], but few groups have addressed issues related to the use of massively parallel machines with 100K to 1M processors for small to modest size systems. In this paper we address two main issues: a good decomposition method that can take advantage of a massively parallel system and the communication ....
Brooks, B. R. and M. Hodoscek, "Parallelization of CHARMM for MIMD machines," Chemical Design Automation News, 7:16-22, 1992.
....In this project, we used the CHARMM [1] MD program. This code was selected for two reasons. First, CHARMM is one of the most commonly used MD programs. Second, the program had already been parallelized for the iPSC 860 and the GP Paragon supercomputers by Bernie Brooks and Milan Hodoscek of NIH [2]. The parallel version of CHARMM is described in figure 1. CHARMM used a loosely synchronous parallel algorithm based on a replicated data, SPMD algorithm. Each node maintains a copy of the key data structures (e.g. the coordinate and force arrays) and the parallelism is expressed by assigning ....
....of nodes, the performance gain of the Paragon over the iPSC 860 is due to the 25 faster clock on the i860 XP CPU. 4 CHARMM Optimization iPSC 860 CHARMM ran without modification on the Paragon supercomputer. The program used a global communication package optimized for the hypercube architecture [2]. This package worked well on the Paragon supercomputer, but it required a power of two number of nodes. To get away from this restriction and hopefully improve performance, the first optimization was to replace the NIH global communication package with the InterCom package [3] To optimize the ....
B.R. Brooks and M. Hodoscek, "Parallelization of CHARMM for MIMD machines," Chemical Design and Automation News, vol 7, p 16, 1992.
....Thus, for some problems, it is more important to execute many time steps on a modest size problem than few time steps on a large size problem. We analyze the use of current and future MPPs for these modest sized problems. Many implementations of parallel molecular dynamics have been developed [2, 3, 5, 6, 9, 11, 14, 15], but very little work has addressed issues related to the use of machines with 50,000 processors for modest sized problems. In this paper we focus on a fine grained decomposition of molecular dynamics applications that can be parallelized beyond the number of atoms in the system. In particular, ....
Brooks, B. R. and M. Hodoscek, "Parallelization of CHARMM for MIMD machines," Chemical Design Automation News, 7:16--22, 1992.
....by delaying the off processor accumulation until the end of the energy calculation loops. This savings would require a further compromise in the sequential operation ordering, but at least one group has found that this modification does not have an adverse effect on the computational results [4]. These results can be compared with the ones presented by Brooks in [4] Their implementation replicates all the data structure on all the processors. For 64 processors on the Intel iPSC 860, Brooks reports a total time of 1521.1 seconds for 1000 timesteps for the carboxy myoglobin benchmark, ....
....calculation loops. This savings would require a further compromise in the sequential operation ordering, but at least one group has found that this modification does not have an adverse effect on the computational results [4] These results can be compared with the ones presented by Brooks in [4]. Their implementation replicates all the data structure on all the processors. For 64 processors on the Intel iPSC 860, Brooks reports a total time of 1521.1 seconds for 1000 timesteps for the carboxy myoglobin benchmark, this compares to the 368 seconds we required to carry out 100 iterations of ....
B. R. Brooks and M. Hodoscek, Parallelization of charmm for mimd machines, Chemical Design Automation News, 7 (1992), p. 16.
....petaflop performance. This class of machines is likely to be memory limited because of cost. Molecular dynamics is a good application to examine because it has modest memory requirements. Many implementations of parallel molecular dynamics have been developed for the first two classes of MPPs [3, 4, 6, 7, 11, 16, 17], but few groups have addressed issues related to the use of the third class, particularly for small to modest sized problems. In this paper we focus on a fine grained decomposition of the molecular dynamics algorithm that parallelizes beyond the number of atoms in the systems. Traditional ....
Brooks, B. R. and M. Hodoscek, "Parallelization of CHARMM for MIMD machines," Chemical Design Automation News, 7:16-22, 1992.
....ia( and ib( are regenerated in the conditional statement S. Then, since the index translation information stored in local memory can not be reused, the globally indexed data items should be dereferenced whenever the access patterns change. In adaptive applications such as DSMC [3] and CHARMM [5], data access patterns change frequently and irregular data distribution is preferred for better performance over regular data distribution. Thus, minimization of the dereferencing cost is crucial for efficient processing of such applications on distributed memory multicomputers. In such cases, ....
B. R. Brooks and M. Hodoscek. Parallelization of CHARMM for MIMD machines. Chemical Design Automation News, 7, 1992.
....programs are very complicated and computationally intensive. Implementing them on massively parallel processing systems not only reduces execution times but also allows users to solve larger problems. In the past, a number of algorithms have been designed and implemented for MIMD architectures [4]. However, certain difficulties arise when trying to achieve high performance with large numbers of processors, due to the irregular computational structure of MD programs. This paper presents an implementation of the CHARMM program that scales well on MIMD distributed memory machines. This ....
....well on MIMD distributed memory machines. This program is parallelized using a set of efficient runtime primitives called the CHAOS runtime support library [5] Several parallel MD algorithms use the replicated approach, i.e. the entire system s coordinates and forces are stored at each processor [4]. The replicated approach eliminates the overhead in maintaining the data distributions, but it would require more memory space. This prohibits users from simulating larger systems. Furthermore, the method is not scalable for large numbers of processors because each processor has to communicate ....
[Article contains additional citation context not shown here]
B. R. Brooks and M. Hodoscek. Parallelization of charmm for mimd machines. Chemical Design Automation News, 7:16, 1992.
....programs are very complicated and computationally intensive. Implementing them on massively parallel processing systems not only reduces execution times but also allows users to solve larger problems. In the past, a number of algorithms have been designed and implemented for MIMD architectures [4]. However, certain difficulties arise when trying to achieve high performance with large numbers of processors, due to the irregular computational structure of MD programs. This paper presents an implementation of the CHARMM program that scales well on MIMD distributed memory machines. This ....
....well on MIMD distributed memory machines. This program is parallelized using a set of efficient runtime primitives called the CHAOS runtime support library [5] Several parallel MD algorithms use the replicated approach, i.e. the entire system s coordinates and forces are stored at each processor [4]. The replicated approach eliminates the overhead in maintaining the data distributions, but it does require more memory space. This prohibits users from simulating larger systems. Furthermore, the method is not scalable for large numbers of processors because each processor has to communicate all ....
[Article contains additional citation context not shown here]
B. R. Brooks and M. Hodoscek. Parallelization of charmm for mimd machines. Chemical Design Automation News, 7:16, 1992.
....programs are very complicated and computationally intensive. Implementing them on massively parallel processing systems not only reduces execution times but also allows users to solve larger problems. In the past, a number of algorithms have been designed and implemented for MIMD architectures [4]. However, certain difficulties arise when trying to achieve high performance with large numbers of processors, due to the irregular computational structure of MD programs. This paper presents an implementation of the CHARMM program that scales well on MIMD distributed memory machines. This ....
....well on MIMD distributed memory machines. This program is parallelized using a set of efficient runtime primitives called the CHAOS runtime support library [5] Several parallel MD algorithms use the replicated approach, i.e. the entire system s coordinates and forces are stored at each processor [4]. The replicated approach eliminates the overhead in maintaining the data distributions, but it would require more memory space. This prohibits users from simulating larger systems. Furthermore, the method is not scalable for large numbers of processors because each processor has to communicate ....
[Article contains additional citation context not shown here]
B. R. Brooks and M. Hodoscek. Parallelization of charmm for mimd machines. Chemical Design Automation News, 7:16, 1992.
No context found.
B. R. Brooks and M. Hodoscek. Parallelization of CHARMm for MIMD machines. CDA, 7:16-22, Dec. 1992.
No context found.
B. R. Brooks and M. Hodoscek. Parallelization of CHARMm for MIMD machines. CDA, 7:16--22, Dec. 1992.
No context found.
B. R. Brooks and M. Hodoscek. Parallelization of CHARMm for MIMD machines. CDA, 7:16-22, Dec. 1992.
No context found.
B. R. Brooks and M. Hodoscek, `Parallelization of charmm for mimd machines', Chemical Design Automation News, 7, 16 (1992).
No context found.
B. R. Brooks and Milan Hodoscek, Parallelization of CHARMM for MIMD Machines, Chemical Design Automation News 7, 16 (1993).
No context found.
B. R. BrooksandM. Hodoscek,`Parallelization of charmm for mimd machines', Chemical DesignAutomation News, 7, 16, (1992).
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC