Results 1 -
9 of
9
Parallelizing Molecular Dynamics using Spatial Decomposition
- In Scalable High Performance Computing Conference
, 1993
"... Several algorithms have been used for parallel molecular dynamics, including the replicated algorithm and those based on spatial decompositions. The replicated algorithm stores the entire system's coordinates and forces at each processor, and therefore has a low overhead in maintaining the data dist ..."
Abstract
-
Cited by 29 (6 self)
- Add to MetaCart
Several algorithms have been used for parallel molecular dynamics, including the replicated algorithm and those based on spatial decompositions. The replicated algorithm stores the entire system's coordinates and forces at each processor, and therefore has a low overhead in maintaining the data distribution. Spatial decompositions distribute the data, providing better locality and scalability with respect to memory and computation. We present EulerGromos, a parallelization of the Gromos molecular dynamics program which is based on a spatial decomposition. EulerGromos parallelizes all molecular dynamics phases, with most data structures using O(N=P ) memory. This paper focuses on the structure of EulerGromos and analyses its performance using molecular systems of current interest in the molecular dynamics community. EulerGromos achieves performance increases with as few as twenty atoms per processor. We also compare EulerGromos with an earlier parallelization of Gromos, UHGromos, wh...
Evaluating Parallel Languages for Molecular Dynamics Computations
- In Scalable High Performance Computing Conference
, 1992
"... Computational molecular dynamics is an important application requiring large amounts of computing time. Parallel processing offers the possibility of much better performance on scientific computation, but irregular problems like molecular dynamics have proven difficult to map onto parallel machines. ..."
Abstract
-
Cited by 13 (9 self)
- Add to MetaCart
Computational molecular dynamics is an important application requiring large amounts of computing time. Parallel processing offers the possibility of much better performance on scientific computation, but irregular problems like molecular dynamics have proven difficult to map onto parallel machines. In this paper, we describe the practicalities of porting a basic molecular dynamics computation to a distributed-memory machine. In the process, we show how program annotations can aid in parallelizing a moderately complex code. We also argue that algorithm replacement may be necessary in parallelization, a task which cannot be performed automatically. We close with some results from a parallel GROMOS implementation. 1 Introduction The purpose of this paper is to examine the practicalities of parallelizing the basic algorithms of molecular dynamics for distributed-memory multiprocessors using annotations to sequential Fortran programs. This set of algorithms represents an important class o...
Parallelization Strategies for a Molecular Dynamics Program
- In Intel Supercomputer University Partners Conference, Timberline
, 1992
"... A molecular-dynamics program typically takes several man-years to write and therefore is representative for a large class of scientific programs whose rewriting should not be taken lightly. This paper discusses two Intel hypercube adaptations, UHGROMOS and EulerGROMOS (in progress), of a "dusty-deck ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
A molecular-dynamics program typically takes several man-years to write and therefore is representative for a large class of scientific programs whose rewriting should not be taken lightly. This paper discusses two Intel hypercube adaptations, UHGROMOS and EulerGROMOS (in progress), of a "dusty-deck" moleculardynamics code, GROMOS. UHGROMOS uses a low-impact parallelization strategy to minimize modifications to GROMOS. In UHGROMOS, the nonbonded force computation, which usually accounts for at least 90% of the computation time in GROMOS, is the focus for parallelization. This simple approach results already in acceptable performance for typical applications and parallelcomputer resources. However, the lack of spatial locality in data distribution limits overall processor utilization. To overcome this limitation, EulerGROMOS uses a spatial decomposition which enhances locality and permits the design of scalable data-structures and algorithms for all parts of GROMOS. The two implementati...
Handling Irregular Problems with Fortran D - A Preliminary Report
- In Proceedings of the Fourth Workshop on Compilers for Parallel Computers
, 1993
"... Compiling irregular applications written in a data parallel, HPF-like language presents a challenging problem of growing importance. A project addressing this problem is the extension of the Fortran D compiler at Rice University to handle such codes. Generality and robustness have been major design ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Compiling irregular applications written in a data parallel, HPF-like language presents a challenging problem of growing importance. A project addressing this problem is the extension of the Fortran D compiler at Rice University to handle such codes. Generality and robustness have been major design objectives throughout this extension, allowing for arbitrary control flow and irregular accesses to multidimensional arrays. Even though this project is still in progress, it can already handle real-world codes fairly well, such as the non-bonded force calculation routine which is critical to molecular dynamics. This paper is a first report on the experiences gained from extending the Fortran D compiler for irregular problems. Since the theoretical background underlying this project has already been described to some degree in previous publications, this paper focuses on the practical aspects of the implementation. 1 Introduction Several research projects have aimed at providing a "machine...
Value-Based Distributions in Fortran D: A Preliminary Report
- Center for Research on Parallel Computation, Rice University
, 1993
"... Compiling irregular applications written in a data-parallel, High Performance Fortran-like language presents a challenging problem of growing importance. One principal difficulty with irregular problems is the general lack of access locality when distributing data and computation naively, based on s ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Compiling irregular applications written in a data-parallel, High Performance Fortran-like language presents a challenging problem of growing importance. One principal difficulty with irregular problems is the general lack of access locality when distributing data and computation naively, based on simple functions of array and iteration indices. To address this problem, we extend classical, index-based data distributions to value-based distributions. This type of distribution allows the programmer to conveniently express for example the locality characteristics of an underlying physical problem. This gives the compiler an opportunity to improve both inter- and intra-processor locality, resulting in better performance and scalability, even when this locality is not inherent in the original data structures and access patterns. This paper reports on the experience gained from implementing value-based distributions in a Fortran 77D compiler prototype. We propose a natural extension of inde...
Parallel Molecular Dy...
, 10
"... . We describe the results of a preliminary port of a large molecular dynamics (MD) code, gromos [15], to a distributed--memory parallel computer. The objectives of this study were three fold. First, we wanted to assess the suitability of our software tools [14] for porting existing Fortran codes (du ..."
Abstract
- Add to MetaCart
. We describe the results of a preliminary port of a large molecular dynamics (MD) code, gromos [15], to a distributed--memory parallel computer. The objectives of this study were three fold. First, we wanted to assess the suitability of our software tools [14] for porting existing Fortran codes (dusty decks). This involved developing Fortran versions of previously developed parallel extensions to the C programming language. Secondly, we wanted to be able to quantify various components of an MD simulation with respect to communication and computation costs. Here the objective was to have timing data that would aid in the design of a scalable code that could execute efficiently on a large number of processors. Finally, we wanted to see if execution speeds comparable to a Cray could be achieved easily with a modest number of current--generation nodes. Such performance seemed feasible based on a rough assessment of current processor performance and communication speed. One of our principa...
Parallel GROMOS Users Guide
"... this document. The main point is that BALJNB3 has the same O(P) work as BALJNB, but it is less precise, although tolerably so. For greater precision in partitioning the nonbonded force calculation with NTN=2 (remember NTN=1 is almost exact), the user can use the subroutine BALJNB4. BALJNB4 performs ..."
Abstract
- Add to MetaCart
this document. The main point is that BALJNB3 has the same O(P) work as BALJNB, but it is less precise, although tolerably so. For greater precision in partitioning the nonbonded force calculation with NTN=2 (remember NTN=1 is almost exact), the user can use the subroutine BALJNB4. BALJNB4 performs a global communication of JNB permitting a better attempt at load balancing of this charge-group based storage scheme. BALJNB4 requires that storage be provided for the entire JNB at each processor, a considerable drawback. The distribution of GROMOS comes with BALJNB3 in place for NTN=2; this routine has been found satisfactory with less overhead than BALJNB4. To substitute BALJNB4 for BALJNB3, edit the file force.pf and recompile. (For a further discussion of load-balancing techniques see [2, 6].) 5 LOAD BALANCING THE NONBONDED FORCE CALCULATION 7 0 1 2 3 4 5 6 7 8 9 10 x10 4 0 2 4 6 8 10 12 14 16 18 timestep
------ Users Guide
"... this document. The main point is that BALJNB3 has the same O(P) work as BALJNB, but it is less precise, although tolerably so. For greater precision in partitioning the nonbonded force calculation with NTN=2 (remember NTN=1 is almost exact), the user can use the subroutine BALJNB4. BALJNB4 performs ..."
Abstract
- Add to MetaCart
this document. The main point is that BALJNB3 has the same O(P) work as BALJNB, but it is less precise, although tolerably so. For greater precision in partitioning the nonbonded force calculation with NTN=2 (remember NTN=1 is almost exact), the user can use the subroutine BALJNB4. BALJNB4 performs a global communication of JNB permitting a better attempt at load balancing of this charge-group based storage scheme. BALJNB4 requires that storage be provided for the entire JNB at each processor, a considerable drawback. The distribution of GROMOS comes with BALJNB3 in place for NTN=2; this routine has been found satisfactory with less overhead than BALJNB4. To substitute BALJNB4 for BALJNB3, edit the file force.pf and recompile. (For a further discussion of load-balancing techniques see [2, 6].) 5 LOAD BALANCING THE NONBONDED FORCE CALCULATION 7 0 1 2 3 4 5 6 7 8 9 10 x10 4 0 2 4 6 8 10 12 14 16 18 timestep
Parallel Molecular Dynamics
- In Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing
, 1992
"... . We describe the results of a preliminary port of a large molecular dynamics (MD) code, gromos [15], to a distributed--memory parallel computer. The objectives of this study were three fold. First, we wanted to assess the suitability of our software tools [14] for porting existing Fortran codes (du ..."
Abstract
- Add to MetaCart
. We describe the results of a preliminary port of a large molecular dynamics (MD) code, gromos [15], to a distributed--memory parallel computer. The objectives of this study were three fold. First, we wanted to assess the suitability of our software tools [14] for porting existing Fortran codes (dusty decks). This involved developing Fortran versions of previously developed parallel extensions to the C programming language. Secondly, we wanted to be able to quantify various components of an MD simulation with respect to communication and computation costs. Here the objective was to have timing data that would aid in the design of a scalable code that could execute efficiently on a large number of processors. Finally, we wanted to see if execution speeds comparable to a Cray could be achieved easily with a modest number of current--generation nodes. Such performance seemed feasible based on a rough assessment of current processor performance and communication speed. One of our principa...

