| W. Rankin, J. Board, A portable distributed implementation of the parallel multipole tree algorithm, IEEE Symposium on High Performance Distributed Computing[Duke University Technical Report 95-002]. |
....p 2 : 44) A detailed error estimation of MG is given in [107] In [13, pp. 6 31] the performance is compared with the direct method for a 2dimensional system of charges and dipoles. In [107] MG is compared against the fast multi pole method that is implemented in the parallel program DPMTA [95] for water systems. As expected, experiments from [107] show that for s 1, MG converges to the direct method and the error drops to zero monotonously. The work increases monotonously with s, until s is large enough to encompass all pairs and remains constant. Furthermore, it was indicated in ....
.... expansion utilizing the multi pole expansion of the distant sub cell (local expansion) Then the contributions are passed from parent cells to the children sub cells (downward pass) FMM has been implemented for different MD applications [11, 48, 73] and is especially suited for parallelization [93, 95, 96, 123]. However, optimized FMM codes tend to be rather elaborate. More importantly, they do not conserve energy during MD simulations unless enforcing unusual high accuracy, e.g. 12th order multi poles [10, 128] Typically, the electrostatic problem comes with usually small distributions of mono pole ....
W. Rankin and J. Board. A portable distributed implementation of the parallel multipole tree algorithm. IEEE Symposium on High Performance Distributed Computing, 1995. [Duke University Technical Report 95-002].
....have been conducted using it. NAMD relies on the message driven communication model provided by Converse to adaptively overlap computation with communication. It also uses the multilingual capabilities of Converse to incorporate the DPMTA PVM based full electrostatics library from Duke University [11]. The current version of NAMD is optimized to run e#ciently on dedicated parallel machines. This required optimizing the communication so that each timestep can be completed in less than a second on machines with hundreds of processors. The parallel molecular dynamics algorithm has the following ....
W. Rankin and J. Board. A portable distributed implementation of the parallel multipole tree algorithm. IEEE Symposium on High Performance Distributed Computing, 1995. [Duke University Technical Report 95-002].
....X PLOR coordinate and molecular structure files. In addition to non periodic simulations, NAMD2 can use periodic boundary conditions over any combination of the three coordinate axes. It performs cuto# simulations or full electrostatic simulations employing multiple timestepping using the DPMTA [40] or DPME [46] libraries. NAMD2 can connect to VMD [18] the visualization component of MDSCOPE [30] to allow monitoring of and interaction with ongoing simulations. We will first describe the core structure of NAMD2. The specific algorithms and interesting data structures used for force ....
....adapts 18 itself to the communication architecture. Our benchmarks above describe only cut o# electrostatics performance of NAMD2. The longrange interactions in NAMD can be computed using the DPMTA library developed at Duke University. The parallel performance of DPMTA has been documented in [40] The performance of the long range interactions phase is orthogonal to that of the rest of NAMD2. In the future, we plan to experiment with other methods for long range electrostatics, such as particle particle particle mesh methods, and with tighter integration of such libraries to improve ....
[Article contains additional citation context not shown here]
W. Rankin and J. Board. A portable distributed implementation of the parallel multipole tree algorithm. IEEE Symposium on High Performance Distributed Computing, 1995. In press.
....calculations consume a small fraction of the total computation time, particularly when combined with multiple timestepping methods, but their contribution to scalability must still be addressed. The parallelization of these methods is the subject of ongoing research by ourselves and others [14, 15, 16]. The next section describes a general methodology that we are developing for e#ective parallelization of dynamic and irregular computations, and how it is supported by the Charm parallel programming system. Section 3 briefly explains the basic parallel structure and algorithm used in NAMD, ....
W. Rankin and J. Board. A portable distributed implementation of the parallel multipole tree algorithm. IEEE Symposium on High Performance Distributed Computing, 1995. [Duke University Technical Report 95-002].
....simulating the electrostatic and van der Waals forces between all pairs of atoms. NAMD, like most other molecular dynamics program, uses a cutoff radius for computing non bonded forces (although it also allows the user to perform full range electrostatics simulations computation using DPMTA library[9]) Two decomposition schemes are usually employed to parallelize molecular dynamics simulations: force decomposition and spatial decompositon. In the more commonly used scheme, force decomposition, a list of atom pairs is distributed evenly among the processors, and each processor computes forces ....
W. Rankin and J. Board. A portable distributed implementation of the parallel multipole tree algorithm. IEEE Symposium on High Performance Distributed Computing, 1995. [Duke University Technical Report 95-002].
....simulation steps, the long range forces are still a major component of the computation time. A number of researchers have worked on more efficient N body solvers for both chemical and astronomical simulations. For NAMD, we chose an efficient implementation of the fast multipole algorithm in DPMTA [14] library developed by researchers at Duke University. NAMD was originally designed as a message driven program [4] The complex dependencies between each patch and its 26 neighbors made the overlapping of communicationand computation provided by a message driven design attractive. Charm [9] was ....
W. Rankin and J. Board. A portable distributed implementation of the parallel multipole tree algorithm. IEEE Symposium on High Performance Distributed Computing, 1995. [Duke University Technical Report 95-002].
....of the program, NAMD 1, was built using a messagedriven design. Two variants of the program were initially developed. One variant used Charm , which provided support for straightforward expression of the message driven design. The other variant used PVM in order to allow us to use the DPMTA [20] library developed by collaborators at Duke University. The DPMTA library provides efficient long range electrostatic force computation, which is necessary for some simulations. The PVM variant of NAMD originally had a message driven design, but new features tended to be added around rather than ....
W. Rankin and J. Board. A portable distributed implementation of the parallel multipole tree algorithm. IEEE Symposium on High Performance Distributed Computing, 1995. [Duke University Technical Report 95-002].
....in particular, has attracted considerable attention because the running time is proportional to n. However, implementation of the fast multipole method requires considerable care and analysis, so only sophisticated implementations are able to achieve the order n running time. Board and coworkers [32, 31, 40] have developed several sequential and parallel packages for computing electrostatic force fields and potentials using fast multipole algorithms. These implementations have been done with considerable care but are geared to molecular dynamics simulations where it is reasonable to assume a uniform ....
W. T. Rankin and J. A. Board, A portable distributed implementation of the parallel multipole tree algorithm, Technical report 95-002, Department of Electrical Engineering, Duke University, Durham, North Carolina, 1995.
....Cray T3D [9] a distributed memory machine with support for one sided remote put get memory operations. We use Illinois Fast Messages (FM) 22, 26] for our message passing implementation. We selected these applications because they rely on sophisticated PBDSs for efficiency, and previous studies [3, 14, 29, 34, 35, 37] show that locality optimizations are crucial to achieve good performance. Our codes are in ICC and are adapted from the SPLASH 2 version. We concentrate on the force computation phasesbecause they dominate the sequential execution time (95 for Barnes Hut and 90 for FMM) and exhibit ample ....
....of access hoisting. Although configurations differ, our speedup of over 42 on 64 nodes for Barnes Hut is competitive with other studies both on MPPs [3, 14] and on shared memory architectures [34] and the speedup of FMM, 54 fold on 64 nodes, also compares favorably with other implementations [29, 35]. 6 Discussion and Related Work The goal of DPA is to generalize loop and array oriented tiling [1, 4, 24, 32] and communication optimizations [20] to pointerbased computations. Although developed independently, our use of non blocking threads labeled by pointers is similar to the recent cache ....
William T. Rankin and John A. Board Jr. A portable distributed implementation of the parallel multiple tree algorithm. Technical Report 95-002, Duke University, Department of Electrical Engineering, 1995.
.... modifiability, portability, and compatibility with X PLOR (a program for determining threedimensional structures from crystallographic diffraction or NMR data) Full electrostatics are computed using the Distributed Parallel Multipole Tree Algorithm (DPMTA) developed at Duke University [13]. NAMD is written in C , using an object oriented and highly modular design. This design facilitates modification of algorithms and techniques. Communication in NAMD is accomplished via PVM, making it portable across a wide range of computing platforms. The input and output file formats used by ....
W. T. Rankin and J. A. Board Jr., A portable distributed implementation of the parallel multipole tree algorithm, in Proceedings of the Fourth IEEE International Symposium on High Performance Distributed Computing, Los Alamitos, CA, 1995, IEEE Computer Society Press, pp. 17--22.
....Increase Number of Processors DPMTA Scaling on Cray T3D C90 Ideal DPMTA Fig. 3. Performance and scaling behavior of D PMTA on the Cray T3D when simulating 100,000 particles. a message passing code which runs both on workstation clusters and on tightly coupled machines such as the Cray T3D [9]. Figure 3 shows the parallel performance of D PMTA on a moderately large simulation on the Cray T3D; the scalability is not affected by adding the macroscopic option. 3 Ewald summation Ewald summation was invented in 1921 [6] to permit the efficient computation of lattice sums arising in solid ....
W. T. Rankin and J. A. Board, Jr., A Portable Distributed Implementation of the Parallel Multipole Tree Algorithm, Proceedings, Fourth IEEE International Symposium on High Performance Distributed Computing, IEEE Computer Society Press (1995), pp. 17--22.
....are described briefly in the section on work to date, below. Complete descriptions of these two areas are contained in the two technical papers included with this document. The first paper has been published in the Proceedings of the 1995 IEEE Symposium on High Performance Distributed Computing [13]. The second paper represents unpublished work. The third area regarding distributed load balancing is described in the section on future work, below. 2 Background The main motivation behind this research is to address some of the issues behind the performance of N body computations in Molecular ....
....of multipole mathematics has been developed. From this research, an efficient multipole library has been produced, based upon simplified Taylor series multipole equations developed by prior members of the research group [8, 44] ffl A portable distributed implementation of the PMTA program [3, 13] (D PMTA) has been created utilizing the above multipole library. It implements many of the features developed be our research group, runs on a wide variety of parallel and serial platforms, and scales very well up to a large number (64) of processors. This implementation is currently used for ....
W. T Rankin and Jr. J. A. Board. A portable distributed implementation of the parallel multipole tree algorithm. In Proceedings if the
No context found.
W. Rankin, J. Board, A portable distributed implementation of the parallel multipole tree algorithm, IEEE Symposium on High Performance Distributed Computing[Duke University Technical Report 95-002].
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC