Results 1 - 10
of
68
Overcoming Scaling Challenges in Biomolecular Simulations across Multiple Platforms
"... NAMD † is a portable parallel application for biomolecular simulations. NAMD pioneered the use of hybrid spatial and force decomposition, a technique used now by most scalable programs for biomolecular simulations, including Blue Matter and Desmond developed by IBM and D. E. Shaw respectively. NAMD ..."
Abstract
-
Cited by 48 (32 self)
- Add to MetaCart
(Show Context)
NAMD † is a portable parallel application for biomolecular simulations. NAMD pioneered the use of hybrid spatial and force decomposition, a technique used now by most scalable programs for biomolecular simulations, including Blue Matter and Desmond developed by IBM and D. E. Shaw respectively. NAMD is developed using CHARM++ and benefits from its adaptive communication-computation overlap and dynamic load balancing. This paper focuses on new scalability challenges in biomolecular simulations: using much larger machines and simulating molecular systems with millions of atoms. We describe new techniques we have developed to overcome these challenges. Since our approach involves automatic adaptive runtime optimizations, one interesting issue involves harmful interaction between multiple adaptive strategies, and how to deal with them. Unlike most other molecular dynamics programs, NAMD runs on a wide variety of platforms ranging from commodity clusters to supercomputers. It also scales to large machines: we present results for up to 65,536 processors on IBM’s Blue Gene/L and 8,192 processors on Cray XT3/XT4 in addition to results on NCSA’s Abe, SDSC’s DataStar and TACC’s LoneStar cluster, to demonstrate efficient portability. Since our IPDPS’06 paper two years ago, two new highly scalable programs named Desmond and Blue Matter have emerged, which we compare with NAMD in this paper. 1
Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 1. Generalized Born
- J. Chem. Theory Comput
, 2012
"... ABSTRACT: We present an implementation of generalized Born implicit solvent all-atom classical molecular dynamics (MD) within the AMBER program package that runs entirely on CUDA enabled NVIDIA graphics processing units (GPUs). We discuss the algorithms that are used to exploit the processing power ..."
Abstract
-
Cited by 28 (2 self)
- Add to MetaCart
ABSTRACT: We present an implementation of generalized Born implicit solvent all-atom classical molecular dynamics (MD) within the AMBER program package that runs entirely on CUDA enabled NVIDIA graphics processing units (GPUs). We discuss the algorithms that are used to exploit the processing power of the GPUs and show the performance that can be achieved in comparison to simulations on conventional CPU clusters. The implementation supports three different precision models in which the contributions to the forces are calculated in single precision floating point arithmetic but accumulated in double precision (SPDP), or everything is computed in single precision (SPSP) or double precision (DPDP). In addition to performance, we have focused on understanding the implications of the different precision models on the outcome of implicit solvent MD simulations. We show results for a range of tests including the accuracy of single point force evaluations and energy conservation as well as structural properties pertainining to protein dynamics. The numerical noise due to rounding errors within the SPSP precision model is sufficiently large to lead to an accumulation of errors which can result in unphysical trajectories for long time scale simulations. We recommend the use of the mixed-precision SPDP model since the numerical results obtained are comparable with those of the full double precision DPDP model and the reference double precision CPU implementation but at significantly reduced computational cost. Our implementation provides performance for GB simulations on a single desktop that is on par with, and in some cases exceeds, that of traditional supercomputers. 1.
Biomolecular modeling and simulation: a field coming of age
, 2011
"... We assess the progress in biomolecular modeling and simulation, focusing on structure prediction and dynamics, by presenting the field’s history, metrics for its rise in popularity, early expressed expectations, and current significant applications. The increases in computational power combined wi ..."
Abstract
-
Cited by 14 (6 self)
- Add to MetaCart
We assess the progress in biomolecular modeling and simulation, focusing on structure prediction and dynamics, by presenting the field’s history, metrics for its rise in popularity, early expressed expectations, and current significant applications. The increases in computational power combined with improvements in algorithms and force fields have led to considerable success, especially in protein folding, specificity of ligand/biomolecule interactions, and interpretation of complex experimental phenomena (e.g. NMR relaxation, protein-folding kinetics and multiple conformational states) through the generation of structural hypotheses and pathway mechanisms. Although far from a general automated tool, structure prediction is notable for proteins and RNA that preceded the experiment, especially by knowledge-based approaches. Thus, despite early unrealistic expectations and the realization that computer technology alone will not quickly bridge the gap between experimental and theoretical time frames, ongoing improvements to enhance the accuracy and scope of modeling and simulation are propelling the field onto a productive trajectory to become full partner with experiment and a field on its own right.
Microsecond molecular dynamics simulation shows effect of slow loop dynamics on backbone amide order parameters of proteins
- J. Phys. Chem. B
, 2008
"... A molecular-level understanding of the function of a protein requires knowledge of both its structural and dynamic properties. NMR spectroscopy allows the measurement of generalized order parameters that provide an atomistic description of picosecond and nanosecond fluctuations in protein structure. ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
A molecular-level understanding of the function of a protein requires knowledge of both its structural and dynamic properties. NMR spectroscopy allows the measurement of generalized order parameters that provide an atomistic description of picosecond and nanosecond fluctuations in protein structure. Molecular dynamics (MD) simulation provides a complementary approach to the study of protein dynamics on similar time scales. Comparisons between NMR spectroscopy and MD simulations can be used to interpret experimental results and to improve the quality of simulation-related force fields and integration methods. However, apparent systematic discrepancies between order parameters extracted from simulations and experiments are common, particularly for elements of noncanonical secondary structure. In this paper, results from a 1.2 µs explicit solvent MD simulation of the protein ubiquitin are compared with previously determined backbone order
A scalable parallel framework for analyzing terascale molecular dynamics simulation trajectories
- In Proceedings of the 2008 ACM/IEEE conference on Supercomputing
, 2008
"... Abstract—As parallel algorithms and architectures drive the longest molecular dynamics (MD) simulations towards the millisecond scale, traditional sequential post-simulation data analysis methods are becoming increasingly untenable. Inspired by the programming interface of Google’s MapReduce, we hav ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Abstract—As parallel algorithms and architectures drive the longest molecular dynamics (MD) simulations towards the millisecond scale, traditional sequential post-simulation data analysis methods are becoming increasingly untenable. Inspired by the programming interface of Google’s MapReduce, we have built a new parallel analysis framework called HiMach, which allows users to write trajectory analysis programs sequentially, and carries out the parallel execution of the programs automatically. We introduce (1) a new MD trajectory data analysis model that is amenable to parallel processing, (2) a new interface for defining trajectories to be analyzed, (3) a novel method to make use of an existing sequential analysis tool called VMD, and (4) an extension to the original MapReduce model to support multiple rounds of analysis. Performance evaluations on up to 512 cores demonstrate the efficiency and scalability of the HiMach framework on a Linux cluster. I.
An Online Approach for Mining Collective Behaviors from Molecular Dynamics Simulations
- CONFERENCE ON COMPUTATIONAL MOLECULAR BIOLOGY
, 2009
"... Collective behavior involving distally separate regions in a protein is known to widely affect its function. In this paper, we present an online approach to study and characterize collective behavior in proteins as molecular dynamics simulations progress. Our representation of MD simulations as a st ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
(Show Context)
Collective behavior involving distally separate regions in a protein is known to widely affect its function. In this paper, we present an online approach to study and characterize collective behavior in proteins as molecular dynamics simulations progress. Our representation of MD simulations as a stream of continuously evolving data allows us to succinctly capture spatial and temporal dependencies that may exist and analyze them efficiently using data mining techniques. By using multi-way analysis we identify (a) parts of the protein that are dynamically coupled, (b) constrained residues / hinge sites that may potentially affect protein function and (c) time-points during the simulation where significant deviation in collective behavior occurred. We demonstrate the applicability of this method on two different protein simulations for barnase and cyclophilin A. For both these proteins we were able to identify constrained / flexible regions, showing good agreement with experimental results and prior computational work. Similarly, for the two simulations, we were able to identify time windows where there were significant structural deviations. Of these time-windows, for both proteins, over 70 % show collective displacements in two or more functionally relevant regions. Taken together, our results indicate that multi-way analysis techniques can be used to analyze protein dynamics and may be an attractive means to automatically track and monitor molecular dynamics simulations.
Incorporating Flexibility in Anton, a Specialized Machine for Molecular Dynamics Simulation
- in Proc. 14th International Symposium on Computer Architecture (HPCA 2008
, 2008
"... An effective special-purpose supercomputer for molecular dynamics (MD) requires much more than high-performance acceleration of computational kernels: such accelerators must be balanced with general-purpose computation and communication resources. Achieving this balance was a significant challenge i ..."
Abstract
-
Cited by 8 (7 self)
- Add to MetaCart
(Show Context)
An effective special-purpose supercomputer for molecular dynamics (MD) requires much more than high-performance acceleration of computational kernels: such accelerators must be balanced with general-purpose computation and communication resources. Achieving this balance was a significant challenge in the design of Anton, a parallel machine that will accelerate MD simulations by several orders of magnitude. Anton executes its most computationally demanding calculations on a highly specialized, enormously parallel, but largely non-programmable high-throughput interaction subsystem (HTIS). Other elements of the simulation have a less uniform algorithmic structure, and may also change in response to future advances in physical models and simulation techniques. Such calculations are executed on Anton’s flexible subsystem, which combines programmability with the computational power required to avoid “Amdahl’s Law ” bottlenecks arising from the extremely high throughput of the HTIS. Anton’s flexible subsystem is a heterogeneous multiprocessor with 12 cores, each organized around a 128-bit data path. This subsystem includes hardware support for synchronization, data transfer and certain types of particle interactions, along with specialized instructions for geometric operations. All aspects of the flexible subsystem were designed specifically to accelerate MD simulations, and although it relies primarily on what may be regarded as “general-purpose” processors, even this subsystem contains more application-specific features than many recently proposed “specialized ” architectures. *
Acceleration of an Asynchronous Message Driven Programming Paradigm on IBM
"... that can scale to tens of Peta Flops with 16 cores and 64 hardware threads per node. However, significant efforts are required to fully exploit its capacity on various applications, spanning multiple programming models. In this paper, we focus on the asynchronous message driven parallel programming ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
(Show Context)
that can scale to tens of Peta Flops with 16 cores and 64 hardware threads per node. However, significant efforts are required to fully exploit its capacity on various applications, spanning multiple programming models. In this paper, we focus on the asynchronous message driven parallel programming model – Charm++. Since its behavior (asynchronous) is substantially different from MPI, that presents a challenge in porting it efficiently to BG/Q. On the other hand, the significant synergy between BG/Q software and Charm++ creates opportunities for effective utilization of BG/Q resources. We describe various novel fine-grained threading techniques in Charm++ to exploit the hardware features of the BG/Q compute chip. These include the use of L2 atomics to implement lockless producer-consumer queues to accelerate communication between threads, fast memory allocators, hardware communication threads that are awakened via low overhead interrupts from the BG/Q wakeup unit. Burst of short messages is processed by using the ManytoMany interface to reduce runtime overhead. We also present techniques to optimize NAMD computation via Quad Processing Unit (QPX) vector instructions and the acceleration of message rate via communication threads to optimize the Particle Mesh Ewald (PME) computation. We demonstrate the benefits of our techniques via two benchmarks, 3D Fast Fourier Transform, and the molecular dynamics application NAMD. For the 92,000-atom ApoA1 molecule, we achieved 683µs/step with PME every 4 steps and 782µs/step with PME every step. I.
Architectural constraints to attain 1 Exaflop/s on three scientific application classes. Accepted for the
- IEEE International Parallel and Distributed Processing Symposium (IPDPS’2011
, 1998
"... became operational in 1997, and it took more than 11 years for a Petaflop/s performance machine, the IBM Roadrunner, to appear on the Top500 list. Efforts have begun to study the hardware and software challenges for building an exascale machine. It is important to understand and meet these challenge ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
became operational in 1997, and it took more than 11 years for a Petaflop/s performance machine, the IBM Roadrunner, to appear on the Top500 list. Efforts have begun to study the hardware and software challenges for building an exascale machine. It is important to understand and meet these challenges in order to attain Exaflop/s performance. This paper presents a feasibility study of three important application classes to formulate the constraints that these classes will impose on the machine architecture for achieving a sustained performance of 1 Exaflop/s. The application classes being considered in this paper are – classical molecular dynamics, cosmological simulations and un-structured grid computations (finite element solvers). We analyze the problem sizes required for representative algorithms in each class to achieve 1 Exaflop/s and the hardware requirements in terms of the network and memory. Based on the analysis for achieving an Exaflop/s, we also discuss the performance of these algorithms for much smaller problem sizes. Keywords-application scalability; exascale; performance anal-ysis; molecular dynamics; cosmology; finite element methods I.
Molecular dynamics simulations on high-performance reconfigurable computing systems
- ACM Trans. Reconfigurable Technol. Syst
, 2010
"... The acceleration of molecular dynamics (MD) simulations using high-performance reconfigurable computing (HPRC) has been much studied. Given the intense competition from multicore and GPUs, there is now a question whether MD on HPRC can be competitive. We concentrate here on the MD kernel computation ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
The acceleration of molecular dynamics (MD) simulations using high-performance reconfigurable computing (HPRC) has been much studied. Given the intense competition from multicore and GPUs, there is now a question whether MD on HPRC can be competitive. We concentrate here on the MD kernel computation: determining the short-range force between particle pairs. In one part of the study, we systematically explore the design space of the force pipeline with respect to arithmetic algorithm, arithmetic mode, precision, and various other optimizations. We examine simplifications and find that some have little effect on simulation quality. In the other part, we present the first FPGA study of the filtering of particle pairs with nearly zero mutual force, a standard optimization in MD codes. There are several innovations, including a novel partitioning of the particle space, and new methods for filtering and mapping work onto the pipelines. As a consequence, highly efficient filtering can be implemented with only a small fraction of the FPGA’s resources. Overall, we find that, for an Altera Stratix-III EP3ES260, 8 force pipelines running at nearly 200 MHz can fit on the FPGA, and that they can perform at 95 % efficiency. This results in an 80-fold per core speed-up for the short-range force, which is likely to make FPGAs highly competitive for MD.