DMCA
Large-scale parallel lattice Boltzmann-Cellular automaton model of two-dimensional dendritic growth Large-scale parallel lattice Boltzmann -cellular automaton model of two-dimensional dendritic growth Manuscript Title: Large-scale parallel lattice Boltzma
Citations
504 |
The Lattice Boltzmann Equation for Fluid Dynamics and Beyond
- Succi
- 2001
(Show Context)
Citation Context ...c2 + 9 2 (ei · u(r))2 c4 − 3 2 u(r) · u(r) c2 ) , (7) 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 with the lattice velocity c = ∆x/∆t and the weights wi = 4/9 i = 0 1/9 i = 1, 2, 3, 4 1/36 i = 5, 6, 7, 8. (8) Weakly compressible approximation of NSE (1) can be recovered from the LBM equations (4–8) by Chapman-Enskog expansion [18]. Approximation is valid in the limit of low Mach number M, with a compressibility error in the order of ∼ M2 [13]. Deng et al. [19] formulated a BGK collision based LBM model for the convection-diffusion equation (2). In analogy with equation (5), the diffusivity Dl of solute in the liquid phase is related to the relaxation parameter τC for the solute transport in the liquid phase, lattice spacing ∆x, and simulation time step ∆t by D = τC − 0.5 3 ∆x2 ∆t . (9) Similarly, thermal diffusivity α is related to the relaxation parameter for the heat transfer τT as follows α = τT − 0.5 3 ∆x2 ∆t . (10) Corresponding LBM equation for the solute transport is gi(r + ei∆t, t + ∆t) = gi(r, t) + 1 τC ( geqi (r, t) − gi... |
189 |
Lattice BGK models for NavierStokes equation
- Qian, d’Humieres, et al.
- 1992
(Show Context)
Citation Context ...or the fluid flow. Relaxation parameter τu specifies how fast each particle distribution function fi approaches its equilibrium f eq i . Kinematic viscosity ν is related to the relaxation parameter τu, lattice spacing ∆x, and simulation time step ∆t by ν = τu − 0.5 3 ∆x2 ∆t . (5) The macroscopic fluid density ρ and velocity u are obtained as the moments of the distribution function ρ = 8∑ i=0 fi, ρu = 8∑ i=0 fiei. (6) Depending on the dimensionality d of the modeling space and a chosen set of the discrete velocities ei, the corresponding equilibrium particle distribution function can be found [17]. For the D2Q9 lattice, the equilibrium distribution function f eqi , including the effects of convection u(r), is f eqi (r) = wiρ(r) ( 1 + 3 ei · u(r) c2 + 9 2 (ei · u(r))2 c4 − 3 2 u(r) · u(r) c2 ) , (7) 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 with the lattice velocity c = ∆x/∆t and the weights wi = 4/9 i = 0 1/9 i = 1, 2, 3, 4 1/36 i = 5, 6, 7, 8. (8) Weakly compressible approximation of NSE (1) can be recovered from the LBM equati... |
176 | Lattice-Gas Cellular Automata and Lattice Boltzmann Models, - Wolf-Gladrow - 2000 |
65 | Lattice–Gas Cellular Automata : Simple models of complex Hydrodynamics - Rothman, Zaleski - 1997 |
65 |
The mathematical theory of non-uniform gases: an account of the kinetic theory of viscosity, thermal conduction, and diffusion in gases. Cambridge Univ Pr
- Chapman, Cowling
- 1991
(Show Context)
Citation Context ...m distribution function f eqi , including the effects of convection u(r), is f eqi (r) = wiρ(r) ( 1 + 3 ei · u(r) c2 + 9 2 (ei · u(r))2 c4 − 3 2 u(r) · u(r) c2 ) , (7) 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 with the lattice velocity c = ∆x/∆t and the weights wi = 4/9 i = 0 1/9 i = 1, 2, 3, 4 1/36 i = 5, 6, 7, 8. (8) Weakly compressible approximation of NSE (1) can be recovered from the LBM equations (4–8) by Chapman-Enskog expansion [18]. Approximation is valid in the limit of low Mach number M, with a compressibility error in the order of ∼ M2 [13]. Deng et al. [19] formulated a BGK collision based LBM model for the convection-diffusion equation (2). In analogy with equation (5), the diffusivity Dl of solute in the liquid phase is related to the relaxation parameter τC for the solute transport in the liquid phase, lattice spacing ∆x, and simulation time step ∆t by D = τC − 0.5 3 ∆x2 ∆t . (9) Similarly, thermal diffusivity α is related to the relaxation parameter for the heat transfer τT as follows α = τT − 0.5 3 ∆x2 ∆t . (10... |
63 |
A Model for Collision Processes in Gases.
- Bhatnagar, Gross, et al.
- 1954
(Show Context)
Citation Context ...n a d-dimensional lattice. Primary variables of LBM are particle distribution functions fi. Particle distribution functions represent portions of a local particle density moving in the directions of discrete velocities. For a lattice representation DdQz, each point in the d-dimensional lattice link to neighboring points with z links that correspond to velocity directions. We chose D2Q9 lattice, utilizing nine velocity vectors e0–e8 in two dimensions, as shown in Fig. 1. Distribution functions f0– f8 correspond to velocity vectors e0–e8. Using the collision model of Bhatnagar-Gross-Krook (BGK) [16] with a single relaxation time, the evolution of distribution functions is given by fi(r + ei∆t, t + ∆t) = fi(r, t) + 1 τu ( f eqi (r, t) − fi(r, t) ) (4) where r and t are space and time position of the lattice site, ∆t is the time step, and τu is the relaxation parameter for the fluid flow. Relaxation parameter τu specifies how fast each particle distribution function fi approaches its equilibrium f eq i . Kinematic viscosity ν is related to the relaxation parameter τu, lattice spacing ∆x, and simulation time step ∆t by ν = τu − 0.5 3 ∆x2 ∆t . (5) The macroscopic fluid density ρ and velocity... |
56 | HPCToolkit: Tools for performance analysis of optimized parallel programs. Concurrency and Computation: Practice and Experience,
- Adhianto, Banerjee, et al.
- 2010
(Show Context)
Citation Context ...ndex. This can lead to reduction in computational time in the order-of-magnitudes. We chose to implement “propagation optimized” [22] storage of the distribution functions, where the first two array indices represent x and y lattice coordinates and the last index of the array represents the nine components of the distribution function. Further serial optimization, not considered in this work, could be achieved by combining collision and streaming steps, loop blocking [22, 23], or by elaborate improvements of propagation step [24]. The performance analysis of the code was done using HPCToolkit [25]. It revealed that some of the comparably computationally intensive loops introduced more computational cost 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 than others because not all the constants utilized in them were reduced to single precision. Consolidation of the constants into a single precision improved serial performance. Also, it was found that array section assignments in the form a[1:len-1]= a[2:len] required temporary storage with an addition... |
46 | On the single processor performance of simple lattice bolzmann kernels.
- Wellein, Zeiser, et al.
- 2006
(Show Context)
Citation Context ...cation, a visual difference between results in single and double precision calculations was negligible. Ordering of the loops over two dimensions of arrays has a profound impact on the computational time. In Fortran, matrices are stored in memory in a column-wise order, so the first index of an array is changing fastest. To optimize the cache use, the data locality needs to be exploited, thus the inner-most computational loop must be over the fastest changing array index. This can lead to reduction in computational time in the order-of-magnitudes. We chose to implement “propagation optimized” [22] storage of the distribution functions, where the first two array indices represent x and y lattice coordinates and the last index of the array represents the nine components of the distribution function. Further serial optimization, not considered in this work, could be achieved by combining collision and streaming steps, loop blocking [22, 23], or by elaborate improvements of propagation step [24]. The performance analysis of the code was done using HPCToolkit [25]. It revealed that some of the comparably computationally intensive loops introduced more computational cost 4 1 2 3 4 5 6 7 8 9 ... |
37 | Lattice boltzmann modeling: An introduction for geoscientists and engineers. Lattice Boltzmann Model An Introd Geosci Eng, - Sukop, Thorne - 2006 |
19 | Accelerating scientific computations with mixed precision algorithms.
- Baboulin, Buttari, et al.
- 2009
(Show Context)
Citation Context ...eps. The physical properties for the Al-Cu alloy considered in the simulations are listed in Table 1. The relaxation parameter τu = 1 was chosen for the fluid flow. With a lattice spacing of ∆x = 0.3 µm, this leads to the time step of ∆t = 15.47 ns. For the solute transport and temperature, the relaxation parameters were set according to their respective diffusivities to follow the same time step. 6. Serial optimization Reducing accuracy of data representation together with corresponding reduction of computational cost present a simple option of saving computational time and storage resources [21]. Along with the default, double precision variables, we implemented an optional single precision data representation. This resulted in 50% reduction in memory and processing time requirements. Undesirable consequence of reducing accuracy was that the results in single precision representation differed significantly from double precision results. We found that the number of valid digits in single precision was not large enough to represent small changes in the temperature. To achieve better accuracy, we changed the temperature T representation to a sum of T0 and ∆T = T − T0, where T0 is the in... |
4 |
Comparison of different propagation steps for lattice Boltzmann methods
- Wittmann, Zeiser, et al.
- 2013
(Show Context)
Citation Context ...ner-most computational loop must be over the fastest changing array index. This can lead to reduction in computational time in the order-of-magnitudes. We chose to implement “propagation optimized” [22] storage of the distribution functions, where the first two array indices represent x and y lattice coordinates and the last index of the array represents the nine components of the distribution function. Further serial optimization, not considered in this work, could be achieved by combining collision and streaming steps, loop blocking [22, 23], or by elaborate improvements of propagation step [24]. The performance analysis of the code was done using HPCToolkit [25]. It revealed that some of the comparably computationally intensive loops introduced more computational cost 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 than others because not all the constants utilized in them were reduced to single precision. Consolidation of the constants into a single precision improved serial performance. Also, it was found that array section assignments in the ... |
3 |
A quantitative dendrite growth model and analysis of stability concepts,
- Beltran-Sanchez, Stefanescu
- 2004
(Show Context)
Citation Context ...ute in the liquid phase Cl is computed by LBM. The interface equilibrium concentration Ceql is calculated as [6] Ceql = C0 + T ∗ − T eqC0 ml + ΓK 1 − δ cos (4 (φ − θ0)) ml (17) where T ∗ is the local interface temperature computed by LBM, T eqC0 is the equilibrium liquidus temperature at the initial solute concentration C0, ml is the slope of the liquidus line in the phase diagram, Γ is the Gibbs-Thomson coefficient, δ is the anisotropy coefficient, φ is the growth angle, and θ0 is the preferred growth angle, both measured from the x-axis. K is the interface curvature and can be calculated as [20] K = ( ∂ fs ∂x )2 ( ∂ fs ∂y )2 −3/2 × 2 ∂ fs ∂x ∂ fs ∂y ∂2 fs ∂x∂y − ( ∂ fs ∂x )2 ∂2 fs ∂y − ( ∂ fs ∂y )2 ∂2 fs ∂x . (18) The cellular automaton (CA) algorithm is used to identify new interface cells. The CA mesh is identical to the LB mesh. Three types of the cells are considered in the CA model: solid, liquid, and interface. Every cell is characterized by the temperature, solute concentration, crystallographic orientation, and fraction of solid. The state of each cell at each time step is determined from the state of itself and its neighbors at previous time ste... |
2 |
A New Lattice Bhatnagar-Gross-Krook Model for the Convection-Diffusion Equation with a Source Term,
- Deng, Shi, et al.
- 2005
(Show Context)
Citation Context ...))2 c4 − 3 2 u(r) · u(r) c2 ) , (7) 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 with the lattice velocity c = ∆x/∆t and the weights wi = 4/9 i = 0 1/9 i = 1, 2, 3, 4 1/36 i = 5, 6, 7, 8. (8) Weakly compressible approximation of NSE (1) can be recovered from the LBM equations (4–8) by Chapman-Enskog expansion [18]. Approximation is valid in the limit of low Mach number M, with a compressibility error in the order of ∼ M2 [13]. Deng et al. [19] formulated a BGK collision based LBM model for the convection-diffusion equation (2). In analogy with equation (5), the diffusivity Dl of solute in the liquid phase is related to the relaxation parameter τC for the solute transport in the liquid phase, lattice spacing ∆x, and simulation time step ∆t by D = τC − 0.5 3 ∆x2 ∆t . (9) Similarly, thermal diffusivity α is related to the relaxation parameter for the heat transfer τT as follows α = τT − 0.5 3 ∆x2 ∆t . (10) Corresponding LBM equation for the solute transport is gi(r + ei∆t, t + ∆t) = gi(r, t) + 1 τC ( geqi (r, t) − gi(r, t) ) , (11) an... |
1 |
Three dimensional simulation of solutal dendrite growth using lattice Boltzmann and cellular automaton methods,
- Eshraghi, Felicelli, et al.
- 2012
(Show Context)
Citation Context ... convection effects. Although such large 2D domains may not be 7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 necessary to capture a representative portion of a continuum structure, we expect that the outstanding scalability and parallel performance shown by the model will allow simulations of 3D microstructures with several thousands of dendrites, effectively enabling continuum-size simulations. A similar scalability study with a 3D version of the model [11] is currently under way and will be published shortly. Acknowledgment This work was funded by the US Army Corps of Engineers through contract number W912HZ-09-C-0024 and by the National Science Foundation under Grant No. CBET-0931801. Computational resources at the MSU HPC2 center (Talon) and XSEDE (Kraken at NICS, Gordon at SDSC, Lonestar at TACC) were used. Computational packages HPCToolkit [25], PerfExpert [28], and Cray PAT [29] were used to assess the code performance and scalability bottlenecks. Images were made using OpenDX with dxhf5 module [30] and Paraview tool http: //www.paraview.o... |
1 |
Comparison of phase-field and cellular automaton models for dendritic solidification in Al–Cu alloy,
- Choudhury, Reuther, et al.
- 2012
(Show Context)
Citation Context ... 17 days using 41,472 cores of Kraken, or about 6 days using all 112,000 cores. 9. Conclusions The presented model of dendritic growth during alloy solidification, incorporating effects of melt convection, solute diffusion, and heat transfer, shows a very good parallel performance and scalability. It allows simulations of unprecedented, centimeter size domains, including ten millions of dendrites. The presented large scale solidification simulations were feasible due to 1) CA technique being local, highly parallelizable, and two orders of magnitude faster than alternative, phase-field methods [27], 2) local and highly parallelizable LBM method, convenient for simulations of flow within complex boundaries changing with time, and 3) availability of the extensive computational resources. The domain size and number of dendrites presented in the solidification simulation of this work are the largest known to the authors to date, particularly including convection effects. Although such large 2D domains may not be 7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63... |
1 |
Using Cray Performance Analysis Tools,
- CRAY
- 2011
(Show Context)
Citation Context ...3D microstructures with several thousands of dendrites, effectively enabling continuum-size simulations. A similar scalability study with a 3D version of the model [11] is currently under way and will be published shortly. Acknowledgment This work was funded by the US Army Corps of Engineers through contract number W912HZ-09-C-0024 and by the National Science Foundation under Grant No. CBET-0931801. Computational resources at the MSU HPC2 center (Talon) and XSEDE (Kraken at NICS, Gordon at SDSC, Lonestar at TACC) were used. Computational packages HPCToolkit [25], PerfExpert [28], and Cray PAT [29] were used to assess the code performance and scalability bottlenecks. Images were made using OpenDX with dxhf5 module [30] and Paraview tool http: //www.paraview.org/. This study was performed with the XSEDE extended collaborative support guide of Reuben Budiardja at NICS. An excellent guide and consultation on HPCToolkit was provided by John Mellor-Crummey, Department of Computer Science, Rice University. The authors also acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing HPC resources, training, and consultation (James Brown, Ashay Rane... |