Accepted Manuscript: Chemical Engineering Science

Accepted Manuscript
GPU-Accelerated Large Eddy Simulation of Stirred Tanks
Shuli Shu, Ning Yang
PII: S0009-2509(18)30049-6
DOI: https://doi.org/10.1016/j.ces.2018.02.011
Reference: CES 14035
To appear in: Chemical Engineering Science
Received Date: 12 December 2017

Revised Date: 26 January 2018
Accepted Date: 9 February 2018
Please cite this article as: S. Shu, N. Yang, GPU-Accelerated Large Eddy Simulation of Stirred Tanks, Chemical
Engineering Science (2018), doi: https://doi.org/10.1016/j.ces.2018.02.011
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers
we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and
review of the resulting proof before it is published in its final form. Please note that during the production process
errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
GPU-Accelerated Large Eddy Simulation of Stirred Tanks
Shuli Shu, Ning Yang*

State Key Laboratory of Multiphase Complex Systems, Institute of Process Engineering, Chinese Academy of Sciences, P. O.
Box 353, Beijing 100190, China
*Correspondence author
Email address: nyang@ipe.ac.cn
Abstract
A fast and economical solver, accelerated by the Graphics Process Units (GPU) of a single graphics card in a
desktop computer, is developed for the simulation of stirred tanks, integrating Lattice Boltzmann Method
(LBM), coherent Large Eddy Simulation (LES) and Immersed Boundary Method (IBM). The grid resolution
can reach 13.8 million nodes in maximum, resolving considerable details of flow field in stirred tanks with
only such a simple desktop computer. In the meantime, the computational speed is 1500-fold faster than that of
the traditional transient simulation of Computational Fluid Dynamics (CFD) based on Navier-Stokes equations
implemented on 16 CPU cores. It only takes less than two minutes to update the simulation of one impeller
revolution, enabling the transient simulation of longer physical time of stirred tanks. We find that at least 4,000
impeller revolutions in simulation is needed to achieve an obviously time-independent macro-instability
frequency. With the coherent LES model and reasonable grid resolution, the fast solver is able to resolve more
than 95% of total Turbulent Kinetic Energy (TKE) for highly turbulent region, reproducing the monotonic
decrease of TKE along the radial direction, much better than the Smagorinsky model. The average velocity and
TKE predicted by the fast solver are in better agreement with the experimental data in literature. The solver is
therefore more suitable for the fast simulation of stirred tanks using only a desktop computer without the need
of much finer grid resolution and supercomputers.
Graphical abstract
This figure shows non-dimensional instantaneous velocity magnitude distributions (U/Utip) in the two typical
planes (θ=00 and z=T/3) in our simulation. With a desktop computer of Graphics Process Units (GPU), the
flow in stirred tanks can be resolved with considerable details without the need of supercomputers or computer
clusters
1
Keywords: Stirred Tank, LBM, Large Eddy Simulation, GPU acceleration
Highlights
1. A GPU-accelerated LBM-LES solver for desktop computation of stirred tanks is developed.
2. 1,500-fold speedup over the transient CFD simulation with 16 CPU cores was achieved.
3. Coherent LES can resolve more turbulent kinetic energy than the Smagorinsky model.
4. Coherent LES reproduce the monotonic decrease of turbulent kinetic energy along radial direction.
2
1. Introduction
Stirred tanks are widely used in chemical and process industries for the mixing of immiscible or miscible
fluids, crystallization or chemical reactions. Stirred tanks are involved in manufacturing nearly half of all
chemicals worldwide by value (Bashiri et al., 2016). A stirred tank is characterized by a vessel equipped with
rotational impellers. The fluid flow generated by impellers is generally turbulent and the mixing is therefore
enhanced. One critical step in the design and optimization of stirred tanks is to estimate their performances in
the development phase. Compared to the experimental studies, Computational Fluid Dynamics (CFD) is
expected to serve as an alternative tool to accelerate process development. The ultimate requirements for CFD
simulation from industry are the fast and reliable predictions with easily accessible to computational resources,
e.g., desktop computers or workstations, though supercomputers are becoming more favorable.
Over the past three decades, CFD simulation of stirred tanks has made a mighty advance. A large number
of modeling techniques for stirred tanks were developed such as the black-box method (Harvey and Greaves,
1982), the inner-outer iterative method (Brucato et al., 1998), the multiple reference frame method (MRF)
(Luo et al., 1994), the Snapshot method (Ranade and Dommeti, 1996) and sliding mesh method (Bakker et al.,
1997). These methods can be classified into two categories. The former four represent the steady-state methods
in which impellers keep motionless. By contrast, the impeller motion with respect to static walls is directly
simulated in the sliding mesh method, which was reported to be more accurate than others (Yeoh et al., 2005),
and yet both the transient simulation and the dynamic reconstruction of a body-fitted mesh are time-consuming.
In these CFD solvers, many iterations are required in solving the Poisson equations of a huge matrix in the
whole flow domain. In the sliding mesh method, the grid system is dynamically generated, and the
computation data needs to be interpolated onto the updated grid system, which further increases the
computational time. These intrinsic properties make most of the CFD solver structures, algorithms and
multiphase models not compatible with the carrier of the computation, i.e., the high-performance
computational architecture of Graphics Processing Unit (GPU) (Zhao, 2008). Moreover, the manually body-
fitted mesh generation for stirred tanks might take about 80% of the analysis time (Zhang, 2012).
Lattice Boltzmann Method (LBM) is a novel approach with high parallelism in nature and explicit time-
marching properties. Over the past two decades, LBM has been used as a fluid flow solver for the direct
numerical simulation of immiscible multiphase flow such as bubble dynamics (Shu and Yang, 2013), high
Reynolds number (up to 10 million) turbulent flows (Shu and Yang, 2017) and the simulation of stirred tanks.
There are several pioneering studies about the parallel computation of LBM simulation of stirred tanks based
on many CPUs (Central Processing Unit) (Derksen and Van den Akker, 1999; Derksen, 2003, 2011, 2012;
Gillissen and Van den Akker, 2012; Hartmann et al., 2004a,b, 2006a,b). A large number of CPUs are needed
to accelerate the LBM simulation. Moreover, several hours or even about one day were required to simulate
only one impeller revolution of flow for laboratory-scale (Eggels, 1996; Tyagi et al., 2007) or industrial-scale
3
(Derksen et al., 2007) stirred tanks. The newly booming GPU is of high-performance computation and lower
power consumption, becoming widely used in scientific computation. The theoretical peak performance of a
single latest NVIDIA Tesla V100 GPU graphics card is about 15 TeraFLOPS in single precision, which is
faster than that of the world’s fastest supercomputer system based on CPU in TOP 500 list of supercomputers
in 2001. The continuously emerging GPU enables a desktop computer to be powerful enough to boost large-
scale simulations without resort to supercomputers. LBM is well suited to the GPU architecture and transient
flow simulation due to its explicit time-marching characteristics or localized calculations. The compatibility of
GPU architecture and the numerical algorithm of LBM could lead to more efficient simulations.
Although GPU-accelerated simulations have been used in solving LBM (Janßen and Krafczyk, 2011;
Tolke and Krafczyk, 2008; Xian and Takayuki, 2011; Xiong et al., 2012), the applications of GPU technique
for stirred tank simulations are seldom reported. It is therefore of practical significance to utilize LBM and
GPU architecture to accelerate or upgrade the stirred tank simulation for industrial applications. Moreover, the
Immersed Boundaries Method (IBM) (Peskin, 2003), which explicitly treats complex or moving boundaries,
can greatly save the time of manual mesh generation. The complex or curved boundaries can be represented by
a couple of Lagrangian marker points and the boundary conditions are implemented by imposing additional
forces or source terms to the equations in the Eulerian meshes adjacent to the boundaries. Therefore, there is
no need to generate body-fitted grids in IBM. Instead, uniform cubic grids can always be used for flow solvers
and the Lagrange points on boundary surfaces can be easily generated by Computer Aided Design (CAD)
software. Implementations of IBM in stirred tank simulations have been conducted in previous works (Derksen
and Van den Akker, 1999; Derksen, 2003, 2011, 2012; Gillissen and Van den Akker, 2012; Hartmann et al.,
2004a,b, 2006a, b).
Large Eddy Simulation (LES) has been widely used and proved to be more accurate and promising
(Hartmann et al., 2004a; Min, 2006). It is generally acknowledged that the grid resolution in LES has to be fine
enough to resolve more than 80% of turbulent kinetic energy (TKE) (Pope, 2001). This may hinder their
industrial applications, especially for GPU-accelerated computation. The GPU memory capacity is still limited
and even the latest NVIDIA V100 card has only 16Gigabyte memory since fast memory is expensive, though
GPU-accelerated LBM simulation enables the highly temporal resolution or long-time simulation of stirred
tanks. Therefore, more advanced LES models with reasonable grid resolution is required. An index based on
the resolved TKE has been developed to estimate the grid resolution quality for bulk flow, and therefore more
resolved TKE was equivalent to better grid resolution (Celik, et al. 2005). This suggests that the grid resolution
could be lower to resolve more TKE when an advanced LES model is used. Hence, it is of great significance to
assess the various LES models by the resolved TKE with the GPU-accelerated computation.
In this paper, we attempt to evaluate the possibility or potential to accelerate the LBM-LES simulation of
stirred tanks with a desktop computer of a single GPU graphics card. The Multiple-Relaxation-Times (MRT)
LBM (D'Humieres et al., 2002; Lallemand and Luo, 2000) is used as a fluid flow solver, together with the IBM
4
for modeling complex moving boundaries and structured meshes to discretize the flow domain of stirred tanks.
The parallel computation code was developed and implemented on a desktop computer of a single graphics
card. The computational efficiency (performance) of the GPU-accelerated simulation was estimated in terms of
MLUPS (the million lattice units updates per second) (Xian and Takayuki, 2011) when using the maximum
number of grid nodes that a graphics card can reach. It gives a reference of the capability of the desktop
computation of a graphics card. With the GPU-accelerated LBM solver, we are able to estimate the macro-
instability in stirred tanks with thousands of revolutions’ simulation. We also try to assess the grid resolution
requirement of two different LES models with the percentage of the resolved TKE, and the accuracy of two
different LES models by the predictions of time-averaged properties.
2. Models and numerical methods
2.1 Flow solver
LBM, unlike traditional CFD methods based on Navier-Stokes equations, solves the continuous
Boltzmann equation on discrete spatial and temporal spaces. Currently, there are two main types of models for
the collisional terms in the continuous Boltzmann equation, i.e., Single Relaxation Time models (SRT) (Qian
et al., 1992) and MRT models (Lallemand and Luo, 2000). The MRT models have more adjustable parameters
and proved to be more stable. In this work, the D3Q19 lattice model and the MRT model are used, and the
governing equations are formulated as
  1
f  x  ei t , t  t   f  x, t   M 1S m  m eq  ( I  S )Fm
2
(1)
where f is the discrete velocity distribution function vector, ei the discrete velocity, m the discrete velocity
distribution function vector in moment space, meq the equilibrium moment vector, Fm the force term in
moment space (Guo et al., 2002). S is the diagonal matrix or collision matrix and S  diag ( s0 , s1 ,...sb1 ) . b is the
number of discrete velocity in lattice model and b=19 in this work. I is the identity matrix, and M is the
transformation matrix. The details of ei, meq, M and Fm of D3Q19 lattice model are summarized in the
Appendix.
The collision matrix was chosen according to D'Humieres et al. (2002). In the D3Q19 model, S9=S11-
15=1/τ0, S0=S3=S5=S7=0, S1=1.19, S2=S10=S12=1.4, S4=S6=S8=1.2, S16-18=1.98. The non-dimensional relaxation
time τ0 is related to the kinematic viscosity of fluid, and v0=c2s (τ0-0.5). To solve Eq. (1), the discrete velocity
distribution function vector f is transformed into that in the moment space,
mi  M ij f j (2)
and the collision step is then implemented by
 
1
m = m  S m  m eq  ( I  S )Fm
2
(3)
Then m is transformed to the discrete velocity distribution. The propagation step is implemented in the
5
discrete velocity distribution function space,
fi  x  ei t , t  t   fi '  x, t   M ij1m'j (4)
The macro variables density ρ and velocity u can be calculated by
 f
i
i (5)
u  fe 
i
i i
1
2
F t (6)
where F is the force in the Cartesian coordinate system.
2.2 Turbulence modeling
LES directly resolves the larger eddies and the sub-grid fluid flow or small eddies, which are assumed to
be universal and isotropic, are modeled through the sub-grid eddy viscosity. The standard Smagorinsky model
(Smagorinsky, 1963) with the constant model parameter Cs was always used in stirred tank simulation, but the
value differed in literature, e.g., 0.1 (Eggels, 1996; Hartmann et al., 2004a) or 0.12 (Derksen and Van den
Akker, 1999). Gillissen and Van den Akker (2012) found that Cs varied with the spatial locations. Yang (2015)
also pointed out that Cs=0.18 gives reasonable results for isotropic turbulence, whereas for the flows near solid
walls, it should be reduced to 0.1. Hence it is more reasonable to evaluate Cs according to the local flow details.
Recently, a coherent LES model (Kobayashi, 2005) with spatial variable model parameter was developed and
the sub-grid viscosity (vt) was calculated by
vt  C  2 S (7)
where S is the strain rate. The filtered width Δ is kept the same with grid size and the model parameter varies
with local flow detail:
3/ 2
C  Ccoh FCS
and
Q
FCS  (8)
E
Q is the second invariable of strain rate,
1 u j ui
Q (9)
2 xi x j
2
1  u 
E  j (10)
2  xi 
Ccoh is a model parameter and set as 0.01 in this work. According to the Chapman-Enskog expansion, the strain
rate Sij is related to the non-equilibrium parts of velocity distribution function. It can be formulated as
6
(Krafczyk et al., 2003)
1 1
Sij = 
2  cs2
Qij  
2  cs2


e e
i j
 f  feq  (11)
where τ=τ0+3vt. Qij can be computed from the discrete velocity distribution function in moment space and the
equilibrium moment, as shown in the Appendix.
2.3 Modeling of complex geometry
In this simulation, the IBM method was used to represent the complexly curved boundary or rotational
parts of a stirred tank in which boundaries were marked with a set of discrete points. The boundary condition
was implemented by applying forces to the fluid nodes adjacent to the boundaries. The force Fb exerted on a
boundary marker point located at xb was calculated from (Kang and Hassan, 2011),
U d  U *fluid ( xb , t  t )
Fb ( xb , t  t )  2  (12)
t
Ud is the desired or imposed velocity on boundary marker points and determined by the motion of boundaries,
and U *fluid ,b denotes the velocity of the boundary marker points interpolated from the velocity of adjacent fluid
nodes.
U *fluid ,b   ijk G( xb  xijk )u

*
(13)
ijk
where u *ijk is the velocity calculated from Eq. (6) without the force term. The interpolation function was
obtained from
G( xb  xijk )  G(r )  g ( xb  xijk ) g ( yb  yijk ) g ( zb  zijk ) (14)
where
1  r , r  1
g r   (15)
0 r 1
Then the forces exerted on the boundary maker points Fb were cast into the adjacent fluid node xijk by value Ff,
Ff   b G( xijk  xb )Fb (16)
More details of the algorithm for LBM-IBM simulation can be referred to Lu et al. (2012).
3. Code implementation
Although originally designed for graphics rendering, GPUs have been applied to general purpose
computation due to the powerful arithmetic engines capable of running thousands of lightweight threads
concurrently. The GPUs highlight the higher theoretical performance and bandwidth, lower energy
consumption, and high price-performance ratio over CPUs, attracting an increasing number of researchers to
utilize GPUs for scientific computing. In this work, we employed a NVIDIA Tesla C2050 graphics card built
7
as 14 multithreaded streaming multiprocessors, each of which has 32 multiple scalar processor cores. There are
totally 448 cores. The clock frequency of each core is 1.15 GHz and the theoretical peak performance of single
precision of a NVIDIA Tesla C2050 is 1.03 Teraflops.
The logical arrangement of the basic computational tasks is closely related to the architecture of GPU
hardware. Here we used CUDA®, a parallel computing platform and programming model developed by
NVIDIA, for coordinating the computational tasks on the GPUs and CPU. In CUDA, a GPU is called device
and the corresponding CPU is called host. The functions run on the GPUs are always executed in parallel by
many threads. Threads are aggregated into blocks, which are then grouped into a grid. The threads within a
block are executed within the same streaming multiprocessors. The GPU devices have several different
memory spaces, i.e. the global, local, texture, constant, shared and register memory. The only two types of
memory that actually reside on the GPU chip are the register and shared memory. The local, global, constant,
and texture memory are all off-chip. Each type of memory on the device is different in terms of speed, scope
and lifetime. Readers can refer to the CUDA® manual for more details.
LBM simulation is a typical SIMD (single instruction multiple data) process and suitable for GPU
computation. The major computation requirement of LBM lies in the collisional process which is only related
to the local flow information, e.g. Eqs.(2)-(4). Distribution function fi for local grid points can be loaded on
register memory. Direct matrix operation involving many loops for the distribution function and moment space
transfer in Eqs. (2) and (4) was avoided to save the computational cost. For example, mi was calculated from
the direct expansion of right hand side of Eq.(2), i.e., mi=Mi,0f0+Mi,1f1+….+Mi,18f18. Variables such as m and m＇
were specified with the temporary variables and allocated with on-chip memory to save the global memory and
accelerate the computation. The similar strategy was adopted for the calculation of Eqs.(5)-(6), and the shared
memory was used for the propagation step in Eq.(4). More details on memory allocation in GPU computation
can be referred to Tolke and Krafczyk (2008).
Apart from the appropriate utilization of GPU memory, the computation speed is also relevant to the way
to calculate the forces at the boundary points. There are two choices to calculate the forces: first, specify one
thread for one marker point and search all the fluid nodes adjacent to the local marker points and then calculate
the interpolated velocity; second, specify one thread for each grid nodes and search the adjacent marker points.
The computational complexity of the former case is much smaller than that of the latter one, and the
computation can be significantly accelerated. Therefore Fb on the marker points was only cast to the adjacent
fluid nodes to save computational time in this work. The atomic operation should be used to avoid the read-
write conflict when distributing the forces of the marker points onto the adjacent fluid nodes. An atomic
operation means the reading or writing a value back to memory without the interference of any other threads,
which guarantees that a race condition cannot occur.
4. Numerical setups
8
In this paper, a baffled stirred tank with a standard Rushton turbine (Hartmann et al., 2004a) was
simulated. The tank diameter T equals the liquid height H (150 mm). The impeller has 6 blades, and the details
of tank geometry are depicted in Fig. 1. The fluid kinematic viscosity is 1.53×10-5 m2/s and the fluid density is
1,039 kg/m3. The rotational speed of impellers is 2,672 rpm or N=44.53 rev/s, producing a flow of Reynolds
number 7,300.
Fig. 1. Geometry of a stirred tank of Rushton turbine.
Table 1．Simulation cases.
Case Hardware Flow solver Wall Boundary Moving Parts Turbulence model Grid Number
L1 1 GPU card MRT-LBM IBM IBM Coherent LES 2403 (1.38×107)
L2 1 CPU core MRT-LBM IBM IBM Coherent LES 2403 (1.38×107)
L3 1 GPU card MRT-LBM IBM IBM Smagorinsky LES 2403 (1.38×107)
F1 16 CPU cores ANSYS Fluent Body-fitted mesh Sliding mesh Smagorinsky LES 1.26×107
F2 16 CPU cores ANSYS Fluent Body-fitted mesh Sliding mesh Smagorinsky LES 3.01×105
F3 16 CPU cores ANSYS Fluent Body-fitted mesh Sliding mesh Transient SST k-ω 3.01×105
In LBM-IBM simulations, a bounce-back boundary condition was implemented at the bottom to achieve
the no-slip boundary, and a free-slip boundary condition was applied at the top boundary. The complex
geometry such as the rotational and stationary parts (e.g., the wall and baffles) was generated and marked with
the marker points by the commercial CAD software. For Cases L1-L3, the number of marker points was
20,079 for the rotational parts and 233,770 for the stationary parts. The time step Δt was set to 1/(1500N) in
Cases L1-L3. A uniform grid was set for all the LBM-IBM simulations with the same grid number assigned for
each direction (nx, ny, nz).
5. Results and discussion
5.1 GPU acceleration
The computational performance can be estimated by MLUPS (Xian and Takayuki, 2011),
nx  ny  nz  LN
Performance  (MLUPS) (17)
tcomp  106
where nx, ny, nz are the lattice number in x, y, z direction respectively, LN denotes the updating time steps, and
9
tcomp is the computational time for the updating time steps at given lattice numbers. The performance number
represents the computational task (updating steps×lattice number) accomplished per second. It is jointly
affected by both the computational hardware and software (models and algorithm).
The maximum grid number can be estimated from the available independent memory of one NVIDIA
Tesla C2050 graphics card (2687 Mbyte) and the memory requirement for each grid point. In D3Q19 model, at
least 45 variables need to be stored in GPU memory for each discrete grid point, including 38 velocity
distribution functions for time t and t+Δt, density and three fluid velocities and three force terms in three
dimensions. We also need to assign the memory to the Lagrangian marker points. Hence it requires more than
180 (45×4) Bytes for each grid point and 36 Bytes for each marker point for single-precision computation. The
total memory required for the marker points was 8.72 Mbytes. The theoretical grid nodes number that can be
used for LBM simulation was therefore 1.49×107 (2453) for a single GPU graphics card. Due to the memory
requirement of temporal variables and visual display, only 1.38×107 (2403) grids in Case L1 and Case L2 were
used to test the computational performance of a single GPU graphics card or a single CPU core respectively.
Here the computational model and algorithm were kept the same, and hence the difference in computational
performance is mainly from the computational hardware. The theoretical peak performance of single-precision
computation of the Tesla C2050 GPU card is about 1.03 TeraFLOPS, 56.97 times higher than that of a single
core of Intel E5520 CPU (18.08 GigaFLOPS).
Fig. 2 indicates that the performance of the GPU card for the stirred-tank simulation is about 174.98
MLUPS, achieving 61.09-fold speedup over a single core of an E5520 CPU (2.86 MLUPS). The GPU
acceleration is remarkable, and the speedup is superlinear. It takes about 118.50 seconds for a single GPU card
to update the flow simulation of one impeller revolution. The GPU acceleration using only one GPU card is
notable, economic and energy-saving. It is easy to install multiple GPU cards on a desktop PC to further
accelerate the computation.
10
Fig. 2. Comparison of the performance of a single NVIDIA Tesla C2050 GPU card and a single Intel E5520 CPU core.
Table 2．Comparison of computational speed for LBM simulation and traditional CFD simulation.
Case F1 F2 F3 L1 L2
Hardware Intel Xeon Intel Xeon Intel Xeon NVIDIA Intel Xeon
E5-2860 E5-2860 E5-2860 C2050 E5520
Core Number 16 CPU cores 16 CPU cores 16 CPU cores 1 GPU card 1 CPU core
Theoretical peak 358.4GigaFLOPS 358.4GigaFLOPS 358.4GigaFLOPS 1.03 TeraFLOPS 18.08
GigaFLOPS
Flow solver ANSYS Fluent ANSYS Fluent ANSYS Fluent LBM LBM
Iterations/physical time step 20 20 20 1 1
Computational 120s 3.45s 5.9s 0.079s 4.82s
time/physical time
step(tcomp/LN)
(Note: the numerical setting of different cases is different, as listed in Table 1)
We then compared the computational performance with different hardware and models, as listed in Table
2. The computation time was obtained by the statistical data of the so-called wall-clock time either in the
CUDA code or the commercial CFD package. The time needed to update 1500 time steps or one impeller
revolution was counted for three times in all the simulation cases. Then the average time needed for one time
step calculation was obtained. All the simulations listed in Table 2 were solved in a time-dependent manner,
and the operational condition and the time steps in all the cases were kept the same. Cases L1 and L2 denote
the LBM simulations accelerated by one GPU graphics card or one CPU core respectively. In Cases F1-F3, the
parallel simulations using the commercial CFD package (Ansys Fluent) were implemented on 16 CPU cores.
The theoretical peak FLOPS of one CPU core in Case L2 was about 5% of that of 16 CPU cores in Case F1.
11
However, the computational speed in Case L2 using one CPU core (4.82s per time step) was 24.9 times faster
than that of Case F1. This indicates that the LBM-IBM solver for stirred tanks, even in the single-core CPU
architecture, is more efficient than the traditional transient CFD solver using the Sliding Mesh method and
many CPU cores.
The theoretical peak FLOPS of computational devices of one GPU graphics card in Case L1 was only
2.94 times faster than that in Case F1 of 16 CPU cores. However, the computational speed of Case L1 (0.079s
per time step) was 1519 times faster than that of Case F1 (120s per time step), though the grid number of Case
L1 (2403 or 1.38×107) was similar to that of Case F1 (1.26×107). This indicates that the GPU acceleration is
remarkable.
This remarkable acceleration is not only relevant to the higher theoretical peak performance of GPU, but
the difference in complexity, parallelism and time for data access of the algorithms in the two modeling
approaches, as well as the excellent compatibility of LBM algorithm and GPU hardware. The CFD algorithm
generally involves the complex calculation of fluxes on the faces of each cell, requiring the complex
relationship of neighboring nodes. The pressure-Poisson equations are solved in most of the CFD algorithms in
the entire domain at each time step. By contrast, the LBM computation is almost local, and hence the parallel
efficiency is intrinsically much higher than that of CFD simulation, which is very compatible with the
architecture of GPU hardware. Moreover, the data access in CFD simulation might be more time consuming
due to the complex data structure for the unstructured grid systems. By contrast, the structured cubic grids are
generally used in LBM even for complex geometry, which makes the data access easier and faster.
Furthermore, dynamic grid generation and data interpolation from the old grid system to the new one is
required in many CFD simulation of stirred tanks, which brings additional computation cost. However, there is
no need to regenerate the grid system when using the IBM method for complex geometry in LBM, and if
necessary simply increasing the density of the structure grid is adequate for resolving more details in the bulk
phase. The above advantages can therefore greatly enhance the computational efficiency of LBM.
5.2 Instantaneous flow field
The non-dimensional instantaneous velocity magnitude distributions (U/Utip) in the two typical planes
0
(θ=0 and z=T/3) of Case L1 are illustrated in Fig. 3. The simulation can resolve more details of the complex
flow structures, showing the highly asymmetrical instantaneous flow. Behind each blade were the regions of
higher flow velocity magnitude. The instantaneous velocity magnitude was non-uniform at the same radial
positions at the z=T/3 plane. The velocity magnitude around the impeller blades was much larger than the rest
parts. The highest velocity magnitude was about 2.2 times larger than the linear velocity of blade tip Utip, and
therefore the linear velocity of blade tips can only be used as a qualitative estimation of the maximum fluid
flow velocity magnitude.
12
Fig. 3. Instantaneous velocity magnitude at two typical planes: (a) θ=00 plane; (b) z=T/3 plane (anti-clockwise rotation).
Fig. 4 compares the instantaneous flow field distribution in the horizontal plane across the turbine disk
(z=T/3) for different cases (Case F2, Case F3 and Case L1) in Table 1. Case F1 with more grid number is not
presented due to the unbearable computational requirements in the parallel simulation of 16 CPU cores. Case
F2 can to some extent capture the non-uniform structure around the impellers and the flow evolution looks
periodical. However, the instantaneous velocity distributions around the baffles in the simulation were not
periodic and actually might be unphysical. The flow field was quite uniform in Case F3. Case L1 cannot only
capture the non-uniform details, but resolve the periodic flows around both the impeller blades and the baffles.
The maximum velocity magnitude in the z=T/3 plane was 1.6Utip in Case F2, 1.3Utip in Case F3 and 2.2Utip in
Case L1. The instantaneous flow details were averaged out in Case F3 as the transient SST k-ω model was
based on the Reynolds-averaged approach. The LES with fine grids in Case L1 can capture more instantaneous
flow details.
Fig. 4. Instantaneous velocity magnitude (U/Utip): (a) Case F2; (b) Case F3; (c) Case L1.
13
Fig. 5. Flow details in a vertical plane in Case L1: (a) at θ=00 plane; (b) around the blades.
Fig. 5 depicts the more flow details in a vertical plane (θ=00 plane) in Case L1. The fluid flow was
discharged from the impellers along the radial direction and separated into the upward or downward streams
near the baffles. The separated streams then moved upwards or downwards, and turned back towards the shaft
or centerline of the tank, establishing two large-scale circulations. The simulation reveals the circulating cells
at various scales and complex vortices. The details of the so-called trailing vortex (Escudié and Liné, 2003)
developed in the wake of blades were captured. The trailing vortices can promote fluid mixing or particle
breakage.
Macro-Instability (MI) is one of the characteristics of the large-scale flow in stirred tanks and has notable
impacts on heat and mass transfer and mechanical impacts on tank internals, impeller shaft and massive
vibrations of impellers. The instantaneous flow field around the shaft at the z=T/8 plane at four different time
instants is depicted in Fig. 6. The first snapshot was recorded at the 100th impeller revolution, and the time
interval between every two consecutive snapshots was six revolutions. Even though the phase of impeller
blades was kept the same, the flow around the shaft always varied, reflecting the MI effects in stirred tanks as
reported in the literature (Hartmann et al., 2004b; Roussinova et al., 2000, 2003).
14
Fig. 6. Instantaneous flow field around the shaft at the same phase of impeller blades at z=T/8 plane.
Fig. 7. Locations of two monitor points for MI analysis.

To further analyze the MI phenomenon and evaluate the simulation accuracy of the instantaneous flow, the
time series of the radial velocities at two monitoring points given in Fig. 7 are illustrated in Fig. 8. Point A was
located at the outflow region parallel to the impeller plane (θ=300, z=T/3, r=T/4), and Point B was located at
the bulk region around the shaft (θ=300, z=T/8, r=T/12). The radial velocities at the two points were recorded
in the transient simulation. The fluctuation of radial velocity at Point B was smaller than that of Point A, as
shown in Fig. 8(a)-(f). The probability distribution function (PDF) of the non-dimensional radial velocity
15
(Urad/Utip) was based on the statistics of the radial velocity from the 100th to 200th revolutions. As illustrated in
Fig. 8(e), the fluctuation of radial velocity Urad at Point A is between -0.25Utip and 1.25Utip, which can be
captured within 10 revolutions statistics shown in Fig. 8(c). However, the fluctuation of radial velocity Urad at
Point B, is between -0.23Utip and 0.23Utip, which cannot be captured within 10 revolutions, as illustrated in Fig.
8.d. The fluctuation of radial velocity at Point B can only be captured with longer-time simulation. The capture
of fluctuation is significant to the understanding of the so-called micro-mixing.
Fig. 8. Time series of radial velocities at Point A (top) and Point B (bottom)
(a)-(d): radial velocity as a function of impeller revolution; (e) and (f): probability distribution function (PDF) of the radial
velocity (Urad/Utip)
The Discrete Fourier Transform (DFT) method can be used to analyze the velocity fluctuations. This method
was then adopted for the power spectral density (PSD) analysis of the time series of radial velocity at Point A,
as shown in Fig. 9. A distinct peak appears at f≈6N and a smaller peak emerges at f=0.039N. The peak at f=6N
is relevant to the interaction frequency (f) between the six impeller blades and baffles, which was 6 times of
the rotational speed N. The simulation was consistent with that of Roussinova et al. (2003) and Hartmann et al.
(2004b). The low-frequency peak at f=0.039N is related to the MI of the stirred tank.
16
Fig. 9. Power spectral density analysis of the time series of radial velocity at Point A.
Fig. 10 shows the power spectral density of the radial velocity of Point B with different total sampling
time. A distinct peak emerges at f/N=0.015 for the total sampling time of 600 impeller revolutions. There were
two distinct peaks, i.e., f/N=0.01 and f/N=0.041 for the total sampling time of 1,500 revolutions, and three
distinct peaks for the total sampling time of 4,000 revolutions, i.e., f/N=0.011, f/N=0.025 and f/N=0.116. The
total sampling time has a large influence on the peak frequency. Galletti et al. (2004) reported two distinct
peaks at f/N=0.015 and f/N=0.106 for a Rushton turbine stirred tank (6,300<Re<13,600), and only the
simulation of 4,000 revolutions can capture another distinct peak at f/N=0.116. In our simulation, all the above
tests can capture the peak at about f/N=0.01. This may suggest that MI can be more accurately captured only
by using the long physical time simulation. Roussinova et al. (2003) also reported that 5,000 revolutions of
experiments were needed to obtain distinct peaks. In summary, at least 4,000-revolutions simulation is needed
to reproduce the time-independent MI frequencies, and hence the GPU-accelerated computation is of critical
importance to the MI study of stirred tanks
17
Fig. 10. Power spectral density of the time serial of radial velocity at Point B with different total sampling time (Case L1): (a) 600
revolutions; (b) 1,500 revolutions; (c) 4,000 revolutions.
5.3 Quantitative estimation of mean flow
The behavior of main flow is pertinent to the macromixing processes of stirred tanks. The small-scale
mixing and the particle breakage or coalescence is closely related to the turbulence intensity, and therefore it is
critical to understand the turbulent properties in stirred tanks. The total TKE is composed of the resolved and
unresolved TKE, as defined by k  12 u  vt2 / Ck2 (Gillissen and Van den Akker, 2012; Yeoh et al., 2004). The
2
first term at the right side represents the grid resolved-scale turbulence and the second term represents the
unresolved-scale turbulence, and the model constant Ck was chosen as 0.05 (Gillissen and Van den Akker, 2012)
in this work. The turbulent dissipation rate also consists of the grid-resolved and unresolved terms and can be
computed by    v  ve  S . The Kolmogorov length scale (η) represents the smallest length scale or smallest
2
eddy scale in turbulent flow, and η=(v3/ε)1/4. The minimum eddy size in the inertial sub-range is reported to be
about 10 times larger than the Kolmogorov length scale (Tennekes and Lumley, 1972) and often used to
estimate the maximum size of droplets in turbulence (Luo and Svendsen, 1996; Wang et al., 2003). The ratio of
grid size to the Kolmogorov scale is depicted in Fig. 11. The maximum ratio is about 8.8 in the wake of
impellers. The ratio of grid size and the Kolmogorov scale is much higher near the impeller regions.
18
Fig. 11. Ratio of grid size to Kolmogorov length scale (η) at (a) θ=00 plane, (b) z=T/3 plane.
To quantitatively estimate the simulation accuracy, the coherent LES (Case L1) and the Smagorinsky LES
(Case L3) in this work were compared with the experimental data and the Smagorinsky LES results of
Hartmann et al. (2004a). Fig. 12 compares the phase-averaged axial distribution of radial velocity, tangential
velocity and turbulent kinetic energy predicted by different turbulence models and the experimental data of
Laser Doppler velocimetry (LDA) at three different radial locations (2r/D=1.1,1.5,1.9). It should be pointed
out that in the two LES cases of this work, the grid number was kept the same (1.38×107) with that of
Hartmann et al. (2004a). The MRT method was used for the collisional terms in this work in contrast to the
SRT method in Hartmann et al. (2004a). The prediction of radial velocity in Cases L1 and L3 was fairly good
and performs better than that of Hartmann et al. (2004). The coherent LES model was the most accurate,
especially for the region around z=T/3. All the models overestimated the tangential velocity around the blade
tips (i.e., 2r/D=1.1; z=0.32~0.34T).
The deviations of the two LES cases of this work from the experimental data (Hartmann et al., 2004a)
were much smaller than that of the predictions of Smagorinsky LES and SRT. Although the tangential velocity
at 2r/D=1.5 and 1.9 were also well predicted by the Smagorinsky LES/SRT models, the deviations with
experimental data were larger. It underestimated the TKE at 2r/D=1.1 and overestimated at 2r/D=1.5 or 1.9.
The two LES cases were almost identical to the experimental data at the region away from the blades tips
(2r/D=1.5 and 1.9). Moreover, there was no marked difference in the radial or tangential velocity distributions
between the Smagorinsky/MRT and the coherent/MRT. The experimental data indicates that the maximum
TKE decreases with increasing the radial position. However, the Smagorinsky/MRT predicted the larger
maximum TKE at 2r/D=1.5, which was not consistent with the experiments. The coherent LES model can
reflect this trend, showing the reduction along the radial position. Overall, the coherent/MRT LES model was
more reliable and can give the best predictions.
19
Fig. 12. Axial distribution of phase-averaged data at three different radial locations (2r/D=1.1, 1.5, 1.9)
(a) radial velocity, (b) tangential velocity, (c) turbulent kinetic energy (S: Smagorinsky LES; C: Coherent LES).
Fig. 13 gives the resolved and unresolved parts of TKE predicted by the coherent or Smagorinsky LES
models. Both the two parts decreased with the increase of radial position. The resolved TKE of the
Smagorinsky LES model at 2r/D=1.5 was larger than that at 2r/D=1.1, but in reality the turbulence intensity at
2r/D=1.1 should be physically much larger than that of 2r/D=1.5.
Fig. 14 illustrates the ratio of the resolved TKE to total TKE. The percentage of the resolved TKE of both
the coherent and Smagorinsky LES models increased along the radial direction. The percentage of the resolved
TKE at the high TKE region was much smaller than that at the lower TKE region. The coherent LES model
can resolve almost 95% of total TKE at the high turbulence region of r/D=1.1, whereas the Smagorinsky
model only resolved about 85%. The coherent LES model can resolve more than 98% of total TKE at r/D=1.5
20
and 1.9, whereas the Smagorinsky model only resolved about 92%. As suggested by Pope (2001), the grid
resolution of LES simulation should be fine enough to resolve 80% of total TKE. With the same grid
resolution, the percentage of the resolved TKE predicted by the coherent LES model is 10% larger than
Smagorinsky model in this simulation. Hence, more grids are required for the Smagorinsky model to attain the
similar percentage of resolved TKE. The coherent LES model can therefore be applied for coarser grid systems.
Fig. 13. Comparison of resolved and unresolved turbulent kinetic energy at three different radial locations
(a) Smagorinsky LES; (b) coherent LES.
Fig. 14. Ratio of resolved turbulent kinetic energy to total turbulent kinetic energy.
6. Conclusions
In this paper, a fast and economical solver implemented on an easily-accessible desktop computer with a
single GPU graphics card was developed for stirred tanks. The solver integrated the advantage of LBM,
coherent LES and IBM for parallel computation. Compared to the traditional CFD simulations, the GPU-
21
accelerated LBM simulation on such a simple desktop computer proves to be more practical and efficient for
stirred tank simulations. Even with the grid resolution 8.8 times of smallest Kolmogorov length scale, the
simulation can still be 1,500 folds faster than the CFD simulation with 16 CPU cores. Superlinear acceleration
was achieved and the speed-up ratio was more than 60 as compared to the LBM simulation using an Intel
E5520 CPU core. It only took about two minutes to update one revolution simulation with this GPU-
accelerated desktop computation. This enables the study of the macro-instability in stirred tanks, which needs
the simulation of several thousands of revolutions. We found that at least 4,000 impeller revolutions
calculation was needed to achieve a time-independent prediction of macro instability frequency. The
requirement of high grid resolution for LES conflicts with the limited GPU memory capacity for a desktop PC,
and may hinder their application in modeling industrial-scale problems. We demonstrate that with the coherent
LES and reasonable grid resolution, the fast solver was able to resolve more than 95% of total turbulent kinetic
energy for high turbulent region, and reproduced the monotonic decrease of TKE along the radial direction,
much better than the Smagorinsky model. The average velocity and TKE predicted by the fast solver was in
better agreement with the experimental data in the literature. Furthermore, this solver can be extended to solve
the hydrodynamics of complex fluids such as non-Newtonian flow (Xiao et al., 2014) as well. The solver is
therefore more suitable for the fast simulation of stirred tanks on a desktop computer without the need of much
finer grid resolution and supercomputers.
Acknowledgements
Financial support from National Key Research and Development Program of China (2017YFB0602500) and
National Natural Science Foundation of China (Grant No. 91634203) and Chinese Academy of Sciences
(Grant No 122111KYSB20150003) are acknowledged.
Nomenclature
Latin
b number of discrete velocity in lattice model
C model parameter
Ccoh model parameter
Ck model parameter for unresolved turbulent kinetic energy
Cs model parameter
cs sound speed in LBM
D diameter of the impeller, mm
F force in Cartesian system
Fm force term in moment space
22
Fb force exerted on a boundary marker point
Ff feedback force on fluid nodes from marker point
f discrete velocity distribution function vector
f frequency
G Interpolation function
H height of standard Rushton turbine stirred tank
I identity matrix
k total turbulent kinetic energy
m discrete velocity distribution function vector in moment space
eq
m equilibrium moment vector
N rotational speed
nx grid number assigned for x direction
ny grid number assigned for y direction
nz grid number assigned for z direction
Q second invariable of strain rate
Re Reynolds number
S diagonal matrix or collision matrix
T diameter of standard Rushton turbine stirred tank
tcomp computational time
Ud desired or imposed velocity on boundary maker points
Ufluid velocity of boundary maker points interpolated from their adjacent fluid nodes
v0 kinematic viscosity of fluid
vt sub-grid viscosity
Greek letters
Δ filtered width in LES
ε turbulent dissipation rate
η Kolmogorov length scale
θ angle
ρ density of fluid
τ0 non-dimensional relaxation time
Abbreviations
CAD Computer Aided Design
CFD Computational Fluid Dynamics
CPU Central Processing Unit
CUDA Compute Unified Device Architecture
23
DFT Discrete Fourier Transform
FLOPS Floating Point Operations Per Second
GPU Graphics Processing Unit
IBM Immersed Boundary Method
LBM Lattice Boltzmann Method
LES Large Eddy Simulation
LDA Laser Doppler velocimetry
LN updating time step
MI Macro-Instability
MLUPS Million Lattice Units updates Per Second
MRF Multiple Reference Frame
MRT Multiple Relaxation Time
PSD power spectral density
SIMD single instruction multiple data
Smag Smagorisky model
SRT Single Relaxation Time
ST stirred tank with a standard Rushton turbine
Subscripts
b number of velocity distribution function
rad radial
tip outer tip of impeller
References
D'Humieres, D., Ginzburg, I., Krafczyk, M., Lallemand, P., Luo, L.S., 2002. Multiple-relaxation-time lattice
Boltzmann models in three dimensions. Philos Trans A. 360, 437-451.
Derksen, J., Van den Akker, H.E.A., 1999. Large eddy simulations on the flow driven by a Rushton turbine.
AIChE J. 45, 209-221.
Derksen, J.J., 2003. Numerical simulation of solids suspension in a stirred tank. AIChE J. 49, 2700-2714.
Derksen, J.J., 2011. Blending of miscible liquids with different densities starting from a stratified state.
Comput. Fluids. 50, 35-45.
Derksen, J.J., 2012. Direct Simulations of Mixing of Liquids with Density and Viscosity Differences. Ind. Eng.
Chem. Res. 51, 6948-6957.
Escudié, R., Liné, A., 2003. Experimental analysis of hydrodynamics in a radially agitated tank. AIChE J. 49,
585-603.
Galletti, C., Paglianti, A., Lee, K.C., Yianneskis, M., 2004. Reynolds number and impeller diameter effects on
24
instabilities in stirred vessels. AIChE J. 50, 2050-2063.
Gillissen, J.J.J., Van den Akker, H.E.A., 2012. Direct numerical simulation of the turbulent flow in a baffled
tank driven by a Rushton turbine. AIChE J. 58, 3878-3890.
Guo, Z., Zheng, C., Shi, B., 2002. Discrete lattice effects on the forcing term in the lattice Boltzmann method.
Phys. Rev. E. 65, 046308.
Hartmann, H., Derksen, J.J., Montavon, C., Pearson, J., Hamill, I.S., van den Akker, H.E.A., 2004a.
Assessment of large eddy and RANS stirred tank simulations by means of LDA. Chem. Eng. Sci. 59, 2419-
2432.
Hartmann, H., Derksen, J.J., van den Akker, H.E.A., 2004b. Macroinstability uncovered in a Rushton turbine
stirred tank by means of LES. AIChE J. 50, 2383-2393.
Hartmann, H., Derksen, J.J., van den Akker, H.E.A., 2006a. Mixing times in a turbulent stirred tank by means
of LES. AIChE J. 52, 3696-3706.
Hartmann, H., Derksen, J.J., van den Akker, H.E.A., 2006b. Numerical simulation of a dissolution process in a
stirred tank reactor. Chem. Eng. Sci. 61, 3025-3032.
Kang, S.K., Hassan, Y.A., 2011. A comparative study of direct-forcing immersed boundary-lattice Boltzmann
methods for stationary complex boundaries. Int. J. Numer. Meth. Fl. 66, 1132-1158.
Krafczyk, M., Tolke, J., Luo, L.S., 2003. Large-eddy simulations with a multiple-relaxation-time LBE model.
Int. J. Mod. Phys. B. 17, 33-39.
Lallemand, P., Luo, L.S., 2000. Theory of the lattice Boltzmann method: Dispersion, dissipation, isotropy,
Galilean invariance, and stability. Phys. Rev. E. 61, 6546-6562.
Lu, J., Han, H., Shi, B., Guo, Z., 2012. Immersed boundary lattice Boltzmann model based on multiple
relaxation times. Phys. Rev. E. 85, 016711.
Luo, H., Svendsen, H.F., 1996. Theoretical model for drop and bubble breakup in turbulent dispersions. AIChE
J. 42, 1225-1233.
Peskin, C.S., 2003. The immersed boundary method. Acta Numerica 11.
Qian, Y.H., D'Humieres, D., Lallemand, P., 1992. Lattice BGK model for Navier-Stokes equation. Europhys.
Lett. 17, 479-484.
Roussinova, V., Kresta, S.M., Weetman, R., 2003. Low frequency macroinstabilities in a stirred tank: scale-up
and prediction based on large eddy simulations. Chem. Eng. Sci. 58, 2297-2311.
Roussinova, V.T., Grgic, B., Kresta, S.M., 2000. Study of Macro-Instabilities in Stirred Tanks Using a Velocity
Decomposition Technique. Chem. Eng. Res. Des. 78, 1040-1052.
Shu, S., Yang, N., 2013. Direct Numerical Simulation of Bubble Dynamics Using Phase-Field Model and
Lattice Boltzmann Method. Ind. Eng. Chem. Res. 52, 11391-11403.
Shu, S., Yang, N., 2017. Numerical study and acceleration of LBM-RANS simulation of Turbulent Flow.
Accepted by Chin. J. Chem. Eng.
Tennekes, H., Lumley, J.L., 1972. A first course in turbulence. MIT press.
25
Tolke, J., Krafczyk, M., 2008. TeraFLOP computing on a desktop PC with GPUs for 3D CFD. Int. J. Comput.
Fluid D. 22, 443-456.
Wang, T., Wang, J., Jin, Y., 2003. A novel theoretical breakup kernel function for bubbles/droplets in a
turbulent flow. Chem. Eng. Sci. 58, 4629-4637.
Xian, W., Takayuki, A., 2011. Multi-GPU performance of incompressible flow computation by lattice
Boltzmann method on GPU cluster. Parallel Comput. 37, 521-535.
Xiao, Q., Yang, N., Zhu, J.H., Guo, L.J., 2014. Modeling of Cavern Formation in Yield Stress Fluids in Stirred
Tanks. AIChE J. 60, 3057-3070.
Yeoh, S.L., Papadakis, G., Yianneskis, M., 2004. Numerical simulation of turbulent flow characteristics in a
stirred vessel using the LES and RANS approaches with the sliding/deforming mesh methodology. Chem. Eng.
Res. Des. 82, 834-848.
Yeoh, S.L., Papadakis, G., Yianneskis, M., 2005. Determination of mixing time and degree of homogeneity in
stirred vessels with large eddy simulation. Chem. Eng. Sci. 60, 2293-2302.
Zhang, Y.J., 2012. Image-based geometric modeling and mesh generation. Springer Science & Business Media.
Appendix
The discrete velocity vector ei in D3Q19 model is
0 1 1 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 
ei  0 0 0 1 1 0 0 1 1 1 1 0 0 0 0 1 1 1 1
0 0 0 0 0 1 1 0 0 0 0 1 1 1 1 1 1 1 1
The discrete velocity function distribution in moment space for D3Q19 model are computed from the discrete
velocity function distribution,
m = (δρ,e,ε, jx ,qx , j y ,qx , jz ,qz ,3pxx ,3π xx , pww ,πww , pxy , p yz , pxz ,mx ,my ,mz )
The equilibrium moment meq for D3Q19 model are computed from the macro variables (D'Humieres et al.,
2002),
 2 2 2 
meq    , e,  , ux ,  ux , u y ,  u y , uz ,  uz ,2ux2  u 2y  uz2 ,0, u 2y  uz2 ,0, ux u y , u y uz , ux uz ,0,0,0 
 3 3 3 

e  11  19 ux2  u2y  uz2 
 
475 2
63

ux  u 2y  uz2 
The force term in moment space Fm for D3Q19 model is
2 2 2
Fm  (0,38F  u,0, Fx ,  Fx , Fy ,  Fy , Fz ,  Fz ,6 Fx ux  2F  u, F  u  3Fx ux ,
3 3 3
2 Fy u y  2 Fz uz , Fz uz  Fy u y , Fxu y  Fy ux , Fzu y  Fy uz , Fzu x  Fxuz ,0,0,0)
The transformation matrix Mij for D3Q19 model is
26
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 30 11 11 11 11 11 11 8 8 8 8 8 8 8 8 8 8 8 8
 
 12 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1
 
 0 1 1 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0
 0 4 4 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0
 
 0 0 0 1  1 0 0 1 1 1 1 0 0 0 0 1 1 1 1
 0 0 0 4 4 0 0 1 1 1 1 0 0 0 0 1 1 1 1
 
 0 0 0 0 0 1  1 0 0 0 0 1 1 1 1 1 1 1 1
 0 0 0 0 0 4 4 0 0 0 0 1 1 1 1 1 1 1 1
 
M  0 2 2 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 
 0 4 4 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 2 
 
 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0
 
 0 0 0 2 2 2 2 1 1 1 1 1 1 1 1 0 0 0 0
 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
 
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0
 
 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0
 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
 
 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 
Qij in the D3Q19 model in MRT methods is calculated from the following equations:
Pxx  571  30  e  pxx
Pyy  Pxx  1.5 pxx  0.5 pww
Pzz  Pyy  pww
Pxy  pxy , Pyz  p yz , Pxz  pxz
Qmn  13  mn  jm jn  Pmn
27
Highlights
1. A GPU-accelerated LBM-LES solver for desktop computation of stirred tanks is developed.
2. 1,500-fold speedup over the transient CFD simulation with 16 CPU cores was achieved.
3. Coherent LES can resolve more turbulent kinetic energy than the Smagorinsky model.
4. Coherent LES reproduce the monotonic decrease of turbulent kinetic energy along radial direction.
28

Accepted Manuscript: Chemical Engineering Science

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Accepted Manuscript: Chemical Engineering Science

Uploaded by

Copyright:

Available Formats

Accepted Manuscript

GPU-Accelerated Large Eddy Simulation of Stirred Tanks

Shuli Shu, Ning Yang

To appear in: Chemical Engineering Science

Received Date: 12 December 2017

Shuli Shu, Ning Yang*

2. Models and numerical methods

2.1 Flow solver

and the collision step is then implemented by

fi  x  ei t , t  t   fi '  x, t   M ij1m'j (4)

The macro variables density ρ and velocity u can be calculated by

where F is the force in the Cartesian coordinate system.

2.2 Turbulence modeling

2.3 Modeling of complex geometry

U *fluid ,b   ijk G( xb  xijk )u

Ff   b G( xijk  xb )Fb (16)

Fig. 1. Geometry of a stirred tank of Rushton turbine.

Table 1．Simulation cases.

5. Results and discussion

5.1 GPU acceleration

5.2 Instantaneous flow field

Fig. 7. Locations of two monitor points for MI analysis.

5.3 Quantitative estimation of mean flow

(a) Smagorinsky LES; (b) coherent LES.

The transformation matrix Mij for D3Q19 model is

Pyy  Pxx  1.5 pxx  0.5 pww

Pzz  Pyy  pww

Pxy  pxy , Pyz  p yz , Pxz  pxz

Qmn  13  mn  jm jn  Pmn

You might also like