You are on page 1of 22

Computers and Mathematics with Applications 79 (2020) 66–87

Contents lists available at ScienceDirect

Computers and Mathematics with Applications


journal homepage: www.elsevier.com/locate/camwa

An efficient algorithm for the calculation of sub-grid


distances for higher-order LBM boundary conditions in a GPU
simulation environment
D. Mierke *, C.F. Janßen, T. Rung
Institute for Fluid Dynamics and Ship Theory, Hamburg University of Technology, Am Schwarzenberg-Campus 4, 21073 Hamburg,
Germany

article info a b s t r a c t
Article history: This paper presents a new and efficient algorithm for the calculation of sub-grid distances
Available online 16 August 2018 in the context of a lattice Boltzmann method (LBM). LBMs usually operate on equidistant
Cartesian grids and represent moving geometries by either using immersed boundary
Keywords: conditions or dynamic fill algorithms in combination with slip or no-slip boundary condi-
LBM
tions. In order to obtain sufficiently high geometric accuracy, the sub-grid distances from
Grid generation
Eulerian fluid nodes on uniform and structured grids to a tessellated triangular surface
Sub-grid distances
Linear-interpolated bounce-back mesh have to be calculated. The proposed algorithm extends a previously published grid
Graphics processing unit (GPU) generation procedure by an efficient calculation of sub-grid distances. The algorithm is
optimized for massively parallel execution on graphics processing units (GPUs). Based
on a linearized representation of the obstacle surface, surface normal vectors are com-
puted and stored, which then serve to compute the sub-grid distances. This saves GPU
memory, re-uses information that is available from the surface voxelization step, and
has shown to be very accurate and efficient for the implementation in a state-of-the-art
LBM-GPU solver.
© 2018 Elsevier Ltd. All rights reserved.

1. Introduction

In computational fluid dynamics, efficient numerical approaches based on accurate and robust modeling of viscous
flows are highly appreciated. In this context, the Lattice-Boltzmann-Method (LBM) has become a valuable alternative to
other conventional approaches for solving a variety of complex problems in computational fluid dynamics. LBM offers
performance-related advantages, in terms of data locality and parallel computing. When dealing with solid bodies in the flow,
boundary conditions are an essential part of the computational technique. Since the method typically operates on equidistant
Cartesian grids, high-order solid boundary conditions rely on sub-grid distances from the wall-adjacent lattice nodes to the
obstacle surface. Consequently, a fast and accurate approach is necessary for their determination. This in particular holds for
moving obstacle boundaries, where the distances change in every time step and a re-mapping is necessary.
In this contribution, we present a novel and accurate approach to determine the sub-grid distances efficiently. The new
method is advantageous for three main reasons:

1. The method is based on a very efficient, thread-parallel surface voxelization technique [1]. The base method was found
to be very suitable for a GPU implementation, as it scales well with the available number of threads. The suggested
method uses a similar parallelization pattern and, hence, benefits similarly from massively-parallel architectures.

* Corresponding author.
E-mail addresses: dennis.mierke@tuhh.de (D. Mierke), christian.janssen@tuhh.de (C.F. Janßen), thomas.rung@tuhh.de (T. Rung).

https://doi.org/10.1016/j.camwa.2018.04.022
0898-1221/© 2018 Elsevier Ltd. All rights reserved.
D. Mierke, C.F. Janßen and T. Rung / Computers and Mathematics with Applications 79 (2020) 66–87 67

Fig. 1. The linear-interpolated bounce-back (LIBB) boundary condition [3].

2. The method for calculating sub-grid distances is re-using information that is available in the grid generation step
(surface voxelization). The sub-grid distances are reconstructed out of available normal distance vector information
at the wall-adjacent lattice nodes. This leads to a high computational efficiency of the full algorithm, that does not
significantly slow down the overall CFD computations.
3. Additional memory-saving simplifications are tested. Using the fact that the normal distance vectors are solely valid
in a specific range of values, more suitable data types with reduced codomain can be used. This way, the memory
consumption can be further decreased by a factor of up to 4 without significantly reducing the accuracy or the
computational performance.

After a quick presentation of the basics of the LBM and the resulting requirements for efficient grid generation techniques
in Section 2, the new algorithm for the calculation of LBM sub-grid distances is presented in Section 3. A detailed convergence
study with state-of-the-art validation cases in Section 4 shows that the proposed numerical method is able to reproduce
accurate results in a very competitive computational time. Finally, selected applications demonstrate that the method is a
viable tool for fast and efficient simulations of complex fluid–structure interactions, see Section 5.

2. Scope of the work

The newly proposed grid generation method is designed for lattice Boltzmann based solvers that solve the lattice
Boltzmann equation (LBE)

fi (t + ∆t , x + ∆t ei ) − fi (t , x) = Ωi (1)

on equidistant Cartesian grids with grid spacing ∆x, time step size ∆t and a set of particle velocities ei with the corresponding
particle distribution function (PDF) fi [2]. The left hand side of the LBE contains the transient change of PDFs and the PDF
advection, while the discrete collision operator Ωi on the right hand side models the particle interactions. The velocity
discretization reduces the original continuous BE, which is a complex integro-differential equation in space, time and particle
velocity space, to a set of discrete Boltzmann equations, which merely have to be discretized in space and time. If the lattice
constant c is chosen consistent with grid spacing and time step size, c = ∆x/∆t, the discretized Boltzmann equation is
reduced to the simplified form as presented above in Eq. (1). This implies that a particle (or a PDF, respectively) is advected
exactly onto one of its next neighbors during a discrete time step.
The algorithm consists of a collision and a subsequent advection step. This procedure implies that right after the advection
step, particle distribution functions are missing near the boundaries of the computational domain and boundary conditions
have to be applied. In analogy to conventional Navier–Stokes solvers, various different types of boundary conditions are
available. No-slip boundaries are often modeled by a simple bounce-back scheme (SBB) [3], where the PDFs that hit the
obstacle surface during propagation are bounced-back into the computational domain, fI = fi with I as the opposite
direction to i. However, this corresponds to a fluid–solid interface that is approximately half-way between the two nodes
under consideration, which is why SBB typically is referred to as half-way bounce-back. Apart from a viscosity-dependent
approximation error, the no-slip wall is found to be in the middle of two lattice nodes. For complex flow problems with
arbitrarily shaped boundaries, such a strong simplification is not admissible at all, as it would severely reduce the quality
of the resulting flow fields. To overcome this problem, a linear-interpolated bounce-back (LIBB) boundary condition was
proposed [3] and is shown in Fig. 1.
68 D. Mierke, C.F. Janßen and T. Rung / Computers and Mathematics with Applications 79 (2020) 66–87

Here, the PDFs are interpolated, depending on the actual sub-grid distance q of the wall-near lattice node to the obstacle
surface:
e ·v

⎨(1 − 2q) fi,tA + 2q fi,tS − 6ρwi i 2 ∥
⎪ q ≤ 0.5
t +1
fI ,A = (2q − 1) c (2)
1 ei · v ∥

⎩ fIt,E + fi,tS − 3ρwi 2
q > 0.5
2q 2q qc
with the sub-grid distance
AB
q= with q ∈ [0, 1) , (3)
AS
the velocity of the obstacle surface v∥ and weighting factors ω.
LIBB schemes have successfully been applied in many two- and three-dimensional LBM simulations, including rather
complex engineering applications with geometries of arbitrary shapes [4]. For conventional simulations, the sub-grid
distances q are computed once, during the preprocessing stage of the simulation. In case of a fluid–structure interaction
problem, where the geometries potentially move, the sub-grid distances q have to be recomputed efficiently at every time
step.

2.1. LBM grid generation

Grid generation in the context of the LBM essentially is related to the voxelization of triangular meshes on Eulerian grids.
Many of these methods have been proposed in recent years, both in the field of LBM and the field of computer graphics, as the
underlying technique (voxelization) is – by nature – related to rendering purposes. Two classes of voxelization techniques
can be identified. Solid voxelization methods identify all the interior voxels of a solid body, as e.g. reviewed in detail by Liao
[5]. Solid voxelization, however, is not in the scope of this work, as we are interested in the identification of boundary
nodes only. In the case where only the surface of an object is voxelized, surface voxelization, the algorithmic challenge is to
identify as many voxels as possible to obtain a closed surface, but at the same time mark as little voxels as possible to save
computational resources. The fundamentals of surface voxelization techniques have been discussed in e.g. Cohen-Or and
Kaufman [6] and Huang et al. [7]. A topological approach to surface voxelization can be found in Laine [8].
Existing LBM grid generators seem to be quite independent on the research that took place in the field of graphics
processing and visualization, mostly due to historical reasons. Serial and parallel implementations of voxelization methods
of various kinds on both CPU and GPU architectures can be found. Following Freudiger [9] and Geller [4], three distinct
methods for the LBM grid generation around complex three-dimensional geometries can be identified:

- Half-plane based algorithms, that are based on the fact that for convex geometries all edges between arbitrary pairs of
vertices are located inside the geometry. One of the main drawbacks of this method consequently is the limitation to
convex polyhedra [10].
- The solid-angle-algorithm, that sums up the relative angles from the point under consideration to each polyhedron
vortex. The result indicates whether the point is inside or outside the polyhedron, see Haines [11] (2D) and Carvalho
and Cavalcanti [12] (3D).
- Ray crossing algorithms, that emit a number of rays from the point under consideration and count the ray intersections
with the surface of the polyhedron [13–15].

All the previously mentioned techniques focus on solid voxelization. Every grid node is tested individually and the
wall-near nodes are not calculated explicitly. For an increase in efficiency, these three algorithms can be combined with
space-partitioning schemes based on octree or k-d tree data structures. Alternatively, surface voxelization techniques can
be applied. Janßen et al. [1] recently presented a very efficient surface voxelization algorithm, that was designed for massively
parallel processors such as GPUs. The method identifies the first layer of Eulerian lattice nodes inside a solid obstacle that
is described by a triangulated surface mesh, as depicted in Fig. 2. In the case where a solid voxelization is required, an
additional fill algorithm is applied to fill the inside. All grid nodes inside the axis-aligned bounding box of each discrete
segment of the geometry are processed by one computational thread on the GPU. Thanks to a unique dissection of the inner
body shell, the algorithm can be executed in a thread-parallel environment and, on top, benefits from the high performance
of GPU devices. The authors of [1] report durations for the grid update that are below the wall-clock time of a single LBM
time step. Hence, the method is well-suited for the application in fluid–structure interaction problems and serves as a basis
of the higher-order additions that are developed in the scope of this work.

2.2. Methods to calculate and store the normalized sub-grid distance

The algorithms for the calculation of sub-grid distances are closely related to the procedures that are used for the actual
grid generation process. Ray crossing algorithms, that are conceptually based on clipping rays, allow for the calculation of q
on-the-fly: simply calculating the length of a ray that is intersecting the geometry yields the sub-grid distance q. Half-plane
based or solid-angle algorithms need an additional reconstruction step, as they cannot provide the distance information for
D. Mierke, C.F. Janßen and T. Rung / Computers and Mathematics with Applications 79 (2020) 66–87 69

(a) The discrete obstacle geometry to be (b) Determine first inner and outer node (c) Generate an axis aligned grid bounding
mapped to the grid. layer along each discrete geometry seg- box for each discrete geometry segment.
ment.

(d) Process only the grid nodes inside the (e) Resulting grid. Apply flood fill algo- (f) Legend.
intersecting area ∩ by one thread rithm inside the geometry (optional).
each.

Fig. 2. Two-dimensional example of the grid generator by Janßen et al. [1]. The green area marks the first node layer along the discrete segment of a
geometry. Only the first closed inner node layer of the geometry is marked as obstacle (red squares). The boundary condition is processed on the first
closed outer node layer (blue triangles) of the geometry. (For interpretation of the references to color in this figure legend, the reader is referred to the web
version of this article.)

free. The recently proposed grid generator [1] that is the basis for this work needs to be extended to calculate the distances
as well.
When developing a new algorithm for a high-performance computing implementation, attention has to be given to the
hardware details of the actual computational device as well. The algorithm has to take care of capacity and bandwidth of the
corresponding memory architecture to harness the full power of the computational device. Furthermore, when using SIMT
architectures, if conditions in the key parts of an algorithm should be avoided, as they introduce instruction branches which
can significantly slow down the computation [16].
From an algorithmic point of view, three different options for calculating q exist:

1. Calculation of q on-the-fly, when applying Eq. (2), without storing them explicitly. This method is rather inefficient,
since each computational thread has to deal with both the determination of q and the interpolation of the missing
particle distribution functions f afterwards. In particular, the determination of q is strongly memory bounded, if no
search tree technique (like a k-d tree for the vertices of the geometry) is used to identify the correct geometry segment
that is assigned to the node under consideration. Moreover, information that may be available in the grid generation
step cannot be re-used and may have to be calculated twice. The advantage of this method is, that no additional device
memory is required, since the q are generated locally.
2. Generate q during the grid generation process and store them in extra memory. Since it is not possible to know
the required memory size (i.e. the number of q per specific lattice node) in advance, a flexible storage solution is
required. In a CPU environment, data types of varying lengths can be used. On a GPU, however, no additional memory
can be allocated permanently by a computational thread, so that the maximum theoretically required memory per
node would have to be allocated. Nevertheless, the method is more efficient than the previous one, because many
intermediate results are already available during grid generation and can be reused. This concept has been successfully
applied in various CPU-based LBM frameworks, e.g. VirtualFluids [4,9].
3. Generate q during the grid generation process and store them in allocated, but temporarily unused memory. This
technique was recently suggested by Obrecht et al. [17] to overcome the issues of the previously presented technique.
Instead of allocating extra memory for the q, the sub-grid information is stored at the wall-adjacent grid nodes
inside the obstacle. As these nodes do not carry out collision and propagation steps, the memory that was initially
allocated for storing the particle distribution functions f can be used (see also Fig. 3a). This technique was successfully
implemented and tested in a GPU context [17]. One advantage is that no additional memory is required. Nevertheless,
if the LBM-propagation step is implemented as a push scheme in contrast to the pull scheme in [17], certain
70 D. Mierke, C.F. Janßen and T. Rung / Computers and Mathematics with Applications 79 (2020) 66–87

(a) Conflict-free storing of the q at the obstacle (b) Conflict-ridden storing of the q at the obstacle grid node due to a race
grid node in reversed direction. condition. Values of q and f are stored to the same storage.

Fig. 3. Option 3 in action, with a push propagation scheme that stores the normalized sub-grid distances q at the same memory location as the unused
particle distribution functions f at the obstacle nodes. During one time step, the f (black arrows) are propagated to the neighboring nodes, while the q
(blue arrows) are stored inside the unused storage of the f at the obstacle grid nodes. This can lead to a race-condition and is not conflict-free for specific
geometry constellations. For a propagation step based on a pull scheme, this race-condition does not occur, see Obrecht et al. [17]. (For interpretation of
the references to color in this figure legend, the reader is referred to the web version of this article.)

∥pq −pA ∥
Fig. 4. Reconstruction of the normalized sub-grid distance q = ∥pS −pA ∥ . The position of the link point pq can be calculated depending on r1 , r2 and ei as
shown later. The obstacle distance vectors r1 and r2 are located outside respectively inside the obstacle geometry at the grid nodes pA resp. pS and are
orthogonally pointing on the nearest obstacle geometry segment.

configurations may lead to memory race-conditions. This can happen if a particle distribution function f as well as a
sub-grid distance q are to be stored at the same memory location (Fig. 3b). This does not occur for rather academic
test geometries, but turns into a real problem for complex, arbitrarily shaped geometries. Moreover, in the case where
information on the wall distance or the wall normal direction are required, several non-local read operations are
needed to gather the q information to reconstruct the normal vectors, independent of the propagation scheme.

All three methods are not well suited for being applied to our target applications, the complex fluid–structure interaction
simulations with multiple, potentially colliding rigid bodies, in a high-performance computing environment such as
GPUs.

3. Details of the proposed method

The new method is based on a piecewise linear approximation of the geometry, either based on polygon segments (2D)
or planar surface triangles (3D). The sub-grid distances q are reconstructed using the orthogonal obstacle distance vectors
r1 and r2 at the first outer and inner grid nodes denoted as pA and pS in Fig. 4. The obstacle distance vectors ri have to point
to the nearest segment of the geometry to discretize the geometric shape with second-order accuracy. This allows a unique
reconstruction of the sub-grid distances q, without having to access the full geometry information again and at comparably
low additional cost.
Moreover, in comparison to methods that store the full set of q in extra memory, the additional memory requirements
only depend on the grid dimension m instead of the number of non-zero link directions n − 1 for a DmQn-LBM model.
For the D3Q27-LBM model, e.g., the additional number of floating point variables per node decreases from up to 26 to just
about 3. In the same way, the utilized memory bandwidth is reduced considerably. Furthermore, the computational cost to
reconstruct the q is significantly lower than to calculate them from scratch, as the information on ri is available from the
surface voxelization step anyway. Additionally, the normal vector information at the wall-adjacent nodes can be re-used
as well, e.g. for simulations involving slip boundary conditions, additional free-surface capturing models, turbulent wall
functions or RANSE-based turbulence models, that all rely on the wall distance information or the wall normal vectors.
D. Mierke, C.F. Janßen and T. Rung / Computers and Mathematics with Applications 79 (2020) 66–87 71

Fig. 5. Triangle of a triangulated surface mesh, consisting of its vertex points pi , edges vi , the unit normal n and the onto a grid node pointing orthogonal
vector vP as well as its unit edge and unit vertex normal vectors hi respectively ki .

In the following two subsections, the surface voxelizer by [1] is briefly reviewed and the necessary extensions to overcome
previous limitations of the method are presented, before explaining the actual calculation of the sub-grid distances q. In
Section 3.4 the limits of the method are discussed, which are then qualitatively examined in Section 4.

3.1. Step 1: surface voxelization and generation of the orthogonal distance vectors r

To reconstruct the sub-grid distances q, orthogonal distance vectors r1 and r2 are required. These vectors describe the
characteristic shape of the discrete obstacle geometry and can be generated by the employed grid generator on-the-fly.
Note that the calculation of the r1 and r2 for the first inner and the outer closed grid node layer of the obstacle geometry
is sufficient (red squares and blue dots in Fig. 2). Since it is irrelevant whether a grid node is present inside or outside the
geometry, the previously introduced index notation will be neglected in the following. Important to the proposed method
is that the obstacle distance vectors r always describe the shortest distance to the geometry by pointing orthogonally to
the closest discrete segment of the geometry. Moreover, it is possible to handle all geometry segments independently by
processing all grid nodes inside the bounding box of each segment by one computational thread.

3.1.1. Basics
Consider a triangle as a planar segment of a three-dimensional discrete geometry, consisting of its vertex points p1 , p2
v ×v
and p3 , its edge vectors v1 = p2 − p1 , v2 = p3 − p2 and v3 = p1 − p3 and its outward-pointing unit normal n = ∥v1 ×v2 ∥ , see
1 2
Fig. 5.
The vector vP , pointing orthogonally from the triangle to an arbitrary grid node point pP , is calculated by projection
v1P · n
vP = n = (v1P · n) n , (4)
n·n
where v1P points from the vertex point p1 to pP :

viP = pP − pi ∀ i ∈ {1, 2, 3} . (5)


Subsequently, the orthogonal distance vector is determined and normalized by
1
r =− vP . (6)
∆x
Note that, by definition, the obstacle distance vectors r point to the nearest geometry segment, so that their absolute
magnitude√is bounded. In two-dimensional
√ space, the maximum possible length of the vector r can be calculated with
rmax (ϕ ) = |sin(2 ϕ )| + 1 ≤ D, where D describes the dimension, see Fig. 6. A more general formulation, that is applicable
in 3D as well, can be developed with the approach by [1]:
√ r
rmax = |r̂x | + |r̂y | + |r̂z | ≤ D, with r̂ = . (7)
∥r ∥
As discussed in [1], the thickness ∆ of the first wall-adjacent layer of nodes is defined as

∆ = ∆ x ( | nx | + | ny | + | nz | ) . (8)
Subsequently, it is possible to determine whether a grid node is located inside the first node layer of a geometry segment
(green area in Fig. 2). If the orthogonal distance s = ∆x ∥r ∥ = |v1P · n| is smaller than the layer thickness,

0 ≤ s < ∆, (9)
the lattice node is inside the first layer and has to be marked.
72 D. Mierke, C.F. Janßen and T. Rung / Computers and Mathematics with Applications 79 (2020) 66–87

Fig. 6. Two-dimensional illustration of the range of values of all valid obstacle distance vectors r which point orthogonally to the nearest geometry segment
to form a first node layer.

Fig. 7. Two-dimensional illustration of the unit edge normal vector hi of two neighboring triangles, which share the edge vector vi (pointing inwards the
paper plane). The separation plane Hi (dashed) is spanned by hi and vi . It allows to determine which geometry segment is the nearest to a given grid node
and, therefore, on which segment the obstacle distance vectors r (red respectively blue) have to point to. (For interpretation of the references to color in
this figure legend, the reader is referred to the web version of this article.)

3.1.2. Required extensions


While the general grid generation procedure was explained in detail by Janßen et al. [1], special care has to be taken at
the boundaries of the geometry segments to determine unambiguously which segment is the closest to a given grid node
to achieve a second order accurate discretization of the geometrical shape. Since the grid nodes inside the axis aligned
bounding box of a geometry segment are processed in parallel by one thread each, race conditions may occur at grid nodes
of overlapping bounding boxes near the boundaries of adjacent geometry segments. These race conditions may primarily
result in obstacle distance vectors which do not point to the nearest geometry segment – leading to inaccurate geometry
discretization – and, secondly, to unrepeatable results. Both must be avoided, with additional conditions based on separation
planes at the edges respectively the vertices of the triangles to clarify the responsibility of the computational threads in a
geometrical manner. The two additional conditions are discussed in the following.
1. For each triangle edge, a separation plane Hi is constructed that is shared with an adjacent neighboring triangle. Such a
plane is defined by Hi = { x = pi + v vi + h hi | x ∈ R3 , v, h ∈ R } with the shared triangle edge vi and the edge unit normal
vector hi facing towards the outside of the geometry, see Fig. 7 for a two-dimensional illustration:
n + nNi
hi = , (10)
∥n + nNi ∥
with nNi as the unit normal vector of the adjacent triangle that shares the edge vi . For clarity, the plane normal vector nHi has
to face towards the focused triangle:

nHi = hi × vi . (11)

A grid point pP is closer to the focused triangle than to the neighboring one, if each of its relative vectors viP is facing the
same side of the separation plane Hi as its normal vector nHi :

viP · nHi ≥ 0 ∀ i ∈ {1, 2, 3} . (12)

A two-dimensional example of the resulting obstacle distance vectors is shown in Fig. 7.


2. Similarly, but only required in three-dimensional space, further conditions have to be met based on a separation plane
located at each vertex pi of the triangle. This separation plane Ki = { x = pi + k ki + v (ki × n) | x ∈ R3 , k, v ∈ R } is spanned
D. Mierke, C.F. Janßen and T. Rung / Computers and Mathematics with Applications 79 (2020) 66–87 73

v ·e
Fig. 8. Reconstruction of the normalized sub-grid distance q = eq·e i by using orthogonal obstacle distance vectors r1 and r2 . The vectors are located
i i
outside and inside the obstacle geometry at the grid nodes pA respectively pS and are orthogonally pointing to the nearest obstacle geometry segment.
1
Here: ei = ∆x (pS − pA ) cuts the segment 1 at the link point pq , since nB · rB ≥ 0 is valid.

by the unit vertex normal vector ki and the vector ki × n, while the normal vector of the plane

nKi = (ki × n) × ki = n − (ki · n) ki (13)


faces the focused triangle, see Fig. 5. The unit vertex normal vector ki is determined by the weighted sum of the unit normal
vectors nω of the set Ω of all triangles sharing the same vertex pi :
k̂i ∑
ki =   with k̂i = α ω nω . (14)
k̂i 
 
ω∈Ω

The weighting factor αω marks the opening angle of the triangle ω spanned by the vectors uω and wω at the vertex pi :
∥uω × wω ∥
αω = ∡(uω , wω ) = arcsin . (15)
∥uω ∥ ∥wω ∥
A grid point pP is closer to the focused triangle than to any of the neighboring ones, if each of its relative vectors viP is facing
the same side of the separation plane Ki as its normal vector nKi :

viP · nKi ≥ 0 ∀ i ∈ {1, 2, 3} . (16)


In summary, to avoid race conditions and to obtain a second order accurate discretization of the geometrical shape, the
obstacle distance vector r of a grid point pP has to point orthogonally to the surface of the focused triangle if the conditions
in (9) and (12) as well as (16) are all valid. The computational cost can be reduced, if the plane normal vectors nHi and nKi are
precomputed once for each triangle. Since both normal vectors locally stick to each triangle, a recalculation or transformation
is only required if the triangle rotates in time. This allows a highly efficient and unambiguous generation of the orthogonal
distance vectors r for arbitrary grid points pP at moderate computational effort.

3.2. Step 2: Reconstruction of normalized sub-grid distances q

Once the orthogonal distance vectors r1 and r2 are available, the normalized sub-grid distances q can be reconstructed.
This reconstruction is only based on the grid nodes pA respectively pS , see Fig. 8.
Since r1 and r2 describe the local and linearized shape of the discrete obstacle geometry, the accuracy of the reconstruction
depends on the ratio of the grid spacing to the discrete geometry resolution. This is investigated and discussed in detail in
Section 4. First, the method will be presented in 2D, before modifications and extensions are shown for the 3D application.
Note that the point pA is located outside and the point pS is located inside the obstacle geometry. The obstacle distance vectors
r1 and r2 are pointing from the outside respectively inside to the geometry segment under consideration. The outward-
pointing surface normal vectors of unit length, n1 and n2 , can be computed via
r1 r2
n1 = − and n2 = . (17)
∥r1 ∥ ∥r2 ∥
First, it is to investigate which of the obstacle geometry segments 1 or 2 are cut by the grid spacing vector
1
ei = (pS − pA ) , (18)
∆x
see Fig. 8, where ∆x describes the scalar grid spacing. For this purpose, the segment intersection point pB respectively the
ˆ b ≡ det(ab) by [18]:
vector vB has to be determined, e.g. with the two-dimensional cross product a×
( ) ( )
ˆ (ei + r2 + ř2 ) ř1 − r1 ×
(ei + r2 )× ˆ (r1 + ř1 ) ř2
vB = with ři ⊥ ri , (19)
ˆ 2
ř1 ×ř
74 D. Mierke, C.F. Janßen and T. Rung / Computers and Mathematics with Applications 79 (2020) 66–87

where ři is an arbitrary vector orthogonal to ri . Subsequently, the orthogonal vector rB to ei can be calculated by projection:
vB · ei
rB = vB − ei . (20)
ei · ei
Furthermore, the vertex normal nB has to be calculated by the normalized sum of the segment normals:
n1 + n2
nB = . (21)
∥n1 + n2 ∥
Depending on the orientation of the vectors nB and rB to each other it is possible to determine the relative distance vq to the
intersection point alias link point pq :

vq1 , (if ei cuts segment 1)


{
if nB · rB ≥ 0
vq = (22)
vq2 , if nB · rB < 0 (if ei cuts segment 2) ,
whereby the normalization of nB in Eq. (21) is optional. The link normal vector

n1 , (if ei cuts segment 1)


{
if nB · rB ≥ 0
nq = (23)
n2 , if nB · rB < 0 (if ei cuts segment 2)
can be established in a similar way, while the vectors vq1 and vq2 are given by
r1 · r1 (ei + r2 ) · r2
vq1 = ei and vq2 = ei . (24)
r1 · ei r2 · ei
Finally, the normalized sub-grid distance q can be calculated by projection and normalization:
vq · ei
q= . (25)
ei · ei
In the case where the orthogonal distance vectors r1 and r2 point on the same geometry segment (and therefore r1 ∥ r2 ),
the computation can be reduced as follows:
∥r1 ∥
q= , if r1 ∥ r2 , (26)
∥r1 ∥ + ∥r2 ∥
which can also be used for an approximative determination of q if r1 and r2 are nearly parallel.

3.2.1. Extension to 3D
In three-dimensional space, the point pB is located on an edge that is shared by the triangles 1 and 2. Therefore, rB is the
shortest vector between the skew lines described by the edge and the vector ei . The intersection line L = { x = u + s w |
x ∈ R3 , s ∈ R } between the two triangles is given by a position vector u and the orientation vector

w = n1 × n2 (27)

and is marked by the red dot in Fig. 8. Since there is an endless number of solutions for u on the intersection line, one
component of u has to be set to zero. It is important that the corresponding entry of w is not equal to zero. As an example
for uz ≡ 0, the vector u = (ux uy uz )T can be determined by
( ) [ ]−1 ( )
ux n n1y r1 · n1
= 1x . (28)
uy n2x n2y r2 · n2

Afterwards, the perpendicular vector rB between the skew lines ei and L can be calculated via
ei × w
rB = (u · n) n with n= . (29)
∥ei × w ∥
Then, the calculation of the normalized sub-grid distance q can be continued with Eq. (21).

3.3. Further memory usage reduction

Apart from the computational efficiency, the proposed grid generator offers room for further improvements concerning
the required memory of the obstacle distance vectors r, which are allocated at each grid node. Floating-point numbers are
typically stored as either 32-bit single-precision, 64-bit double-precision or sometimes 16-bit half-precision floating-point
variables, according to the IEEE Standard for Floating-Point
√ Arithmetic (IEEE Std 754-2008). Due to the very limited co-domain
of the obstacle distance vectors r of 0 < ∥r ∥ ≤ rmax ≤ D, following Eq. (7), some of the available memory switch states are
not used at all, e.g. exponents above 1, and valuable storage space remains unused. Consequently, in this section, alternative
D. Mierke, C.F. Janßen and T. Rung / Computers and Mathematics with Applications 79 (2020) 66–87 75

(a) Equidistant Cartesian coordinates from [−a, −a] (b) Stretched polar coordinates.
to [a, a].

Fig. 9. Discrete two-dimensional coordinate systems to store the obstacle distance vectors r in taylor-made fixed-point precision data types with
n = 24 = 16 switch states for each axis. Note that the condition 0 < ∥r ∥ ≤ rmax has to be valid.

data types are investigated to find a data type for the obstacle distance vectors r, that offers a higher memory efficiency
while keeping the same (or almost the same) level of accuracy for the overall procedure.
A first step to reduce the memory consumption would be to switch from single-precision to half-precision floating-point
variables with 2-byte in length for the obstacle distance vectors r. Nevertheless, not all possible memory states offered
by floating-point types will still be used. To achieve an even higher memory utilization an appropriate storage data type
has to be developed. While for conventional fixed-point data types, the data range cannot be modified, own taylor-made
data types can be used to achieve the desired behavior. Data types of either 2-byte (short) or 1-byte (char) in size offer
n = 22·8 = 65 536 respectively n = 21·8 = 256 discrete states that can be mapped to the co-domain of r by a proper
mapping procedure. Two possible mapping procedures are depicted in Fig. 9: Cartesian and polar coordinates respectively
mappings. Due to the different distributions of the discrete coordinates in both systems, the accuracy of the transformation
varies, in particular for the shortest and the longest obstacle distance vectors r.
A mapping to discrete equidistant Cartesian coordinates is a rather conservative approach. Discrete coordinates outside
the area 0 < ∥r ∥ ≤ rmax will never be used, so that the available memory states are not fully utilized. Moreover, due
to the low number of only four discrete coordinates near the origin, short obstacle distance vectors r will be represented
inaccurately.
In a polar mapping, the memory states are fully utilized and short obstacle distance vectors r will be represented as
accurately as longer ones, see Fig. 9b. { }
In Fig. 9, both coordinate systems are exemplarily discretized with n = 16 ∈ 2j | j ∈ N∗ states for each coordinate axis.
In case of the Cartesian system, it can be seen in Fig. 9a that the shortest obstacle distance vectors r can only be transformed
to 2D = 4 possible discrete coordinates x̂ ∈ {(7, 7), (7, 8), (8, 7), (8, 8)} while the stretched polar system offers n discrete
coordinates (r̂ , ϕ̂ ) = (1, i) with i ∈ {0, . . . , n − 1} resulting in a significant higher approximation accuracy. In contrast,
obstacle distance vectors in polar coordinates that are nearly rmax long can only be approximated by n discrete coordinates
(r̂ , ϕ̂ ) = (n − 1, i) whereas in Cartesian coordinates about 4n discrete states are available.
The maximum possible length of an arbitrary obstacle distance vector r can either be determined by Eq. (7) or by
cos θ cos ϕ sgn cos ϕ
( ) ( )
rmax = cos θ sin ϕ · sgn sin ϕ , with θ = 0 if D = 2 , (30)
sin θ sgn sin θ
where ϕ ∈ [−π , π) describes the azimuth and θ ∈ [−π/2, π/2] the latitude of r in spherical coordinates. The transformation
to discrete Cartesian coordinates (Fig. 9a) is done via
⌊ ⌉
n−1
fCartesian : U → V , x ↦ → x̂ ≡
D D
(x + a 1) (31)
2a

with 1 = (1, . . . , 1)T , the sets U = {x ∈ R | −a ≤ x ≤ a}, V = {i ∈ N | 0 ≤ i < n} and the maximum side width

1+ D
a= . (32)
2
76 D. Mierke, C.F. Janßen and T. Rung / Computers and Mathematics with Applications 79 (2020) 66–87

Fig. 10. Unfavorable resolution relationship between the grid spacing ∆x and the characteristic geometry segment length l: the normalized sub-grid
distance q is wrongly calculated based on the point p′q instead of pq . The reason is, that the orthogonal distance vectors r1 and r2 are pointing to non-
adjacent geometry segments and no information of the intermediate segment is present in the grid. This leads to the wrong segment intersection point pB
which then is used to determine p′q .

Transformations to stretched polar or spherical coordinates (Fig. 9b) are done via
⎧ ( ⌊ ⌉)
n−1
r̂ ≡ max 1, ∥x∥





⎪ rmax
⎪ ⌊ ⌉
n−1

fpolar : U D → V D , x ↦→ ϕ̂ ≡ (arc tan2 (y, x) + π) (33)

⎪ 2π
π
⎪ ⌊ ( ( ) )⌉
n −1 z

⎩θ̂ ≡ , only if D = 3 .

arc sin +


π ∥x∥ 2
Note that the maximum function prevents a transformation to a vector r with zero length and therefore allows r̂ = 0 to be
used for debugging purposes. Keep in mind that the transformation to discrete stretched spherical coordinates could be more
optimized, e.g. by an appropriate balancing of the number of switch states for the individual coordinate axis and thereby the
size of the discrete spacings ∆r̂ = rnmax
−1
, ∆ϕ̂ = n2−π1 and ∆θ̂ = n−
π
1
. Nevertheless, the transformation given by Eq. (33) is a
reasonable choice and provides sufficiently accurate results, as shown later. For the proposed method, transformations to
discrete stretched spherical coordinates are preferred, because of the fact that the switch state utilization is higher and both
short and long vectors r can be approximated sufficiently accurate, unlike with Cartesian coordinates.

3.4. Concluding remarks on the expectations on convergence behavior and accuracy

The accuracy of the proposed method is characterized by the relationship between the LBM grid resolution and the
discrete geometry. Both are determined by the grid spacing ∆x respectively the characteristic geometry segment length
l, which is given by the segment length normalized with the grid spacing. The method only yields a numerically exact
solution for q, if the orthogonal distance vectors r1 and r2 are pointing to neighboring (or identical) geometry segments.
If any additional segments are located between them, the presented procedure for the calculation of the intersection point
pB possibly fails, similar to the subsequent calculations based on pB respectively vB and rB . This implication is also depicted
in Fig. 10.
To reduce this error, geometries have to be discretized fine enough. This is also recommended from a physical point of
view: the resolution of geometry and computational grid typically should be in the same order of magnitude, so that the
resulting error is almost negligibly small. Similar problems can occur, if two geometries are too close to each other, resulting
in orthogonal obstacle distance vectors r that are pointing to different geometry sets, essentially yielding a race condition.
For problems like this a workaround is to check, if the computed sub-grid distance q is in the range [0, 1) or not. If it is found
to be invalid, the method falls back to the simple bounce-back scheme by assuming q = 0.5.

4. Convergence study

In this section, the proposed method for the reconstruction of the normalized sub-grid distances q is analyzed in detail.
Two test cases are considered: a parametric circle that is discretized by a polygon (2D) and a sphere that is discretized
by equally sized triangles (3D). The discrete geometries are mapped to Cartesian LBM grids of different resolution and the
orthogonal obstacle distance vectors r are computed. Then, the normalized sub-grid distances q are reconstructed based on
r according to Eq. (25). The numerically exact reference solutions for both cases, q̂, are calculated based on the analytic circle
(2D) resp. a simple ray casting algorithm (3D). The error is measured by the L1 and L2 error norms:

i |qi − q̂i |
L1q ≡ L1 (q, q̂) = ∑ (34)
i |q̂i |
D. Mierke, C.F. Janßen and T. Rung / Computers and Mathematics with Applications 79 (2020) 66–87 77

(a) κ = 1/4, n = 16, l2D ≈ 1.56, (b) κ = 1/4, n = 64, l2D ≈ 0.39,
L1q ≈ 0.097, L2q ≈ 0.11, L1q ≈ 0.020, L2q ≈ 0.035.
parametric shaping depicted in green.

(c) κ = 1/32, n = 512, l2D ≈ 0.39, (d) κ = 1/512, n = 16 384, l2D ≈ 0.2,
L1q ≈ 2.1 × 10−3 , L2q ≈ 5.0 × 10−3 . L1q ≈ 1.7 × 10−4 , L2q ≈ 3.5 × 10−4 .

Fig. 11. Four selected configurations from the two-dimensional convergence study of a circle. The circle is discretized by n polygon segments, with the
normalized curvature κ and the normalized characteristic geometry segment length l2D . In addition, the L1q and L2q error norms of the reconstructed q are
given. The inner respectively outer orthogonal distance vectors r are depicted in red respectively blue. (For interpretation of the references to color in this
figure legend, the reader is referred to the web version of this article.)

√∑
i (qi − q̂i )2
L2q ≡ L (q, q̂) =
2
. (35)
q̂2i

i

In case of a randomly uniform distribution of qi ∈ [0, 1) and q̂i = 0.5 both L1q and L2q converge to 0.5. This results into
the same behavior as LBM’s simple bounce-back (SBB) method that constantly assumes q = 0.5 independently of the real
discrete q, which can be arbitrary. That means, whenever L1 or L2 is less than 0.5, the proposed method is more accurate than
a simple bounce-back scheme.
In the following, different configurations in 2D and 3D are analyzed to assess the convergence behavior of the algorithm.
As the test case is based on a piecewise linear approximation of the curved surfaces, the tests are run for varying normalized
curvatures κ = 1/R with R being the number of lattice nodes per radius of the respective geometry. Moreover, the number of
polygon segments and triangles n is varied. Results are also plotted over the characteristic segment length l. Also note that
the reconstructions of q are made in single arithmetic precision while the error norm measurements are made in double
arithmetic precision.

4.1. Two-dimensional convergence study: mapping of a circle

In 2D, the parametrical circle is approximated by a polygon with n segments and a characteristic segment length
l2D = 2/κ sin(π/n). The test cases vary in the normalized curvature κ = 1/i2 with i ∈ {1, . . . , 5} as well as the number of
segments in circumferential direction n = 2j with j ∈ {2, . . . , 22}.
Fig. 11 shows the geometry and the resulting orthogonal distance vectors r for four selected settings. For the case depicted
in Figs. 11a and 11b, a rather high normalized curvature was chosen, so that the discrepancy between the parametric shape
and the polygonal approximation becomes visible. In Fig. 11a, the low number of segments n = 16 and the resulting large
characteristic segment length l2D ≈ 1.56 yields relatively high error norms, L1q ≈ 0.097 and L2q ≈ 0.11. A further increase of
the number of segments to n = 64 (yielding l2D ≈ 0.39), leads to a significant reduction of the error norms to L1q ≈ 0.020
78 D. Mierke, C.F. Janßen and T. Rung / Computers and Mathematics with Applications 79 (2020) 66–87

Fig. 12. Convergence behavior of the reconstructed sub-grid distance q for three different refinement levels of a discretized circle. The green point marks
the exact solution. The reconstructed solutions are given by amber points: (a) a coarse polygon with l2D ≈ 1, (b) a finer polygon with l2D ≈ 0.5, and (c)
an infinitely fine polygon with l2D → 0. Note, that an infinite number of polygon segments does not lead to the exact solution. (For interpretation of the
references to color in this figure legend, the reader is referred to the web version of this article.)

and L2q ≈ 0.035 even for this under-resolved high-curvature case, see Fig. 11b. Figs. 11c and 11d show examples for higher-
resolved cases with 32 and 512 lattice nodes per radius and 512 resp. 16 384 polygon segments. The errors norms of the
discrete approximation of q become negligibly small, i.e. O(10−3 ) and O(10−4 ), respectively.
Results for the full study are plotted in Fig. 13: In Fig. 13a, the L1q and L2q error norms are plotted over the number of
polygon segments n for nine different dimensionless curvatures κ . It can clearly be seen, that the convergence depends on
two contributions. Firstly, a sufficiently high resolution of the curvature (in terms of lattice nodes per radius), that reduces the
error which is introduced by the piecewise linear approximation. Secondly, an adequately high number of polygon segments
in circumferential direction, that reduces the error which is induced by the basic concept of the mapping algorithm. For large
non-dimensional curvatures κ , a further increase in the number of polygon segments does not improve the result. Only for
sufficiently small κ , L1q and L2q converge with the order O(n−2 , κ ), before for all cases, an asymptotic limit is reached. Analyzing
convergence in terms of κ yields the order O(κ ), at least for sufficiently fine polygons. This strong coupling of κ and n is also
confirmed by the behavior for even finer polygon discretizations with n ≈ O(218 ). Here, a further increase in n results in
an increase of errors almost with the order of O(n, κ ). The reason for this is depicted in Fig. 12: even for an infinitely fine
resolved polygon, the sub-grid distances q are not reconstructed exactly, because the orthogonal obstacle distance vectors
are pointing to the tangents of the circle. No information of the infinitely short polygon segments in between are used. This
confirms what was already mentioned in Section 3.4 and emphasizes the importance of a well-chosen discretization level
of both the geometry and the computational grid.
In Fig. 13b, the L1q and L2q error norms are plotted over the characteristic polygon segment length l2D for the same set
of tests in order to obtain a recommendation for a proper geometry discretization. It can be seen that for l2D ≈ 0.5, the
minimum error is reached, almost independent on the actual curvature κ . This convergence limit is explained by the fact
that for l2D ≈ 0.5, exactly two polygon segments fit into one grid cell on the Cartesian LBM grid. This results in obstacle
distance vectors that are pointing to two different polygon segments. Any further refinement of the polygon does not add
new information on the shape of the geometry, the setup is saturated. As explained in Section 3.4, for l2D ≪ 0.5, information
is lost.

4.2. Three-dimensional convergence study: mapping of an icosphere

In three-dimensional space, an icosphere with n equally sized triangles is used to discretize the ideal sphere geometry,
see Fig. 14. The icosphere is generated in fixed-point
√ single precision with the open-source software Blender [19]. The
(√ )
characteristic triangle segment length l3D ≈ 4/κ π/ 3 n is approximately determined by the n-fold divided sphere surface.

The test cases vary in the normalized curvature κ with 1/κ ∈ {2, 4, 8, 16, 32, 64, 128, 185} as well as the number of triangles
n = 5 · 4k with k ∈ {1, . . . , 8}. Due to the higher memory consumption in 3D, the test case matrix is not as large as for the
2D analysis in Section 4.1.
Fig. 14 gives an overview over the test bed and exemplarily shows three of the test cases. For the case in Fig. 14a, a
relatively high normalized curvature of κ = 1/2 and a coarse discretization with only n = 80 triangle segments (yielding a
characteristic triangle segment length of l3D ≈ 1.2) leads to error norms of L1q ≈ 0.27 and L2q ≈ 0.37. Even for such coarse
discretizations, significantly lower errors than for a SBB scheme can be observed. Refining the icosphere resolution while
keeping κ constant (Fig. 14b) only leads to a small reduction of the errors norms, as already revealed by the 2D analysis.
Similarly, when refining both n and κ (Fig. 14c), the error norms again are significantly reduced.
In Fig. 15, the errors norms for the 3D case are plotted over the number of triangles n and the characteristic triangle
segment length l3D for different curvatures κ . In general, a behavior similar to the 2D analysis can be observed, taking into
account the smaller parameter space due to the higher memory requirements of the 3D test cases. First, for sufficiently low
dimensionless curvatures κ , the error is of order O(n−1 , κ ) until a value n(κ ), below which the order changes to O(κ ). A
D. Mierke, C.F. Janßen and T. Rung / Computers and Mathematics with Applications 79 (2020) 66–87 79

(a) L1q and L2q error norms over the number of polygon segments n.

(b) L1q and L2q error norms over the characteristic polygon segment length l2D = 2/κ sin (π/n). A choice of l2D ≈ 0.5
yields the lowest error norm for arbitrary κ .

Fig. 13. L1q and L2q error norms of the reconstructed sub-grid distances q in single-precision for different normalized curvatures κ of a discretized two-
dimensional circle. As an orientation, the four selected cases of Fig. 11 are marked (a)–(d) and the error norm limit for the SBB, L1,2 (q = 0.5, q̂) = 0.5, is
highlighted by a dashed horizontal line.
80 D. Mierke, C.F. Janßen and T. Rung / Computers and Mathematics with Applications 79 (2020) 66–87

(a) κ = 1/2, n = 80, (b) κ = 1/2, n = 1280, (c) κ = 1/64, n = 20 480, l3D ≈ 2.4, L1q ≈ 0.030, L2q ≈ 0.040.
l3D ≈ 1.2, L1q ≈ 0.27, L2q ≈ l3D ≈ 0.3, L1q ≈ 0.15, L2q ≈
0.37. 0.35.

Fig. 14. Selected test cases from the three-dimensional convergence study of the reconstructed sub-grid distances q with outer and inner orthogonal
distance vectors r, marked in blue respectively red. The sphere is discretized by n equally sized triangle segments. Given are the normalized curvature κ ,
the characteristic triangle segment length l3D and the L1q and L2q error norms. (For interpretation of the references to color in this figure legend, the reader
is referred to the web version of this article.)

further increase in errors for even higher numbers of triangles could not be observed (yet), but is expected for even higher
triangle counts n.
When analyzing the error norms as a function of the characteristic triangle segment (length)l3D , a convergence limit at
/n , the error is of the order

l3D ≈ 0.5 can be observed again (Fig. 15b). Since l3D is approximately of the order O 1

O l3D , κ . For l3D < 0.5, it scales with O (κ). In 3D, it is hence recommended to discretize geometries by a characteristic
(2 )
triangle segment length of l3D ≈ 0.5 for maximizing the grid generation accuracy.

4.3. Memory reduction

In the previous two subsections, very good results for the 2D and 3D test cases were obtained. For the analyses, the
obstacle distance vectors r were stored as 32-bit single-precision floating-point values. As discussed in Section 3.3, more
suitable data storage types can be used to store the r with much lower memory consumption, while providing a similar (or
only slight lower) overall accuracy. In the following, the convergence tests will be repeated. First, with 2-byte half-precision
floating-point data types. Afterwards, with two integer data types (2-byte short and 1-byte char) in two taylor-made
discrete coordinate systems (Cartesian and stretched polar resp. spherical).
For the two-dimensional case, the L2q error norms of the reconstructed sub-grid distances q for half-precision and all
permutations of the data storage types short and char as well as the Cartesian and stretched polar coordinate systems
are given in Fig. 16. In comparison to the reference results in Fig. 13b, no difference can be found for the resulting L2q error
norms in Fig. 16(a), if the obstacle distance vectors are stored as half-precision floating-point values. For the short data type,
only minor differences can be detected in Figs. 16b and 16c. Contrary to that, for the char data type, see Fig. 16d, the error
norms for low non-dimensional curvatures κ are up to two orders of magnitude higher than for the reference case in single-
precision. Still, the error norms are in the order of L2q ≈ 0.05. The increasing error norm is caused by the discrete coordinate
system, as explained in Section 3.3. In case of char
√ in two-dimensional Cartesian coordinates, the absolute error to transform
a single r back and forth can accumulate to D n−a 1 ≈ 0.0067 for the length of the vector or to arctan(0.5) ≈ 0.46 rad
for the orientation of the vector. Additionally, it can be seen in Fig. 16e, that if the r are stored as char in stretched polar
coordinates the resulting error norms for low curvatures are nearly one magnitude lower than for the ones stored in Cartesian
coordinates. The convergence limit at l2D ≈ 0.5 still occurs for all four cases.
The L2q error norms for the 3D analysis are given in Fig. 17 and show a very similar convergence behavior. In comparison
to the reference solution in single-precision, shown in Fig. 15b, in all cases a decreased error norm can be detected for
geometries with high normalized curvature. If the r are stored in half-precision or as short, only slight differences can
be found between them, see Figs. 17a–17c. However, the char-case with the stretched spherical coordinate system and
geometries with low normalized curvature shows error norms which are nearly up to one magnitude lower than for
Cartesian coordinates. As previously expected and explained in Section 3.3, a back and forth transformation to stretched polar
respectively spherical coordinates results in significant lower reconstruction errors, compared to Cartesian coordinates. Even
a reduction of the memory consumption from 2-byte (short) to 1-byte (char) per dimension provides a sufficient accuracy
in the reconstruction of the sub-grid distances q. This shows that for the proposed method to store r the transformation to
discrete stretched polar resp. spherical coordinates with only 28 = 256 memory switch states (1-byte, char) per axis is a
very good choice.
D. Mierke, C.F. Janßen and T. Rung / Computers and Mathematics with Applications 79 (2020) 66–87 81

(a) L1q and L2q error norm over the number of triangles n.

√ (√ )
(b) L1q and L2q error norm over the normalized characteristic triangle segment length l3D ≈ 4/κ π/ 3 n . For
l3D < 0.5 the error norms are found to be nearly constant.

Fig. 15. L1q and L2q error norm of the reconstructed sub-grid distances q in single-precision for different curvatures κ of a three-dimensional icosphere. The
three selected test cases of Fig. 14 are marked in the diagram, (a)–(c). The error norm limit for the SBB, L1,2 (q = 0.5, q̂) = 0.5, is highlighted by a dashed
horizontal line.

4.4. Summary

The convergence analysis of the proposed algorithm clearly indicates its suitability for mapping arbitrarily shaped discrete
geometries to equidistant Cartesian grids as well as for reconstructing sub-grid distances. The L1 and L2 error norms for
82 D. Mierke, C.F. Janßen and T. Rung / Computers and Mathematics with Applications 79 (2020) 66–87

(a) r as a half-precision floating-point vector in R3 .

(b) r as short in Cartesian coordinates. (c) r as short in stretched polar coordinates.

(d) r as char in Cartesian coordinates. (e) r as char in stretched polar coordinates.

Fig. 16. L2q error norms of the two-dimensional reconstructed sub-grid distances q for the five different memory storage types.

reasonable choices of grid and geometry resolutions were found to be in the order of O(10−4 ). Two contributions to
the discretization error could be identified. First, there is a model-inherent discrepancy of the sub-grid distances q of a
discrete geometry based on a piecewise linear approximation to the sub-grid distances of a parametrically given, perfectly
shaped geometry. Second, the proposed reconstruction method introduces an additional error. It is mainly depending on the
resolution relationship between the grid and the geometry. For well-chosen values of normalized curvatures κ and number
of geometry segments n respectively characteristic segment lengths l, both error terms can be reduced to very low values. To
achieve the lowest possible error norm for arbitrary κ , a characteristic segment length of l ≈ 0.5 has to be used. Additionally,
reduced data types can be used to further reduce the storage requirements of the employed method.

5. Application and performance

This section is focused on the application of the proposed method, and the resulting performance when coupled to a full
LBM solver. The LBM solver elbe, the efficient lattice boltzmann environment, [20] is used. elbe is a thoroughly validated and
efficient numerical tool for the simulation of two- and three-dimensional nonlinear flow problems [21]. To accelerate the
hydrodynamic computations, elbe is strongly optimized for the utilization on graphics processing units (GPUs) and allows
D. Mierke, C.F. Janßen and T. Rung / Computers and Mathematics with Applications 79 (2020) 66–87 83

(a) r as a half-precision floating-point vector in R3 .

(b) r as short in Cartesian coordinates. (c) r as short in stretched spherical coordinates.

(d) r as char in Cartesian coordinates. (e) r as char in stretched spherical coordinates.

Fig. 17. L2q error norms of the three-dimensional reconstructed sub-grid distances q for the five different memory storage types.

for simulations in very competitive simulation time. As it has been shown in various publications that LBM methods are
especially well suited for implementation in a GPU context [22], the Nvidia CUDA toolkit is used for the implementation and
parallelization. First, results for a two-dimensional fluid–structure interaction test case are given to show the applicability
of the algorithm. Then, with the help of 2D and 3D test cases, the computational advantages in efficiency and memory
consumption of the new strategy are discussed.

5.1. 2D Application: Oscillating cylinder

To demonstrate the applicability of the proposed method, a straightforward test case of an oscillating cylinder in a 2D
channel flow is presented. Simulations using the new method for the unsteady reconstruction of sub-grid distances of
arbitrarily shaped geometries are compared to simulations based on analytically prescribed sub-grid distances.
Mittal and Kumar [23] investigated vortex induced vibrations of a circular cylinder placed in a uniform flow at Reynolds-
number Re = 325. The cylinder was modeled by a one-mass-oscillator, consisting of a spring–damper-system with two
translational degrees of freedom. By varying the spring stiffness k, different natural frequencies were addressed. The original
test case configuration of [23] is reproduced with elbe. The bidirectional fluid–structure-interaction is very sensitive to
errors in the boundary condition and requires a recalculation of the obstacle distance vectors r at every discrete time step.
With higher-order boundary conditions, the flow with density ρ , viscosity ν and approach velocity u around the moving
cylinder with diameter D can be calculated much more accurately compared to a first-order simple bounce-back boundary
84 D. Mierke, C.F. Janßen and T. Rung / Computers and Mathematics with Applications 79 (2020) 66–87

condition. The test case is defined by a set of dimensionless parameters, i.e. the Reynolds-number Re, the dimensionless
cylinder mass m̄, the dimensionless spring frequency Fs as well as the damping ratio ζ :

uD 4m /k m c
Re = m̄ = Fs = D ζ = √ . (36)
ν D2 ρ 2π u 2m /
k m

The movement of the cylinder due to fluid loads, inertia and the spring–damper-system is determined by solving two
standard ODE for a linearly damped one-mass-oscillator with damping factor c.
The channel has a length and width of 32D × 16D and is discretized with 512 × 256 and 1024 × 512 grid nodes. At the
beginning of the simulation, the cylinder is in neutral position at x = 8D and y = 8D. The simulation parameters are set to
Re = 325, m̄ = 4.7273, Fs = 0.42 and ζ = 3.3 × 10−4 as well as Ma = 0.1. The Strouhal number of the stationary cylinder
is St = 0.21, so that no resonance is given here [23].
The simulations are conducted with simple (SBB), linear- (LIBB) and quadratic-interpolating bounce-back (QIBB) bound-
ary conditions [3]. All numerical computations, as well as the obstacle distance vectors r, are executed in single precision us-
ing floats. The cylinder is discretized by a polygon with n = 256 segments. Depending on the grid resolution, this results in
a non-dimensional curvature of κ = 1/4 resp. κ = 1/8, a normalized polygon segment length of l2D ≈ 0.05 resp. l2D ≈ 0.1 and
an expected discretization error norm of L2q ≈ 3.5 × 10−2 resp. L2q ≈ 2 × 10−2 according to Fig. 13b. The multiple-relaxation-
time collision operator [24] is applied on a D2Q9 grid and in combination with a standard Smagorinsky turbulence model [25]
and the momentum exchange method [26] for the force evaluation. Inlet and outlet of the channel are modeled by velocity
and extrapolation boundary conditions following [27] and [22]. On the channel walls, the standard SBB boundary condition
is applied, At t = 0 the fluid is assumed to be at rest and the velocity field is initialized to zero throughout the computational
domain.
The simulation results are compared to the simulation result of Mittal and Kumar [23] and a reference solution with
elbe based on the same simulation setup, but based on a perfectly shaped circular geometry. For the latter case, the sub-grid
distances q are determined analytically and are then used to feed a QIBB bounce-back scheme, which is why the solution is
being referred to as A-QIBB reference solution in the following.
Fig. 18 shows the trajectories and the in-line as well as the cross-flow phase plots for the coarse and the fine grid with
512 × 256 and 1024 × 512 lattice nodes. All four boundary conditions are compared (SBB, LIBB, QIBB, A-QIBB). For the coarse
grid (Fig. 18a–c), it can clearly be seen that the SBB simulation does not converge to a harmonic oscillation at all, while the
more accurate LIBB and QIBB schemes produce more accurate, periodic oscillations. Moreover, the QIBB and A-QIBB schemes
produce almost identical results. Nevertheless, the trajectories are found to be slightly asymmetric in case of the coarse grid.
Compared to the reference solution of Mittal and Kumar [23], the in-line displacement of the cylinder is overpredicted for
all four cases. Refining the grid leads to significantly better results, as depicted in Fig. 18d–f. All boundary conditions result in
a symmetrical trajectory (Fig. 18d) and a stationary periodic oscillation (Fig. 18e–f). Nevertheless, the SBB setup still shows
a noisy signal in the streamwise direction.
All in all, the two-dimensional analysis of a periodically oscillating cylinder clearly shows, that higher-order boundary
conditions lead to much better results than SBB approximations. Therefore, an efficient and accurate algorithm for the
calculation of the sub-grid distances q is mandatory for the application of higher-order boundary conditions at moving
boundaries in transient flows.
To quantify these effects and to supplement the visual inspection of results, the power spectra for the four cases are
analyzed as well. Exemplarily, the power spectrum for the time series of the drag coefficient is given in Fig. 19. The power
spectra confirm that the lower-order approximations of the cylinder geometry (SBB) lead to noisy, non-periodic simulation
results. For the more accurate representations (LIBB, QIBB), the power spectra show a more pronounced peak and less noise.
Moreover, the power spectrum for the QIBB case coincides with the power spectrum of the A-QIBB reference solution. The
spectra for the lift coefficient and the in-line and cross-flow displacements are significantly less noisy, which is why they
are omitted here. A similar trend can be observed in Fig. 18.
To complete the analysis, several metrics from reference [23] were analyzed for the current case (Table 1). The general
trend that was observed by the visual inspection of trajectories and the analysis of power spectra is confirmed here. First,
the elbe results converge to the A-QIBB reference solution with increasing accuracy of the polygonal approximation (SBB,
LIBB, QIBB). Second, the results for the QIBB case are almost identical with the A-QIBB reference, with only minor deviations
of below 1% for all parameters except for the in-line oscillation amplitude (3% error). When comparing the elbe solutions
with the reference data from [23], this is confirmed. Almost all metrics show a reasonable agreement between the methods,
while the deviation of the in-line oscillation amplitude is the worst. Mind that the drag amplitude is quite small, more than
50 times smaller than the lift, and thus more difficult to predict. Moreover, the resolution of the fine grid with 1024 × 512
lattice nodes might still be slightly too low, with 32 lattice nodes per diameter for Re = 325. For Reynolds numbers between
263 and 365, Rettinger and Rüde [28] suggest at least 36 or even 48 lattice nodes per diameter. Nonetheless, the results
show that high-order boundary conditions are explicitly required and, with that, an efficient and accurate sub-grid distance
determination scheme.

5.2. Performance

After the initial validation and successful application of the sub-grid distance reconstruction scheme, this subsection
serves to investigate the calculation costs and the additional memory consumption of the algorithm. Two test cases are
D. Mierke, C.F. Janßen and T. Rung / Computers and Mathematics with Applications 79 (2020) 66–87 85

(a) Trajectories (coarse grid). (b) In-line phase plot (coarse grid). (c) Cross-flow phase plot (coarse grid).

(d) Trajectories (fine grid). (e) In-line phase plot (fine grid). (f) Cross-flow phase plot (fine grid).

Fig. 18. Trajectories and phase plots of the oscillating cylinder case with Fs = 0.42 and the coarse grid (top) with 512 × 256 lattice nodes, normalized
curvature κ = 1/4, normalized segment length l2D ≈ 0.05 and a discretization error of L2q ≈ 3.5 × 10−2 and the fine grid (bottom) with 1024 × 512 lattice
nodes, normalized curvature κ = 1/8, normalized segment length l2D ≈ 0.1 and a discretization error of L2q ≈ 3 × 10−2 .

Fig. 19. Simulation results for the oscillating cylinder: power spectra of the drag coefficient P(Cd ) for the fine grid with 1024 × 512 lattice nodes and
Fs = 0.42.

simulated with elbe. The first one is the previously mentioned two-dimensional oscillating cylinder on the fine grid. The
second test case is a three-dimensional, freely falling cube, in line with the experiments published in [29]. The latter test case
was already analyzed and validated with elbe in [21], so that in the scope of the present publication, only the performance
of the algorithm is addressed. Both test cases are examined in several different configurations:

1. With and without the use of higher-order bounce-back boundary conditions and the proposed reconstruction method
for the calculation of q,
2. Storing the obstacle distance vectors r with conventional (float and half) or taylor-made data types (based on
short or char)
3. Using Cartesian or stretched polar resp. spherical coordinates for the taylor-made data types.

The performance for the resulting seven configurations is summarized in Table 2.


86 D. Mierke, C.F. Janßen and T. Rung / Computers and Mathematics with Applications 79 (2020) 66–87

Table 1
Metrics for the oscillating cylinder: comparison of SBB, LIBB and QIBB results with the fine grid with 1024 × 512 lattice nodes to the A-QIBB reference
solution and reference data from [23] for Fs = 0.42.
SBB LIBB QIBB A-QIBB Ref.
[23]
abs. rel. abs. rel. abs. rel. abs. rel.
Drag amplitude 0.0498 294.7 0.0158 93.5 0.0170 100.6 0.0169 100.0 0.02
Lift amplitude 1.4300 99.3 1.4100 97.9 1.4330 99.5 1.4400 100.0 1.27
In-line oscillation amplitude 0.0166 119.4 0.0132 95.0 0.0135 97.1 0.0139 100.0 0.02
Cross-flow oscillation amplitude 0.2362 96.9 0.2384 97.8 0.2422 99.4 0.2437 100.0 0.225
Mean drag coefficient 1.6921 101.3 1.6535 99.0 1.6713 100.1 1.6702 100.0 1.418
Vortex-shedding frequency 0.2154 96.4 0.2242 100.3 0.2227 99.6 0.2235 100.0 0.212

Table 2
Test matrix and results for the processing performance of the boundary condition (BC) for different storage data types.
# Boundary Data type of Coordinate system BC performance [%] Additional device
condition r component Cylinder Cube memory [Byte]
1 SBB – – 109 86 –
2 LIBB float – 100 100 4
3 LIBB half – 96 104 2
4 LIBB short Cartesian 88 75 2
5 LIBB short Stretched polar/spherical 77 44 2
6 LIBB char Cartesian 94 75 1
7 LIBB char Stretched polar/spherical 67 43 1

The performance is evaluated on a Nvidia M6000 GPU, based on the total computational time for the boundary condition,
including the reading of the obstacle distance vectors r as well as the reconstruction of the sub-grid distances q, relative to
the original configuration that is given by the reference case #2. Each case is executed multiple times, with a sufficiently
long run-time to obtain a statistically valid mean performance and as a warm start. It can clearly be seen that the additional
computational costs of the proposed method are rather moderate. Surprisingly, some of the test cases with reduced memory
usage perform nearly as well as (or even better than) the conventional approach that is storing r as a vector of floats. The
reason for this is either the fact that case #2 is limited by the device memory bandwidth or the fact that the wall-clock
time for the memory transfer exceeds the wall-clock time for the arithmetic operations. For the cases #4 to #7, it seems
that the additional arithmetic operations overcompensate the up to four times lower memory transfer, resulting in a lower
performance of the boundary condition procedure. The use of stretched polar respectively spherical coordinate systems leads
to a slightly reduced performance compared to the use of Cartesian coordinates. The reason for this is that transformations
to and from stretched polar resp. spherical coordinates require much more compute-intensive trigonometrical arithmetic
operations. All in all, compared to the SBB case, the proposed grid generation method shows a very high computational
performance and consumes only a rather small amount of additional device memory.

6. Conclusions and summary

In this paper, a novel and accurate algorithm for the efficient calculation of sub-grid distances q was presented. The
method aims at higher-order boundary conditions for moving objects on uniform and structured grids, as present in the
LBM. Here, accurate information on the sub-grid distances from the Eulerian fluid nodes to a tessellated triangulated surface
mesh have to be determined, possibly in every discrete time step of the numerical scheme.
Thanks to the stable and straightforward two-step approach that was presented in Section 3, the proposed method is
almost free of instruction branches and therefore highly suitable for massively parallel execution on GPUs. As discussed in
the first part of Section 3.1, the algorithm extends a previously published grid generation procedure by introducing additional
orthogonal distance vectors r on every grid node adjacent to the discrete geometry. Then, as presented in Section 3.2, the sub-
grid distances q are reconstructed out of these vectors r. In addition, these vectors can be used to extract the surface normal
vector of the discrete geometry and the wall-distance of the wall-adjacent grid node, without additional computational costs.
Due to the fact that the distance vectors r describe the shape of the discretized geometry with respect to the discrete
lattice nodes, the accuracy of the reconstructed sub-grid distance q depends on the resolution of both the Cartesian grid
and the geometry itself. Detailed convergence studies in Section 4 revealed the convergence behavior of the algorithm. It
turned out that the accuracy of the reconstruction depends on the normalized geometry curvature κ as well as the length
l of a discrete geometry segment. For maximum accuracy, it is advisable to discretize geometries with approximately two
polygon segments (or triangles) per grid spacing.
Since the memory requirements of the sub-grid distance reconstruction only depend on the spatial dimension, the
algorithm is very economical in terms of memory utilization and memory transfer. The required amount of device memory
was further reduced by a factor of four using tailor-made data types, introduced in Section 3.3, with reduced co-domain
for storing the distance vectors r. The accuracy and performance of the reconstruction did not significantly decrease, as
D. Mierke, C.F. Janßen and T. Rung / Computers and Mathematics with Applications 79 (2020) 66–87 87

discussed in Section 4.3. Finally, selected applications and performance investigations in Section 5 demonstrated that the
method is a viable tool for the fast and efficient simulation of complex fluid–structure interaction problems with one or
more moving obstacles.

References

[1] C.F. Janßen, N. Koliha, T. Rung, A fast and rigorously parallel surface voxelization technique for GPU-accelerated CFD simulations, Commun. Comput.
Phys. 17 (2015) 1246–1270. http://dx.doi.org/10.4208/cicp.2014.m414. http://journals.cambridge.org/article_S1815240615000420.
[2] X. He, L.-S. Luo, Theory of the lattice Boltzmann method: From the Boltzmann equation to the lattice Boltzmann equation, Phys. Rev. E 56 (6) (1997)
6811.
[3] P. Lallemand, L.-S. Luo, Lattice Boltzmann method for moving boundaries, J. Comput. Phys. 184 (2) (2003) 406–421.
[4] S. Geller, Ein explizites Modell für die Fluid-Struktur-Interaktion basierend auf LBM und p-FEM (Ph.D. thesis), Braunschweig University of Technology,
2010.
[5] D. Liao, Real-time Solid Voxelization Using Multi-core Pipelining (Ph.D. thesis), George Washington University, Washington, DC, USA, 2009.
[6] D. Cohen-Or, A. Kaufman, Fundamentals of surface voxelization, Graph. Models Image Process. 57 (6) (1995) 453–461. http://dx.doi.org/10.1006/gmip.
1995.1039.
[7] J. Huang, R. Yagel, V. Filippov, Y. Kurzion, An accurate method for voxelizing polygon meshes, in: Proceedings of the 1998 IEEE Symposium on Volume
Visualization, VVS ’98, ACM, New York, NY, USA, 1998, pp. 119–126. http://dx.doi.org/10.1145/288126.288181.
[8] S. Laine, A topological approach to voxelization, in: Eurographics Symposium on Rendering, vol. 32, no. 4, 2013.
[9] S. Freudiger, Entwicklung Eines Parallelen, Adaptiven, Komponentenbasierten Strömungskerns Für Hierarchische Gitter Auf Basis Des Lattice
Boltzmann Verfahrens (Ph.D. thesis), Braunschweig University of Technology, 2009.
[10] N. Blum, Algorithmen und Datenstrukturen: Eine anwendungsorientierte Einführung, Oldenbourg Wissenschaftsverlag, 2004.
[11] E. Haines, Point in polygon strategies, in: P. Heckbert (Ed.), Graphics Gems IV, Academic Press, 1994, pp. 24–46.
[12] P.C.P. Carvalho, P.R. Cavalcanti, Point in polyhedron testing using spherical polygons, in: A. Paeth (Ed.), Graphics Gems V, Academic Press, 1995,
pp. 42–49.
[13] J. O’Rourke, Computational Geometry in C, Cambridge University Press, 1998.
[14] G. Sebastien Thon, R. Raffin, A low cost antialiased space filled voxelization of polygonal objects, in: GraphiCon ’04 Proceedings, GraphiCon, 2004,
pp. 71–78.
[15] M. Szucki, J. Suchy, A voxelization based mesh generation algorithm for numerical models used in foundry engineering, Metall. Foundry Eng. (MaFE)
38 (1) (2012) 43–54.
[16] S. Frey, G. Reina, T. Ertl, SIMT microscheduling: Reducing thread stalling in divergent iterative algorithms, in: Parallel, Distributed and Network-Based
Processing, PDP, 2012 20th Euromicro International Conference on, IEEE, 2012, pp. 399–406.
[17] C. Obrecht, F. Kuznik, B. Tourancheau, J.-J. Roux, Efficient GPU implementation of the linearly interpolated bounce-back boundary condition, Comput.
Math. Appl. 65 (6) (2013) 936–944.
[18] W. Gellert, S. Gottwald, M. Hellwich, H. Kästner, VNR Concise Encyclopedia of Mathematics, second ed., Van Nostrand Reinhold, 1989.
[19] R. Hess, The Essential Blender: Guide to 3D Creation with the Open Source Suite Blender, No Starch Press, 2007.
[20] C.F. Janßen, elbe–efficient Lattice Boltzmann environment. [online], 2017. http://www.tuhh.de/elbe.
[21] C.F. Janßen, D. Mierke, M. Überrück, S. Gralher, T. Rung, Validation of the GPU-accelerated CFD solver elbe for free surface flow problems in civil
and environmental engineering, Computation 3 (3) (2015) 354–385. http://dx.doi.org/10.3390/computation3030354. http://www.mdpi.com/2079-
3197/3/3/354.
[22] C.F. Janßen, Kinetic Approaches for the Simulation of Non-Linear Free Surface Flow Problems in Civil and Environmental Engineering (Ph.D. thesis),
Braunschweig University of Technology, 2010.
[23] S. Mittal, V. Kumar, Finite element study of vortex-induced cross-flow and in-line oscillations of a circular cylinder at low Reynolds numbers, Internat.
J. Numer. Methods Fluids 31 (7) (1999) 1087–1120. http://dx.doi.org/10.1002/(SICI)1097-0363(19991215)31:7<1087::AID-FLD911>3.0.CO;2-C.
[24] D. d’Humières, I. Ginzburg, M. Krafczyk, P. Lallemand, L.-S. Luo, Multiple-relaxation-time lattice Boltzmann models in three dimensions, Phil. Trans.
R. Soc. A 360 (1792) (2002) 437–451.
[25] J. Smagorinsky, General circulation experiments with the primitive equations: I. the basic experiment, Mon. Weather Rev. 91 (3) (1963) 99–164.
[26] M. Bouzidi, M. Firdaouss, P. Lallemand, Momentum transfer of a Boltzmann-lattice fluid with boundaries, Phys. Fluids (1994-Present) 13 (11) (2001)
3452–3459.
[27] Q. Zou, X. He, On pressure and velocity boundary conditions for the lattice Boltzmann BGK model, Phys. Fluids (1994-Present) 9 (6) (1997) 1591–1598.
http://dx.doi.org/10.1063/1.869307.
[28] C. Rettinger, U. Rüde, A comparative study of fluid-particle coupling methods for fully resolved lattice Boltzmann simulations, Comput. & Fluids 154
(2017) 74–89. http://dx.doi.org/10.1016/j.compfluid.2017.05.033. ICCFD8. http://www.sciencedirect.com/science/article/pii/S0045793017302086.
[29] M. Kraskowski, Validation of the RANSE rigid Rody motion computations, in: 12th Numerical Towing Tank Symposium, vol. 6, 2009, pp. 99–104.

You might also like