Professional Documents
Culture Documents
Manuscript ID hpc-20-0135
Complete List of Authors: Melander, Anders; Danmarks Tekniske Universitet, Applied mathematics
and computer science
Strøm, Emil; Technical University of Denmark
Fo
Pind, Finnur; Technical University of Denmark
Engsig-Karup, Allan; Danmarks Tekniske Universitet, Applied
Mathematics and Computer Science
Jeong, Cheol-Ho; Technical University of Denmark
rP
Mathematics
Hesthaven, Jan; EPFL
Note: The following files were submitted by the author for peer review, but cannot be converted to PDF.
You must view these files (e.g. movies) online.
SageH.bst
SageV.bst
http://mc.manuscriptcentral.com/ijhpca
Page 2 of 22 International Journal of High Performance Computing Applications
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Fo
20
21
22
23
rP
24
25
26
ee
27
28
29
rR
30
31
32
ev
33
34
35
36
iew
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60 http://mc.manuscriptcentral.com/ijhpca
International Journal of High Performance Computing Applications Page 3 of 22
1 Journal Title
2 Massively Parallel Nodal Discontinous XX(X):1–20
3 c The Author(s) 2016
4
5
Galerkin Finite Element Method Reprints and permission:
sagepub.co.uk/journalsPermissions.nav
DOI: 10.1177/ToBeAssigned
6
7
Simulator for Room Acoustics www.sagepub.com/
SAGE
8
1 1 2,3
9 Anders Melander , Emil Strøm , Finnur Pind , Allan P. Engsig-Karup , Cheol-Ho Jeong3 ,
1
10
Tim Warburton4 , Noel Chalmers4 and Jan S. Hesthaven5
11
12
13
14
Abstract
15
16 We present a massively parallel and scalable nodal Discontinuous Galerkin Finite Element Method (DGFEM) solver
17 for the time-domain linearised acoustic wave equations. The solver is implemented using the libParanumal finite
18 element framework with extensions to handle curvilinear geometries and frequency dependent boundary conditions
19
20 of relevance in practical room acoustics. The implementation is benchmarked on heterogeneous multi-device many-
21 core computing architectures, and high performance and scalability are demonstrated for a problem that is considered
Fo
22 expensive to solve in practical applications. In a benchmark study, scaling tests show that multi-GPU support gives
23 the ability to simulate large rooms, over a broad frequency range, with realistic boundary conditions, both in terms
24
of computing time and memory requirements. Furthermore, numerical simulations on two non-trivial geometries are
rP
25
26 presented, a star-shaped room with a dome and an auditorium. Overall, this shows the viability of using a multi-device
27 accelerated DGFEM solver to enable realistic large-scale wave-based room acoustics simulations using massively
28
ee
parallel computing.
29
30
31 Keywords
rR
36
37 1 Introduction This motivates the use of wave-based methods, where the
38
iew
56 rays that are propagated in the room using the laws of ray Technical University of Denmark, Kgs. Lyngby, Denmark.
4 Department of Mathematics, Virgina Tech, United States.
57 optics. This reduces the computational task considerably, but 5 Chair of Computational Mathematics and Simulation Science, cole
58 the approximation is only appropriate in the high-frequency Polytechnique Fdrale de Lausanne, Lausanne, Switzerland.
59
limit. At low-mid frequencies, wave phenomena such as
60 Corresponding author:
diffraction and interference dominate and other simulation Anders Melander
methods must be used. Email: anders.d.melander@gmail.com
http://mc.manuscriptcentral.com/ijhpca
Prepared using sagej.cls [Version: 2017/01/17 v1.20]
Page 4 of 22 International Journal of High Performance Computing Applications
2 Journal Title XX(X)
1
2 the boundary element method (BEM) (Hargreaves et al. et al. is described as efficient, it cannot handle complex
3 2019), and the finite element method (FEM) (Craggs geometries or non-trivial boundary conditions (BCs).
4 1994). Recently, nodal high-order-accurate variants of The goal of this work is to introduce the details of a
5
6 the FEM, such as the spectral element method (SEM) HPC approach to simulate three-dimensional room acoustics
7 (Pind et al. 2019) and the discontinuous Galerkin finite using the DGFEM. To the best of our knowledge, this is
8 element method (DGFEM) (Wang et al. 2019), have been the first time a massively parallel GPU-accelerated wave-
9 applied to simulate room acoustics. High-order methods based DGFEM room acoustic simulator is presented in
10
11 result in better accuracy-per-computational-cost and they the literature. The simulator is written in C/C++ with
12 have the potential to significantly reduce the runtime of Message Passing Interface (MPI) as its backbone, and
13 simulations, and are a must for having good accuracy in the device-interfacing is implemented using the single
14 wave propagation problems over long integration times kernel language for parallel programming Open Concurrent
15
(Kreiss and Oliger 1972). The nodal DGFEM is particularly Compute Abstraction (OCCA) (Medina et al. 2014) for
16
17 well suited for room acoustic simulations, because it portable multi-threading code development across different
18 combines the attractive features of geometric flexibility, many-core hardware architectures. Curvilinear meshes and
19 high-order accuracy, suitability for parallel computing, and locally reacting frequency independent and frequency
20
lean memory usage. In the DGFEM schemes, the solution dependent impedance boundary conditions are supported in
21
Fo
25
3 presents the numerical discretization of the governing
26
27 The major drawback of wave-based methods is the equations. In Section 4, the HPC implementation is described
28 and Section 5 contains several numerical experiments that
ee
http://mc.manuscriptcentral.com/ijhpca
Prepared using sagej.cls
International Journal of High Performance Computing Applications Page 5 of 22
Melander et al. 3
1
2 2.1.1 Rigid The rigid boundary condition perfectly reflects which can be rewritten using a partial fraction decomposition
3 the impinging wave back into the domain Ω. Mathematically
4 this is expressed as
Q
X Ak
5 Y (ω) =Y∞ + +
λk − jω
6 k=1
! (6)
7 n̂ · v = 0 on ∂Ω, (2) S
X Bk + jCk Bk − jCk
8 + .
αk + jβk − jω αk − jβk − jω
9 where n̂ is the outward pointing normal of the boundary k=1
−∞
29 air Zair = ρc results in a highly absorptive boundary.
30 Then by applying the inverse Fourier transform to (6) and
31
rR
32 inserting the result into (7), the expression for the velocity
33 vn at the boundary becomes
34 2.1.3 Locally-reacting frequency dependent Most real
35 Q
ev
p̂(ω)
(1) (2)
X
39 v̂n = = p̂(ω)Y (ω), (4) 2 Bk ψk (t) + Ck ψk (t) ,
Z(ω)
40 k=1
41 where ω is the angular frequency and (v̂n , p̂) denote where φk is the accumulator for the k’th real poles, and
42 (1) (2)
the Fourier transformed normal velocity and pressure, ψk and ψk are the accumulators for the k’th complex
43
44 respectively. Z(ω) is no longer constant, as in the frequency conjugate poles pairs. These accumulators are defined by a
45 independent case, but is a complex function of the angular set of ordinary differential equations (ODEs)
46 frequency. For a particular surface, Z(ω) can be measured,
47 dφk
e.g., in an impedance tube or using pressure-velocity λk φk = p(t),
48 dt
49 sensors. Alternatively, several material models exist to (1)
dψk (1) (2)
50 estimate surface impedance of many common materials and αk ψk + βk ψk = p(t), (9)
dt
51 constructions. (2)
52 dψk (2) (1)
αk ψk − βk ψk = 0.
53 To incorporate the frequency dependent boundary dt
54 conditions accurately and efficiently into the time-domain Thus, when using the ADE method, an additional set of
55 simulation, the method of auxiliary differential equations
56 ODEs must be solved at the boundary in each time step.
57 (ADEs) can be used (Pind et al. 2019; Dragna et al. 2015). In The added computational cost due to this additional set
58 the ADE method, the surface admittance is approximated as of ODEs is relatively low, as the amount of extra ODEs
59 a rational function, defined as, needed to solve is small compared to the system that must
60
a0 + ... + aN (−jω)N be solved for the acoustic wave equations. In total, there
Y (ω) = , (5) will be Npoles times the number of boundary nodes with
1 + ... + bN (−jω)N
http://mc.manuscriptcentral.com/ijhpca
Prepared using sagej.cls
Page 6 of 22 International Journal of High Performance Computing Applications
4 Journal Title XX(X)
1
2 frequency dependent boundary condition ODEs to solve. variables vhk and pkh , takes the following form
3 This extra work generally grows significantly slower than the
4 ∂vhk k
Z Z
1
total number of nodes in a domain when the mesh resolution `n (x)dx = − ∇pkh `kn (x)dx
5 D k ∂t ρ Dk
6 increases in the domain. Z
1
n̂ pkh − p∗ `kn (x)dx,
7 +
ρ ∂Dk
8 Z k Z
9 ∂ph k
3 The numerical model ` (x)dx = − ρc2 ∇ · vhk `kn (x)dx
10 D k ∂t n Dk
11
Z
3.1 Spatial discretization + ρc2 n̂ · vhk − v∗ `kn (x)dx.
12 ∂D k
13 The spatial derivatives in (1) are discretized using DGFEM. (13)
14 DGFEM is a high-order method (Hesthaven and Warburton
15 2008) capable of solving the time-domain acoustic partial where the star-labeled terms denote numerical fluxes that
16 propagate the solution between elements. This can be
differential equations within a domain Ω, subject to an
17
18 appropriate set of boundary and initial conditions. The rewritten in a discrete formulation using matrix-based
19 spatial domain is tessellated by K non-overlapping elements operators for integration and differentiation
20 as
21 K dukh 1 1
= − Dxk pkh + Lk pkh − p∗ n̂x ,
Fo
[ (14a)
22 Ω ' Ωh = Dk . (10) dt ρ ρ
23 k=1
dvhk 1 1
= − Dyk pkh + Lk pkh − p∗ n̂y ,
24 In this work, the elements are taken to be tetrahedral dt ρ ρ
(14b)
rP
25
elements in the three-dimensional space. On each of these dwhk 1 1
26 = − Dzk pkh + Lk pkh − p∗ n̂z ,
(14c)
27 elements the solution variables v and p are expressed on dt ρ ρ
28 the k’th element in terms of polynomial basis functions and dpk
ee
32 Np Np
The matrix operations on the right hand side are expressed
33
X X
vhk (x, t) = v̂nk (t)φn (x) = vhk (xkn , t)`kn (x),
34 in terms of the operators defined on the reference elements
n=1 n=1
35 (11)
ev
Np Np
36 Dr = Vr V −1 , Ds = Vs V −1 , Dt = Vt V −1 ,
X X
pkh (x, t) = p̂kn (t)φn (x) = pkh (xkn , t)`kn (x).
37 (15)
n=1 n=1 M = (VV T )−1 .
38
iew
http://mc.manuscriptcentral.com/ijhpca
Prepared using sagej.cls
International Journal of High Performance Computing Applications Page 7 of 22
Melander et al. 5
1
2 the element edges We are interested in the numerical flux along the outward
3 pointing element boundary normal n̂, which leads to the
4 L = (M)
−1
E, M = (VV T )−1 , (18) creation of the following operator
5
6
7 where E is the column-wise concatenation of mass matrices N = n̂ · F = n̂x Ax + n̂y Ay + n̂z Az . (23)
8 defined in terms of the nodal distributions on the faces.
9 Applying an eigendecomposition on N yields
10
To avoid computing and storing these matrices for every
11 N = RDR−1 , (24)
12 element, the matrix operators are determined for a reference
13 element, and a geometric mapping from the arbitrary mesh
where
14 elements to the reference elements is used, e.g.,
15
n̂x n̂
− n̂ρcx − n̂n̂xz − n̂xy
16 n̂ρcy
Mk = J k (VV T )−1 , (19) n̂
− ρcy 0
1
17 R=
ρc
,
18 n̂z
ρc − n̂ρcz 1 0
where J k is a diagonal matrix storing the computed Jacobian
19 1 1 0 0
20 values of the local map of the k’th element to the reference (25)
21 element. Hence, we need to compute and store the J k ’s c 0 0 0
Fo
22
0 −c 0 0
for each element, and the procedure in this way treats D= .
23 0 0 0 0
24 curvilinear and straight-sided elements in a similar way. By
0 0 0 0
rP
http://mc.manuscriptcentral.com/ijhpca
Prepared using sagej.cls
Page 8 of 22 International Journal of High Performance Computing Applications
6 Journal Title XX(X)
1
2 By the properties of D+ and D− , this results in the 2016). Thus, in this work, the time step size is given as
3 numerical upwind flux
4 1 1
∆t = CCFL min(∆xl ) , (31)
5 c N2
(n̂ · F)∗ = RD+ R−1 q− − −1 +
h + RD R qh . (28)
6
where ∆xl is the smallest edge length of the mesh elements
7
8 Boundary conditions are weakly enforced through the and CCFL is a constant of O(1).
9 numerical flux terms. For element faces that lie on the
No measures have been taken to automatically address
10 domain boundary, q+ h needs to be defined according to the potential stiffness of the ADEs. Implicit-explicit
11
the boundary conditions. For surface impedance boundary
12 time-stepping (Kennedy and Carpenter 2003) or stiffness
13 conditions, an averaging approach is used to impose
restriction, by means of a constrained multipole mapping
14 Dirichlet conditions, where the neighboring value is given by
process (Wang et al. 2020), could be used to ensure the
15 p+ − + −
h = ph and vh = −vh + 2gv , where n̂ · gv = vn , with
16 ADEs do not cause a numerical instability. However, in
vn given by either (3) or (8). Thus, at the domain boundary
17 all numerical experiments, the spatial scheme is found to
18 nodes, the flux term in (14d) becomes n̂ · (vhk − v∗ ) = n̂ ·
determine the condition on the stable time step size.
19 vh− − vn .
20
21
Fo
25 The semi-discrete system in (14) can be expressed as a open-source meshing tool gmsh (Geuzaine and Remacle
26
general form ODE 2009). When affine (straight-sided) elements are used,
27
28 the mesh generator is used to generate a low-order
ee
dqh
29 = L(qh (t), t), (29) finite element mesh, and the simulator then adds the α-
dt
30
optimized nodes to define the high-order mesh elements, in
31 where L is the spatial operator. This ODE system is
rR
(0)
qh = qnh , elements, the internal high-order mesh nodes are shifted to
36
37
better fit the geometry (Pind et al. 2019). For room acoustic
k(i) = a k(i−1) + ∆tL(q(i−1) , tn + c ∆t),
38 i h i simulations, this is a particularly important feature, since
iew
i ∈ [1, . . . , 5] :
39 q(i) = q(i−1) + b k(i) ,
h h i most real-world rooms contain curved or complex shaped
40 (5)
q n+1
=q , boundary surfaces.
41 h h
42 (30) Currently, gmsh has the ability to generate a high-
43 order curvilinear mesh, however, only with equidistant node
44 where ∆t = tn+1 − tn is the time step size and the distributions. This is sub-optimal, because of the rapid
45
coefficients ai , bi , ci are given in Ref. (Carpenter and growth of the Lebesque constant associated with equidistant
46
47 Kennedy 1994). node distributions (Hesthaven 1998). Our solution here is
48 Explicit time-stepping methods impose a conditional to apply an interpolation post-processing step on the high-
49
stability criterion, ∆t ≤ C1 /(max |λe |), where λe are the order mesh generated by gmsh, where the equidistant nodes
50
51 eigenvalues of the spatial discretization and C1 is a constant are moved to the α-optimized curvilinear node positions.
52 related to the size of the stability region of the time-stepping This can be done by re-interpolating the node position by
53 method (Pind et al. 2019). There are two mechanisms at representing the coordinates as polynomial series, cf. (11).
54
play that influence the maximum allowed time step: 1) Then, let r and re be the r, s and t coordinates in the
55
56 the eigenvalue spectrum’s of the matrix operators, and 2) reference element for α-optimized and equidistant node
57 the stiffness of the ADEs. In DGFEM, the matrix operator distributions, respectively. Using a hierarchical modal basis,
58 eigenvalue spectrum scales as max |λe | ∼ C2 cN 2γ , where the coefficients r̂ of this representation can be determined
59
the C2 constant is dictated by the size of the smallest mesh from
60
element and γ is the highest order of differentiation in
e
the governing equations (γ = 1, here) (Engsig-Karup et al. r = Vr̂, re = V r r̂, (32)
http://mc.manuscriptcentral.com/ijhpca
Prepared using sagej.cls
International Journal of High Performance Computing Applications Page 9 of 22
Melander et al. 7
1 e
2 where (V r )ij ≡ φj (rei ), using same basis used to define V
3 but defined in terms of the different set of equidistant nodes.
4 This implies the relationship between the node distributions
5
6 e
22
GPU0 into RAM on CPU0 , before it can communicate this
23
24 information to CPUx through MPI, where 0 < x ≤ M − 1
rP
32
a penalty in the scaling of computational performance when
33 The proposed HPC room acoustics simulator is based on
34 adding additional CPU-GPU pairs to run the simulation, as
35 the open source libParanumal framework (Karakus et
ev
(GPU/CPU) systems. A Message Passing Interface (MPI) simulation is expected when using GPUs compared to CPUs.
39
is used to handle communication between CPUs. Note,
40 The communication structure can be seen in Figure 2, where
41 that when we use the term ‘CPU’, it refers to one single the red arrows are the CPU and GPU communications and
42 CPU core, while ‘GPU’ refers to the entire GPU. MPI the blue arrows are the MPI communications.
43 works by treating each CPU core as its own entity, with its
44
own separate memory. Information can be shared between
45
46 CPU cores through the messaging interface. In practice
47 this means that libParanumal splits the domain into
48 different chunks. This is illustrated in Fig. 1. Each CPU
49
core only stores the simulation information needed for the
50
51 computations in elements belonging to its own part of the
52 domain. However, as Fig. 1 illustrates, the CPUs share an
53 interface between them, where information has to be shared
54
for the acoustic wave to propagate from one CPU to the
55
56 next. This means that for some elements, the numerical
57 flux has to be computed based on neighbouring elements
58 that are located on a different CPU, requiring that MPI
59 Figure 2. Communication overview of M total heterogeneous
communication between cores must take place. This is called
60 CPU-GPU pairs. All CPUs can communicate through MPI,
a halo exchange. The same communication is used when GPUs can only communicate through their attached CPU by
GPUs are used. proxy.
http://mc.manuscriptcentral.com/ijhpca
Prepared using sagej.cls
Page 10 of 22 International Journal of High Performance Computing Applications
8 Journal Title XX(X)
1
2 The implementation of one stage of the time-stepping • GPU : 4 × NVIDIA Tesla V100 SXM2 32 GB,
3 method in (30), including all the interface communication,
4 is summarized in Algorithm 1. The GPU functions, called with the three nodes being connected by InfiniBand. The
5
kernels, must be launched in every iteration of the time compilers used for compilation are:
6
7 stepping algorithm.
8 • GCC v7.4.0,
9 Algorithm 1: One stage of the LSERK time-stepping • OpenMPI v3.1.3,
10 method • OCCA v1.0,
11 (i−1)
Input: Previous stage qh
12 • CUDA compilation tools v10.0.130.
(i)
Result: An updated stage qh
13
1 GPU to CPU memory copy of interface nodes
14 For the scalability tests, we will use the setups of 1, 2, 4, 8,
(CPU/GPU). Simultaneously initiate volume integral
15 and 12 GPUs. Note that when running tests with 4 or less
compute kernel (GPU).
16
2 Initiate the interface MPI communication (CPU). GPUs, they will all be on the same node. This means that
17
3 CPU to GPU memory copy of interface nodes
18 when going from 4 to 8 GPUs there will be an additional
(CPU/GPU).
19 cost of transfer, as internal communication is not as fast as a
4 Initiate surface integral compute kernel (GPU).
20 memory copy within a node.
5 Update based on current stage (GPU).
21
Fo
32 communication work in steps 1, 2 and 3 is finished before 5.1.1 Cube domain As an initial validation of the
33 the volume kernel finishes. This way, communication costs simulator, a convergence test in a cube domain with
34 (x, y, z) ∈ [0, 1] and rigid boundaries is carried out. The
are completely hidden. This is possible because the volume
35
ev
36 kernel is completely asynchronous with respect to the host. simulation is initiated using a Gaussian pulse pressure initial
37 Step 4, however, cannot start until all communication of steps condition
38
iew
http://mc.manuscriptcentral.com/ijhpca
Prepared using sagej.cls
International Journal of High Performance Computing Applications Page 11 of 22
Melander et al. 9
1
2 boundaries. A convergence test is carried out, where the
3 domain is meshed either using affine mesh elements or
4 curvilinear mesh elements. The reference solution here is
5
6 computed using an extremely fine mesh of affine elements
7 (410 308 elements) and N = 8 basis function order. The
8 numerical error is then defined as e = ||pDGFEM − pDGFEM
fine ||
9 , where p DGFEM
is the simulated solution in 10 randomly
10
11 chosen receiver points at t = 0.01 and pDGFEMfine is the
12 simulated solution in the same receiver points for the fine
13 Figure 3. p-convergence for the rigid cube test case. mesh reference simulation.
14 Figure 5 shows the resulting p-convergence, tested for
15 Figure 4 shows the h-convergence for different orders of three different meshes of varying resolution, using either
16
the basis. Table 1 lists the resulting convergence rates. The
17 affine elements or curvilinear elements. When using affine
18 table also includes convergence rates for a case where the elements, the high-order convergence is destroyed, as the
19 time step size has been reduced significantly (CCFL = 0.02). convergence rate flattens out for orders higher than N ≈
20 A convergence rate in the range O(hN ) and O(hN +1 ) is 3. However, when using the curvilinear elements, spectral
21
Fo
25 meshes not being fine enough for these low orders to give an
26
accurate representation of their convergence. Furthermore, it
27
28 is seen that reducing the time step size has only marginal
ee
32
33
34 (a) Affine elements.
35
ev
36
37
38
iew
39
40
41
42
43 Figure 4. h-convergence for the rigid cube test case.
44
45
46 (b) Curvilinear elements.
47 N CCFL = 1 CCFL = 0.02
48 1 -0.778 -0.778 Figure 5. p-convergence for the cylinder geometry using either
affine or curvilinear meshes.
49 2 -3.239 -3.240
50
3 -3.919 -3.920 Further insights are offered in Figure 6, where h-
51
52 4 -4.618 -4.620 convergence is shown for both the affine and the curvilinear
53 5 -5.752 -5.752 case. It is seen that for the affine elements, a convergence rate
54 6 -6.188 -6.188 of around O(h2 ) is achieved consistent with the geometric
55
7 -7.754 -7.747 approximation, whereas a much higher convergence rate is
56
57 8 -8.161 -8.166 observed for the curvilinear elements. However, the error
58 Table 1. Convergence rates for the h-convergence study. plateaus at around 10−7 , which is likely due to aliasing errors
.
59
that stem from the transformation Jacobians and normals no
60
5.1.2 Cylinder domain Consider now a cylinder shaped longer being constant. This results in inexact integration of
domain, of 1 m height and 1 m radius and having rigid variable terms in the inner products used to compute the
http://mc.manuscriptcentral.com/ijhpca
Prepared using sagej.cls
Page 12 of 22 International Journal of High Performance Computing Applications
10 Journal Title XX(X)
1
2 matrix based operators. Anti-aliasing techniques could be layer backed by an air cavity (BC2). The surface admittance
3 employed to reduce the error further (Engsig-Karup et al. of the BCs are estimated using Miki’s model and a transfer
4 2016; Kirby and Karniadakis 2003). However, in practical matrix method (Miki 1990; Allard and Atalla 2009). Vector
5
6 room acoustics simulations, error levels below 10−7 are fitting is used for the multipole mapping (Gustavsen and
7 rarely needed and, furthermore, such anti-aliasing techniques Semlyen 1999). The material is mapped in the frequency
8 increases the computational cost of the scheme. range of 50 to 1500 Hz. An estimate of the relative mapping
9 error is given by
10
11
|Y (f ) − Yfit (f )|
12 = , (36)
13 Y (f )
14
15 where Y indicates the mean across frequency. Table 2
16 summarizes the properties of the considered boundary
17 conditions and the multipole mapping. Note that more poles
18
are needed in the mapping of BC1, because the admittance
19
20 response fluctuates more with frequency for this absorber.
21 (a) Affine elements.
Fo
22
23 ID dmat d0 σmat Npoles Re Im
24 BC1 0.02 0.2 50 000 8 0.0216 0.0177
rP
25
BC2 0.05 0.0 10 000 4 0.0213 0.0071
26
27 Table 2. Material properties and multipole mapping error for the
28 BCs used in the boundary validation study. dmat [m] is the
ee
39 5.2 Boundary condition validation between the analytic solution and the simulation is found for
40 both BCs considered, both in terms of magnitude and phase.
41 A validation study of the locally reacting frequency
This confirms the high precision of the BC implementation.
42 dependent boundary conditions is carried out, using a normal
43 incidence single reflection case. For this case, an analytic
44
solution exists (Thomasson 1976). A 6 m × 10 m × 10 m
45
46 rectangular domain is meshed with an unstructured mesh of
47 tetrahedral elements. The frequency dependent impedance 5.3 Heterogeneous multi-device CPU-GPU
48 boundary is the Y Z plane at x = 0. The resolution of the performance
49
mesh is roughly 8 points per wavelength (PPW) at 1 kHz,
50 5.3.1 Strong scaling Strong scaling is characterised by
51 the highest frequency of interest. The source is placed 2
how the runtime changes while keeping the problem size
52 m from the impedance boundary (S : (2, 5, 5) m) and the
53 constant and increasing the number of processing units. Thus
receiver is placed between the source and the boundary
54 the overall workload is constant, while the workload of each
(R : (1, 5, 5) m). The simulation duration is tfinal = 0.02 s,
55 processing unit decreases, as the workload is split among an
56 which ensures that no parasitic reflections from the other
increasing number of processing units.
57 room surfaces arrive at the receiver position. A Gaussian
58 pulse initial condition, centered at the source position, with Four meshes of varying resolutions are constructed,
59
sxyz = 0.2 m2 is used. described in Table 3. These meshes are used to analyse the
60
Two boundary conditions are considered; a porous strong scaling of the simulator for two different polynomial
material mounted on a rigid backing (BC1) and a thin porous orders, N = 4 and N = 6. In all cases, rigid BCs are used.
http://mc.manuscriptcentral.com/ijhpca
Prepared using sagej.cls
International Journal of High Performance Computing Applications Page 13 of 22
Melander et al. 11
1
2 #GPUs Mesh 1 Mesh 2 Mesh 3 Mesh 4
3 1 1 1 − −
4
5 2 0.630 0.903 1 1
6 4 0.393 0.656 0.851 0.913
7 8 0.290 0.586 0.704 0.743
8 12 0.159 0.384 0.505 0.573
9
Table 4. Strong scaling computational efficiency ηS for the
10
meshes in Table 3 with N = 4. ”−” denotes that running the
11 (a) Magnitude.
simulation was not possible due to it requiring more memory
12 than the GPUs in the setup have.
13
14
15 As two of the meshes with N = 4 were not able to be
16 run on 1 GPU due to memory limitations, a comparison
17
where using M b = 2 GPUs as a baseline is shown in table
18
19 5. These numbers are more directly comparable between
20 meshes. As expected, good scaling occurs for high GPU
21 (b) Phase. counts, provided a high resolution mesh is used to ensure that
Fo
22
the efficiency does not suffer due to the overhead associated
23 Figure 7. Simulation of a single reflection from a locally
24 reacting frequency dependent impedance boundary compared with communication and synchronization of the cores.
rP
http://mc.manuscriptcentral.com/ijhpca
Prepared using sagej.cls
Page 14 of 22 International Journal of High Performance Computing Applications
12 Journal Title XX(X)
1
2 involved. Because of the increased size of the problem, variables, then we see that at least two million field values
3 meshes 3 and 4 require a minimum of 4 and 8 GPUs, would have to be streamed to and/or from device memory
4 respectively. Thus, only the strong scaling performance for to achieve the target 80% efficiency of the GPU memory
5
6 meshes 1 and 2 is considered. Table 6 shows the strong bus. Through the lens of this simplified model, it is apparent
7 scaling efficiency factor ηS . The scaling efficiency has that the efficiency of a memory bound kernel degrades
8 improved, as compared to the N = 4 case, due to the for small problem sizes which is precisely the regime we
9 increase in DoF. enter when performing a strong scaling study, wherein the
10
11 number of GPUs is increases but the amount of problem
#GPUs Mesh 1 Mesh 2
12 data remains fixed. In practice the kernel launch time can
13 1 1 1
be significantly higher and the achieved throughput can be
14 2 0.813 0.923
significantly lower, however the fundamental limit to strong
15 4 0.525 0.859
scaling remains.
16 8 0.390 0.700
17
18 12 0.235 0.486
19 Table 6. Strong scaling computational efficiency ηS for two of 5.3.2 Weak scaling Weak scaling is characterised by how
20 the meshes in Table 3 with N = 6. the runtime changes when fixing the problem size for each
21 processing unit. Thus the problem size is increased when
Fo
22 We combine the information about mesh 1 and 2 from increasing the number of processing units, to keep the overall
23
Table 4 and 6 in Figure 9. Here we clearly see the efficiency workload constant on each processing unit. The weak scaling
24
rP
TR
29
30 The nature of weak scaling requires us to construct new
31 meshes, these meshes are shown in Table 7. The meshes are
rR
32
structured meshes, with around 55 000 elements per GPU.
33
34
35
ev
http://mc.manuscriptcentral.com/ijhpca
Prepared using sagej.cls
International Journal of High Performance Computing Applications Page 15 of 22
Melander et al. 13
1
2 Mesh 1 Mesh 2
3 #GPUs Rigid BC1 BC2 Rigid BC1 BC2
4
5 1 1 1 1 1 1 1
6 2 0.868 0.844 0.842 0.926 0.924 0.890
7 4 0.569 0.554 0.551 0.880 0.872 0.846
8 8 0.419 0.413 0.413 0.714 0.708 0.687
9
Table 8. Strong scaling computational efficiency ηS for different
10
boundary conditions.
11
12 Figure 10. Weak scaling computational efficiency for the
13 meshes in Table 7.
14
15 Mesh 1 Mesh 2
16 #GPUs BC1 BC2 BC1 BC2
17 The results for both the strong and weak scaling show
1 +0.43% −0.09% +0.86% + − 2.67%
18 that the number of GPUs one should use depends heavily
19 on the problem size. Adding more compute power for 2 +3.25% +2.89% +1.26% +0.90%
20 an application can greatly decrease the overall runtime 4 +3.01% +3.01% +1.87% +0.86%
21 8 +1.44% +1.06% +1.87% +0.91%
Fo
32
It is therefore of interest to analyze how the runtime of
33 5.3.3 Influence of BCs on performance By having non-
34 the simulator grows as the valid frequency range of the
rigid BCs, an additional computation must take place at
35
ev
http://mc.manuscriptcentral.com/ijhpca
Prepared using sagej.cls
Page 16 of 22 International Journal of High Performance Computing Applications
14 Journal Title XX(X)
1
2 the initial condition u0 , are related by to ensure valid simulation results up to a certain frequency
3 fmax . This allows us to analyse how the runtime changes
4 u0 = uN e−ω̂N ∆t , (40) as the frequency range of the simulation is extended,
5
6 when using different basis function orders. Two rooms are
where ω̂ is the numerical frequency, which will differ from
7 considered, a 4 m × 2.7 m × 3 m ‘bedroom’ and a 9 m × 7 m
8 the exact frequency ω due to the dispersion of the numerical × 3m ‘classroom’. In both cases, frequency independent BCs
9 scheme. This non-linear equation can be solved numerically are used and a simulation time (impulse response length) of
10 for ω̂, and in this work a Levenberg-Marquardt algorithm
11 1 s is simulated. The rooms are simulated using 4 GPUs.
is used. By comparing the numerical frequency against the
12
13 exact one, the dispersion relationship can be established
14 since cd /c = ω̂/ω, where cd is the numerical wave speed.
Figure 12 shows the measured runtime for the
15 Figure 11 shows an example dispersion relationship for the
16 two rooms. The runtime is measured for fmax =
N = 2, 4, 6 basis function orders. In this dispersion analysis,
17 250, 500, 1000, 2000, 4000 Hz. The runtime increases
18 the time step is kept very small, ∆t = 10−5 s and wave speed
quickly with frequency in all cases, after a certain frequency
19 c = 1 m/s, to ensure that the spatial errors dominate. Thus,
is reached. This is due to the inherent nature of having to
20 the analysis only takes into account the spatial discretization
21 mesh the three dimensional volume of the room. However,
Fo
orders.
36 good fit is observed.
37
38
iew
39
40
41
42
43
44
45
46
Figure 11. 1D dispersion analysis results when using
47
∆t = 0.0001, c = 1 m/s and n = 2 time steps.
48
49
50 (a) Bedroom.
51 N =2 N =4 N =6
52 Points per wavelength 17.7 7.3 6.0
53
54 Table 10. Points-per-wavelength needed to ensure spatial
dispersion errors remain under 2%, based on the 1D dispersion
55 analysis of figure 11.
56
57
58 Table 10 lists the PPW needed to ensure that spatial (b) Classroom.
59
dispersion errors remain under 2% for the dispersion analysis
60 Figure 12. Measured runtime as a function of simulation
example case considered. Using these PPW values as frequency range fmax . The dashed lines are the fitted runtime
4
reference, we can construct meshes with the right resolution model on the form r = afmax .
http://mc.manuscriptcentral.com/ijhpca
Prepared using sagej.cls
International Journal of High Performance Computing Applications Page 17 of 22
Melander et al. 15
1
2 Bedroom Classroom
3 N a R2 a R2
4
5 2 1.04 · 10−8 0.92 2.74 · 10−8 0.96
6 4 2.11 · 10−10 0.86 8.36 · 10−10 0.91
7 6 1.06 · 10−10 0.85 3.28 · 10−10 0.88
8 4
Table 11. Coefficients a for the runtime model r = afmax and
9 associated R2 values for evaluating the goodness of the fit.
10
11
12 5.5 Large shoebox-shaped rooms
13
14 To give additional insights into the performance of the
15 simulator for challenging practical cases, we consider two
Figure 13. Geometric model of auditorium B341-A21 at DTU.
16 very large shoebox-shaped rooms. The first is 5 000 m3 ,
17
18 which is meant to imitate a large auditorium, and the other
is 20 000 m3 , which is meant to imitate a large concert hall. GPUs results in slightly worse performance compared to
19
20 These rooms are meshed using a structured mesh, N = 4 the case of using 8 GPUs, indicating that communication
21 costs between cores have begun to dominate. For the concert
Fo
25 interest. Thus, it is reasonable to assume that when using drops considerably as the number of GPUs is increased,
26 N = 4 basis order, the accuracy is likely acceptable at and likely would drop further if more than 12 GPUs were
27 used. We have thus shown that for very large rooms, we
frequencies beyond 1000 Hz. For reference, the Schroeder
28
ee
http://mc.manuscriptcentral.com/ijhpca
Prepared using sagej.cls
Page 18 of 22 International Journal of High Performance Computing Applications
16 Journal Title XX(X)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17 Figure 15. Geometric model of dome shaped room.
18
19
20
21
Fo
22
23
24
rP
http://mc.manuscriptcentral.com/ijhpca
Prepared using sagej.cls
International Journal of High Performance Computing Applications Page 19 of 22
Melander et al. 17
1
2 simulations far beyond the restricted use-case of small rooms Filter for Room Acoustics Applications, IEEE/ACM Trans.
3 and frequencies below the Schroeder frequency, which wave- Audio, Speech, Lang. Proc. vol. 23.8: pp. 1368-1380.
4 based methods are historically typically associated with. Lopez JJ, Carnicero D, Ferrando N and Escolano J (2013)
5
6 The heterogeneous multi-device scalability benchmarks Parallelization of the finite-difference time-domain method for
7 show that the simulator can handle problems of nearly room acoustics modelling based on CUDA, Math. Comput.
8 arbitrary size, as long as the number of GPUs is selected Mod. vol. 57.7: pp. 1822-1831.
9
to ensure that the workload on each GPU is sufficient, Ainsworth M (2004) Dispersive and dissipative behaviour of
10
11 to avoid costs, related to communication between CPU high order discontinuous Galerkin finite element methods, J.
12 cores starting to dominate the runtime. While the new Comput. Phys. vol. 198.1: pp. 106-130.
13 model has been designed for massive parallelism and Hu F and Atkins H (2002) Two-dimensional wave analysis of
14
scalability utilizing heterogeneous multi-device CPU-GPU the discontinuous Galerkin method with non-uniform grids
15
16 many-core hardware, there are still opportunities for further and boundary conditions, 8th AIAA/CEAS Aeroacoustics
17 improvements of the performance. It is also shown that Conference and Exhibit.
18 using mesh-adaptive high-order numerical methods is highly Eleuterio FT (1999) Riemann Solvers and Numerical Methods for
19 beneficial for room acoustic simulations, due to the improved
20 Fluid Dynamics: A Practical Introduction, Springer-Verlag, 2.
21 physical description such wave-based room acoustic model edition.
Fo
http://mc.manuscriptcentral.com/ijhpca
Prepared using sagej.cls
Page 20 of 22 International Journal of High Performance Computing Applications
18 Journal Title XX(X)
1
2 Am. vol 139.4: pp. 1822-1832. Naylor GM (1993) ODEON—Another hybrid room acoustical
3 Geuzaine C and Remacle J-F (2009) Gmsh: a three-dimensional model, Appl. Acoust. vol 38.2-4: pp. 131-143.
4
finite element mesh generator with built-in pre- and post- Kennedy CA and Carpenter MH (2003) Additive Runge-Kutta
5
6 processing facilities, Int. J. Num. Methods Eng. vol 79.11: pp. schemes for convection-diffusion-reaction equations, AAppl.
7 1309-1331. Num. Math. vol 44.1: pp. 139-181.
8 Hesthaven J (1998) From Electrostatics to Almost Optimal Nodal Savioja L and Svensson UP (2015) Overview of geometrical room
9
Sets for Polynomial Interpolation in a Simplex, SIAM J. Num. acoustic modeling techniques, J. Acoust. Soc. Am. vol 138.2:
10
11 Analysis. vol 35.2: pp. 655-676. pp. 708-730.
12 Engsig-Karup AP, Eskilsson C and Bigoni D (2016) A stabilised Gustavsen B and Semlyen A (1999) Rational approximation of
13 nodal spectral element method for fully nonlinear water waves, frequency domain responses by vector fitting, IEEE Trans. Pow.
14
J. Comp. Physics. vol 318: pp. 1-21. Del. vol 14.3: pp. 1052-1061.
15
16 Kirby RM and Karniadakis GE (2003) De-aliasing on non-uniform Gustavsen B (2008) Fast Passivity Enforcement for Pole-Residue
17 grids: algorithms and applications, J. Comput. Physics. vol Models by Perturbation of Residue Matrix Eigenvalues, IEEE
18 191.1: pp. 249-264. Trans. Pow. Del. vol 23.4: pp. 2278-2285.
19
20 Okuzono T, Shimizu N and Sakagami K (2019) Predicting Carpenter MH and Kennedy C (1994) Fourth-order 2N-storage
21 absorption characteristics of single-leaf permeable membrane Runge-Kutta schemes, NASA Report TM 109112. NASA
Fo
22 absorbers using finite element method in a time domain, Appl. Langley Research Center.
23
Acoust. vol 151: pp. 172-182. Wang H, Yang J and Hornikx M (2020) Frequency-dependent
24
rP
25 Biot MA (1956) The theory of propagation of elastic waves in a transmission boundary condition in the acoustic time-domain
26 fluid-saturated porous solid. I. Low frequency range. II. Higher nodal discontinuous Galerkin model, Appl. Acoust. vol. 164:
27 frequency range, J. Acoust. Soc. Am. vol 128: pp. 168-191. pp. 107280.
28
ee
Panneton R (2007) Comments on the limp frame equivalent fluid Dumbser M, Kaser M and Toro EF (2007) An arbitrary high-
29
30 model for porous media, J. Acoust. Soc. Am. vol 122.6: pp. order Discontinuous Galerkin method for elastic waves on
31 217-222. unstructured meshes - V. Local time stepping and p-adaptivity,
rR
vol 183: pp. 129-142. Dynamic compressibility of air in porous structures at audible
36 de Bree H, Lanoye R, de Cock S and van Heck J (2005) In frequencies, J. Acoust. Soc. Am. vol. 102.4: pp. 1995-2006.
37
situ, broad band method to determine the normal and oblique Jeong C-H and Brunskog J (2013) The equivalent incidence angle
38
iew
39 reflection coefficient of acoustic materials, Proc. Noise. Vib. for porous absorbers backed by a hard surface, J. Acoust. Soc.
40 Conf. Am. vol. 134.6: pp. 4590-4598.
41 Ducourneau J, Planeau V, Chatillon J and Nejade A (2009) Jeong C-H (2012) Absorption and impedance boundary conditions
42
Measurement of sound absorption coefficients of flat surfaces for phased geometrical-acoustics methods, J. Acoust. Soc. Am.
43
44 in a workshop, Appl. Acoust. vol 70: pp. 710-721. vol. 132.4: pp. 2347-2358.
45 Tamura M (1990) Spatial Fourier transform method of measuring Ingard U (1951) On the Reflection of a Spherical Sound Wave from
46 reflection coefficients at oblique incidence. I. Theory and an Infinite Plane, J. Acoust. Soc. Am. vol. 23.3: pp. 329-335.
47
numerical examples, J. Acoust. Soc. Am. vol 88: pp. 2259- Hesthaven JS and Teng CH (2000) Stable Spectral Methods on
48
49 2264. Tetrahedral Elements, SIAM J. Sci. Comput. vol. 21.6: pp.
50 Hesthaven JS and Warburton T (2008) Nodal Discontinuous 2352-1270.
51
Galerkin Methods – Algorithms, Analysis, and Applications, Thomasson S-I (1976) Reflection of waves from a point source by an
52
53 Springer, New York. impedance boundary, J. Acoust. Soc. Am. vol. 59.4: pp. 780-
54 Bliss DB and Burke SE (1982) Experimental investigation of the 785.
55 bulk reaction boundary condition, J. Acoust. Soc. Am. vol Tomiku R, Otsuru T, Okamoto N, Okuzono T and Shibata T (2011)
56
71.3: pp. 546-551. Finite element sound field analysis in a reverberation room
57
58 Grote MJ, Kray M, Nataf F and Assous F (2017) Time-dependent using ensemble averaged surface normal impedance, Proc.
59 wave splitting and source separation, J. Comput. Phys. vol 330: 40th Int. Cong. Expo. Noise Cont. Eng.
60 pp. 981-996. Hargreaves JA, Rendell LR and Lam YW (2019) A framework
for auralization of boundary element method simulations
http://mc.manuscriptcentral.com/ijhpca
Prepared using sagej.cls
International Journal of High Performance Computing Applications Page 21 of 22
Melander et al. 19
1
2 including source and receiver directivity, J. Acoust. Soc. Am. bounding surfaces, J. Sound Vib. vol. 309.1: pp. 167-177.
3 vol. 145.4: pp. 2625-2637. Yousefzadeh B and Hodgson M (2012) Energy- and wave-based
4
Harari I and Hughes TJR (1992) A cost comparison of boundary beam-tracing prediction of room-acoustical parameters using
5
6 element and finite element methods for problems of time- different boundary conditions, J. Acoust. Soc. Am. vol. 132.3:
7 harmonic acoustics, Comput. Methods Appl. Mech. Eng. vol. pp. 1450-1461.
8 97.1: pp. 77-102. Wareing A and Hodgson M (2005) Beam-tracing model for
9
Botteldooren D (1995) Finite-difference time-domain simulation of predicting sound fields in rooms with multilayer bounding
10
11 low-frequency room acoustic problems, J. Acoust. Soc. Am. surfaces, J. Acoust. Soc. Am. vol. 118.4: pp. 2321-2331.
12 vol. 98.6: pp. 3302-3308. Gunnarsdóttir K, Jeong C-H and Marbjerg G (2015) Acoustic
13 Craggs A (1995) A finite element method for free vibration of air behavior of porous ceiling absorbers based on local and
14
in ducts and rooms with absorbing walls, J. Sound Vib. vol. extended reaction, J. Acoust. Soc. Am. vol. 137.1: pp. 509-512.
15
16 173.4: pp. 568-576. Jeong C-H, Marbjerg G and Brunskog J (2016) Uncertainty of
17 Bilbao S (2013) Modeling of Complex Geometries and Boundary input data for room acoustic simulations, Proc. Baltic-Nordic
18 Conditions in Finite Difference/Finite Volume Time Domain Acoust. Meet.
19
20 Room Acoustics Simulation, IEEE Trans. Audio, Speech, Lang. Wang H and Hornikx M (2020) Time-domain impedance boundary
21 Proc. vol. 21.7: pp. 1524-1533. condition modeling with the discontinuous Galerkin method for
Fo
22 Bilbao S, Hamilton B, Botts J and Savioja L (2016) Finite Volume room acoustics simulations, J. Acoust. Soc. Am. vol. 147.4: pp.
23
Time Domain Room Acoustics Simulation under General 2534-2546.
24
rP
25 Impedance Boundary Conditions, IEEE Trans. Audio, Speech, Suh JS and Nelson PA (1999) Measurement of transient response
26 Lang. Proc. vol. 24.1: pp. 161-173. of rooms and comparison with geometrical acoustic models, J.
27 Lam YW (2005) Issues for computer modelling of room acoustics Acoust. Soc. Am. vol. 105.4: pp. 2304-2317.
28
ee
in non-concert hall settings, Acoust. Sci. Tech. vol. 26.2: pp. Kirby R (2014) On the modification of Delany and Bazley fomulae,
29
30 145-155. Appl. Acoust. vol. 86: pp. 47-49.
31 Brinkmann F, Aspock L, Ackermann D, Lepa S, Vorlander M Sides DJ and Mulholland KA (1971) The variation of normal layer
rR
32 and Weinzierl S (2019) A round robin on room acoustical impedance with angle of incidence, J. Sound Vib. vol. 14.1: pp.
33
simulation and auralization, J. Acoust. Soc. Am. vol. 145.4: 139-142.
34
35
ev
pp. 2746-2760. Shaw EAG (1953) The acoustic wave guide. II. Some specific
36 Richard A, Fernandez-Grande E, Brunskog J and Jeong C-H (2017) normal acoustic impedance measurements of typical porous
37 Estimation of surface impedance at oblique incidence based on surfaces with respect to normally and obliquely incident waves,
38
iew
39 sparse array processing, J. Acoust. Soc. Am. vol. 141.6: pp. J. Acoust. Soc. Am. vol. 25.2: pp. 231-235.
40 4115-4125. Klein C and Cops A (1980) Angle dependence of the impedance of
41 Miki Y (1990) Acoustical properties of porous materials— a porous layer, Acustica vol. 44.4: pp. 258-264.
42
Modifications of Delany-Bazley models, J. Acoust. Soc. Jap. Jeon C-H (2011) Guideline for Adopting the Local Reaction
43
44 vol. 11.1: pp. 19-24. Assumption for Porous Absorbers in Terms of Random
45 Marbjerg G, Brunskog J, Jeong C-H and Nilsson E (2015) Incidence Absorption Coefficients, Acta Acust. Acust. vol.
46 Development and validation of a combined phased acoustical 97.5: pp. 779-790.
47
radiosity and image source model for predicting sound fields in Aretz M and Vorlander M (2010) Efficient modelling of absorbing
48
49 rooms, J. Acoust. Soc. Am. vol. 138.3: pp. 1457-1468. boundaries in room acoustic FE simulations, Acta Acust.
50 Marbjerg G, Brunskog J and Jeong C-H (2018) The difficulties of Acust. vol. 96.6: pp. 1042-1050.
51 simulating the acoustics of an empty rectangular room with an Vorlander M (2013) Computer simulations in room acoustics:
52
absorbing ceiling, Appl. Acoust. vol. 141: pp. 35-48.
53 Concepts and uncertainties, J. Acoust. Soc. Am. vol. 133.3:
54 Kuttruff H (2009) Room Acoustics, Taylor & Francis, 5th edition, pp. 1203-1213.
55 Ch. 2, 7. Kreiss H-O and Oliger J (1972) Comparison of accurate methods
56
Bradley JS, Reich R and Norcross SG (1999) A just noticeable for the integration of hyperbolic equations, Tellus vol. 24.3: pp.
57
58 difference in C50 for speech, Appl. Acoust. vol. 58.2: pp. 99- 199-215.
59 108. ISO 3382-1 (2009) Acoustics—Measurement of room acoustic
60 Hodgson M and Wareing A (2008) Comparisons of predicted parameters—Part 1: Performance spaces, International Orga-
steady-state levels in rooms with extended- and local-reaction nization for Standardization.
http://mc.manuscriptcentral.com/ijhpca
Prepared using sagej.cls
Page 22 of 22 International Journal of High Performance Computing Applications
20 Journal Title XX(X)
1
2 ISO 3382-3 (2009) Acoustics—Measurement of room acoustic
3 parameters—Part 3: Open plan offices, International Organi-
4 zation for Standardization.
5
6 ISO 354(2003) Acoustics—Acoustics—Measurement of sound
7 absorption in a reverberation room, International Organization
8 for Standardization.
9
Engsig-Karup AP, Glimberg S, Nielsen A and Lindbergr O (2013)
10
11 Fast hydrodynamics on heterogeneous many-core hardware,
12 Designing Scientific Applications on GPUs, CRC Press /
13 Taylor & Francis Group, Boca Raton: pp. 251-294.
14
Ren M and Jacobsen F (1993) A method of measuring the dynamic
15
16 flow resistance and reactance of porous materials, Appl.
17 Acoust. vol. 39.4: pp. 265-276.
18 Taflove A and Hagness SC (2013) Computational Electrodynamics:
19
20 The Finite-Difference Time-Domain Method, Artech House,
21 Inc., Boston, 3rd edition, ch. 9.
Fo
http://mc.manuscriptcentral.com/ijhpca
Prepared using sagej.cls