You are on page 1of 12

AIAA JOURNAL

Vol. 55, No. 7, July 2017

High-Order Implicit Large-Eddy Simulations


of Flow over a NACA0021 Aerofoil

J. S. Park,∗ F. D. Witherden,† and P. E. Vincent†


Imperial College London, London, England SW7 2AZ, United Kingdom
DOI: 10.2514/1.J055304
In this study the graphical-processing-unit-accelerated solver PyFR is used to simulate flow over a NACA0021
aerofoil in deep stall at a Reynolds number of 270,000 using the high-order flux reconstruction approach.
Wall-resolved implicit large-eddy simulations are undertaken on unstructured hexahedral meshes at fourth- and
fifth-order accuracy in space. It was found that either modal filtering or antialiasing via an approximate L2
projection is required in order to stabilize simulations. Time-span-averaged pressure coefficient distributions on
the aerofoil and associated lift and drag coefficients are seen to converge toward experimental data as the
simulation setup is made more realistic by increasing the aerofoil span. Indeed, the lift and drag coefficients
obtained by fifth-order implicit large-eddy simulation with antialiasing via an approximate L2 projection agree
better with experimental data than a wide range of previous studies. Stabilization via modal filtering, however, is
Downloaded by 136.159.213.169 on March 6, 2024 | http://arc.aiaa.org | DOI: 10.2514/1.J055304

found to reduce solution accuracy. Finally, performance of various PyFR simulations is compared, and it is found
that fifth-order simulations with antialiasing via an L2 projection are the most efficient. Results indicate that
high-order flux reconstruction schemes with antialiasing via an L2 projection are a good candidate for
underpinning accurate wall-resolved implicit large-eddy simulation of separated, turbulent flows over complex
engineering geometries.

I. Introduction When employing the FR approach, it is customary to use a


collocation projection in order to project the nonlinear flux function
U NSTEADY turbulent flow with separation is a challenging regime
for computational fluid dynamics. Conventional Reynolds-
averaged Navier–Stokes approaches are ill suited [1] since they
into the polynomial space of the solution. However, in the case of
underresolved solutions and nonlinear fluxes, as encountered when
simulating turbulent flows, this type of projection can result in
temporally average large-scale unsteady coherent structures and scale-
aliasing-driven instabilities. This issue is well known and has been
resolving strategies such as large-eddy simulation (LES) and direct
theoretically analyzed by Jameson et al. [8] and Hesthaven and
numerical simulation (DNS) are often too expensive for industrial Warburton [3], among others. The effect of aliasing within the context
applications. High-order numerical methods for mixed unstructured of turbulent flow simulations has also been investigated by Gassner
grids offer the promise of increased simulation accuracy in the vicinity of et al. [9]. A variety of strategies to mitigate aliasing exists, including
complex engineering geometries and thus a route to efficient scale- filtering [3], adding spectral vanishing viscosity [10], switching to
resolving simulations of unsteady turbulent flows with separation in an a skew-symmetric formulation [11,12], and antialiasing via an
industrial context. approximate L2 projection with a suitable quadrature rule [13].
Over the last decade, various high-order methods for mixed Effective mitigation of aliasing-driven instabilities is critical to
unstructured grids have been developed. One such method is the enabling the industrial adoption of high-order schemes for real-world
flux reconstruction (FR) approach [2], which provides a unifying unsteady flow problems.
framework for various high-order schemes. Using FR, it is possible In this study, we use PyFR to simulate unsteady turbulent flow over
to recover the nodal discontinuous Galerkin (DG) schemes of a NACA0021 aerofoil in deep stall, at a Reynolds number of 270,000,
Hesthaven and Warburton [3] and, for linear flux functions, spectral using a wall-resolved implicit large-eddy simulation (ILES)
difference schemes, originally proposed by Kopriva and Kolias [4] approach. In Sec. II, we provide a brief summary of the FR
and later popularised by Wang et al. [5]. When combined with approach, including the extensions required to incorporate modal
explicit time-marching schemes, FR methods exhibit a significant filtering and antialiasing via an approximate L2 projection. In Sec. III,
degree of temporal/spatial locality, which is favorable when considering we provide a detailed description of the NACA0021 test case. In
recent hardware architectures, such as graphical processing Sec. IV, stability is assessed along with the affects of both modal
units (GPUs). filtering and antialiasing via an approximate L2 projection. Also,
PyFR [6] is an open-source Python-based framework for solving various flow metrics are computed and compared with experimental
advection-diffusion-type problems using the FR approach. It is data, and the performance of various simulations is assessed. Finally,
capable of solving the Euler and compressible Navier–Stokes in Sec. V, conclusions are drawn.
equations on mixed unstructured grids. Furthermore, it is also
designed to target a range of modern hardware platforms, including
heterogeneous mixtures of CPUs and GPUs, via C/OpenMP, CUDA, II. Methodology
and OpenCL backends [7]. A. Flux Reconstruction
For a detailed overview of the multidimensional FR approach, the
reader is referred to the work of Witherden et al. [6] and the references
Received 22 April 2016; revision received 25 November 2016; accepted for therein. In this section, we will outline salient numerical aspects of FR
publication 20 March 2017; published online Open Access 7 June 2017. within the context of solving the three-dimensional Navier–Stokes
Copyright © 2017 by J. S. Park, F. D. Witherden, and P. E. Vincent. Published equations on unstructured hexahedral grids. In conservative form, the
by the American Institute of Aeronautics and Astronautics, Inc., with Navier–Stokes equations can be expressed as
permission. All requests for copying and permission to reprint should be
submitted to CCC at www.copyright.com; employ the ISSN 0001-1452
∂uα
(print) or 1533-385X (online) to initiate your request. See also AIAA Rights  ∇ ⋅ f α u; ∇u  0 (1)
and Permissions www.aiaa.org/randp. ∂t
*Department of Aeronautics; jin-seok.park@imperial.ac.uk (Corresponding
Author). where α is the field variable index, u  ux; t  ρ; ρvx ; ρvy ; ρvz ; E

Department of Aeronautics. is the solution, ρ is the mass density of the fluid, v  vx ; vy ; vz  is
2186
PARK, WITHERDEN, AND VINCENT 2187

the fluid velocity vector, and E is the total energy per unit volume. This is accomplished as follows. First, the solution polynomial is
For a perfect gas, the pressure p and total energy are related through the evaluated at each of the flux points. As the flux points on the faces of
ideal gas law adjacent elements pair up, this results in two distinct values of the
solution at a single point in physical space. By employing a local
p 1 discontinuous Galerkin (LDG)-type method, it is possible to correct
E  ρkvk2 (2)
γ−1 2 these discontinuous solutions to obtain an approximation, which is
C0 continuous at the flux points and the gradient of which sits in
where γ  cp ∕cv is the ratio of specific heats. The flux can be P3k Ω.
~ The gradients of this approximate solution, along with the
written as f α  f α u; ∇u  f cα − f vα , where f c is the inviscid flux original discontinuous solution defined by Eq. (7), can be used to
given by compute a discontinuous flux within each as
0 1
ρvx ρvy ρvz X
Nu
B ρv2x  p ρvx vy ρuvz C f~D
B C iα x
~  f~ijα uij ; ∇uij lj x
~ (8)
fc  B
B ρvx vy ρv2y  p ρvy vz C C (3) j
@ ρvx w ρvvz ρv2z  p A
ρvx E  p ρvy E  p ρvz E  p where the transformed flux is given by

and f v is the viscous flux


f~ijα u; ∇u  Ji x~u −1 u
j J i x~j f α u; ∇u (9)
0 1
0 0 0
B T xx T xz C
Downloaded by 136.159.213.169 on March 6, 2024 | http://arc.aiaa.org | DOI: 10.2514/1.J055304

B T xy C Furthermore, using the solution gradient, it is also possible to


fv  B
B T xy T yy T yz C
C (4) compute an approximation of the inviscid and viscous fluxes at each
@ T xz T yz T zz A flux point pair. These can then be combined to compute a normal
θx θy θz transformed flux, which is common to both flux points. For the
inviscid portion, this involves application of an approximate
where Riemann solver, whereas for the viscous portion, a LDG-type scheme
can be employed. This common approximation, however, is
X
3
∂T generally different from that obtained by evaluating the transformed
θ i  vj T ij − ς (5) discontinuous flux of Eq. (8) at the flux point and dotting it with the
j
∂xi associated normal vector.
The final step involves scaling the vector correction function
Here, ς is the thermal conductivity coefficient, T is the absolute associated with each flux point by the difference between the
static temperature, and T ij is the viscous tensor, which can be common and and discontinuous flux at each flux point. By summing
determined by the Stokes hypothesis the scaled corrections from all of the flux points and adding these onto
  f~D 0
iα , the result will be C continuous at each of the flux points. The
∂vi ∂vj 2 ∂v divergence of this polynomial can then be taken to give the desired
T ij  μ  − μ l (6)
∂xj ∂xi 3 ∂xl semidiscretized form of the governing system.

where μ is the dynamic viscosity. B. Aliasing-Driven Instabilities


To solve the system using the FR approach, we start by Inside of a FR step, it is necessary, at several points, to perform
decomposing the physical domain Ω into a set of N E conformal, polynomial projections: specifically, the projection of the transformed
nonoverlapping, hexahedral elements. Take Ωi to refer to the ith flux into the polynomial space of the solution within an element,
element in this set. It is convenient, for reasons of both mathematical projection of the transformed normal common interface flux into the
simplicity and computational efficiency, to work in a transformed polynomial space of the correction function divergence on the face of
space. We accomplish this by introducing a standard element Ω, ~
an element, and the projection of the divergence of the corrected flux
which exists in a transformed space x. ~ Take J i x~ to be the relevant into the space of the solution within an element. In the prescription
Jacobian matrix for the ith element and Ji x ~ its determinant. We outlined previously, these projections are performed by simply
proceed to associate this standard element with a set of N u solution evaluating the relevant functional at the appropriate set of nodal basis
points denoted by fx~u i g ∈ Ω and a set of N f flux points denoted by
~
points, a so-called collocation projection.
fx~f
i g ∈ ∂ Ω.
~ At each flux point, we define a polynomial correction Although extremely efficient, collocation-type projections can
function gf i  ~
x with the property that nf f f
i ⋅ gj x~i   δij , where result in aliasing, whereby energy from modes that are unresolved is
nf
i is the unit normal vector associated with the ith flux point. The erroneously transferred into those that are. This aliasing can impact
specific form of these correction functions determines various not just the accuracy of the solution but also the stability. Aliasing can
properties of the resulting FR scheme, including stability and be mitigated through modal filtering [3], adding spectral vanishing
accuracy [2,14,15]. In particular, by the appropriate choice of viscosity [10], switching to a skew-symmetric formulation [11,12],
correction functions, one can recover the nodal DG scheme [2]. and antialiasing via an approximate L2 projection with a suitable
Inside of each element, we represent the solution as a degree k quadrature rule [13]. Two of these approaches are considered in this
polynomial uiα ∈ Pk Ω. ~ This polynomial can be defined using a
study: modal filtering and antialiasing via an approximate L2
nodal basis as
projection. Modal filtering is perhaps the most straightforward
X
Nu approach to suppressing aliasing errors. It is relatively simple to
uiα x
~  uijα lj x
~ (7) implement and relatively cheap to apply at a given time step.
j However, it often requires careful parameter tuning in order to
achieve a balance between stability and accuracy and even then can
where uijα is the value of the α field variable at the jth solution point be overly dissipative. Antialiasing via an approximate L2 projection,
inside of the ith element and {lj x}
~ are a set of nodal basis functions on the other hand, can be relatively complex to implement, requiring
defined such that lj x~u
i   δij . These polynomials are permitted to significant changes to a codebase, and can be relatively expensive to
be discontinuous between elements. The objective in FR is to obtain apply at a given time step. However, if the L2 projection is sufficiently
an approximation of ∇ ⋅ f α in the same polynomial space as the accurate, it can suppress aliasing without additional tuning parameters
solution. and without being overly dissipative.
2188 PARK, WITHERDEN, AND VINCENT

C. Modal Filtering Table 1 Summary of simulations


It is known that the effects of aliasing are most pronounced in the Name Meshes k Stabilization
highest-frequency modes of the expansion [16–18]. Therefore, one
P3-N P3 3 None
means of stabilizing a simulation is to filter the higher modes of the P3-F P3 3 Modal filtering
expansion [3] every N time steps. The modal expansion of the P3-AA P3 3 L2 projection
solution can be written as P4-N P4 4 None
P4-AA P4 4 L2 projection
X
Nu
uiα x
~  u^ ijα Lj x
~ (10)
j
integrand, as is the case when solving the Navier–Stokes equations,
where u^ ijα are the modal coefficients and Lj x~ are the orthonormal the result is always approximate on account of f~iα x ~ being
modal basis functions. A popular choice of filter is an exponential inherently nonpolynomial. However, the error can be reduced by
filter [3]. Such a filter can be expressed as a diagonal matrix that acts choosing a quadrature rule of a sufficiently high degree. Generally,
on the vector of modal coefficients as this results in N q ≫ N u .
 When antialiasing via an approximate L2 projection, it is necessary
1 if deg Li ≤ ηc
Λii  (11) to evaluate the relevant functional at the chosen set of quadrature
exp−κdeg Li − ηc ∕ηm − ηc s  otherwise points. This requires all arguments of the functional, for example, the
solution and its gradients, to be interpolated to these points. Next, the
where κ ≈ − log ϵ with ϵ being the machine precision, ηc < ηm is a functional itself must be evaluated at the abscissa and a summation
cutoff parameter, s is the strength of the filter, deg Li is the degree of performed to obtain the desired modal coefficients. As a consequence,
Downloaded by 136.159.213.169 on March 6, 2024 | http://arc.aiaa.org | DOI: 10.2514/1.J055304

the ith modal basis function, and ηm  k  1. Damping is only antialiasing via an approximate L2 projection, especially with a large
applied to modes of which the degree is greater than the cutoff with number of points, is more computationally expensive than a simple
higher modes receiving progressively more damping. The rate at collocation projection.
which this ramps up is controlled by s.

D. Antialiasing via Approximate L2 Projection III. Flow over NACA0021 Aerofoil in Deep Stall
The principle behind antialiasing is to compute the modal A. Overview
expansion coefficients of the desired polynomial exactly. Taking the In this study, we use PyFR to simulate unsteady separated
projection of the nonlinear volume flux as an example, we note that in turbulent flow over a NACA0021 aerofoil in deep stall. The aerofoil
modal form the polynomial inside of the ith element can be expressed is set at a 60 deg angle of attack to the oncoming freestream flow. The
as Reynolds number Re∞ (based on the aerofoil chord c, the freestream
velocity u∞, the freestream density ρ∞, and the fluid viscosity μ) is set
X
Nu
at 270,000. The Mach number Ma∞ (based on the freestream
f~D
iα x
~  f^ ijα Lj x
~ (12) velocity u∞, the freestream density ρ∞, and the freestream pressure
j
p∞ ) is set at 0.1. The setup exhibits significant unsteady dynamics
and is hence a good test case for scale-resolving approaches such
with the modal coefficients being given by as detached-eddy simulation (DES), LES, and DNS. Moreover,
Z significant experimental [19] and computational (from the DESider
f^ ijα  f~iα xL
~ j x
~ dx~ (13) project) [20,21] data exist for the test case, including time-span-
Ω
~ averaged pressure coefficient distributions on the aerofoil, associated
lift and drag coefficients, and lift and drag coefficient spectra.
where the integration domain Ω ~ is that of the reference element. It can
be readily verified that the resulting polynomial is optimal in that it is
the one that minimizes the norm kf~D ~ − f~iα xk
iα x ~ Ω;2
~ . The integrals
B. Domains and Meshes
can be approximated through Gaussian quadrature in which Four domains were considered. All had an extent of 30c in the
stream- and crosswise directions. However, the spanwise extent Lz
Z X
Nq was varied. Specifically, values of Lz  c, 2c, 4c, 7c were
f~iα xL
~ j x
~ dx~ ≈ ωl f~iα x~j Lj x~l  (14) considered. Previous experimental data were obtained with a span of
Ω
~
l 7.2c [19], and previous computational results suggest that
simulations should employ a span of at least 4c [20]. The leading
where {x~l } are the abscissa of an N q -point quadrature rule and {ωl } edge of the aerofoil was located at the center of the cross-streamwise
are a set of associated weights. In the case of a nonpolynomial

Fig. 2 Modal filter coefficients for the P3-F simulation obtained with
Fig. 1 View of P3 mesh with Lz  c. κ  36, s  16, and ηc  1.
PARK, WITHERDEN, AND VINCENT 2189

Table 2 Effect of filter strength s and a distance equivalent to y ≈ 1 from the wall (where y is based on
cutoff ηc on the stability of a P3-F simulation a flat plate boundary layer at Reynolds number 270,000). Elements of
with Lz  c up to t  400c∕u∞ with fixed the P3 meshes had a characteristic size 0.05c in the stream- and
κ  36 and N  50
crosswise directions near the aerofoil and a uniform extent 0.05c in
s ηc Outcome the span-wise direction throughout the domain. Elements of the P4
16 1 Stable meshes had a characteristic size 0.0625c in the stream- and crosswise
32 1 Diverged (at t  95.87c∕u∞ ) directions near the aerofoil and a uniform extent 0.0625c in the
16 2 Diverged (at t  57.69c∕u∞ ) spanwise direction throughout the domain. For both meshes, the
32 2 Diverged (at t  30.17c∕u∞ ) majority of elements was concentrated near the aerofoil, as shown
in Fig. 1.
In total, the P3 meshes had 80,840 elements per spanwise extent c,
and the P4 meshes had 51,632 elements per spanwise extent c.
However, since third-order solution polynomials require 64 solution
Table 3 Effect of application frequency points within each hexahedral element and fourth-order solution
N on the stability of a P3-F simulation with polynomials require 125, the total number of degrees of freedom per
Lz  c up to t  400c∕u∞ with fixed κ  36, spanwise extent c was similar across P3 and P4 meshes.
s  16, and ηc  1
N Outcome C. Methodology
50 Stable The compressible Navier–Stokes equations with constant viscosity
75 Diverged (at t  100.54c∕u∞ ) were solved using PyFR version 0.8.0 [6]. A DG scheme was used for
Diverged (at t  1.92c∕u∞ )
Downloaded by 136.159.213.169 on March 6, 2024 | http://arc.aiaa.org | DOI: 10.2514/1.J055304

100 the spatial discretisation, solution points were located as a tensor


125 Diverged (at t  0.92c∕u∞ ) product of Gauss–Legendre points within each hexahedra, flux points
were located as a tensor product of Gauss–Legendre points on each
face of each hexahedra, common inviscid fluxes were computed via a
RoeM-type Riemann solver [23], and common viscous fluxes were
plane. Riemann-invariant boundary conditions were applied in the far computed using a LDG approach [24]. An explicit RK45[2R]
field, a no-slip adiabatic wall boundary condition was applied on the scheme [25] with adaptive PI time-step control [26] was used to
aerofoil surface, and a periodic condition was imposed in the advance the solution in time. A wall-resolved ILES approach was
spanwise direction. adopted, and hence no turbulence model was employed.
For each domain, two quadratically curved hexahedral meshes Simulations on P3 and P4 meshes with Lz  c, 2c, 4c, 7c
were made using Gmsh [22], one for simulation with third-order were started from a freestream initial condition with Re∞  27;000.
solution polynomials (nominally fourth-order accurate in space), Each simulation was then run “natively” without any antialiasing
henceforth referred to as P3 meshes, and one for simulation with for 100c∕u∞ time units using first-order solution polynomials to
fourth-order solution polynomials (nominally fifth-order accurate facilitate the passing of initial transients. After this warmup period,
in space), henceforth referred to as P4 meshes. All meshes were various simulations were restarted on each mesh with Re∞  270;000.
unstructured in the stream- and crosswise directions but uniformly Specifically, P3 meshes with Lz  c, 2c, 4c, 7c were restarted with
extruded in the spanwise direction. Elements adjacent to the aerofoil third-order solution polynomials and 1) antialiasing via modal filtering,
were sized such that the first (Gauss–Legendre) solution point sat henceforth referred to as P3-F simulations; 2) antialiasing via an L2

a) Lz = c b) Lz = 2c

c) Lz = 4c d) Lz = 7c
Fig. 3 Time- and span-averaged Cp distributions on the pressure and suction surfaces of the aerofoil.
2190 PARK, WITHERDEN, AND VINCENT

projection, henceforth referred to as P3-AA simulations; and approximate L2 projection of the transformed flux into the polynomial
3) natively without any antialiasing, henceforth referred to as P3-N space of the solution (within an element) and an approximate L2
simulations. P4 meshes with Lz  c, 2c, 4c, 7c were restarted with projection of the transformed normal common interface flux into the
fourth-order solution polynomials and 1) antialiasing via an L2 polynomial space of the correction function divergence (on the face of
projection, henceforth referred to as P4-AA simulations, and an element).
2) natively without any antialiasing, henceforth referred to as P4-N We note that the filter parameters were selected based on the results
simulations. All the aforementioned simulations, which are of a series of numerical experiments using a P3 mesh with Lz  c and
summarized in Table 1, were run for a further 400c∕u∞ time units, third-order solution polynomials. First, experiments were performed
over which period data were extracted for analysis. varying filter strength s and cutoff ηc , with fixed κ  36 and N  50.
In the previous description, antialiasing via modal filtering refers Outcomes are presented in Table 2. Looking at the table, it is clear that
to the application of Eq. (11) with κ  36, s  16, and ηc  1 every only the strongest filter (s  16 and ηc  1) is stable up to
N  50 time steps. A visual representation of the resulting filter t  400c∕u∞ . Next, for this strongest filter, with fixed κ  36,
coefficients is shown in Fig. 2. Antialiasing via an L2 projection on s  16, and ηc  1, experiments were performed varying
the P3 and P4 meshes, with third-order and fourth-order solution application frequency N. Outcomes are presented in Table 3.
polynomials, respectively, refers to use of 9th-order and 11th-order Looking at the table, it is clear that only the highest application
Gauss–Legendre quadrature rules, respectively, to perform an frequency (N  50) is stable up to t  400c∕u∞ .
Downloaded by 136.159.213.169 on March 6, 2024 | http://arc.aiaa.org | DOI: 10.2514/1.J055304

Fig. 4 Cl and Cd obtained from P3-F simulations, along with the Fig. 6 Cl and Cd obtained from P4-AA simulations, along with the
experimental data of [19] and previous numerical results [20] (gray experimental data of [19] and previous numerical results [20] (gray
markers; see Table 4). markers; see Table 4).

Fig. 5 Cl and Cd obtained from P3-AA simulations, along with the


experimental data of [19] and previous numerical results [20] (gray Fig. 7 Percentage errors, relative to experimental data [19], in Cl and
markers; see Table 4). Cd obtained from P3-F, P3-AA, and P4-AA simulations.
PARK, WITHERDEN, AND VINCENT 2191

IV. Numerical Results Table 4 Marker legend for Figs. 4, 5, 6, and 8


A. Stability Marker Approach Lz
It was found that P3-N and P4-N simulations quickly diverged for ○ DLR (SA DDES) c
all Lz considered. However, P3-F, P3-AA, and P4-AA simulations ▵ IMFT (k-ω OEM DES) c
⋄ NLR (X-LES) c
NUMECA (SA DES) c
TUB (SALSA DES) c
TUB (LLR DES) c
NTS (SST-SAS) 2c
NTS (TRRANS) 2c
▴ TUB (CEASM DES) 3.24c
♦ ANSYS (SST-SAS) 4c
▪ NTS (SA DES) 4c
EADS-M (SA DES) 7.2c

remained stable for all Lz . Consequently, all subsequent results


are from P3-F, P3-AA, and P4-AA simulations. This finding clearly
demonstrates the importance of suppressing aliasing-driven instabilities
when using FR schemes.
Downloaded by 136.159.213.169 on March 6, 2024 | http://arc.aiaa.org | DOI: 10.2514/1.J055304

B. Comparison with Experimental Data


1. Pressure Distributions
Figure 3 plots time- and span-averaged distributions of pressure
coefficient Cp on the pressure and suction surfaces of the aerofoil. All
simulations achieved good agreement with the experimental data
on the pressure surface. P3-AA and P4-AA simulations achieved
increasingly good agreement with the experimental data on the suction
surface as Lz was increased, visually converging to the experimental
Fig. 8 Percentage errors, relative to experimental data [19], in Cl and data when Lz ≥ 4c. P3-F simulations exhibited the same trend.
Cd obtained from various previous numerical studies [20] (gray markers; However, they failed to visually converge to the experimental data,
see Table 4). even when Lz  7c.

a) Lz = c b) Lz = 2c

c) Lz = 4c d) Lz = 7c
Fig. 9 Plots of PSD of Cl from various PyFR simulations and similar experimental data [19]. It should be noted that, while the PyFR data were obtained
from Cl (the time- and surface-averaged lift coefficient), the experimental data were obtained from a sectional lift coefficient at a fixed spanwise location.
Peak 1 and peak 2 mark distinct peaks in the experimental PSD.
2192 PARK, WITHERDEN, AND VINCENT

Table 5 St for peak 1 and peak 2 of the PSD of Cl for all increased, the PyFR simulations converge toward the experimental
simulations (for reference, the experimental St for peak 1 data, and when Lz ≥ 4c, the P3-AA and P4-AA simulations exhibit
was 0.1994 and for peak 2 was 0.3987 [19]) less than 2.5% deviation in Cl and Cd . Finally, we note that the
Span P3-F P3-AA P4-AA P4-AA simulation with Lz  7c achieves the best agreement with
Lz Peak 1 Peak 2 Peak 1 Peak 2 Peak 1 Peak 2 experimental data and that the resulting errors in Cl and Cd are lower
than any of those obtained from the DESider project [20,21].
c 0.1660 0.3320 0.2348 0.4435 0.1855 0.3027
2c 0.1855 0.3418 0.1855 0.3613 0.1953 0.3711
4c 0.1855 0.3809 0.1953 0.3906 0.1953 0.3906 3. Force Spectra
7c 0.1855 0.3809 0.1953 0.3906 0.1953 0.3809 Figure 9 compares the power spectral density (PSD) of Cl with
related experimental data of [19], where St is a Strouhal number
based on the freestream velocity u∞ and the chord c.
2. Time-Averaged Force Coefficients To obtain the PSD, Cl was sampled every 0.025c∕u∞ time
Figures 4, 5, and 6 compare the time-averaged lift coefficient Cl units, and the PSD was computed using Welch’s averaged
and time-averaged drag coefficient Cd with experimental data [19] periodogram method with windows of length 4096 samples and
and computational results from the DESider project [20,21], for P3-F, a shift between windows of 10 samples. It should be noted that,
P3-AA, and P4-AA simulations, respectively Table 4. Figures 7 and 8 while the PyFR data were obtained from Cl (the time- and
plot percentage errors in Cl and Cd , relative to experimental data [19], surface-averaged lift coefficient), the experimental was obtained
for PyFR simulations and computational results from the DESider from a sectional lift coefficient at a fixed spanwise location.
project [20,21], respectively. Consequently, it is only meaningful to compare frequencies for
As one would expect, the overall trends are similar to those peak 1 and peak 2 (see Fig. 9), which are presented in Table 5.
observed for the Cp distributions of Fig. 3. When Lz  c, all of the We see that the P3-AA and P4-AA simulations with Lz  4c and
Downloaded by 136.159.213.169 on March 6, 2024 | http://arc.aiaa.org | DOI: 10.2514/1.J055304

PyFR simulations overpredict both Cl and Cd . This is especially Lz  7c achieved very good agreement with the experimental
evident for the P3-F simulation with Lz  c. However, as Lz is results.

Fig. 10 Time-averaged nondimensional pressure contours, with streamlines, obtained from P3-AA simulations with various Lz . Streamlines were
generated by integrating forward and backward from 50 equispaced seed points located along the line x  0.8.
PARK, WITHERDEN, AND VINCENT 2193

Fig. 11 Instantaneous isosurfaces of the Q criterion colored by nondimensional density and cut by the midspan plane for P3-AA simulations with Lz  c
and Lz  4c.
Downloaded by 136.159.213.169 on March 6, 2024 | http://arc.aiaa.org | DOI: 10.2514/1.J055304

Fig. 12 Instantaneous color maps of nondimensional density on the midspan plane. Solid lines denote location of cut planes used for visualization in
Fig. 13.

Fig. 13 Instantaneous isosurfaces of the Q criterion colored by nondimensional density and cut by a plane described in Fig. 12 for P3-AA simulations with
Lz  c and Lz  4c.
2194 PARK, WITHERDEN, AND VINCENT
Downloaded by 136.159.213.169 on March 6, 2024 | http://arc.aiaa.org | DOI: 10.2514/1.J055304

Fig. 14 Time-averaged nondimensional pressure contours, with streamlines, obtained from P3-F and P3-AA simulations with Lz  c. Streamlines were
generated by integrating forward and backward from 50 equispaced seed points located along the line x  0.8.

Fig. 15 Instantaneous isosurfaces of the Q criterion colored by nondimensional density and cut by the midspan plane for P3-F and P3-AA simulations
with Lz  c.

Fig. 16 Wall clock time for each simulation to run 0.1c∕u∞ time units,
normalized by the wall clock time required for the P4-AA simulation with Fig. 17 Average time step for each simulation as measured over
Lz  c to run 0.1c∕u∞ time units. 0.1c∕u∞ time units.
PARK, WITHERDEN, AND VINCENT 2195

C. Discussion Further insight is gained by plotting isosurfaces of Q criterion,


1. Effect of Lz which define vortical structures [27], obtained from P3-AA
Increasing Lz toward the experimental value of 7.2c was found to simulations with Lz  c and Lz  4c. Figure 11 plots such
give consistently better agreement with experimental data, in terms isosurfaces colored by nondimensional density and cut by the
of both time-averaged statistics and spectral characteristics. This midspan plane. Small-scale vortical structures for the Lz  c case are
appears to be an improvement compared to previous results from the elongated and predominately perpendicular to the spanwise
DESider project [20,21], which, as illustrated in Fig. 8, exhibited a direction, whereas those for the Lz  4c case appear more isotropic.
degree of scatter in time-averaged statistics as Lz was varied, Also, the large region of low density (indicating the presence of a
including cases in which time-averaged statistics became worse with large-scale vortical structure) behind the suction surface is more
increasing Lz . pronounced for the Lz  c case compared with the Lz  4c case.
To better understand Lz dependence, Fig. 10 plots time-averaged This is further illustrated by Fig. 12, which plots colour maps of
nondimensional pressure contours, with streamlines, on the midspan nondimensional density on the midspan plane, and Fig. 13, which
plots isosurfaces of Q criterion colored by nondimensional density
plane obtained from the P3-AA simulations for the four different Lz .
and cut by a plane described in Fig. 12.
In all cases, there are two primary vortices behind the suction surface
of the aerofoil, which gives rise to a low-pressure region. As Lz is
2. Effect of Antialiasing via Modal Filtering and
increased, these vortices move farther away from the suction surface,
Approximate L2 Projection
and the extent of the pressure drop in the wake is also reduced. These
observations suggest that as Lz is increased the two primary vortices Antialiasing via an L2 projection is found to give consistently
become weaker. more accurate results than antialiasing via modal filtering.
Figure 14 plots time-averaged nondimensional pressure contours,
with streamlines, on the midspan plane obtained from P3-F and
Downloaded by 136.159.213.169 on March 6, 2024 | http://arc.aiaa.org | DOI: 10.2514/1.J055304

P3-AA simulations with Lz  c. For the P3-F case, the two


primary vortices are closer to the surface of the aerofoil than
those obtained by the P3-AA simulation. The P3-F simulation
also has a larger low-pressure region, which indicates that the
primary vortices of the P3-F simulation are stronger than those
of the P3-AA simulation.
Further insight is gained by plotting isosurfaces of the Q criterion,
obtained from P3-F and P3-AA simulations with Lz  c. Figure 15
plots isosurfaces of the Q criterion colored by nondimensional
density and cut by the midspan plane. It is clear that the P3-AA
simulation resolves a wider range of small-scale vortical structures
compared to the P3-F simulation. The inability of the filtered
P3-F simulations to accurately resolve small-scale structures likely
underlies their observed lack of convergence to the experimental data
as Lz is increased.

D. Computational Cost
Fig. 18 Average wall clock time per time step for each simulation as
measured over 0.1c∕u∞ time units, normalised by the wall clock time A total of 12 simulations are run on a variety of different GPU-
required for the P4-AA simulation with Lz  c to run 0.1c∕u∞ time accelerated machines: local clusters at Imperial College, London,
units. Emerald at the Science and Technology Facilities Council

Fig. 19 Average time step for the P3-F simulations with various filter application frequencies N as measured over 0.1c∕u∞ time units.
2196 PARK, WITHERDEN, AND VINCENT

Rutherford Appleton Laboratory, the Wilkes cluster at Cambridge Acknowledgments


University, and Piz Daint at the Swiss National Supercomputing The authors would like to thank the Engineering and Physical
Center. All these machines consist of NVIDIA GPUs and are Sciences Research Council for their support via an Early Career
targeted using the CUDA backend of PyFR. Fellowship (EP/K027379/1), a Platform Grant (EP/L000407/1), the
To compare computational cost across all the simulations, we Hyper Flux project (EP/M50676X/1), and a Doctoral Training
choose to conduct performance testing on single machine, Piz Grant. Finally, the authors would like to thank the Centre for
Daint, which is comprised of NVIDIA K20X GPUs. This is Innovation for access to the Emerald graphical processing unit
accomplished by running 0.1c∕u∞ extra time units after the final cluster, Cambridge University, for access to the Wilkes graphical
solution for each simulation. The number of NVIDIA K20X processing unit cluster and the Swiss National Supercomputing
GPUs is selected to be 10Lz ∕c (effectively weak scaling). Hence, Centre for access to the Piz Daint graphical processing unit cluster.
when Lz  c, a total of ten GPUs are employed, whereas at Data related to this publication is available as electronic
Lz  7c, a total of 70 GPUs are used. Depending on the order of Supplementary Material.
accuracy and the type of antialiasing approach, GPU memory load
varies from 5 to 10% of capacity. This setup mirrors how the
actual simulations are performed in practice with a greater number References
of GPUs being used at larger spans and loading toward the limit of [1] Slotnick, J., Khodadoust, A., Alonso, J., Darmofal, D., Gropp, W.,
strong scaling. Lurie, E., and Mavriplis, D., “CFD Vision 2030 Study: A Path to
Measurements of the wall clock time, excluding that required Revolutionary Computational Aerosciences,” NASA/CR 2014-
for file input/output, can be seen in Fig. 16. From the figure, it is 218178, 2014.
clear that the P4-AA simulations are quickest for all Lz . This is [2] Huynh, H. T., “A Flux Reconstruction Approach to High-Order Schemes
Including Discontinuous Galerkin Methods,” 18th AIAA Computational
attributed to the fact that the P4 meshes are significantly coarser
Downloaded by 136.159.213.169 on March 6, 2024 | http://arc.aiaa.org | DOI: 10.2514/1.J055304

Fluid Dynamics Conference, AIAA Paper 2007-4079, 2007.


than the P3 meshes. The increase in element size serves to offset [3] Hesthaven, J. S., and Warburton, T., Nodal Discontinuous Galerkin
the more severe time-step restriction that comes from increasing Methods: Algorithms, Analysis, and Applications, Springer–Verlag,
the solution polynomial order, so as to permit a larger overall Berlin, 2008.
time step to be taken as shown in Fig. 17. This in turn offsets the [4] Kopriva, D. A., and Kolias, J. H., “A Conservative Staggered-Grid
higher wall clock time per time step for the P4-AA simulations, Chebyshev Multidomain Method for Compressible Flows,” Journal of
shown in Fig. 18, leading to an overall reduction in time to Computational Physics, Vol. 125, No. 1, 1996, pp. 244–261.
doi:10.1006/jcph.1996.0091
solution.
[5] Wang, Z. J., Liu, Y., May, G., and Jameson, A., “Spectral Difference
Finally, we note that, while model filtering is generally Method for Unstructured Grids II: Extension to the Euler Equations,”
considered a cheaper route to antialiasing than an approximate L2 Journal of Scientific Computing, Vol. 32, No. 1, 2007, pp. 45–71.
projection, in this case it leads to a higher wall clock time. This was doi:10.1007/s10915-006-9113-9
because the adaptive PI time-step controller [26] selects a smaller [6] Witherden, F. D., Farrington, A. M., and Vincent, P. E., “PyFR: An Open
time step for P3-F simulations than P3-AA simulations, as can be Source Framework for Solving Advection–Diffusion Type Problems on
seen in Fig. 17. This reduction in time-step size offsets the reduced Streaming Architectures Using the Flux Reconstruction Approach,”
wall clock time per time step for the P3-F simulations, shown in Computer Physics Communications, Vol. 185, No. 11, Nov. 2014,
pp. 3028–3040.
Fig. 18, leading to an overall increase in time to solution. Further doi:10.1016/j.cpc.2014.07.011
studies revealed that this effect is proportional to the filter [7] Witherden, F. D., Vermeire, B. C., and Vincent, P. E., “Heterogeneous
application frequency N, as shown in Fig. 19. Thus, when Computing on Mixed Unstructured Grids with PyFR,” Computers and
employing an adaptive PI time-step controller, it is important to Fluids, Vol. 120, Oct. 2015, pp. 173–186.
apply the filter as infrequently as possible. The interaction between doi:10.1016/j.compfluid.2015.07.016
modal filtering and adaptive time stepping remains a topic for [8] Jameson, A., Vincent, P. E., and Castonguay, P., “On the Non-Linear
further research. Stability of Flux Reconstruction Schemes,” Journal of Scientific
Computing, Vol. 50, No. 2, 2012, pp. 434–445.
doi:10.1007/s10915-011-9490-6
[9] Gassner, G. J., and Beck, A. D., “On the Accuracy of High-Order
V. Conclusions Discretizations for Underresolved Turbulence Simulations,” Theoretical
and Computational Fluid Dynamics, Vol. 27, Nos. 3–4, 2013,
In this study, the open-source GPU-accelerated computational pp. 221–237.
fluid dynamics solver PyFR was used to simulate flow over a doi:10.1007/s00162-011-0253-7
NACA0021 aerofoil in deep stall at a Reynolds number of 270,000 [10] Kirby, R. M., and Sherwin, S. J., “Stabilisation of Spectral/hp Element
using the high-order FR approach. Wall-resolved ILES were Methods Through Spectral Vanishing Viscosity: Application to Fluid
undertaken on unstructured hexahedral meshes at nominally fourth- Mechanics Modelling,” Computer Methods in Applied Mechanics and
Engineering, Vol. 195, Nos. 23–24, 2006, pp. 3128–3144.
and fifth-order accuracy in space. It was found that either modal doi:10.1016/j.cma.2004.09.019
filtering or antialiasing via an approximate L2 projection was [11] Mahesh, K., Constantinescu, G., and Moin, P., “A Numerical Method for
required in order to stabilize simulations. Results for time-span- Large-Eddy Simulation in Complex Geometries,” Journal of
averaged pressure coefficient distributions on the aerofoil and Computational Physics, Vol. 197, No. 1, 2004, pp. 215–240.
associated lift and drag coefficients were seen to converge toward doi:10.1016/j.jcp.2003.11.031
experimental data as the simulation setup was made more realistic by [12] Gassner, G. J., “A Skew-Symmetric Discontinuous Galerkin Spectral
increasing the aerofoil span. Indeed, the lift and drag coefficients Element Discretization and its Relation to SBP-SAT Finite Difference
Methods,” SIAM Journal on Scientific Computing, Vol. 35, No. 3, 2013,
obtained by fifth-order accurate ILES with antialiasing via an
pp. A1233–A1253.
approximate L2 projection agree with experimental data better than a doi:10.1137/120890144
wide range of previous studies. Stabilization via modal filtering, [13] Mengaldo, G., De Grazia, D., Moxey, D., Vincent, P. E., and Sherwin, S. J.,
however, was found to reduce solution accuracy relative to “Dealiasing Techniques for High-Order Spectral Element Methods on
experimental results. Finally, the performance of various PyFR Regular and Irregular Grids,” Journal of Computational Physics, Vol. 299,
simulations was compared, and it was found that nominally fifth- Oct. 2015, pp. 56–81.
order simulations with antialiasing via an approximate L2 projection doi:10.1016/j.jcp.2015.06.032
[14] Vincent, P. E., Castonguay, P., and Jameson, A., “A New Class of
were the most efficient. The results indicate that high-order GPU-
High-Order Energy Stable Flux Reconstruction Schemes,” Journal of
accelerated FR schemes with antialiasing via an approximate L2 Scientific Computing, Vol. 47, No. 1, April 2011, pp. 50–72.
projection are a good candidate for underpinning accurate wall- doi:10.1007/s10915-010-9420-z
resolved ILES of unsteady, separated, turbulent flows in the vicinity [15] Vincent, P. E., Farrington, A. M., Witherden, F. D., and Jameson, A.,
of complex engineering geometries. “An Extended Range of Stable-Symmetric-Conservative Flux
PARK, WITHERDEN, AND VINCENT 2197

Reconstruction Correction Functions,” Computer Methods in Applied International Journal for Numerical Methods in Engineering, Vol. 79,
Mechanics and Engineering, Vol. 296, Nov. 2015, pp. 248–272. No. 11, Sept. 2009, pp. 1309–1331.
doi:10.1016/j.cma.2015.07.023 doi:10.1002/nme.v79:11
[16] Karniadakis, G., and Sherwin, S. J., Spectral/hp Element Methods for [23] Kim, S.-S., Kim, C., Rho, O.-H., and Hong, S. K., “Cures for the
Computational Fluid Dynamics, Oxford Univ. Press, Oxford, England, Shock Instability: Development of a Shock-Stable Roe Scheme,”
U.K., 2005. Journal of Computational Physics, Vol. 185, No. 2, March 2003,
[17] Kirby, R. M., and Karniadakis, G., “De-Aliasing on Non-Uniform pp. 342–374.
Grids: Algorithms and Applications,” Journal of Computational doi:10.1016/S0021-9991(02)00037-2
Physics, Vol. 191, No. 1, 2003, pp. 249–264. [24] Cockburn, B., and Shu, C.-W., “The Local Discontinuous Galerkin
doi:10.1016/S0021-9991(03)00314-0 Method for Time-Dependent Convection-Diffusion Systems,” SIAM
[18] Kirby, R. M., and Sherwin, S. J., “Aliasing Errors Due to Quadratic Journal on Numerical Analysis, Vol. 35, No. 6, 1998, pp. 2440–2463.
Nonlinearities on Triangular Spectral/hp Element Discretisations,” doi:10.1137/S0036142997316712
Journal of Engineering Mathematics, Vol. 56, No. 3, 2006, pp. 273–288. [25] Kennedy, C. A., Carpenter, M. H., and Lewis, R. M., “Low-Storage,
doi:10.1007/s10665-006-9079-5 Explicit Runge–Kutta Schemes for the Compressible Navier–Stokes
[19] Swalwell, K. E., “The Effect of Turbulence on Stall of Horizontal Axis Equations,” Applied Numerical Mathematics, Vol. 35, No. 3, Nov. 2000,
Wind Turbines,” Ph.D. Thesis, Monash Univ., Melbourne, Australia, 2005. pp. 177–219.
[20] Garbaruk, A., Shur, M. L., Strelets, M. Kh., and Travin, A. K., doi:10.1016/S0168-9274(99)00141-5
“3 NACA0021 at 60 o Incidence,” DESider, A European Effort on Hybrid [26] Hairer, E., and Wanner, G., Solving Ordinary Differential Equations II.
RANS-LES Modelling: Results of the European-Union Funded Project, Stiff and Differential-Algebraic Problems, Springer–Verlag, Berlin,
2004-2007, edited by W. Haase, M. Braza, and A. Revell, Springer– 1996.
Verlag, Berlin, 2009, pp. 127–134, Chap. V Applications—Test Cases. [27] Hunt, J. C. R., Wray, A. A., and Moin, P., “Eddies, Streams, and
[21] Garbaruk, A., Leicher, S., Mockett, C. R., Spalart, P. R., Strelets, M. Kh., Convergence Zones in Turbulent Flows,” Proceedings of the Summer
and Thiele, F. H., “Evaluation of Time Sample and Span Size Effects in Program, Center for Turbulence Research, Center for Turbulence
Downloaded by 136.159.213.169 on March 6, 2024 | http://arc.aiaa.org | DOI: 10.2514/1.J055304

DES of Nominally 2D Airfoils Beyond Stall,” Progress in Hybrid RANS- Research, Stanford Univ., 1998, pp. 193–208.
LES Modelling, Vol. 111, edited by S. -H. Peng, P. Doerffer, and W. Haase,
Springer-Verlag, Berlin, 2010, pp. 87–99. H. Blackburn
[22] Geuzaine, C., and Remacle, J.-F., “Gmsh: A 3-D Finite Element Associate Editor
Mesh Generator with Built-in Pre- and Post-Processing Facilities,”

You might also like