You are on page 1of 16

Deep reinforcement learning based

synthetic jet control on disturbed flow over


airfoil
Cite as: Phys. Fluids 34, 033606 (2022); https://doi.org/10.1063/5.0080922
Submitted: 05 December 2021 • Accepted: 13 February 2022 • Published Online: 04 March 2022

Yi-Zhe Wang (王依哲), Yu-Fei Mei (梅宇飞), Nadine Aubry, et al.

ARTICLES YOU MAY BE INTERESTED IN

Global and local modal characteristics of supersonic open cavity flows


Physics of Fluids 34, 034104 (2022); https://doi.org/10.1063/5.0082808

Influence of nano- and micro-roughness on vortex generations of mixing flows in a cavity


Physics of Fluids 34, 032005 (2022); https://doi.org/10.1063/5.0083503

Applying deep reinforcement learning to active flow control in weakly turbulent conditions
Physics of Fluids 33, 037121 (2021); https://doi.org/10.1063/5.0037371

Phys. Fluids 34, 033606 (2022); https://doi.org/10.1063/5.0080922 34, 033606

© 2022 Author(s).
Physics of Fluids ARTICLE scitation.org/journal/phf

Deep reinforcement learning based synthetic jet


control on disturbed flow over airfoil
Cite as: Phys. Fluids 34, 033606 (2022); doi: 10.1063/5.0080922
Submitted: 5 December 2021 . Accepted: 13 February 2022 .
Published Online: 4 March 2022

Yi-Zhe Wang (王依哲),1 Yu-Fei Mei (梅宇飞),2 Nadine Aubry,3 Zhihua Chen (陈志华),1 Peng Wu (吴鹏),4
2,a)
and Wei-Tao Wu (吴威涛)

AFFILIATIONS
1
Key Laboratory of Transient Physics, Nanjing University of Science and Technology, Nanjing 210094, China
2
School of Mechanical Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
3
Department of Mechanical Engineering, Tufts University, Medford, Massachusetts 02155, USA
4
Artificial Organ Technology Laboratory, School of Mechanical and Electric Engineering, Soochow University, Suzhou 215137, China

a)
Author to whom correspondence should be addressed: weitaowwtw@njust.edu.cn

ABSTRACT
This paper applies deep reinforcement learning (DRL) on the synthetic jet control of flows over an NACA (National Advisory Committee
for Aeronautics) 0012 airfoil under weak turbulent condition. Based on the proximal policy optimization method, the appropriate strategy
for controlling the mass rate of a synthetic jet is successfully obtained at Re ¼ 3000. The effectiveness of the DRL based active flow control
(AFC) method is first demonstrated by studying the problem with constant inlet velocity, where a remarkable drag reduction of 27.0% and
lift enhancement of 27.7% are achieved, accompanied by an elimination of vortex shedding. Then, the complexity of the problem is increased
by changing the inlet velocity condition and reward function of the DRL algorithm. In particular, the inlet velocity conditions pulsating at
two different frequencies and their combination are further applied, where the airfoil wake becomes more difficult to suppress dynamically
and precisely; and the reward function additionally contains the goal of saving the energy consumed by the synergetic jets. After training, the
DRL agent still has the ability to find a proper control strategy, where significant drag reduction and lift stabilization are achieved, and the
agent with considerable energy saving is able to save the energy consumption of the synergetic jets for 83%. The performance of the DRL
based AFC proves the strong ability of DRL to deal with fluid dynamics problems usually showing high nonlinearity and also serves to
encourage further investigations on DRL based AFC.
Published under an exclusive license by AIP Publishing. https://doi.org/10.1063/5.0080922

I. INTRODUCTION delay transition and control flow separation and attaching dielectric
Wing performance has a significant impact on the climb rate, barrier discharge (DBD) plasma actuators on the surface of the airfoil
payload capacity, and endurance ability of an airplane and can be to accelerate the air.7–9 Although the methods are distinctive, finding
improved by structure design or proper flow control. In the past decades, efficient strategies for AFC remains a challenge, because of the high
researchers paid much attention to utilizing the passive flow control nonlinearity of the Navier–Stokes equation, which further requires
(PFC) and active flow control (AFC) to improve the performance of high dimensionality of control parameter spaces. Additionally,
airplane wings. The former is fixed at the designed stage and cannot be challenges—such as disturbance inherent to the physical environment
changed during work and thus lacks flexibility and adaptability, and error of sensors and actuators—also exist at the application level,
while the latter emerges as a controllable proactive approach that has which impose hard requirements on the ability of control algorithms
the potential to significantly change the flow characteristics instanta- to adapt robustly to external conditions.10,11 Hence, for a complex
neously, resulting in larger industrial benefits and broader applications. flow problem, the main issue with AFC is that it is difficult to find
Active flow control applied to airfoils has been a topic of investi- robust and efficient algorithms that can perform effective control.
gation for many years, where “active” refers to techniques in which In the past few years, the rapidly developing field of artificial
energy is expended to modify the flow.1 Based on different structures intelligence (AI) or machine learning (ML) has become a key solution
of airfoil assembled with or without flaps,2 examples of previous for problems of diverse disciplines involving big data, high nonlinear-
research for achieving AFC include using co-jets or synthetic jets3–6 to ity, and high dimensionality.12–14 ML specifically including genetic

Phys. Fluids 34, 033606 (2022); doi: 10.1063/5.0080922 34, 033606-1


Published under an exclusive license by AIP Publishing
Physics of Fluids ARTICLE scitation.org/journal/phf

programming, supervised/unsupervised learning, deep learning, rein- unprecedented Reynolds number, Re ¼ 3000. Then, the flow condi-
forcement learning, etc., has already found applications in fluid tion is set to be more complicated by applying a time-related pulsation
mechanics. For example, genetic programming is applied to search incoming flow, where the airfoil wake becomes more difficult to sup-
explicit control laws for reducing the recirculation zone behind a press. Specifically, we consider both drag reduction and energy saving
backward-facing step15 and to suppress vortex-induced vibrations of the synthetic jet as optimization aim, which further increases the
under a numerical simulation environment.16–18 The proper orthogo- challenge for the DRL to find a proper control strategy.
nal decomposition (POD) method is utilized to construct a reduced-
II. METHODS
order model via deep neural networks, successfully preserving its
accuracy with significantly lower offline and online costs.19 This whole section is divided into three main parts: description
Additionally, the convolutional neural network is used to extract fea- of the studied problem, methodology for the CFD numerical simula-
tures and construct a mapping function between the temporal evolu- tion, and application of the DRL algorithm. The DRL algorithm is
tion of the pressure signal and the velocity field.20–22 These successful developed based on Python and the open source framework
investigations indicate the applicability and prospect of machine learn- Tensorforce,38 which builds on the TensorFlow framework.39
ing in fluid mechanics.
Compared with other machine learning methods, deep reinforce- A. Problem description
ment learning (DRL) seems to be more suitable for AFC, because of The entire computational domain is configured as a two-
the major advantage that DRL only requires a well-defined (st , at , rt ) dimensional rectangle of size Lx  Ly ¼ 3:5D  1:4D, as depicted in
interface from the environment and can be used as an agnostic tool Fig. 1, where D is defined as the chord length of the NACA0015 airfoil.
for control and optimization tasks both in continuous and discrete The leading edge of the airfoil is located at the horizontal centerline,
contexts. It allows the agent, which is usually modeled by a deep neural away from the left domain boundary by 0:5D. For performing AFC,
network, to find an optimized control strategy through trial-and-error, three synthetic jets are located on the upper surface of the airfoil and
even when no solution is known a priori. DRL has attracted increasing horizontally away from the leading edge by 0:2D, 0:3D, and 0:4D. The
attention23 following its many successes in robotics control24 and net mass flow rates of all three jets, which are controlled by the DRL
sophisticated game playing such as Go.25,26 Literature on the applica- agent, are forced to be zero, and the directions of the jets are set to be
tion of DRL in the field of fluid mechanics from the past few years27 perpendicular to the airfoil surface. The injection velocity of the jets
includes optimizing sensor placement,28 aerodynamic shape optimiza- can be positive or negative, corresponding to blowing or suction,
tion for better drag–lift ratio,29,30 flow separation control,31,32 and syn- respectively. Under this configuration, there could be an extra injected
thetic jets for drag reduction.33 One can observe that the challenging momentum; also, to ensure that the drag reduction observed is the
systems that are successfully resolved by DRL have remarkably similar result of flow control rather than some sort of propulsion phenome-
properties of nonlinearity and high dimensionality to AFC.10,34 non, the injected mass flow rate values are maintained at no more
Consequently, DRL can be treated as a promising avenue for AFC of than 3% of the momentum intercepting the airfoil and, therefore, this
complex flow systems. small propulsion effect will be neglected in the following discussion.
AFC can be either open-loop or closed-loop, depending on
whether we collect flow information as feedback to adjust the control- B. Numerical simulation method
ler. Recently, Ghraieb et al.35 studied the open-loop control of laminar
In the present study, the flow is assumed to be viscous and
and turbulent flows using the single-step proximal policy optimization
incompressible. The continuity and momentum equations can be
(PPO, one of the state-of-the art algorithms of DRL), which is a novel
expressed in the nondimensional form as follows:
“degenerate” version of the PPO algorithm. Compared with open-loop
controls, well-designed closed-loop controls can be more adaptive and @u
effective over a wider range of flow conditions,36 despite being more þ u  ðruÞ ¼ rp þ Re1 Du;
@t (1)
challenging. For instance, using the normal PPO method,37 Rabault r  u ¼ 0;
et al.33 studied closed-loop continuous flow control for a circular
cylinder immersed in a laminar channel flow at Reynolds number where u is the nondimensional velocity, t is the nondimensional time
Re ¼ 100 using a pair of jets at the top and bottom of the cylinder, step, and p is the nondimensional pressure. The characteristic length
where approximately 8% drag reduction was achieved. Subsequently, and density are the chord length D and the density of fluid q, and the
an analogous model equipped four jets with higher Reynolds number characteristic time is chosen as the time duration of one action,
was further studied, which indicated that the flow can still be well con- Dtaction . The Reynolds number is defined as Re ¼ U D=v, where v is
trolled and 38.7% drag reduction was achieved.10 Lately, a controllable the kinematic viscosity of the fluid.40
flow regime using the DRL method was used on weakly turbulent con- To close (1), the boundary of the domain is partitioned (see
ditions, i.e., Re ¼ 1000, with remarkable drag reduction of around also in Fig. 1) into an inflow part Cin , a no-slip part Cwall , an out-
30%. All relevant studies indicate the potential of closed-loop DRL in flow part Cout (free outlet with zero velocity gradient and a constant
AFC with more complicated flow conditions. pressure), and three separate jets part Ci ði ¼ 1; 2; 3Þ. Besides the
In the current work, based on computational fluid dynamics top and bottom boundary, the no slip part also includes the surface
(CFD) simulations, DRL is applied to the AFC of flow over an airfoil of the airfoil. The inflow velocity in the streamwise direction (Cin )
under weak turbulent conditions. The flow control is achieved through is defined as
synthetic jets. We first verify the feasibility and efficiency of our
DRL-trained AFC with constant inlet velocity condition and an Uinlet ðyÞ ¼ Upara ðyÞ  f ðt Þ; (2)

Phys. Fluids 34, 033606 (2022); doi: 10.1063/5.0080922 34, 033606-2


Published under an exclusive license by AIP Publishing
Physics of Fluids ARTICLE scitation.org/journal/phf

FIG. 1. Geometrical description of the configuration and boundary conditions used for simulating the flow past the NACA0012 airfoil immersed in a two-dimensional flow
domain. The leading edge of the airfoil is set as the origin of the coordinate. Cin is the inlet boundary while Cout represents the outflow. Cwall corresponds to no-slip boundary
conditions implemented at the solid walls. The jets marked by red arcs on the upper surface of the airfoil are denoted by Ci ði ¼ 1; 2; 3Þ, with a negligible width.

where Upara ðyÞ is specified as a parabolic profile, refer to the previous where r is the Cauchy stress tensor, n is the unit vector normal to the
work,41 and f ðt Þ is a function used for describing time-varying char- outer airfoil surface, and ex ¼ ð1; 0Þ, ey ¼ ð0; 1Þ. The drag and lift
acteristics of the inlet in cases: one is constant and three others are coefficients are, respectively, normalized as follows:
time-related pulsation,
2FD
CD ¼ 2 (8)
Upara ðyÞ ¼ 4Um ð0:7D  yÞð0:7D þ yÞ=L2y ; (3) qU D
8
>
> 1 Constant; and
>
>
< 1 þ 0:25sinðxtÞ Form I;
f ðt Þ ¼ 2FL
> ð Þ
(4) CL ¼ : (9)
>
> 1 þ 0:25sin 2xt Form II; qU D
2
>
: 1 þ 0:25sinð0:5xtÞ þ 0:25sinð2xtÞ Form III;
The Strouhal number (St ), which is used to describe the char-
where Ly ¼ 0:14D is the height (along the y axis, as depicted in Fig. 1) acteristic frequency of oscillating flow phenomena, is defined as
of the domain, Um ¼ 0:45 is the maximum inflow velocity, and x follows:
determines the frequency of the pulse. Using this velocity profile, the
St ¼ fs  D=U ; (10)
mean velocity can be calculated as follows:
ð where fs is the shedding frequency computed from the periodic evolu-
1 0:7D 2 tion of the lift coefficient CL .
U ¼ Uinlet ðyÞdy ¼ Um : (5)
Ly 0:7D 3 In addition, the control is established on the basis that the total
mass flow rate injected by the jets is zero, i.e., the nondimensional
The Reynolds number in the present work is Re ¼ 3000.
mass flow rate Q1 þ Q2 þ Q3 ¼ 0. The injected mass flow rates are
A structured mesh size of approximately 12 000 cells has been
normalized as follows:
adopted after examining the grid resolution effect. The grid is refined
around the synthetic jet actuators, surface boundary layer, and wake
flow area to ensure appropriate resolution in important flow regions (as
shown in Fig. 2). The transient incompressible flow solver based on
OpenFOAM was used and the PIMPLE algorithm was adopted for
velocity–pressure coupling. Details on the grid independence study,
numerical schemes, and influence of the inlet location are available in
the Appendix. The numerical solution is obtained at each time step, and
the drag FD and lift FL are integrated over the surface of the airfoil, i.e.,
ð
FD ¼ ðr  nÞ  ex dS (6)

and
ð FIG. 2. Mesh of the computational domain. To better capture the influence of the
FL ¼ ðr  nÞ  ey dS; (7) actuations on the flow simulation, the mesh is refined near the airfoil, jets, and
wake flow.

Phys. Fluids 34, 033606 (2022); doi: 10.1063/5.0080922 34, 033606-3


Published under an exclusive license by AIP Publishing
Physics of Fluids ARTICLE scitation.org/journal/phf

Qi Ujet; i  ljet system collects the velocity or pressure data from the probes, and in
Qi ¼ ¼ ; i ¼ 1; 2; 3; (11)
Qref UD order to obtain more key information for AFC the probes are densely
set near the airfoil and its tail, see Fig. 4. The sampled data/signal
where Qref is the reference mass flow rate with respect to the reference forms an array as the state sent to the actor net and critic net, see the
length D, and ljet is the width of the jets, which is only 1.8% of the arrow path from left to right in Fig. 3. The actor net and critic net are
chord length D. The total flow rate of the jets should be limited so that two major components of the PPO algorithm. The actor net is a two-
the aerodynamic force is not changed by the momentum of the jets. layer fully connected network with 512 neurons in each layer with the
The
P3 summation of Q is restricted to not no more than 7.2%, i.e., state as its input; after parameterizing the decision policy distribution,

i¼1 jQi j < 0:072. it outputs a certain action, which is the jet velocity and will be applied
All the information required for the reinforcement learning is to the simulation environment (the arrow path from actor net to the
extracted from the numerical simulation and transferred to the agent. environment). The critic net has the same structure to the actor net
For the proof-of-feasibility demonstration and to lower the computa- but has different input and output, where the input is a combination
tional resources, the mesh density is kept as low as possible, but it can ^ t computed
of (st , at , rt ) and the output is the advantage estimator A
be easily increased later. by state value function V ðsÞ; 42,43
given by

C. Framework and DRL algorithm ^ t ¼ dt þ ðckÞdtþ1 þ    þ ðckÞTt1 dT1 ;


A (12)
As stated previously, we intend to use DRL as a closed-loop con- where dt ¼ rt þ cV ðstþ1 Þ  V ðst Þ, and the discount c and GAE (gen-
trol method to seek an appropriate strategy for controlling the flow eralized advantage estimation) parameter k are both hyperparameters.
over the airfoil. Generally, such a kind of problem is always set as a The above data transfer forms a closed loop and after enough cycle
framework that consists of two parts: environment and agent. For the steps, the first stage of interaction is accomplished. Then, DRL pro-
current problem, the environment is the numerical simulation of the ceeds to the second stage, in which the agent calculates the loss func-
flow over NACA0012 under specific inlet conditions and the agent— tion as follows:
which is designed based on the proximal policy optimization (PPO) h i
LtCLIPþVFþS ðhÞ ¼ E ^ t LCLIP ðhÞ  c1 LVF ðhÞ þ c2 S½ph ðst Þ : (13)
algorithm—interacts with the environment through three chan- t t
nels: (1) the state st , an array of probe measurements of pressure
The loss function consists of three subfunctions: the clipped surrogate
and velocity from simulation; (2) the action at , the injection veloc-   
ity of the jets, given by the learning agent and delivered into the objective LCLIP ðhÞ ¼ E^ t min rt ðhÞA
^ t ; clip rt ðhÞ; 1  e; 1 þ eÞA
^ t Þ,
t
flow environment of the numerical simulation; and (3) the reward which is designed as a constraint [utilizing the clipping function
rt , which combines the aim of drag reduction and lift oscillation clipðÞ] to prevent excessively large policy update; the squared-error
 2
suppression, as well as saving the energy consumed by synthetic ðhÞ ¼ Vh ðst Þ  Vttarg , where Vh ðst Þ is the value derived
loss LVF
t
jets. A simplified overview of the framework used in the present
targ
study is schematically depicted in Fig. 3. from critic net and Vt represents the target value at time t; and the
As stated in Fig. 3, a whole learning episode can be separated into entropy bonus S½ph ðst Þ, where ph refers to the current policy with net-
two stages: interaction and evolution. In the interaction stage, the work weights h. c1 , c2 , and e are hyperparameters, see Table I for the

FIG. 3. Sketch of the DRL framework uti-


lized and learning process for one episode.
Generally, each episode consists of two
stages (the dashed boxes): The first stage
comprises of the interaction between the
agent and environment sustaining enough
steps, which is a hyperparameter of the
algorithm, and the data exchange in a
closed loop; in the second stage, the agent
updates the network parameters based on
the experience collected during interaction,
and resets the environment for the next
episode.

Phys. Fluids 34, 033606 (2022); doi: 10.1063/5.0080922 34, 033606-4


Published under an exclusive license by AIP Publishing
Physics of Fluids ARTICLE scitation.org/journal/phf

FIG. 4. Location of the sampled probes, which are indicated by black dots.

hyperparameter setting. Then, the Adam optimizer is applied to current work) to choose some random action. Although this will cause
update the network parameters as follows: an oscillation near the optimal solution, it is generally conducive to the
convergence of the entire learning process. Additionally, to prevent
ht ht1  ag ðht1 Þ; (14) the learned agent from being too dumb that it can only solve a certain
where a is the learning rate. g ðÞ represents complex functions with flow field condition, at the end of each episode there is a 20% probabil-
respect to h and is omitted since it is not the focus of this study; for ity that the entire environment will be completely reset to the initial
more details, see the related work by Kingma and Ba.44 One term condition.
needing special attention is the learning rate a, because of its huge A valid and efficient agent should be able to achieve certain goals
impact on the results; therefore, it is nearly the most important hyper- through specified reward functions. In the present work, there are two
parameter in any DRL algorithm. A study on the selection of an kinds of reward functions/policies: drag reduction and less lift fluctua-
appropriate learning rate is presented in Sec. III C. After the network tion, which can be expressed as
update step, the environment is reset for the next episode. For more
details on the DRL algorithm, the reader is referred to Ref. 37. r1 ¼ ðhCD i  m1 Þ  0:2hjCL  m2 ji; (16)
Furthermore, like Rabault et al.,33 after an action is chosen by the which on considering the energy saving condition can be expressed as
agent, the control value (jet velocity) is determined by an exponential * +
decay law as follows: X 3
r2 ¼ ðhCD i  m1 Þ  0:2hjCL  m2 ji  jW i j ; (17)
vtþ1 ¼ vt þ bða  vt Þ; (15) i¼1

where vt is the jet velocity at the previous time step, vtþ1 is the current where hi indicates the average over an action step and m1 and m2 are
jet velocity, a is the action provided by the agent, and b is a coefficient both constants used to centralize the drag coefficient and lift coeffi-
used to adjust the smoothness of the jet velocity variation. Some previ- cient for quicker convergence, with values 0.286 and 0.885 16, respec-
ous related works prove that the value of a has little effect on the per- tively. In Eq. (16), the first term aims to improve drag reduction at the
formance of the flow control;45 therefore, we set b ¼ 0:1. In order to expense of the drag coefficient, while the second term aims to lower
find a global optimal solution, the PPO algorithm has an explore the lift oscillation. By contrast, Eq. (17) adds an additional term, which
mechanism; that is, even though the agent has a relatively well- aims to save the energy consumed by synthetic jets, which is calculated
developed policy, the algorithm still has a certain chance (20% in as follows:
 
1 Qi 2
TABLE I. PPO hyperparameters used in the present work.
Wi ¼ Qi Dt; (18)
2 qljet

Hyperparameter Value where Dt is the time step. For simplification, it is reasonable to assume
that injecting or sucking the same quality of fluid costs the same
Discount (c) 0.99 amount of work.
GAE parameter (k) 0.97 The coefficients of the terms in the reward function are deter-
VF coefficient (c1 ) 1 mined following two considerations. First, from the perspective of the
Entropy coefficient (c2 ) 0.01 training process, the reward function is designed to guide the environ-
Clipping ratio (e) 0.2 ment modeling/understanding by the neural networks, and to prompt
the agent to effectively establish mapping between the states and the

Phys. Fluids 34, 033606 (2022); doi: 10.1063/5.0080922 34, 033606-5


Published under an exclusive license by AIP Publishing
Physics of Fluids ARTICLE scitation.org/journal/phf

optimal actions. The multiple optimization objectives need to be con- acquired a proper AFC strategy to collaboratively drive the execution
verted to the terms in the reward function, and it is important to equil- of three jets to achieve the goal described by Eq. (16).
ibrate the magnitude of each term to take every optimization objective The attention should be focused on the converging phase, that is,
into account. Second, according to our experience, a negative reward the evolution of the field from the nondimensional time 200 to 290
for mistaken actions and an adversarial positive reward for favorable (portion colored in light gray background in Fig. 5), which reveals the
actions will contribute to the convergence of the DRL algorithm. For mechanism of how AFC works, see Fig. 6. In Fig. 6, the flow fields
the drag, lift, and jet flux value in the cases of this article, suitable coef- near the airfoil is presented in order to indicate the stabilization pro-
ficients are well designed for each term as needed. cess. It is obvious that vortex shedding is eliminated gradually, and the
wake eventually reaches an almost steady state in nondimensional
III. RESULTS AND DISCUSSION time 250. To show the effect of AFC on flow field in more detail, a
The feasibility of the DRL based flow control method is first veri- comparison of the pressure and velocity fields of the controlled and
fied by comparing it with the baseline (the uncontrolled situation) to baseline cases is displayed in Fig. 7, where the subfigures are plotted
with isolines at four critical moments. By taking snapshots in the left
demonstrate its effectiveness. Then, the control algorithm is further
and right column together, it can be found that when AFC is applied,
extended to several more complex situations. In addition, the optimi-
the pressure drop along the flow direction is significantly weakened
zation of the training process of the DRL agent is also studied in this
near the upper region of the airfoil. In other words, the agent homoge-
section.
nizes the pressure distribution by the AFC with control strategy
according to the trained ANN (artificial neural network).
A. Effectiveness of DRL based active flow control Furthermore, the pressure distribution in the wake and the bottom
The effectiveness of the DRL method is now showcased by study- region of the airfoil are also changed, which leads to a drag reduction
ing different flow regimes from simple to complex. It is necessary to and lift enhancement. Therefore, the DRL agent is capable of searching
declare that the time displayed in all cases has already been nondimen- for a proper strategy that guides controlled micro-momentum injec-
sionalized, i.e., t ¼ t  =Dtaction , where Dtaction is the time duration of tion or suction to internally change the flow structure around the air-
foil according to the goal of the flow control.
one action. First, a fundamental case applying a constant parabolic
From the perspective of the control algorithm, the exploration
inlet condition as Eqs. (2) and (4) and reward function, Eq. (16), is
and improvement process of the control strategy are indicated by the
analyzed for verifying the feasibility of our method, and the result is
profiles of the rewards and averaged drag coefficient, illustrated in
shown in Fig. 5. In the figure, the left part with a deeper gray back-
Fig. 8. In this study, up to 10 000 calculation steps are set in one train-
ground represents flow initialization without control where the flow
ing episode, and the network parameters are updated every 20 episodes.
reaches a periodically quasi-steady condition and the drag coefficient Without doubt, more training episodes lead to a better performance.
CD varies in period of nondimensional time about 20. From t  ¼ 200, However, considering the trade-off between the agent performance
the AFC starts, and after an obvious converging phase lasing Dt  ¼ 90 and the time and computational cost, it is not worth spending too
(drawn with light gray background), the synthetic jets significantly much for a tiny enhancement. Therefore, 500 episodes are arranged for
change the flow structure, where a 27.0% drag reduction and a 27.7% training in the current work. Due to the presence of exploration noise,
lift enhancement are achieved. In other words, the DRL agent has both curves fluctuate as expected. For better visualizing the training
process, the thicker line in Fig. 8(a) is the outcome obtained by
applying the Savitzky–Golay filter on step-resolved rewards data, and
the thick line in Fig. 8(b) is the averaged drag coefficient in each episode.
After the entire training process, the ANN is able to form a complete
strategy to instruct the agent to perform effective AFC.

B. Investigation on more complex flow regimes


In consideration of far more complicated flow regimes in practice
and the additional requirement of saving energy, our investigation is
extended to more complex flow conditions and control requirements
in this subsection. Energy saving is ensured by the third term in Eq.
(17). For the complexity of the flow condition, we introduce three pat-
terns of pulsating inlet velocity—see Eq. (4) for the mathematical
expression and Fig. 9 for a visualization of the velocity profiles. For
indicating the robustness of the DRL algorithm, the additional energy
saving term in the reward function and different patterns of the puls-
ing inlet velocity is applied simultaneously. Figure 10 shows the evolu-
tion of the drag and lift coefficients of the baseline case, the controlled
case with and without the energy saving term, where the inlet velocity
FIG. 5. Time-resolved value of the drag coefficient CD , lift coefficient CL , and mass
flow rates of three jets in the case without (baseline curve) and with (controlled of form I is applied. Similar to the study in Fig. 5, the uncontrolled
curve) active flow control. For the controlled case, the AFC starts from a nondimen- phase and the phase of AFC initialization are distinguished with dark
sional time of 200. and light gray backgrounds. It is obvious in Fig. 10 that the agent

Phys. Fluids 34, 033606 (2022); doi: 10.1063/5.0080922 34, 033606-6


Published under an exclusive license by AIP Publishing
Physics of Fluids ARTICLE scitation.org/journal/phf

FIG. 6. Evolution of the drag coefficient, lift coefficient, jets’ mass rates, and the corresponding velocity fields.

considering energy saving has the ability to reduce the energy con- The investigation is performed by further increasing the com-
sumption of synthetic jets by 83%, with only a minor extension of con- plexity of the flow condition. Figure 12 shows the evolution of the
trol phase. The difficulty in convergence is believed to have drag and lift coefficients of the controlled and baseline cases with
contributed to the increased nonlinearity of the reward function, that three inlet velocities shown in Fig. 9, where the reward function
is, the energy saving term which is cubic. For improving the conver- accounts for energy saving, and all initial processes are omitted in
gence performance, we further use the concept of “mass flow rate sav- the illustration for brevity. For the purpose of showing the robust-
ing” to replace the concept of “energy saving.” As a result, the reward ness of the control strategy, each profile contains three complete
function becomes periods of the flow pulsation. A remarkable drag reduction of
28%, 51%, 35% is obtained when the inlet waves are of forms I,
* +
0
X3 II, and III, respectively; furthermore, the drag profiles of the con-
r2 ¼ ðhCD i  m1 Þ  0:2hjCL  m2 ji  0:2 jQ i j ; (19) trolled case are significantly smoother than the baseline. The lift
i¼1 enhancement resulting from AFC is discussed in detail in the fol-
lowing spectrum analysis.
where the last term aims to penalize the mass flow rate. It is under- To further analyze the transition of the flow field, the fast Fourier
standable to make such modifications because of the positive correla- transformation (FFT) is applied for investigating the frequency of the
tion between the mass flow rate and the energy. The result comparing time series of the drag and lift coefficients with and without AFC,
the case applying the reward function with and without mass flow rate where 30 000 numerical data in the case of inlet forms I and II and
shows a nearly identical transformation of the drag and lift profiles, as 60 000 numerical data in the case of inlet form III (sample number is
shown in Fig. 11. Although the new reward function yields a slightly adopted for covering three full cycles) are used. For easier analysis, the
worse energy saving performance, which is 59%, the convergence drag and lift coefficients are subtracted by their mean value before FFT;
improvement suggests that the above mass-saving reward function is thus, the purely oscillatory properties of the coefficients are revealed.
preferred when we are dealing with problems experiencing conver- Figure 13 shows the frequency-domain analysis of the drag and lift
gence issues. coefficients. In the case of form I, the oscillation of the drag coefficient
According to the result, it can be proved that the DRL algorithm is obviously weakened; in the case of form II, a very strong oscillation
is capable of searching the optimal control strategy in a wide range of mode is observed, which has the same frequency as the inlet velocity; in
flow and control conditions, i.e., its robustness has been validated. On the case of form III, a more different primary frequency composition is
the other hand, the result provides strong confidence in using the presented, where the oscillation is suppressed for most frequencies and
reward function of Eq. (19) in the following investigation. only the St of 0.7 is enhanced. For the lift coefficient, the situation is

Phys. Fluids 34, 033606 (2022); doi: 10.1063/5.0080922 34, 033606-7


Published under an exclusive license by AIP Publishing
Physics of Fluids ARTICLE scitation.org/journal/phf

FIG. 7. Comparison of the pressure


[left, (a)–(d)] and velocity [right, (e)–
(h)] without (top part of each double
panel) and with (bottom part of each
double panel) active control.

less complex than the drag, where only the St of 0.3 of form III shows method can still can seek out an appropriate control strategy to stabi-
enhancement and the rest of the lift oscillation is weakened. lize the flow field. To summarize, the above study highlights the
The results indicate that even for more complex flow regimes strength of AFC in controlling the flow and shows its significant
caused by highly disturbed incoming flows (10 Hz as form II), the potential in practical applications.

Phys. Fluids 34, 033606 (2022); doi: 10.1063/5.0080922 34, 033606-8


Published under an exclusive license by AIP Publishing
Physics of Fluids ARTICLE scitation.org/journal/phf

FIG. 8. Illustration of the learning/training process: profiles of the rewards (a) and drag coefficient (b) obtained at each training episode. The profiles show fluctuation due to
the occasional exploration of the agent. The thicker line in figure (a) is the outcome obtained by applying the Savitzky–Golay filter on step-resolved rewards data, and the thick
line in figure (b) is the averaged drag coefficient in one episode.

C. Optimization of the agent training exist a possibility that the agent misses the optimal policy—which usu-
In this subsection, we study and display the process of finding ally corresponds to an oversized learning rate—or the agent traps into
the most appropriate learning rate for obtaining the optimized agent some local optimum and cannot get rid of it—which corresponds to
from a machine learning perspective. As regards the DRL method, the an undersized learning rate. Under such circumstances, it seems that a
learning rate represents the step size of the parameter update of the proper learning rate should be selected according to the loss function;
neural network, which contains the control policy, and is one of however, in many problems the value plane of the loss function
the most important hyperparameters in the optimizer. For some forms appears in an implicit form. Therefore, there is no clear rule for prop-
of the loss function, if an improper learning rate is applied, there does erly choosing learning rates, and trail-and-error or experience is

FIG. 9. Visualization of the spatiotemporal profile at the inlet boundary. The complexity of the inlet velocity increases gradually from form I to form III. For a mathematical
expression of the inlet velocity, see Eq. (4).

Phys. Fluids 34, 033606 (2022); doi: 10.1063/5.0080922 34, 033606-9


Published under an exclusive license by AIP Publishing
Physics of Fluids ARTICLE scitation.org/journal/phf

FIG. 10. (a) Evolution of the drag and lift coefficients using different reward functions [Eqs. (16) and (17)] with pulsating inlet boundary condition, form I. (b) Energy expenditure
by applying the reward function with and without accounting for energy saving.

usually needed. In the following study, the optimizer used in our PPO while Fig. 14 shows a larger difference—the case with a learning
agent is Adam,44 which is one of the best algorithms based on the gra- rate of a ¼ 1  103 looks overset and pushes the agent to a bad
dient descent method—that is, from any start location on the value strategy in the early stage, although an acceptable control policy is
plane, the optimizer effectively moves downhill by iterations and found in the end, such a learning rate is undoubtedly unstable.
stands still at some local or global optimal point. Three learning Table II shows the performance of the agent trained with different
rates—a ¼ 1  103 , 1  104 , and 1  105 —are investigated, and reward functions and learning rates. For cases using the reward
a comparison is made for both two reward functions with inlet velocity function without energy saving, namely, Eq. (16), the agent
form I. trained with a ¼ 1  104 gives the greatest sacrifice of average
The performance evaluation mainly comprises of the reward lift but gets highest drag reduction in return, while the agents
value, which reveals the convergence of the training process, see using the other two learning rates seem to have found balance
Fig. 14 and Table II. In Fig. 14, for different learning rates a simi- strategies between the drag and lift. For the reward function with
lar trend and result can be observed, except the case with a learn- energy saving, the agent using a ¼ 1  105 achieves the best
ing rate of a ¼ 1  104 which has dozens of episodes lagged; targeted minimum mass flow rate; meanwhile, the agent has

FIG. 11. (a) Evolution of the drag and lift coefficients using different reward functions [Eqs. (16) and (19)] with pulsating inlet boundary condition, form I. (b) Energy expenditure
by applying the reward function with and without the mass saving requirement.

Phys. Fluids 34, 033606 (2022); doi: 10.1063/5.0080922 34, 033606-10


Published under an exclusive license by AIP Publishing
Physics of Fluids ARTICLE scitation.org/journal/phf

FIG. 12. Evolution of the drag and lift coefficients of the controlled and baseline case with three inlet velocities shown in Fig. 9, where the reward function considers the energy
saving term. The drag and lift coefficients are drawn in red and blue curves, respectively, on the left, and the corresponding average value of the last pseudo-period is shown
on the right.

comparable performance to the agent trained using the reward This state-of-the-art method can utilize artificial neural networks to
function without energy saving. This subsection is designed to resolve the strong nonlinear correlation between active actuation and
focus on the learning rate to emphasize the influence of the hyper- flow condition and find an optimal strategy to deal with complex flow
parameters of the algorithm. patterns. To the best of our knowledge, this may be the first study
applying such a policy-based and model-free closed-loop method on
active flow control on an airfoil model.
IV. CONCLUSION Some constraints are applied for avoiding unrealistic situations:
In this study, based on the CFD simulation, DRL is applied (1) the limitation of the mass flow rate of synergetic jets, which ensures
to AFC of flow over an airfoil under weak turbulent conditions. that the control effect is derived from the reconstruction of the flow

Phys. Fluids 34, 033606 (2022); doi: 10.1063/5.0080922 34, 033606-11


Published under an exclusive license by AIP Publishing
Physics of Fluids ARTICLE scitation.org/journal/phf

FIG. 13. FFT analysis of the drag coefficient (red line) and lift coefficient (blue line). The top, middle, and bottom rows show the spectrum of the incoming flow, drag and lift
coefficient, respectively.

structure rather than some sort of sucking/bellowing phenomenon; (2) and lift stabilization are obtained. Then, the influence of learning rates
the smoothing law between two successive actions, which is defined is investigated, where a study is performed for both two rewards func-
for avoiding nonphysical jumps of the actuators. Furthermore, the sys- tions with the inlet velocity form I. The result shows significant effect
tem sets random initialization to prevent fixed initial states, in order to of the learning rate on the performance of the trained agent, and the
guarantee the generalization of the control strategy. In addition, with drag reduction and lift stabilization/enhancement even show some
an increase in the Reynolds number, the flow field will become more trade-off effect as the learning rate changes.
disordered and turbulent; as a result it also will be more difficult to This work proves the effectiveness of the DRL based AFC in the
find an optimal control strategy, therefore it is motivated to further cases with application-oriented geometry configuration and complex
investigate problems with more complex conditions in the future. flow conditions. Although the computation cost increases inevitably
The complexity of the environments used for training increases by increasing the problem complexity, the time required for finding an
gradually, by changing the reward function and the inlet velocity pro- optimized control strategy is still less considering the high nonlinearity
file. The first training uses constant inlet velocity, and the trained DRL of the studied problems. The surprising performance of the DRL based
agent is able to actively control the flow to reduce drag by around AFC on the drag reduction of the airfoil encourages its use for saving
27.0%, as well as stabilizing the pseudo-periodic fluctuating lift and the energy of airplanes and its application in complex flow control
enhancing average lift by 27.7%. Then, three pulsating inlet velocity problems—such as flows with more instabilities in boundary layers,
profiles are applied; and after training, the DRL agent still has the abil- stronger turbulence, and flow over objects with more complex
ity to find a proper control strategy, where significant drag reduction geometries.

Phys. Fluids 34, 033606 (2022); doi: 10.1063/5.0080922 34, 033606-12


Published under an exclusive license by AIP Publishing
Physics of Fluids ARTICLE scitation.org/journal/phf

FIG. 14. Reward of the training process (a) without considering the energy saving term and (b) with considering the energy saving term at different learning rates of
a ¼ 1  103 , 1  104 , and 1  105 .

TABLE II. Performance of the agent trained with different learning rates. Boldface AUTHOR DECLARATIONS
denotes the values of the best performance for each reward function.
Conflict of Interest
Case setting The authors have no conflicts to disclose.

Learning CD CL Q DATA AVAILABILITY


Reward rate (a) (averaged) (averaged) (averaged) The data that support the findings of this study are available
from the corresponding author upon reasonable request.
1  103 0.235 1.318 /
;27.5% :31.4%
Eq. (16) 1  104 0.218 1.155 / APPENDIX: NUMERICAL SCHEMES AND MESH
;32.7% :15.1% INDEPENDENCE STUDY
1  105 0.228 1.281 /
The transient incompressible flow solver based on
;29.5% :27.7% OpenFOAM was used and the PIMPLE algorithm was adopted for
3
1  10 0.226 1.272 0.013 velocity–pressure coupling. The spatial and temporal discretization
;30.0% :26.8% schemes are shown in Table III. Considering the numerical simula-
Eq. (19) 1  104 0.214 1.115 0.016 tion consumption and accuracy, a grid resolution study is carried
;33.8% :11.2%
1  105 0.226 1.276 0.010 TABLE III. Spatial and temporal discretization schemes used in numerical
;30.0% :27.2% simulation.
Baseline 0.324 1.003 /
Spatial and temporal discretization schemes in OpenFOAM

ACKNOWLEDGMENTS timeSchemes Backward

This work is supported by the National Key Laboratory of Science GradSchemes Gauss linear
and Technology on Helicopter Transmission (Nanjing University of divSchemes div(phi, U) Gauss linear
Aeronautics and Astronautics, Grant No. HTL-O-20G06), Natural Upwind grad(U)
Science Foundation of China (No. 11802135), the Fundamental div((nuEff dev2(T(grad(U))))) Gauss linear
Research Funds for the Central Universities (No. 30919011401), and snGradSchemes Corrected
the Postgraduate Research and Practice Innovation Program of Jiangsu laplacianSchemes Gauss linear corrected
Province under Grant No. KYCX21_0262.

Phys. Fluids 34, 033606 (2022); doi: 10.1063/5.0080922 34, 033606-13


Published under an exclusive license by AIP Publishing
Physics of Fluids ARTICLE scitation.org/journal/phf

FIG. 15. Comparison of the time-averaged velocity profiles obtained by different grid sizes. Grids 1, 2, 3, and 4 correspond to mesh densities 12 000, 18 000, 27 000, and
41 000, respectively. The velocity data are sampled from the positions of X/D ¼ −0.25, 0, 0.25, 0.5, 0.75, 0.95, 1.5, and 2.

FIG. 16. Time-resolved value of the (a) drag coefficient CD and (b) lift coefficient CL obtained from the case with different upstream inlet locations, i.e., 0.5D, 1D, and 1.5D,
which are the distance between the inlet and leading edge of the airfoil.

out. All velocity profiles simulated by four grid sizes are close to 3
J. Zhang, K. Xu, Y. Yang, R. Yan, P. Patel, and G. Zha, “Aircraft control surfa-
each other at different locations, as shown in Fig. 15. Grids 1, 2, 3, ces using co-flow jet active flow control airfoil,” in 2018 Applied Aerodynamics
Conference (AIAA, 2018), p. 3067.
and 4 refer to mesh densities of about 12 000, 18 000, 27 000, and 4
Z. Liu and G. C. Zha, “Transonic airfoil performance enhancement using co-
41 000, respectively. flow jet active flow control,” in 8th AIAA Flow Control Conference (AIAA,
The effect of the upstream location of the inflow boundary is 2016), p. 3472.
investigated as well. Based on the same computational grid density,
5
T. Research and M. Engineering, “Active flow control over a NACA 0015
Airfoil using a ZNMF jet,” in 15th Australasian Fluid Mechanics Conference,
cases with different upstream inlet locations (0.5D, 1D, and 1.5D 2004.
which are the distance between the inlet and leading edge of the air- 6
M. Gul, O. Uzol, and I. S. Akmandor, “An experimental study on active flow
foil) are simulated and the evolution of the drag and lift coefficients control using synthetic jet actuators over S809 airfoil,” J. Phys.: Conf. Ser.
of the airfoil is compared. The time-resolved values of CD and CL 524(1), 012101 (2014).
are plotted in Fig. 16. It can be observed that the curves fluctuate in
7
J. J. Wang, K. S. Choi, L. H. Feng, T. N. Jukes, and R. D. Whalley, “Recent develop-
ments in DBD plasma flow control,” Prog. Aerosp. Sci. 62, 52–78 (2013).
a similar pattern except for an anticipative phase difference, i.e., the 8
T. C. Corke, P. O. Bowles, C. He, and E. H. Matlis, “Sensing and control of
flow characteristics are not altered by the upstream inlet location. flow separation using plasma actuators,” Proc. R. Soc. London, Ser. A
369(1940), 1459–1475 (2011).
9
V. Manoj Kumar and C. C. Wang, “Active flow control of flapping airfoil using
REFERENCES
OpenFOAM,” J. Mech. 36(3), 361–372 (2020).
1
J. F. Donovan, L. D. Kral, and A. W. Gary, “Active flow control applied to an air- 10
H. Tang, J. Rabault, A. Kuhnle, Y. Wang, and T. Wang, “Robust active flow
foil,” in 36th AIAA Aerospace Sciences Meeting and Exhibit (AIAA, 1998), p. 210. control over a range of Reynolds numbers using an artificial neural network
2
K. Ren, Y. Chen, C. Gao, and W. Zhang, “Adaptive control of transonic buffet trained through deep reinforcement learning,” Phys. Fluids 32(5), 053605
flows over an airfoil,” Phys. Fluids 32(9), 096106 (2020). (2020).

Phys. Fluids 34, 033606 (2022); doi: 10.1063/5.0080922 34, 033606-14


Published under an exclusive license by AIP Publishing
Physics of Fluids ARTICLE scitation.org/journal/phf

11
M. Tadjfar and E. Asgari, “Active flow control of dynamic stall by means of 29
J. Viquerat, J. Rabault, A. Kuhnle, H. Ghraieb, A. Larcher, and E. Hachem,
continuous jet flow at Reynolds number of 1  106,” ASME J. Fluids Eng. “Direct shape optimization through deep reinforcement learning,” J. Comput.
140(1), 011107 (2018). Phys. 428, 110080 (2021).
12
F. Ren, H. Hu, and H. Tang, “Active flow control using machine learning: A 30
X. He, J. Li, C. A. Mader, A. Yildirim, and J. R. R. A. Martins, “Robust aerody-
brief review,” J. Hydrodyn. 32(2), 247–253 (2020). namic shape optimization—From a circle to an airfoil,” Aerosp. Sci. Technol.
13
X. Wu, X. Peng, W. Chen, and W. Zhang, “A developed surrogate-based opti- 87, 48–61 (2019).
mization framework combining HDMR-based modeling technique and TLBO 31
S. Shimomura, S. Sekimoto, A. Oyama, K. Fujii, and H. Nishida, “Experimental
algorithm for high-dimensional engineering problems,” Struct. Multidiscip. study on application of distributed deep reinforcement learning to closed-loop
Optim. 60(2), 663–680 (2019). flow separation control over an airfoil,” in AIAA Scitech 2020 Forum (AIAA,
14
W. Chen, X. Li, and W. Zhang, “Suppression of vortex-induced vibration of a 2020), p. 0579.
circular cylinder at subcritical Reynolds numbers using shape optimization,” 32
D. Fan, L. Yang, Z. Wang, M. S. Triantafyllou, and G. Em, “Reinforcement
Struct. Multidiscip. Optim. 60(6), 2281–2293 (2019). learning for bluff body active flow control in experiments and simulations,”
15
N. Gautier, J. L. Aider, T. Duriez, B. R. Noack, M. Segond, and M. Abel, Proc. Natl. Acad. Sci. 117, 26091–26098 (2020).
“Closed-loop separation control using machine learning,” J. Fluid Mech. 770, 33
J. Rabault, M. Kuchta, A. Jensen, U. Reglade, and N. Cerardi, “Artificial neural
442–457 (2015). networks trained through deep reinforcement learning discover control strate-
16
F. Ren, C. Wang, and H. Tang, “Active control of vortex-induced vibration of a gies for active flow control,” J. Fluid Mech. 865, 281–302 (2019).
circular cylinder using machine learning,” Phys. Fluids 31(9), 093601 (2019). 34
D. Gao, H. Meng, Y. Huang, G. Chen, and W.-L. Chen, “Active flow control of
17
Y. F. Mei, C. Zheng, N. Aubry, M. G. Li, W. T. Wu, and X. Liu, “Active control the dynamic wake behind a square cylinder using combined jets at the front
for enhancing vortex induced vibration of a circular cylinder based on deep and rear stagnation points,” Phys. Fluids 33(4), 047101 (2021).
reinforcement learning,” Phys. Fluids 33(10), 103604 (2021). 35
H. Ghraieb, J. Viquerat, A. Larcher, P. Meliga, and E. Hachem, “Single-step
18
H. Zhu, T. Tang, H. Zhao, and Y. Gao, “Control of vortex-induced vibration of deep reinforcement learning for open-loop control of laminar and turbulent
a circular cylinder using a pair of air jets at low Reynolds number,” Phys. Fluids flows,” Phys. Rev. Fluids 6(5), 053902 (2021).
31(4), 043603 (2019). 36
F. Ren, J. Rabault, and H. Tang, “Applying deep reinforcement learning to
19
S. A. Renganathan, R. Maulik, and V. Rao, “Machine learning for nonintrusive active flow control in weakly turbulent conditions,” Phys. Fluids 33(3), 037121
model order reduction of the parametric inviscid transonic flow past an airfoil,” (2021).
Phys. Fluids 32(4), 047110 (2020). 37
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal
20
J.-Z. Peng, S. Chen, N. Aubry, Z.-H. Chen, and W. T. Wu, “Time-variant pre- policy optimization algorithms,” arXiv:1707.06347 (2017).
diction of flow over an airfoil using deep neural network,” Phys. Fluids 32(12), 38
A. Kuhnle, M. Schaarschmidt, and K. Fricke, see https://github.com/
123602 (2020). tensorforce/tensorforce for “Tensorforce: A TensorFlow Library for applied
21
J.-Z. Peng, S. Chen, N. Aubry, Z. Chen, and W. T. Wu, “Unsteady reduced- reinforcement learning” (2017).
order model of flow over cylinders based on convolutional and deconvolutional 39
M. Abadi et al., “Tensorflow: A system for large-scale machine learning,” in
neural network structure,” Phys. Fluids 32(12), 123609 (2020). 12th USENIX Symposium on Operating Systems Design and Implementation
22
J.-Z. Peng, N. Aubry, S. Zhu, Z. Chen, and W.-T. Wu, “Geometry and bound- (OSDI 16) (USENIX Association, 2016), pp. 265–283.
ary condition adaptive data-driven model of fluid flow based on deep convolu- 40
P. B. S. Lissaman, “Low-Reynolds-number airfoils,” Annu Rev. Fluid Mech.
tional neural networks,” Phys. Fluids 33(12), 123602 (2021). 15(1), 223–239 (1983).
23
S. L. Brunton, B. R. Noack, and P. Koumoutsakos, “Machine learning for fluid 41
M. Sch€afer, S. Turek, F. Durst, E. Krause, and R. Rannacher, “Benchmark com-
mechanics,” Annu. Rev. Fluid Mech. 52, 477–508 (2020). putations of laminar flow around a cylinder,” in Flow Simulation With High-
24
S. Gu, E. Holly, T. Lillicrap, and S. Levine, “Deep reinforcement learning for Performance Computers II (Vieweg+ Teubner Verlag, 1996), pp. 547–566.
robotic manipulation with asynchronous off-policy updates,” in 2017 IEEE 42
J. Schulman, P. Moritz, S. Levine, M. I. Jordan, and P. Abbeel, “High-
International Conference on Robotics and Automation (IEEE, 2017), dimensional continuous control using generalized advantage estimation,” in
pp. 3389–3396. 4th International Conference on Learning Representations, ICLR
25
D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. 2016—Conference Track Proceedings, arXiv:1506.02438 (2016), pp. 1–14.
Hubert, L. Baker, M. Lai, A. Bolton, and Y. Chen, “Mastering the game of Go 43
V. Mnih et al., “Asynchronous methods for deep reinforcement learning,”
without human knowledge,” Nature 550(7676), 354 (2017). in International Conference on Machine Learning (PMLR, 2013), Vol. 48,
26
D. Silver et al., “Mastering the game of Go with deep neural networks and tree pp. 1928–1937.
search,” Nature 529(7587), 484–489 (2016). 44
D. P. Kingma and J. L. Ba, “Adam: A method for stochastic optimization,” in
27
P. Garnier, J. Viquerat, J. Rabault, A. Larcher, A. Kuhnle, and E. Hachem, “A 3rd International Conference on Learning Representations, ICLR
review on deep reinforcement learning for fluid mechanics,” Comput. Fluids 2015—Conference Track Proceedings, 2015.
225, 104973 (2021). 45
J. Rabault and A. Kuhnle, “Accelerating deep reinforcement learning strategies
28
R. Paris, S. Beneddine, and J. Dandois, “Robust flow control and optimal sensor of flow control through a multi-environment approach,” Phys. Fluids 31,
placement using deep reinforcement learning,” J. Fluid Mech. 913(2001), A25 (2021). 094105 (2019).

Phys. Fluids 34, 033606 (2022); doi: 10.1063/5.0080922 34, 033606-15


Published under an exclusive license by AIP Publishing

You might also like