You are on page 1of 15

AIAA JOURNAL

Vol. 57, No. 12, December 2019

Machine-Learning-Based Detection of Aerodynamic Disturbances


Using Surface Pressure Measurements

Wei Hou,∗ Darwin Darakananda,† and Jeff D. Eldredge‡


University of California, Los Angeles, Los Angeles, California 90095
DOI: 10.2514/1.J058486
Aerodynamic disturbances (due to gusts, maneuvers, or a combination) leave a signature in the pressures exerted
on the wing surface. In this work, the following question is explored: To what extent can the characteristics of these
disturbances be parsed from the measured pressures alone? A supervised learning algorithm based on several layers
of neural networks is applied. The overall machine-learning architecture is trained and tested on aerodynamic
disturbance data generated by an inviscid vortex method applied to a two-dimensional flat plate undergoing a smooth
pitchup maneuver. As a surrogate for an incident gust, the critical leading-edge suction parameter (LESP) is
perturbed, which in turn dynamically changes the flux of the vorticity from the leading edge. The results are used to
Downloaded by Indian Institute of Science on December 8, 2023 | http://arc.aiaa.org | DOI: 10.2514/1.J058486

train the algorithm to estimate the LESP and angle-of-attack histories from the surface pressure. Two different
approaches are used. In the first, which is a purely machine-learning strategy, a combination of convolutional and
recurrent neural networks that accept surface pressure measurements as input is used. The overall architecture is
shown to achieve an accurate estimation of the LESP and angle of attack. In the second approach, machine learning
is integrated with a dynamical systems framework to learn a dynamical model for the angle of attack and the LESP. It
is shown that this machine-learned system identification approach achieves somewhat higher accuracy as compared
to the purely machine-learning approach with fewer parameters. In both approaches, it is shown that overfitting is
mitigated by injecting random noise into the input pressures.

Nomenclature u~ = discrete forcing vector in machine-learned system


A~ = discrete system updating matrix in machine-learned identification algorithm
system identification algorithm w = parameters of the neural network
a1 = bias vector of the first recurrent neural network x = state vector of the dynamical system in machine-
layer learned system identification algorithm
a2 = bias of the second recurrent neural network layer xt = initial condition of the state vector in machine-
c = chord length of the flat plate, m learned system identification algorithm
J = loss function associated with the algorithm’s yT = time series of the target values for neural network
prediction error prediction
K = dimensionless pitch rate of the flat plate; α = angle of attack, rad
Δα∕2Δtp  α0 = angle of attack when starting the pitchup process,
N μ; Σ2  = normal random variable with mean μ and standard rad
deviation Σ γ = bound vortex strength of the flat plate, m∕s
pt = surface pressure coefficient at time step t ΔCp = coefficient of pressure difference between
p~ t = historical surface pressure coefficients of steps upper and lower surfaces of the flat plate;
before t 2p − p− ∕ρU2 
p , p− = pressure on the upper and lower surfaces of the flat Δtp = duration of pitchup of flat plate, s
plate, Pa Δα = change of angle of attack in the pitchup process, rad
S = peak variation of σ c Θ = weights of the second recurrent neural network layer
T = gust characteristic time, s θ = transformed coordinate along the chord
t = time, s Π = weights of the first recurrent neural network layer
t0 = time at which the flat plate starts to pitch up, s ρ = fluid density, kg∕m3
t1 = center time of the variation of the critical leading- σ = leading-edge suction parameter
edge suction parameter, s σc = critical leading-edge suction parameter
U = velocity of the flat plate, m∕s ϕ = operation of long short-term memory cell at the
Ua; b = uniformly distributed random variable between a second recurrent neural network layer after
and b multiplying the input with the weights and adding
the bias
ψ = long short-term memory operation of first recurrent
neural network layer and the outputs of the second
Presented as Paper 2019-1148 at the AIAA SciTech 2019 Forum, San recurrent neural network layer from the previous
Diego, CA, 7–11 January 2019; received 1 April 2019; revision received 1 time step
July 2019; accepted for publication 12 August 2019; published online 5
September 2019. Copyright © 2019 by the American Institute of Aeronautics Superscripts
and Astronautics, Inc. All rights reserved. All requests for copying and
permission to reprint should be submitted to CCC at www.copyright.com; t = discrete time step in machine-learned system
employ the eISSN 1533-385X to initiate your request. See also AIAA Rights identification algorithm
and Permissions www.aiaa.org/randp. 0
= ordinary derivative
*Department of Mechanical and Aerospace Engineering; weihou@g.ucla.
edu.

Department of Mechanical and Aerospace Engineering; darwindarak@
I. Introduction
gmail.com.

Department of Mechanical and Aerospace Engineering; jdeldre@ucla.
edu. Associate Fellow AIAA. I N THIS study, we investigate the potential to apply deep learning
to gust detection in unsteady aerodynamics. During the course of
5079
5080 HOU, DARAKANANDA, AND ELDREDGE

an airborne vehicle’s flight, there can be different forms of near stall at the leading edge is a significant source of vorticity for
disturbances that affect the vehicle’s aerodynamics. Some of these which the rate of release is disturbed by the gust’s incidence. This
disturbances are directly attributable to a variety of environmental influence is missing from linearized models of gust interactions, as
inhomogeneities, called “gusts,” that are incident upon the vehicle. noted recently in Ref. [19]. In fact, in some cases, the effect of the gust
But such direct disturbances inevitably give rise to disturbances in the is primarily delivered through the disturbed flux of the leading-edge
rigid-body motion of the vehicle. In a small, light flight vehicle, both vorticity and its subsequent interactions with the airfoil and with the
of these forms of disturbance may induce leading-edge stall in the vorticity released at the trailing edge [20]. Thus, we seek in this study
wings, necessitating some form of flow control to regulate the to learn the time-varying release of vorticity from the leading edge of
vehicle’s flight. But, the effectiveness of such control depends a flat plate from the measured surface pressures.
crucially on estimating the state of the flow from available sensors Ramesh et al. [21] showed, from high-fidelity simulations, that the
such as inertial measurements and surface-mounted pressure sensors. release of vorticity from the leading edge of an airfoil can be
All aerodynamic disturbances leave traces in the surface pressure. To attributed to a criterion placed on the “leading-edge suction
what extent can these surface signatures be used to detect the parameter” (a normalized value of the integrated surface pressure in
disturbances themselves and obtain information in the flowfield? the immediate vicinity of the airfoil nose), which is abbreviated as
Deep learning has shown great potential in various fields, such as LESP. If the LESP is within some critical bounds, then no vorticity is
image recognition [1] and natural language processing [2]. An released from the leading edge. However, if the LESP exceeds the
emerging interest in applying deep learning in traditional engineering bounds, then vorticity is released with the strength required to return
fields has manifested itself in attempts to integrate machine learning the LESP to the allowable range. This criterion is well suited for an
with dynamical systems and fluid mechanics. There are several inviscid vortex model of the aerodynamics because it can be used in
Downloaded by Indian Institute of Science on December 8, 2023 | http://arc.aiaa.org | DOI: 10.2514/1.J058486

notable examples of these integrated studies. Lee et al. [3] used neural lieu of the Kutta condition at the leading edge of a flat plate. Although
network to reduce the drag in turbulent flow. Milano and Ramesh et al. [21] proposed that the critical LESP is invariant (i.e.,
Koumoutsakos [4] used wall measurements to reconstruct the near- determined only by the airfoil geometry and Reynolds number),
wall flowfield in a turbulent channel flow. Brunton et al. [5] used a recent work by Darakananda et al. [20,22] has shown that it may vary
data-driven method to identify the governing equations of dynamical in time. In particular, these latter studies found, by assimilating
systems. Colvert et al. [6] used a neural network and local vorticity to surface pressures from high-fidelity simulations with an ensemble of
classify vortex wakes. Ling el al. [7] used deep learning algorithms to inviscid vortex models of a flat plate at a fixed angle of attack, that
improve a Reynolds-averaged Navier–Stokes turbulence model. their estimated value of the critical LESP varied in time when the flat
Tompson et al. [8] used a convolutional neural network to accelerate plate was subjected to transient gusts. More important, the vortex
the single time-step velocity update in an Euler solver. Otto and models contained no explicit representation of the gust but
Rowley [9] used a neural network to identify nonlinear dynamics reproduced the flow response with high accuracy through the use of
using a Koopman operator theory. Parish and Duraisamy [10] used the LESP criterion. Thus, in light of the observations made in the
field inversion and machine learning to find closure for physical previous paragraph, it is possible to represent the influence of a weak
models using a high-resolution simulation and experimental data. gust in a mathematical model by varying the critical LESP in lieu of
Singh et al. [11] used machine learning to aid the prediction of flow explicitly introducing the gust.
separation over airfoils. Wang et al. [12] used physics-informed In this work, we seek to estimate the value of the time-varying
machine learning to reconstruct the Reynolds stress in turbulent flow. LESP from surface pressure data. These pressure data are obtained
Gautier et al. [13] used integrated machine learning to give closed- from an inviscid vortex model in which the critical LESP has been
loop control of flow separation. Many more examples were described directly specified to vary in time, as shown schematically in Fig. 1. In
in the recent review by Brunton et al. [14]. However, there have been other words, we ask the following basic question: Can we learn the
few applications of advanced deep learning algorithms to the study of value of the LESP that has produced, through the aerodynamics, a
unsteady aerodynamics. certain measured surface response? This question may appear
Much of the challenge in applying machine learning to a given superficially simple to address because the LESP itself is due directly
problem lies in deciding what one seeks to learn from a given set of to pressure and, in fact (as we will discuss later in the paper), is
input data. In the current context, we generally seek to learn proportional to the lowest Fourier mode on the plate surface.
something about the transient disturbances to an airfoil from the time- However, it is important to remember that, in the high-amplitude
varying pressure distribution on the airfoil’s surface. Incident gusts cases of interest in this work, the flow response (and the surface
can take a wide variety of forms. However, ultimately, for flight pressure) is nonlinearly dependent on the disturbed critical LESP so
control purposes, we seek not the characteristics of the gust itself but that the surface pressures along the airfoil chord are not trivially
the response that it elicits in the aerodynamics of the airfoil. That related to the leading-edge suction. Indeed, because of this
response can be observed in the wake, and often (for large-amplitude nonlinearity, it is not clear a priori that a measured distribution of
gusts or for airfoils near stall) in the separation behavior near the surface pressure can be uniquely attributed to a single disturbance.
leading edge. Such responses have been demonstrated in several There is already some evidence that a unique relationship between
recent experimental and computational studies (e.g., Refs. [15–18]) pressure and disturbance can be extracted. As mentioned previously,
using various gust forcing modalities. Darakananda et al. [20,22] have already found, using an ensemble
The overall airfoil response to a gust is attributable to vorticity Kalman filter, that the critical LESP can be estimated from the
dynamics, to airfoil kinematics, and to their interactions. Although, measured surface pressures when the plate is at a stationary nominal
some of the vorticity may be the gust’s own because an airfoil at or angle of attack. However, that study assimilated the data into an

(t)
c
(t)

Fig. 1 Schematic of sensors (green circles) on a flat plate, shown here undergoing a representative transient change of angle of attack αt with a
simultaneous time-varying critical LESP σ c t. The inviscid vortex elements are depicted as blue (positive strength) and red (negative strength) circles,
with darker colors denoting stronger vortex elements.
HOU, DARAKANANDA, AND ELDREDGE 5081

ensemble of dynamical models for the flow. In other words, it used machine learning to identify (from surface pressures) an underlying
the surface pressures to augment an underrepresented description of linear dynamical model for the desired quantities.
the dynamics. The present study makes no use of such a dynamical
model for interpreting the surface pressure. Furthermore, it goes
further by investigating situations in which the plate is undergoing a II. Problem Formulation
simultaneous pitchup maneuver. Thus, the measured surface This study focuses on leading-edge disturbances and pitchup
pressures contain a combination of effects from plate motion and maneuvers applied to a two-dimensional infinitely thin flat plate of
from the leading-edge disturbance. Such effects are expected to enter chord length c translating impulsively from rest at velocity U. The
the pressure nonlinearly. We investigate in this work whether the leading-edge disturbances will be referred to throughout this paper as
LESP and the time-varying angle of attack (AOA) can be gusts for brevity. However, it should be understood that the incident
independently parsed from the measurements. gust itself is not explicitly introduced in the generation of data; rather,
In Sec. II, we present some sample surface pressures for various the gust’s effect is introduced by disturbing the critical LESP, as
disturbances and maneuvers, and we describe the general discussed in the Introduction (Sec. I). These different disturbances
characteristics of the inviscid vortex model used for data generation. leave different signatures on the surface pressure of the plate, as
In Sec. III, we introduce the machine-learning algorithms used in this shown with a few examples in Fig. 2. This figure depicts the
work. The approach we take relies fundamentally on neural coefficient of pressure difference between the upper and lower
networks. For example, the initial layers of the algorithm apply surfaces of the plate [ΔCp  2p − p− ∕ρU2 ] generated by the
convolutional neural networks (CNNs), which are known for their inviscid vortex model described in the following. Such time-varying
utility in image recognition [1], to detect features in the two- data, sampled uniformly in time, constitute the input in our various
Downloaded by Indian Institute of Science on December 8, 2023 | http://arc.aiaa.org | DOI: 10.2514/1.J058486

dimensional spatiotemporal surface pressure data. But, we also seek machine-learning strategies. The vertical axis of each plot represents
to capture causal behaviors in the data; for this, we make use of the chordwise distribution, with the leading edge at the top and the
recurrent neural networks (RNNs): specifically in the form known as trailing edge at the bottom. The horizontal axis depicts the time
long short-term memory (LSTM) [2]. From these ingredients, we history in convective time units of t  tU∕c. Dark (blue) colors
introduce the neural network structure designed for detecting the indicate more negative pressures.
LESP, and we describe the generation of training and testing data The surface pressure data are generated with an inviscid vortex
from a variety of cases with leading-edge disturbances and pitchup blob method. This method, which is essentially the same as described
maneuvers. Subsequently, in Sec. IV, we present the estimation by Ramesh et al. [21], uses point vortex elements for which the Biot–
results of our trained algorithm and compare different methods for Savart interactions with each other are regularized through the use of
preventing overfitting. We also directly apply the algorithm with a modified velocity kernel for which the blob radius is set to 0.01c.
some minor modification to the task of estimating the angle of attack The method is applied to the flow about a translating flat plate
from the same surface pressure data. By comparing the internal undergoing a simultaneous change in angle of attack. The Kutta
values associated with the inner structure of the algorithms, we condition is applied at the trailing edge, whereas the LESP criterion is
interpret how the neural network works. Finally, in Sec. V, we used at the leading edge. With the trailing-edge Kutta condition, the
propose another algorithm called machine-learning system bound vortex sheet strength on the plate can be expanded in a
identification (MLSID), with fewer parameters to learn, that uses standard Fourier form, as in thin airfoil theory, as

a) Plate impulsively started without a gust; the plotted b) Plate impulsively started and subjected to a gust;
range of DCp is (–6, 0) the plotted range is (–6, 0)

c) Plate is pitching up without gusts; the plotted range is d) Plate is pitching up and subjected to a gust; the plotted
(–15, 0) range is (–15, 0)
Fig. 2 Spatiotemporal contour plots of pressure jump coefficient ΔCp for different disturbances.
5082 HOU, DARAKANANDA, AND ELDREDGE

 
1  cos θ X ∞ evaluated (“sensed”) at the Chebyshev nodes of the plate, which are
γθ; t  2U A0 t  An t sinnθ (1) expressed in chord coordinates as
sin θ n1
c
where the coordinate along the plate is given by xk  1 − cos θk  for k  0; 1; 2; : : : ; n − 1 (8)
2
c where θk  πk∕n − 1 and n  128. The pressure points at the
x  1 − cos θ (2)
2 leading edge (k  0) and the trailing edge (k  n − 1) are excluded
due to their infinite magnitude. Thus, a total of 126 pressure
with the leading edge at x  0 (θ  0) and the trailing edge at x  c measurements along the chord are taken. The output of the strategies
(θ  π). The LESP σ, which is a signed value defined by Ramesh is the time-varying LESP and/or angle of attack that generated
et al. [21] to be proportional to the strength of the leading-edge such data.
suction, is equivalent in these circumstances to the leading coefficient Thus, each use of the vortex model with a choice of the gust and
in the Fourier expansion: maneuver parameters samples a relationship
σt  4A0 t (3) Y  F σ c t; αt (9)
It should be noted that the definition of the LESP used in Eq. (3) is where Y ∈ RM×N denotes an M × N array of surface pressure data at
four times that of the definition in the work of Ramesh et al. [21], and M pressure sensor locations sampled at N discrete times. The primary
it represents the coefficient of the 1∕ sin θ term in the expansion as
Downloaded by Indian Institute of Science on December 8, 2023 | http://arc.aiaa.org | DOI: 10.2514/1.J058486

objective of this work is to attempt to invert this relationship between


θ → 0. The LESP criterion monitors the instantaneous value of this the gust and maneuver parameters and the surface pressure. However,
parameter to check that it lies within the bounds it should be noted the critical LESP, as an upper bound on the LESP, is
clearly unobservable in any data except when the LESP is actually at
−σ c t ≤ σt ≤ σ c t (4) this bound; we can only seek to estimate the LESP itself. Thus, our
goal is to learn the following inverse relationships:
where σ c t denotes the current value of the critical LESP. In all cases
considered in this work, σ > 0. If σ lies within these bounds, then no X g  Hg Y; ξg ; X m  Hm Y; ξm  (10)
vorticity is released from the leading edge. If, instead, it exceeds the
bounds, then a new vortex element is released near the edge to ensure where X g and Xm ∈ RN are, respectively, the LESP (with subscript g
that the updated value of σ is set to σ c . In most cases, the result of this for gust) and angle-of-attack (with subscript m for maneuver)
criterion is that, once the vorticity starts to be shed from the edge, σ histories at the same discrete times as the pressure in Y; and ξg and ξm
tends to remain equal to σ c for long time intervals. The strength of the are the sets of the parameters (e.g., weights and biases) that constitute
new element is necessarily proportional to the amount by which the the relationships. For this task of approximating Hg and Hm , as well
critical value is exceeded. It is useful to note that a critical value equal as determining their respective parameter sets ξg and ξm , we will use
to zero would correspond to enforcing the Kutta condition at the machine-learning methods discussed in the following sections.
leading edge.
As discussed in the Introduction (Sec. I), we simulate gusts in this
work by varying the critical LESP. Specifically, we smoothly vary
this critical value from a baseline of 0.44, which corresponds to a III. Machine-Learning Methodology
typical value found by Ramesh et al. [21] from offline calibration and As discussed previously, the pitchup maneuvers and gusts result in
by Darakananda et al. [20] from data assimilation for undisturbed disturbances in the surface pressure that vary both spatially and
flow past a flat plate at a moderate angle of attack. We generate a temporally. According to Chen and Chen [24], a general neural
smooth variation in the critical LESP in a compact time interval of network can approximate nonlinear dynamical systems with certain
t ∈ t1 − 2T; t1  2T, which is symmetric about t1 , with a peak conditions posed on the activation functions. If we impose stronger
denoted by S: conditions, such as that the function we are approximating has certain
bounds on Fourier transforms, then by Barron [25], we can achieve an
σ c t  0.44  Sft − t1 ∕T (5) L2 norm of the error of the approximation that goes like 1∕n if the
artificial neural network has one layer of n sigmoidal nodes. These
where the smooth distribution fτ is inspired from the smooth ramp results inspired us to use neural networks to identify the functional
function given by Eldredge et al. [23] and is defined as relationship between the surface pressure difference and the time
 logcosh3τ1 cosh3τ−1∕cosh2 3τ histories of the LESP and angle of attack. This section describes the
fτ  2 logcosh3 ; if τ ∈ −2; 2 (6) components of the machine-learning algorithm and the generation of
0; otherwise training data.

The value of three in the function is a parameter that defines the A. Convolutional Neural Network
smoothness of the change of the LESP and is sufficiently large that it A convolutional neural network is one of the most widely used
ensures that f is approximately equal to zero at the ends of the deep learning network structures: particularly in the field of image
interval, τ  −2, and τ  2. recognition [1]. The core of a CNN layer is a linear filtering operation
We also prescribe a smooth change in the angle of attack of the that applies a common “tile” of weights (obtained by training) to each
plate from α0 to α0  Δα over an interval of t ∈ t0 ; t0  Δtp  using contiguous block (or pixel patch) of the input data. The CNN can be
the same smooth ramp function as given by Eldredge et al. [23]: used to extract local features due to its translational invariance
  property; it can also be interpreted sometimes as applying a discrete
K cosh6t − t0  version of a local differential operation, such as a gradient or
αt  α0  log  KΔtp (7)
6 cosh6t − t0 − Δtp  Laplacian. In contrast to a purely linear filter, the CNN also applies a
nonlinear activation function to each output pixel. Activation
where K  Δα∕2Δtp  is the dimensionless pitch rate, and the factor functions can take different forms in machine learning, such as a
of six is the smoothing parameter of the ramp. hyperbolic tangent (which saturates values outside of a certain range)
The result of each use of the vortex blob model, with the critical or a rectifying linear unit (RELU), which preserves only the positive
LESP and the angle of attack varied in these manners, is a part of the input. The result of this set of operations, which is the
spatiotemporal distribution of surface pressure ΔCp on the plate, as feature map, can be used as new inputs for the next neural
illustrated in the examples in Fig. 2. These surface pressures are network layer.
HOU, DARAKANANDA, AND ELDREDGE 5083

xt xt means of selectively forgetting some of this past information. All of


these gates are determined from weighted combinations of the input
vector and hidden state, with weights determined from training. The
Input Gate it Output Gate ot LSTM also builds in nonlinearity by employing activation functions
for most of these operations. Readers can refer to Goodfellow et al.
Cell
Input Value [30] for more detailed information on the RNN.
xt jt ct ht
C. Training Data Generation
As we will describe in the next section, the machine-learning
algorithm is trained to approximate the mapping from the time series
of the surface pressure distribution along the plate to the
ft Forget Gate corresponding LESP history (or, alternatively, the angle-of-attack
history). We used the vortex blob method described previously to
generate 125,000 sets of training and testing data: datasets that are
xt complete with both their input (surface pressures) and their known
Fig. 3 LSTM schematic, adapted from Graves [29]. output (LESP or angle-of-attack history). Each dataset was generated
by designating the characteristics of the LESP disturbance (t1 , T, and
S) and the angle-of-attack variation (t0 , K, α0 , and Δα), and running
Usually, a pooling layer is connected to a CNN to select the most the vortex blob model for two convective time units with a time-step
Downloaded by Indian Institute of Science on December 8, 2023 | http://arc.aiaa.org | DOI: 10.2514/1.J058486

salient local feature among a set of adjacent pixels. In particular, size of ΔtU∕c  0.01. The ranges of the LESP disturbance and
maximum pooling selects the maximum value from a small patch of pitchup parameters are described in Table 1. All the parameters are
the feature map to pass to the next level. As the surface pressure plots chosen from five uniformly spaced values from the aforementioned
in Fig. 2 suggest, the signatures of the gusts and maneuvers may be range, except for T, which is chosen from eight such values. Note that
subtle and difficult to differentiate from each other. From Boureau a random value was added to the LESP disturbance amplitude in
et al. [26], it is known that maximum pooling performs well in order to ensure some randomness in the training data. The details of
classification problems with “sparse” features, which are features that the training process will be described in the next section.
are subtle and challenging to activate. The pooling layer also has the As one can see from Table 1 and the explanation, the number of
benefit of reducing the size of the data: for example, if maximum values to choose for each parameter in their own range is not rich
pooling is applied to a 2 × 2 patch, then the size of the data is reduced enough to make accurate interpolations. Especially for the variance of
by a factor of four. In the case of our spatiotemporal data, we the LESP to represent gust, the peak of the LESP variation is evenly
generally apply maximum pooling only along the spatial dimension. spaced by 0.2, which is large as compared to the accuracy of the
Overall, several CNN layers can be applied in succession to prediction this algorithm is trying to achieve. Thus, the trained
approximate a function to high accuracy. The reader can refer to algorithm must contain an approximation of the system dynamics
Bishop [27] for a more detailed theoretical background on the CNN. and does not simply provide an interpolation among the training
dataset.
B. Recurrent Neural Network and Long Short-Term Memory Cell
The recurrent neural network is prevalent in sequential series
modeling such as speech recognition. It is designed to recurrently IV. Estimation of LESP Disturbance and Pitchup
apply certain operations along a time series. The RNN can be Maneuver
interpreted as a dynamical model for the data, supplying an updating In this section, we describe a machine-learning strategy for
rule that is partly predesigned but also contains weights obtained estimating the time histories of the LESP disturbance and the angle of
from training. Long short-term memory is one kind of RNN [2], attack that produce a given surface pressure distribution. We will
which was developed to avoid various problems in the basic RNN, focus the explanation on the algorithm for the LESP; the angle-of-
such as exploding and the vanishing gradient problem [28]. attack strategy is similar and will be described briefly later in this
Each LSTM update step has a specific structure, as shown in Fig. 3 section. The section concludes with a discussion of the historical
[29], for calculating the output value ht at a certain time level t from evolution of our machine-learning architecture.
the current input vector xt. It also makes use of various other
information, including some carried over from the previous time A. Algorithm Description
level. This information, carried from one step to the next (the memory Figure 4 depicts the overall neural network architecture we
in the network), is held in a cell state ct (also known as the hidden designed for approximating Hg for estimating the discretely time-
state) that is passed to the output through an output gate ot . The cell varying LESP (Xg ∈ RN ) from the surface pressure data Y. These
state depends in part on the current input vector after this vector is data are input to the algorithm as a two-dimensional matrix of size
combined in a weighted sum jt and then passes through an input gate RM×N . The first dimension M, which we will refer to as the “width” of
it . The dependence of this cell state on its previous value is metered the input data, corresponds to the discrete spatial locations at
by a forgetting gate ft . Thus, the LSTM provides a mechanism for Cheybshev nodes distributed between the leading and trailing edges;
preserving the effect of a past event, like an indicial response function here, M  126, which is equal to 128 nodes minus the two values at
in a linear dynamical system, but goes further by also providing a the edges themselves. The other dimension N, which is the “height”
of the input data, corresponds to the number of discrete sample times
Table 1 Ranges of LESP disturbance and pitchup values used in (with the sampling interval identical to the time-step size of the vortex
generating the training and testing data blob method that generated the data: ΔtU∕c  0.01). Because all
cases were run to two convective times, the height of the input data is
Parameter Value N  201. Because the eventual output of the network constitutes
t1 U∕c 0.2; 1.4 scalar time-varying data, it has dimension R1×201 (or simply R201 ). In
4T 0; 1.8 fact, the height of the data never changes from layer to layer. The
S −0.44; 0.36 uniformly distributed random perturbation machine-learning algorithm and optimization step are implemented
∼U0; 0.044 using TensorFlow [31].
t0 U∕c 0.2; 1.4
The first layer of the network, involving noise injection, is used to
K 0; 0.6
α0 0; π∕4
prevent overfitting and will be discussed in the following in the
Δα π∕4; π∕2 context of training. The second layer, which is the data preprocessing
block, is designed to treat some spurious values in the input data.
5084 HOU, DARAKANANDA, AND ELDREDGE
Downloaded by Indian Institute of Science on December 8, 2023 | http://arc.aiaa.org | DOI: 10.2514/1.J058486

Fig. 4 Computational graph of the machine-learning algorithm.

Because we generate these data using an inviscid vortex model, and B. Training and Testing the Neural Network
the interactions between the vortex elements and the plate in this A neural network is trained [that is, the network weights and
model are not regularized, then the elements that come very close to biases, such as ξg in the gust map in Eq. (10), are determined] by
the plate can exert unphysically large (negative) pressures. Although minimizing a metric over the training dataset. This metric, called the
the situation is rare, the resulting high-pressure data points can loss function, is constructed to measure the error between the network
disrupt the training of the neural network. However, simply estimate and the known output of the input data Y.
eliminating the invalid values can be detrimental to the algorithm
because it discards potentially valuable information. In this 1. Training Details
algorithm, we use a CNN followed by a hyperbolic tangent function
As described earlier in the paper, a total of 125,000 sets of data was
to saturate the invalid pressure values, but we retain information
generated for training. In addition, a batch of 6075 sets was generated
about the remaining (physically meaningful) pressures. The CNN
for testing the trained network by randomly varying the values
contains 16 separate filters, with each using a 5 × 5 tile of weights so
mentioned earlier in training data generation in Sec. III.
that the output of this layer is of size R16×126×201 . The first dimension
Network training relies on a technique called backpropagation to
(16) constitutes the depth of the output volume of the layer.
estimate the gradient of the loss function with respect to the network
The preprocessed data are then sent into the first of two core
blocks: a series of CNNs, each with a 5 × 5 tile of weights; and weights; this gradient is used to adjust the weights, with a step size
maximum pooling layers with a spatial dimension of three. This called the learning rate, within an overall optimization framework
block’s purpose is to detect and amplify features in the pressure data that seeks to minimize the loss function. In our training, we used both
that can be used to discern phenomena and events in the fluid stochastic gradient descent [31] and the Adam optimizer [32].
dynamics. Each of the CNN layers contains a separate subfilter to Stochastic gradient descent subtracts the gradient of the loss function,
operate on each depth dimension of the input data. The outputs of weighted by the step size, from the current value of the set of
these subfilters are combined and then sent through some number of parameters at each iteration. The Adam optimizer is a variation of the
other filters: the output of the first CNN, for example, is R8×126×201 . stochastic gradient descent method, which has been shown to
Then, the maximum pooling reduces the width of the data by a factor outperform many other training algorithms. Because the
of three: for example, R8×42×201 . Eventually, the output of the CNN aforementioned algorithms only use the gradient and do not require
block is of size R16×5×201 . The basic principle of this deep structure of the Hessian matrix, these optimization methods are computationally
CNN filters is to identify as many local features as possible that might efficient. For each iteration, one method is selected based on the
inform the global behavior inherent in the final estimates of the output of the current iteration. In each iteration, the method used in
overall network. that iteration randomly selects, from the overall collection, a
The output of this CNN block, after the first two dimensions are minibatch of datasets to backpropagate over; we used 100 sets in each
stacked into a vector with a length of 80, is delivered as input to the minibatch.
next: an RNN block consisting of two LSTM layers. These layers Two loss functions were used in the training procedure. The first
resemble a traditional fully connected neural network layer but with was the L1 norm of the difference between the estimated and known
the additional powerful ability of an LSTM RNN to selectively retain output vectors (of length 100N) over the entire minibatch. The
and discard information. The first RNN layer outputs data of size second was the mean-squared error (MSE) over the same minibatch.
R128×201 , and the second RNN layer reduces the data to size R4×201 . Because the L1 norm is well known to be less sensitive to outliers (the
The output of this layer is used as input to four linear and nonlinear invalid cases), the training was started by minimizing the L1 norm for
functions, with each applied to the corresponding channel in the RNN the first set of iterations. Our experience has shown that this initial
output. The purpose of these functions is to serve as a means of training converges quickly. However, because the variation in the
providing rich content for reconstructing the output signal. The final LESP history is smooth, we expect the second moment of the error to
output of the algorithm Xg is obtained by linear combination of the be low; so, after some number of iterations with the L1 norm, we
output of these functions, which is followed by a final tanh activation. switched the training to minimizing the MSE. We also lowered the
HOU, DARAKANANDA, AND ELDREDGE 5085

learning rate from 10−3 to 10−4 during this switch, ensuring a kept), then the testing metric increased to 0.014. Thus, for high
smoother adjustment of weights during this interval. It should be accuracy, we still must discard values even during the actual
noted that training required some degree of user supervision. estimation applications. So, in spite of the low metric value, the
Testing of the network was carried out (simultaneously with the dropout algorithm discards a significant amount of information,
training) with minibatches of 100 sets randomly selected from the thereby sacrificing the interpretability of the algorithm and limiting
testing batch of 6075 as well; these were evaluated without the potential for further development and application.
backpropagation and weight adjustment. Thus, the testing data were We also explored another approach to avoid overfitting. In the
kept anonymous to the training data in order to independently training, we multiplied each entry in the input data by a random
evaluate the network’s accuracy. The L1 norm over each testing number chosen from a Gaussian distribution. In other words, if we
minibatch was used as the metric to assess the error of the network. denote the original input pressure matrix by P, we generate a matrix N
where N ij ∼ N μ  1; σ 2  0.052  for all i and j, and then we use
2. Overfitting the elementwise product P~  P ∘ N as input in the next layer (the
data preprocessing step in Fig. 4). This multiplicative Gaussian noise
Overfitting is a well-known phenomenon in the training of neural
is inspired somewhat by the work of Li and Liu [34], who introduced
networks in contexts where the number of neural network parameters
adaptive multiplicative Gaussian noise to avoid overfitting. However,
ξg is large as compared with the number of data points in the training
in that case, the adaptive noise was applied to the output of each layer,
data. This phenomenon is similar to the familiar situation of fitting a whereas we apply the noise directly to the input. Using this approach
large-degree polynomial through a small number of data points: the during training, we found that the resulting testing metric converges
curve can easily be constructed to pass exactly through the data, but it to 0.0077 after the training metric (based on the L1 norm) was
relies too much on the exact values of these points so that new data
Downloaded by Indian Institute of Science on December 8, 2023 | http://arc.aiaa.org | DOI: 10.2514/1.J058486

reduced to 0.004, as shown in Fig. 5c (multiplicative Gaussian noise).


points are not accurately estimated. In a deep neural network, the Because the testing metric reached this value after around 16,000
training can fit any set of data as long as the loss function achieves a iterations, we deemed that this was a sufficient number of iterations
small value, but it does not necessarily lead to a good estimation for for training. More important, we did not need to introduce the
other input data. This can be observed in Fig. 5a (no overfitting multiplicative noise in the estimation applications after training to
prevention strategy), in which the testing metric converges to a achieve low error.
significantly larger value (0.015) than achieved during training One might be tempted to increase the noise variance to ensure an
(0.01). Even if we continue the training to an even lower value of the even more robust training procedure. However, there is a limit on the
loss function, the testing metric remains stuck at 0.015. Thus, effectiveness of the procedure. In the present case, we found that the
overfitting severely limits the accuracy of the algorithm. neural network can achieve the same value of the training loss
We considered two types of strategies to avoid overfitting in this function even if the standard deviation of the multiplicative noise is
work. Dropout is a technique used widely in the training of neural set to 0.4, but the subsequent performance of the network on testing
networks in which some values of the output of a certain layer of the data is severely inaccurate. In other words, if the noise in the input
network are randomly set to zero or kept according to some “keep” pressure data is large, the function approximated by the neural
probability [33]. Using dropout, with a keep probability of 0.5, the network no longer provides the relationship between the pressure
resulting testing metric value converges to 0.009, as shown in Fig. 5b. distribution and the LESP but, rather, some other function that is
However, this testing required the dropout to remain in effect even in almost irrelevant to the desired one. In such circumstances, the
testing: if the keep probability was increased to one (i.e., all data were performance of the network applied to the testing dataset is poor.

a) b)

c)
Fig. 5 Comparison of the error metrics over 20,000 iterations of training (in orange) and testing (in blue) for three different strategies for preventing
overfitting.
5086 HOU, DARAKANANDA, AND ELDREDGE

In conclusion, by comparing the results of the neural network assessment, the angle-of-attack history is assumed known.) The
trained with dropout and with moderate multiplicative Gaussian comparison of Figs. 6c and 6d shows similar vorticity distributions in
noise, the latter performs better on the testing dataset. We thus used the flowfield for the actual and the estimated cases. Figure 6e shows
multiplicative Gaussian noise in the network training process. that the strengths of the vortex elements released into the fluid at the
leading edge agree very well, and Fig. 6f shows that the total
C. Results and Discussion circulation of the leading-edge vortex is estimated very accurately by
As described previously and shown in Fig. 5c, the value of the L1 the vortex blob model using the estimated LESP history.
It is interesting to look deeper into the structure of the network, and
norm of the testing dataset is 0.0077. There are a few cases in which
particularly into that of the second RNN layer. Figure 7 depicts the
the estimation can greatly deviate from the desired value. However,
gates and output values of one case. Each color represents a unit of the
most cases in the testing data show very good agreement with the true
RNN that corresponds to one function after the second RNN: blue for
LESP. The cumulative distribution of the loss function of the testing is
the identity function, orange for the square function, green for
shown in Fig. 6a. Most of the testing cases have an error of less than
sigmoid, and red for the exponential function; the dotted line is the
0.01, and almost all the cases have an error of less than 0.02. In
scaled value of the LESP, which is included for reference. First, the
Figs. 6b–6e, we show an example of estimation in which the L1 norm input value of the LSTM cell corresponding to the exponential
of the error is close to that of the average across the testing dataset. function converges to zero quickly, as shown in Fig. 7a. However, the
The estimated LESP agrees well with the actual LESP (and with the same RNN unit has a large variation in the hidden state (Fig. 7c),
critical LESP because the LESP is at its upper bound for all of the especially during the time interval when the LESP is changing. This
simulation). However, because the LESP and its critical value are variation propagates to the output gate where the value of the RNN
Downloaded by Indian Institute of Science on December 8, 2023 | http://arc.aiaa.org | DOI: 10.2514/1.J058486

only surrogates for the desired results (for example, an estimation of unit for the exponential function is still significant (Fig. 7b). In
the leading-edge vortices generated during an agile flight maneuver addition, the forgetting gate of the same RNN unit converges to zero
subjected to a gust), it is important to compare the flows generated by after the gust (Fig. 7d). However, as with many deep learning
the actual history of the LESP with the LESP history estimated by the architectures, it is difficult to extract intuition from these internal
network when they are both used in a vortex blob model. (For this values.

a) b)

c) d)

e) f)
Fig. 6 Representations of a) cumulative distribution of L1 norm of loss in testing data; b) one case of estimate with an L1 norm of 0.0072 (green line is
critical LESP, orange line is LESP, and blue line is estimated LESP); c) actual distribution of vortex elements at tU∕c  2 in testing case shown in Fig. 6b,
where color indicates the sign (blue for positive, red for negative) and magnitude (darker is stronger) of the element strength; d) distribution of vortex
elements at tU∕c  2 obtained by using estimated LESP history; e) comparison of strengths of each new vortex blob introduced at the leading edge (orange
line is actual strength, blue line is strength obtained by using estimated LESP history); and f) overall circulation in leading-edge vortex (orange line is actual
circulation, blue line is circulation obtained by using estimated LESP history).
HOU, DARAKANANDA, AND ELDREDGE 5087

a) Input values of second RNN layer (notice the RNN unit b) Output values of second RNN layer. LSTM cell for ex-
corresponding to the exponential function becomes almost ponential function, identity function, and sigmoid function
zero shortly after start) have the most fluctuation of values
Downloaded by Indian Institute of Science on December 8, 2023 | http://arc.aiaa.org | DOI: 10.2514/1.J058486

c) Hidden state of second RNN layer (LSTM cells for d) Forgetting gate of second RNN layer (RNN unit corresponds
exponential function and identity function have greatest to exponential function converging to zero after gust)
change, thus retaining most information throughout)
Fig. 7 Gates and value history of one specific test case of estimation of LESP history.

D. Estimation of Angle of Attack It is reasonable to wonder whether similar performance could be


We apply the same neural network structure to the task of achieved with just an RNN/LSTM receiving input directly from the
recovering the angle of attack from the surface pressure data, but with surface pressure sensors. However, the CNN layers serve two
two minor changes. We reduce the number of units in the first RNN important purposes. First, they reduce the dimension of input to the
later to 32 from 128 in order to save computation time. Also, we RNN structure. This operation significantly reduces the number of
change the activation function used in the final output from parameters needed for the neural network because the first RNN layer
hyperbolic tangent to a smooth rectifier (RELU) in recognition of the is the layer with the most parameters. Furthermore, the CNN serves to
fact that the angle of attack is constrained to vary in a positive range extract local features in a manner that cannot be achieved by the RNN
0; π∕2. After going through the same training and validation with lower dimension. The CNN thus provides a means for reducing
process, we obtain that the error for angle-of-attack estimation is the number of parameters but also ensuring sufficient network depth
0.015, as shown in Fig. 8a. Considering the larger magnitude of angle so that the universal approximation property of the multilayer
of attack as compared with the LESP, we conclude that the trained perceptron can be exploited.
algorithm for angle-of-attack estimation is as accurate as the neural One distinctive feature of the aforementioned neural network
network for LESP estimation. structure is that it involves a small library of nonlinear functions
(some of which can be unbounded) instead of the typical sigmoidal
functions used in fully connected neural networks. The use of this
E. Evolution of Machine-Learning Structure library allows the network to introduce a mix of different types of
As is evident in Fig. 4, the structure of the machine-learning signal amplification into the neural network. We observed a
algorithm we used in this work is complex. We arrived at this significant drop in performance if, instead, we used a fully connected
structure via an evolution through a series of earlier architectures that layer with sigmoidal activation functions. Our results from the testing
addressed simpler versions of the problems described in this paper. set (an example of which is shown in Fig. 7b) indicate that the
For brevity, the quantitative results of these earlier efforts are not exponential and identity functions have similar, as well as the most,
reported here, but we provide some comments here on the significant contributions among all the functions; although, this
progression. In our earliest work, we focused on the prediction of balance of contributions clearly would depend on the nature of the
discontinuous changes of LESP at a constant angle of attack. For this signal we seek to approximate.
task, we found success by connecting a CNN to an RNN layer. Then,
in later work, when we allowed the angle of attack to increase
smoothly, as in this paper, we were able to predict a discontinuous V. Machine-Learned System Identification Method
change of the LESP with multiple layers of CNN, but fewer than in In the last section, we developed a network architecture that
the architecture shown in Fig. 4. However, as the work of learned the functional relationship between surface pressures and the
Darakananda et al. [20] showed, a typical gust does not trigger a LESP and the angle-of-attack histories. In this section, we propose a
discontinuous jump in LESP but, rather, a smooth variation. Thus, different method in which we instead attempt to learn a dynamical
our work in this paper has focused only on smooth simultaneous model that describes this relationship. The advantage of this approach
changes in both the angle of attack and LESP. The depth and is that the resulting dynamical model is more readily analyzable with
complexity of the network shown in Fig. 4 were found to be sufficient standard tools as compared with an obscure deep neural network. We
to disambiguate these smooth variations in the surface pressure data. describe this framework as machine-learned system identification.
We have not attempted to show that this structure is necessary or Previous researchers have also proposed combinations of machine
optimal, however; and it is likely that there are other choices for the learning and dynamical systems approaches. For example, the cluster
number of layers that would lead to similar performance. reduced-order modeling framework by Kaiser et al. [35] was
5088 HOU, DARAKANANDA, AND ELDREDGE

a)
Downloaded by Indian Institute of Science on December 8, 2023 | http://arc.aiaa.org | DOI: 10.2514/1.J058486

b) c)
Fig. 8 Representations of a) error metrics during training of network for estimation of angle of attack [the training error (orange) converges to 0.01,
whereas the testing error (blue) converges to 0.015]; b) cumulative distribution of the L1 norm of loss in testing data; and c) one case of angle-of-attack
estimation with L1 norm of loss equal to 0.01430411, which is slightly lower than average loss but slightly higher than median loss (0.0143). The orange
curve is the actual angle of attack, whereas the blue line is the estimated value.

proposed to identify physical systems using machine-learning history and evolves with time. In fact, this latter vector comprises the
models. The main difference is that the method proposed by Kaiser cell state (the memory) of an LSTM RNN cell, which is delivered as
et al. [35] assumes a Markov process in the transition of the dynamics, input to the cell (via the forgetting gate) in the next time step. The
whereas MLSID does not make this assumption. objective of the machine learning is to learn the time-varying
To formulate MLSID, we assume that the desired quantity can be elements of the matrix A~ and forcing vector u.~ Because A~ ∈ Mn×n R
described by a linear system of first-order differential equations with and u~ ∈ Rn , the total number of outputs in each time step is n2  n.
forcing: As the schematic in Fig. 9 shows, we simulate the temporal evolution
of those values as a RNN with n2  n units. In fact, this RNN block
x_  Ap; px
~  up; p;
~ x0  x0 (11) consists of two LSTM RNNs: the first to accept the output of a set of
three CNN layers used for preprocessing the pressure data, and the
The first entry of the state vector, x ∈ Rn , is the desired quantity; second to populate the values of the matrix A~ and vector u. ~ The
the other n − 1 entries of x represent internal states. The matrix A and rationale is that the first RNN layer evolves the output of the CNN
the forcing u both depend on the current vector of surface pressure layers to another spatiotemporal map such that the evolution features
coefficients p as well as historical values of these pressures, denoted both long-term and short-term time dependency (a characteristic of
~ The vector x0 is the initial condition. We write this in time-
by p. LSTM). Thus, the output is termed “generated pressure.” The second
discrete form as RNN layer subsequently converts the generated pressure to the
elements in the updating matrices and the forcing vector. In addition
~ t ; p~ t xt−1  up
xt  Ap ~ t ; p~ t ; x0  x0 (12) to the dynamical system updating matrices and forcing vectors, we
use a fully connected network to generate the initial condition x0 from
where xt denotes the current value of the state vector at time step t, pt the pressure information of the first few time steps. These parameter
is the current pressure coefficient distribution on the surface of the histories are then sent to the dynamical system block, where the state
plate, and p~ t is some state variable vector dependent on the pressure vector is propagated by its update equation. During training and

Fig. 9 Machine-learned system identification structure to develop a dynamical system to estimate LESP and angle of attack (FCNN, fully connected
neural network).
HOU, DARAKANANDA, AND ELDREDGE 5089

testing, the history of the propagated state vector is compiled and the the similar ranges, as shown in the comparison of Figs. 8b and 10b.
estimated value in this state vector is compared with the actual value The learning of initial conditions also contributed to the smoothness of
of that desired quantity, via the loss function. the estimation at the start, giving a better performance overall.

A. Estimation of Angle of Attack B. Estimation of LESP


We let n  4 (i.e., simulate a dynamical system with state vector of Increasing the number of hidden units of the RNN to 64 and the
dimension four) to train an MLSID neural network to detect the angle number of internal states of the simulated dynamical system to six,
of attack. For the initial condition, in order to reduce the number of the same MLSID framework can be used to estimate the LESP. To
parameters, we use pressure information from the first two time steps learn the initial condition of all states as well as the internal states of
as input with a one-layer fully connected neural network and the LSTM RNN layers, we used the values of the 32 pressure sensors
hyperbolic tangent activation function (no hidden layer) to learn the closest to the leading edge from their first five time samples as input to
initial condition of the first two states; and the initial condition of the a two-layer fully connected neural network, followed by a single
remaining two states are set to zero. Because the diversity of the initial hidden layer with a hyperbolic tangent activation function. However,
conditions in the training data is low, this latter approach avoids the the overfitting problem is even more severe than the purely machine-
overfitting that would arise with the addition of two more values to learning algorithm, despite the small number of parameters. Thus, in
learn. Although the number of parameters is small as compared to the addition to the Gaussian noise injection layer used in the AOA
size of the training data, overfitting problems still emerged during the estimation case, a layer of “truncated” Gaussian noise injection was
training of the MLSID network, requiring the use of techniques to inserted between the third CNN layer and the generated pressure
RNN layer. The rationale for drawing noise from a truncated
Downloaded by Indian Institute of Science on December 8, 2023 | http://arc.aiaa.org | DOI: 10.2514/1.J058486

mitigate such problems. Indeed, even with such techniques, one can
still observe the difference between training and testing errors, Gaussian distribution (a distribution in which values are limited to a
showing the presence of overfitting. finite range about the mean) is the unboundedness of the
The training proceeds similarly to the previous section. As shown in backpropagation procedure in the RNN. This unboundedness is
Fig. 10a, the metric value for both the training and testing datasets described in the following.
decreases throughout the training but becomes flat after approximately We can formulate the dynamical system as
15,000 training iterations. It can be observed that the variation of the
metric value is relatively stronger and less stationary as compared to xt  A~t xt−1  u~ t (13)
the algorithm with only a neural network (shown in Fig. 8a). This
behavior is expected because the present method involves the
propagation through matrix multiplication, which is an operation that yt  cT xt (14)
is not bounded by activation functions, thus adding complexity and
instability to the training. Nevertheless, the algorithm achieves a better
accuracy than the purely machine-learning strategy, and with far fewer
parameters. The error distributions on the testing dataset are limited in c  1; 0; 0; 0; 0; 0T (15)

a)

b) c)
Fig. 10 Representations of a) metric evolution during training for angle-of-attack estimation using MLSID [the training error (orange) converges to
0.003, whereas the testing error (blue) converges to 0.0105]; b) cumulative distribution of the loss L1 norm of testing data for angle-of-attack estimation
using MLSID; and c) one case of angle-of-attack estimation of MLSID algorithm with L1 norm of loss equal to 0.01165, which is slightly higher than
average loss but around the same as median loss (0.01165). The orange curve is the actual angle of attack, whereas the blue line is the estimated value.
5090 HOU, DARAKANANDA, AND ELDREDGE

where y is the desired output (LESP), and x is the state vector. The L1 where ϕ is the part of the operation of the LSTM cell at the second
loss of the expression is RNN layer after multiplying the input with the weights and adding the
bias, and ψ is the combination of the similar part of the LSTM
Jw  ky − yT k1 (16) operation as ϕ concatenated with the output values from the previous
time step of the second RNN layer; the stricture may seem
where y is the time series of the estimated value, yT is the time series cumbersome, but this type of formulation helps simplify the RNN
of the actual target value, and w is the set of parameters to be found structure in the formulation given in Eq. (20) and the analysis
during training to minimize J. For a parameter wi in the set, afterward. In addition, Π and Θ are the weights of the first and the
second RNN layers, respectively; a1 and a2 are the biases of the first
∂J X ∂yt and second RNN layers, respectively; and zt is the combined input
 sgnyt − ytT  (17) for the RNN block at that time step (a combination of the output from
∂wi t
∂wi
the preceding CNN block and data from the previous time step).
It is important to note that, because we are only focused on a single
X ∂xt
 sgnyt − ytT cT (18) entry in A~ t , the activation function ϕ and its argument are only scalar
t
∂wi valued in this discussion. However, the output of the first RNN layer
(via activation function ψ, which acts elementwise upon this output)
X   is vector valued, and so Θ is a correspondingly long vector of weights.
∂xt−1 ∂u~ t ∂A~t t−1
 sgnyt − ytT cT A~ t   x (19) In contrast, Π is a matrix of weights, reconciling the length of the
∂wi ∂wi ∂wi
t input vector zt with that of the output of the first RNN layer. Biases a1
Downloaded by Indian Institute of Science on December 8, 2023 | http://arc.aiaa.org | DOI: 10.2514/1.J058486

and a2 are vector and scalar valued, respectively.


Now, for the last term in Eq. (19), the partial derivative for updating Let us then consider the gradient of A~tij with respect to one weight
matrix A~ with respect to the parameters in the RNN block is Πkl in this matrix, which is a typical entry in the ∂A~t ∕∂w tensor in the
unbounded if the second Gaussian noise layer attached to the end of backpropagation process:
the CNN block is allowed to apply unbounded noise to the input of One can write out the derivative as
the RNN. The derivation is shown in the following:
Because the updating rule of the RNN for some time step t uses the
output of the previous time step t − 1 to calculate the hidden states ∂A~tij
 ∇ϕΘψΠzt  a1   a2 Θψ ;k Πzt  a1 ztl
and output for that time step, we treat those outputs and hidden states ∂Πkl
similarly to the actual input from the preceding CNN block. In this ∂zt
way, we can write that the one entry (i; j) of updating matrix A~ t at that  ∇ϕΘψΠzt  a1   a2 ΘJψ Πzt  a1 Π
∂Πkl
time step is
X ∂At−1
nm
 (21)
A~tij  ϕΘψΠzt  a1   a2  (20) n;m
∂Π kl

a)

b) c)
Fig. 11 Representations of a) metric evolution during training for LESP estimation using MLSID [the training error (orange) converges to 0.005,
whereas the testing error (blue) converges to 0.0052]; b) cumulative distribution of the loss L1 norm of testing data for angle-of-attack estimation using
MLSID; and c) one case of LESP estimation of MLSID algorithm with L1 norm of loss equal to 0.003740, which is lower than average loss but slightly
higher than median loss (0.003734). The green line is the critical LESP, the orange line is the LESP, and the blue line is the estimated LESP.
HOU, DARAKANANDA, AND ELDREDGE 5091

Table 2 Test results of presented algorithms with high pitch rate (0.6–0.8)
Quantity to detect Mean metric Median metric
Purely machine-learning algorithm LESP 0.020 0.017
AOA 0.083 0.070
MLSID LESP 0.022 0.022
AOA 0.084 0.067

Here, the prime n;k denotes partial differentiation of its nth output full spectrum of the Gaussian random variable without truncation.
with respect to its kth parameter; in particular, ψ ;n represents the The second Gaussian layer is entirely within the network itself, with
partial derivative of the output value of the LSTM operation ψ with the sole purpose to prevent overfitting. Thus, the necessary
respect to the function’s nth argument, and Jψ denotes the Jacobian modification (truncation) should be applied to minimize its effect in
matrix of the operation ψ. No summation is implied by the repeated disrupting the training.
indices. With the noise injections and about 30,000 iterations of training, a
The key revelation of Eq. (21) is that this gradient is proportional to more accurate (shown in comparison of Figs. 5c and 11a) and more
entries in the combined input vector zt. Part of this input to the first stable (shown in comparison of Figs. 6a and 11b) estimation of the
RNN layer comes from the output of the CNN block, which is passed LESP can be achieved using the MLSID algorithm with much fewer
through the Gaussian noise layer. If the noise introduced into zt is not parameters as compared to the algorithm with only neural network
Downloaded by Indian Institute of Science on December 8, 2023 | http://arc.aiaa.org | DOI: 10.2514/1.J058486

bounded, the input vector will not be bounded, nor will the gradient structures.
itself. One should also notice that the similar unbounded terms might
appear in the last term of Eq. (21) that, in its explicit form, will have
similar formulations as Eq. (21) for each of the summands. Those VI. Limitations
summands will be unbounded, and so will the summation itself. The algorithms presented in this paper do have certain limitations.
Thus, during training, the parameters in the first RNN layer might Specifically, as is typical of neural networks, these algorithms are
incur unreasonably large changes, and these changes will propagate generally only capable of interpolation within the range of data on
to other parameters via the network’s coupling, generating an which they are trained. We illustrate this point in this section with
instability. This instability can be prevented by truncating the noise another set of testing data in which the dimensionless pitch rates of
applied before the first RNN layer. the flat plate are randomly chosen from a higher range (between 0.6
One might argue that the multiplicative Gaussian noise applied to and 0.8) than the range used for training, whereas all other variables
the input to the first CNN layer should also be truncated in similar are within the same range of variables presented in Sec. III. A total of
fashion. However, it should be noted that the design of the first CNN 4050 such sets of testing data are generated and then used to test the
layer for both MLSID and the purely neural network algorithm serves four algorithms presented in this paper. The results are shown in
to preprocess the data and to minimize the effect of the invalid Table 2. From the results, one can observe that all algorithms’
pressure points. The Gaussian noise injected here is not only a performances degrade in this testing range; but, the angle-of-attack
regularization measure but also an imitation of the noise in actual detection algorithms are more susceptible to degradation in this
measurements. Thus, the first CNN layer should be exposed to the higher pitch rate range, whereas the LESP detection algorithms suffer

a) b)

c) d)
Fig. 12 Four different examples of tests with pitch rates (between 0.6 and 0.8) higher than used for training: a) LESP detection using purely machine-
learning algorithm. b) Angle-of-attack detection using purely machine-learning algorithm. c) LESP detection using MLSID. d) Angle-of-attack detection
using MLSID. For LESP predictions, the blue line is the prediction, the orange line is the true LESP, and the green line is the critical LESP. The red dash
line is the normalized angle-of-attack change (scaled to lie within the range of LESP). For angle-of-attack predictions, the orange line is the actual angle-of-
attack profile while the blue line is the prediction.
5092 HOU, DARAKANANDA, AND ELDREDGE

a relatively smaller loss of accuracy. This is expected because the learning architecture: for example, in the manner in which it maps
pitch rate directly affects the surface pressures attributable to the input clusters to output clusters, as described in the work of Kaiser
change of angle of attack, whereas the pressures associated with the et al. [35].
LESP change are only indirectly modified by the pitch rate.
Compared to the results from the previous sections, the error due to
this higher range of pitch rate becomes the main source of error in the Acknowledgment
algorithms. Support by the U.S. Air Force Office of Scientific Research
Representative examples of the algorithms’ behaviors for these (FA9550-14-1-0328 and FA9550-18-1-0440) is gratefully ac-
high pitch rate tests are depicted in four separate cases in Fig. 12. knowledged.
Figure 12a shows the LESP detection using a purely machine-
learning algorithm. Figure 12b shows the angle-of-attack detection
using a purely machine-learning algorithm. Figure 12c shows the References
LESP detection using MLSID. Figure 12d shows the angle-of-attack [1] Lecun, Y., Bottou, L., Bengio, Y., and Haffner, P., “Gradient-Based
detection using MLSID. For LESP predictions, the blue line is the Learning Applied to Document Recognition,” Proceedings of the IEEE,
prediction, the orange line is the true LESP, and the green line is the Vol. 86, No. 11, 1998, pp. 2278–2324.
[2] Hochreiter, S., and Schmidhuber, J., “Long Short-Term Memory,”
critical LESP. The red dashed line is the normalized angle-of-attack
Neural Computation, Vol. 9, No. 8, 1997, pp. 1735–1780.
change (scaled to lie within the range of the LESP). For angle-of- [3] Lee, C., Kim, J., Babcock, D., and Goodman, R., “Application of Neural
attack predictions, the orange line is the actual angle-of-attack Networks to Turbulence Control for Drag Reduction,” Physics of Fluids,
profile, whereas the blue line is the prediction. From the LESP Vol. 9, No. 6, 1997, pp. 1740–1747.
Downloaded by Indian Institute of Science on December 8, 2023 | http://arc.aiaa.org | DOI: 10.2514/1.J058486

detection cases in Fig. 12, the most significant errors occur when the [4] Milano, M., and Koumoutsakos, P., “Neural Network Modeling for
LESP changes concurrently with the high-rate pitchup maneuver. In Near Wall Turbulent Flow,” Journal of Computational Physics,
those cases, the algorithm cannot entirely distinguish the effect of the Vol. 182, No. 1, 2002, pp. 1–26.
gust (the change in LESP) from that of the pitchup maneuver. For [5] Brunton, S. L., Proctor, J. L., and Kutz, J. N., “Discovering Governing
angle-of-attack detection, the trends predicted by the algorithms are Equations from Data by Sparse Identification of Nonlinear Dynamical
Systems,” Proceedings of the National Academy of Sciences, Vol. 113,
similar to those of the actual pitchup maneuver, but the magnitudes
No. 15, 2016, pp. 3932–3937.
are generally underpredicted. Indeed, it is interesting to observe that [6] Colvert, B., Alsalman, M., and Kanso, E., “Classifying Vortex Wakes
the predicted profiles resemble those associated with lower pitch Using Neural Networks,” Bioinspiration and Biomimetics, Vol. 13,
rates from the training set. Overall, the algorithms, although less No. 2, 2018, Paper 025003.
accurate than when applied to data in the training range, are less doi:10.1088/1748-3190/aaa787
erratic than one might expect in an extrapolative role. [7] Ling, J., Kurzawski, A., and Templeton, J., “Reynolds Averaged
Turbulence Modelling Using Deep Neural Networks with Embedded
Invariance,” Journal of Fluid Mechanics, Vol. 807, 2016, pp. 155–166.
VII. Conclusions [8] Tompson, J., Schlachter, K., Sprechmann, P., and Perlin, K.,
“Accelerating Eulerian Fluid Simulation with Convolutional Net-
In this study, two different uses of a deep learning algorithm were works,” arXiv preprint: 1607.03597, 2016.
demonstrated in the context of unsteady aerodynamics. First, a purely [9] Otto, S. E., and Rowley, C. W., “Linearly-Recurrent Autoencoder
machine-learning algorithm was introduced to estimate the leading- Networks for Learning Dynamics,” arXiv preprint: 1712.01378, 2017.
edge suction parameter (LESP) value. For this task, multiple [10] Parish, E. J., and Duraisamy, K., “A Paradigm for Data-Driven
convolutional neural network layers and two recurrent neural Predictive Modeling Using Field Inversion and Machine Learning,”
network layers were used to construct the neural network. By Journal of Computational Physics, Vol. 305, 2016, pp. 758–774.
[11] Singh, A. P., Medida, S., and Duraisamy, K., “Machine-Learning-
introducing a novel form of multiplicative Gaussian noise, the Augmented Predictive Modeling of Turbulent Separated Flows over
problem of overfitting in the training of the neural network was Airfoils,” AIAA Journal, Vol. 55, No. 7, 2017, pp. 2215–2227.
alleviated. The results show that the trained neural network can [12] Wang, J.-X., Wu, J.-L., and Xiao, H., “Physics-Informed Machine
accurately recover the value of the LESP for an impulsively starting Learning Approach for Reconstructing Reynolds Stress Modeling
and pitching up plate subjected to gusts. This neural network Discrepancies Based on DNS Data,” Physical Review Fluids, Vol. 2,
structure was generalized to the task of detecting the angle of attack No. 3, 2017, Paper 034603.
with minor modification. This work suggests that deep learning can [13] Gautier, N., Aider, J.-L., Duriez, T., Noack, B., Segond, M., and Abel,
be used to aid the simulation of flow under gusts in realistic cases. In M., “Closed-Loop Separation Control Using Machine Learning,”
particular, machine learning was used to reveal a relationship Journal of Fluid Mechanics, Vol. 770, 2015, pp. 442–457.
[14] Brunton, S., Noack, B., and Koumoutsakos, P., “Machine Learning for
between the measured surface pressures and the leading-edge Fluid Mechanics,” arXiv preprint: 1905.11075, 2019.
disturbance and airfoil maneuver that led to these pressures. [15] Perrotta, G., and Jones, A. R., “Unsteady Forcing on a Flat-Plate Wing in
Another framework was also designed, which is called machine- Large Transverse Gusts,” Experiments in Fluids, Vol. 58, No. 8, 2017,
learned system identification, in which a neural network was used to Paper 101.
identify the time-varying coefficients of a linear dynamical system [16] Mulleners, K., Mancini, P., and Jones, A. R., “Flow Development on a
model for the angle of attack or LESP. After some care was taken in Flat-Plate Wing Subjected to a Streamwise Acceleration,” AIAA
mitigating the overfitting challenges, the machine-learning system Journal, Vol. 55, No. 6, 2017, pp. 2118–2122.
identification achieved higher accuracy with fewer parameters for the [17] Barnes, C. J., and Visbal, M. R., “Counterclockwise Vortical-Gust/
Airfoil Interactions at a Transitional Reynolds Number,” AIAA Journal,
estimation of both the angle of attack and LESP. It was also shown
Vol. 56, No. 7, 2018, pp. 2540–2552.
that, when some of the parameters of the flight characteristics in the [18] Hufstedler, E. A. L., and McKeon, B. J., “Vortical Gusts: Experimental
testing dataset lie outside of the range used for training, the Generation and Interaction with Wing,” AIAA Journal, Vol. 57, No. 3,
performances of both approaches degrade significantly: although not 2019, pp. 921–931.
as catastrophically as one might expect for such an extrapolation. [19] Leung, J. M., Wong, J. G., Weymouth, G. D., and Rival, D. E.,
There are several other open questions that ongoing work will seek “Modeling Transverse Gusts Using Pitching, Plunging, and Surging
to address. The requirements of the spatial and temporal sampling of Airfoil Motions,” AIAA Journal, Vol. 56, No. 8, 2018, pp. 3271–3278.
the measured pressures have not yet been explored. It will be [20] Darakananda, D., da Silva, A. F. d. C., Colonius, T., and Eldredge, J. D.,
interesting to see whether a sparser set of data can be used to achieve “Data-Assimilated Low-Order Vortex Modeling of Separated Flows,”
similar accuracy. Also, in this work, the generated data have been Physical Review Fluids, Vol. 3, No. 12, 2018, Paper 124701.
[21] Ramesh, K., Gopalarathnam, A., Granlund, K., Ol, M. V., and Edwards,
obtained from a low-order vortex model. It remains to be seen J. R., “Discrete-Vortex Method with Novel Shedding Criterion for
whether it will also apply to a scenario in which the data are obtained Unsteady Aerofoil Flows with Intermittent Leading-Edge Vortex
from a higher-fidelity source, such as direct numerical simulation, or Shedding,” Journal of Fluid Mechanics, Vol. 751, 2014, pp. 500–538.
from experiments. Finally, it would be interesting to further analyze [22] Darakananda, D., Eldredge, J., da Silva, A., Colonius, T., and Williams,
the resulting function approximation generated by the machine- D. R., “EnKF-Based Dynamic Estimation of Separated Flows with a
HOU, DARAKANANDA, AND ELDREDGE 5093

Low-Order Vortex Model,” 2018 AIAA Aerospace Sciences Meeting, [30] Goodfellow, I., Bengio, Y., and Courville, A., Deep Learning, MIT
AIAA Paper 2018-0811, 2018. Press, Cambridge, MA, 2016, Chap. 10.
[23] Eldredge, J. D., Wang, C., and OL, M. V., “A Computational Study of a [31] Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C.,
Canonical Pitch-Up, Pitch-Down Wing Maneuver,” 39th AIAA Fluid Corrado, G. S., Davis, A., Dean, J., Devin, M., et al., “TensorFlow:
Dynamics Conference, AIAA Paper 2009-3687, 2009. Large-Scale Machine Learning on Heterogeneous Systems,” Software
[24] Chen, T., and Chen, H., “Universal Approximation to Nonlinear Package, 2015, http://tensorflow.org/.
Operators by Neural Networks with Arbitrary Activation Functions and [32] Kingma, D., and Ba, J., “Adam: A Method for Stochastic Optimization,”
Its Application to Dynamical Systems,” IEEE Transactions on Neural Proceedings of the International Conference on Learning Representa-
Networks, Vol. 6, No. 4, 1995, pp. 911–917. tions (ICLR), 2015.
[25] Barron, A. R., “Universal Approximation Bounds for Superpositions of [33] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and
a Sigmoidal Function,” IEEE Transactions on Information Theory, Salakhutdinov, R., “Dropout: A Simple Way to Prevent Neural
Vol. 39, No. 3, 1993, pp. 930–945. Networks from Overfitting,” Journal of Machine Learning Research,
[26] Boureau, Y.-L., Ponce, J., and LeCun, Y., “A Theoretical Analysis of Vol. 15, No. 1, 2014, pp. 1929–1958.
Feature Pooling in Visual Recognition,” Proceedings of the 27th [34] Li, Y., and Liu, F., “Whiteout: Gaussian Adaptive Noise Regularization
International Conference on Machine Learning (ICML-10), Omnipress, in Feedforward Neural Networks,” arXiv preprint: 1612.01490, 2016.
2010, pp. 111–118. [35] Kaiser, E., Noack, B. R., Cordier, L., Spohn, A., Segond, M., Abel, M.,
[27] Bishop, C., Pattern Recognition and Machine Learning, Springer, 2006, Daviller, G., Östh, J., Krajnović, S., and Niven, R. K., “Cluster-Based
Chap. 5. Reduced-Order Modelling of a Mixing Layer,” Journal of Fluid
[28] Pascanu, R., Mikolov, T., and Bengio, Y., “On the Difficulty of Training Mechanics, Vol. 754, 2014, pp. 365–414.
Recurrent Neural Networks,” Proceedings of Machine Learning
Research, Vol. 28, 2013, pp. 1310–1318. P. Lavoie
Graves, A., “Generating Sequences with Recurrent Neural Networks,”
Downloaded by Indian Institute of Science on December 8, 2023 | http://arc.aiaa.org | DOI: 10.2514/1.J058486

[29] Associate Editor


arXiv preprint: 1308.0850, 2013.

You might also like