You are on page 1of 22

RESEARCH ARTICLE | JUNE 01 2020

Exploration and prediction of fluid dynamical systems using


auto-encoder technology 
Lionel Agostini 

Physics of Fluids 32, 067103 (2020)


https://doi.org/10.1063/5.0012906

CrossMark

 
View Export
Online Citation

11 December 2023 21:54:35


Physics of Fluids ARTICLE scitation.org/journal/phf

Exploration and prediction of fluid dynamical


systems using auto-encoder technology
Cite as: Phys. Fluids 32, 067103 (2020); doi: 10.1063/5.0012906
Submitted: 7 May 2020 • Accepted: 8 May 2020 •
Published Online: 1 June 2020

Lionel Agostinia)

AFFILIATIONS
Department of Aeronautics, Imperial College London, South Kensington, London SW7 2AZ, United Kingdom

a)
Author to whom correspondence should be addressed: l.agostini@imperial.ac.uk

ABSTRACT
Machine-learning (ML) algorithms offer a new path for investigating high-dimensional, nonlinear problems, such as flow-dynamical systems.
The development of ML methods, associated with the abundance of data and combined with fluid-dynamics knowledge, offers a unique
opportunity for achieving significant breakthroughs in terms of advances in flow prediction and its control. The objective of this paper is to
discuss some possibilities offered by ML algorithms for exploring and predicting flow-dynamical systems. First, an overview of basic concepts
underpinning artificial neural networks, deep neural networks, and convolutional neural networks is given. Building upon this overview, the

11 December 2023 21:54:35


concept of Auto-Encoders (AEs) is introduced. An AE constitutes an unsupervised learning technique in which a neural-network architecture
is leveraged for determining a data structure that results from reducing the dimensionality of the native system. For the particular test case
of flow behind a cylinder, it is shown that combinations of an AE with other ML algorithms can be used (i) to provide a low-dimensional
dynamical model (a probabilistic flow prediction), (ii) to give a deterministic flow prediction, and (iii) to retrieve high-resolution data in the
spatio-temporal domain from contaminated and/or under-sampled data.
Published under license by AIP Publishing. https://doi.org/10.1063/5.0012906., s

I. INTRODUCTION The non-linear terms are the origin of the turbulence—a chaotic (but
not random) phenomenon that is characterized by a broad range of
Fluid mechanics is in the enviable position of having a firmly spatio-temporal scales, some of which are associated with coher-
established, mathematical foundation, in the form of the Navier– ent structures and feature certain aspects of order and organiza-
Stokes (NS) equations, which describe all single-phase, non-reacting tion.15,55,64 This makes most fluid flows complex dynamical sys-
fluid-flow scenarios. tems that contain numerous spatio-temporal degrees of freedom—
For an incompressible flow, the NS equations can be written as characterizing the system’s “dimensionality.”
As the dimensionality becomes ever higher at the increas-
∂u 1
+ (u ⋅ ∇)u = − ∇p + g + ν∇2 u , (1) ing speed (Reynolds number), solving the NS equations becomes
∂t ´¹¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¶ ρ ² more and more resource-intensive, exceeding present computa-
° convection ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ diffusion
variation sources tional capabilities in most practically relevant conditions. A route
often taken for predicting the flow is, therefore, to simplify the sys-
where u is the velocity field, ρ is the density, ν is the dynamic vis- tem by replacing it with an appropriate model, thus reducing the
cosity, p is the pressure, and g represents body acceleration. The system’s dimensionality. The real system is then represented by a
motion of any viscous fluid can be accurately predicted, in principle, simplified reduced-order model, composed of a minimum num-
by solving these equations if the boundary conditions are fully pre- ber of modes able to represent the principal physical and opera-
scribed (Laplace’s determinism65 ). However, the non-linear terms tional characteristics of the flow system. As non-trivial fluid-flow
associated with the convective acceleration are the source of instabil- systems are inherently complex—characterized by extremely high
ity that manifests itself, at more than very modest speeds, by a wide dimensionality, a wide range of scales, and strong non-linearity—
range of interacting eddies that evolve chaotically in time and space. the identification of the phenomena driving the flow dynamics and

Phys. Fluids 32, 067103 (2020); doi: 10.1063/5.0012906 32, 067103-1


Published under license by AIP Publishing
Physics of Fluids ARTICLE scitation.org/journal/phf

their associated modes remains a major challenge in present-day ML is also becoming increasingly pertinent to the construc-
fluid-flow physics and engineering. tion of efficient control algorithms for non-linear fluid-flow sys-
Thanks to recent advances in experimental and computational tems. Within a closed-loop-control framework, the fluid flow as a
methods, and the drastic reduction in computational costs due to whole may be regarded as a dynamical system, which is steered
advances in hardware, both the number and the size of databases toward a desired state by control laws. Inputs are disturbances and
that contain the output of measurements and simulations have actuation signals applied by the controller, whereas the dynami-
increased dramatically over the past decade or two. This abundance cal system’s outputs are the cost function—the quantity that has
of data holds great promise for deepening the knowledge of fluid- to be optimized—and sensor signals corresponding to the observ-
dynamic systems but poses new challenges for processing the data able quantities. The controller’s objective is to determine from the
that requires novel tools. Candidates for such tools and related tech- sensors the control laws that must be imposed onto the system for
niques are progressively available within the rapidly growing field maximizing or minimizing the cost function.
“Machine Learning (ML),” and present methods are increasingly Rapid developments of ML algorithms in fluid mechanics (see
made applicable to the analysis of spatio-temporal fluid-flow data, the recent reviews of Brunton et al.7 and Brenner et al.6 ) and modern
aided by recent advances in hardware (Graphics/Tensor Process- computational hardware provide the rationale for pursuing the use
ing Unit—GPU/TPU—platforms, in particular), algorithms, and of data-driven methods and ML algorithms for active flow control.
open-source libraries. The following two ML-based routes can be taken for implementing
By definition, ML is the field of study that gives computers the flow-control strategies, one being conventional and the other more
ability to learn without being explicitly programmed.61 ML algo- progressive:
rithms are categorized as supervised, semi-supervised, and unsuper-
vised (see Fig. 1 and the recent review of Brunton et al.7 ). The task ● Model-driven approach. The real system is represented
of supervised learning is to learn a function that maps input to out- by a simplified, reduced-order model. This model can be
put based on labeled examples. In contrast, unsupervised learning described by a set of equations—e.g., linearized Navier–
consists of learning unknown patterns in a set of data without a Stokes equations,2,18 resolvent analysis,41 exact coherent
priori information. Finally, semi-supervised learning operates on a structures,25 or by equations coupled with data—e.g., POD-
large amount of unlabeled data, or with corrective input from an Galerkin14 or data assimilation.16,68 Whichever method
appropriate environment, for training a small amount of labeled is used, the model-driven approach poses majors chal-

11 December 2023 21:54:35


data. The specific potential of some ML algorithms lies in their abil- lenges for complex flows: deriving the modes is, generally,
ity to recognize structure in the data and define alternative flow- extremely resource-intensive. This is especially so for non-
representing models at much-reduced effort and cost than conven- linear multi-scale flows. In this case, the flow dynamics can
tional human-driven approaches. By leveraging large data sets that change dramatically in an unpredictable manner due to the
describe the evolution of associated flows, ML algorithms thus offer complex non-linear characteristics of the flows, and this
a new path for deriving flow representations, which are more com- makes it extremely hard for the model to represent suffi-
pact and efficient than the native data sets in the original domains. ciently well the real phenomena being provoked by control
Several recent reviews7,20,33 show the potential of ML algorithms laws.
to radically transform the way data mining is performed in fluid ● Data-driven approach. By taking advantage of recent devel-
mechanics. opments in ML algorithms and modern computational

FIG. 1. Schematic representation of a deep neural network.

Phys. Fluids 32, 067103 (2020); doi: 10.1063/5.0012906 32, 067103-2


Published under license by AIP Publishing
Physics of Fluids ARTICLE scitation.org/journal/phf

hardware, data-driven algorithms hold great potential for Auto-Encoder (AE) technology is a “good candidate” for reduc-
identifying control strategies by focusing directly on the ing efficiently the problem dimensionality, as shown by the online
controller without any knowledge of the dynamics and article “A look at deep learning for science” by Prabhat,49 which pro-
without recourse to any model—the so-called model-free vides an excellent overview of some uses of the deep learning tech-
approach. ML optimizes the system based on data the sys- nology in science applications. Indeed, several studies presented in
tem generates—thus unlike model-driven control, which this overview were only possible by the use of AEs. This method
optimizes the constrained model. aims at extracting essential information and representing it in opti-
mal space, by reducing the dimensionality of the data and conserv-
A whole range of ML methods have been successfully used in ing only the most essential information. By identifying new coor-
fluid mechanics for a variety of desired outcomes; among meth- dinate systems, built on non-linear embedding, the AE combined
ods used specifically for “Machine Learning Control (MLC),” evo- with other ML algorithms allows for the approximation of strongly
lutionary algorithms and genetic algorithms have been shown to non-linear dynamics by a sparse, weakly non-linear model,11,43,51
be very effective for designing well behaved control laws. Evolu- enabling MLC methods to design the most efficient control laws.
tionary algorithms are based on the biological principle of natural After a short introduction to Artificial Neural Networks
selection. Starting with an initial population of control laws, which (ANNs) in Sec. II, the concept of the AE is discussed and com-
are all competing against one another, only the most effective one pared to the Proper Orthogonal Decomposition (POD) in Sec. III,
for optimizing the objective function is selected. Control laws are a conventional technique used to represent flow-structure informa-
improved from generation to generation by applying genetic oper- tion. In Secs. V–VII, the low-dimensional dynamical representation
ations (elitism, mutation, replication, crossover). By “breeding” the obtained by the AE is then used, in conjunction with other machine
most effective control laws, these are selectively improved from gen- learning algorithms, to
eration to generation. Unlike genetic algorithms, which search for
(i) provide a low-dimensional dynamical model (a probabilistic
the best parameters to apply to the control law, genetic program-
flow prediction)—in Sec. V,
ming control learns the structure of control laws and parameters.17
(ii) give a deterministic flow prediction—in Sec. VI, and
Genetic programming has been used for turbulence control in vari-
(iii) retrieve high-resolution data in the spatio-temporal domain
ous applications, for example, the mixing-layer,52 flow-separation,21
from contaminated and/or under-sampled data—in Sec. VII.
turbulent separated shear flow,4 and flow around bluff bodies (e.g.,
the Ahmed body39,40 ). Another approach for identifying an opti-

11 December 2023 21:54:35


mal control strategy is Reinforcement Learning63 (RL). This is an
autonomous, self-learning strategy that, essentially, acquires knowl- II. ARTIFICIAL NEURAL NETWORK (ANN) AND DEEP
edge by interacting with its environment. In this approach, an agent LEARNING
interacts with its environment through actions. With every action, ANNs are computing systems composed of functional nodes,
the agent modifies its environment and gets a reward. The agent’s entities transforming and transferring information among one
goal is to determine those actions that maximize its long-term another. The role of each of these entities, called “perceptrons,” is
rewards. To accelerate convergence, a model reward, combining similar to that of a neuron in biological neural networks. The notion
state and action, is learned from the data. In some applications, of a perceptron was first introduced by Rosenblatt59 to define a
computer memory is also attached to the agent to remember the simple binary network.
best past actions/states. The landscape of the cost function can A deep neural network (DNN) is an ANN with several layers
be learned by a neural network. This method is referred to as between the input and output layers. An example of operations per-
Deep Reinforcement Learning (DRL), and it has been success- formed by a DNN is given in Fig. 1. In this example, a single obser-
fully used by Rabault et al.57 for reducing drag behind a rotating vation x provides inputs into the first layer of nodes. Each of these
cylinder. inputs is multiplied by a connection weight W1 , and a bias (b1 ) is
As any non-trivial fluid-flow system is inherently complex and added for shifting the activation function f1 up or down. This deter-
of high dimensionality, determining control laws directly from its mines which perceptron should be activated (“fired”) and decides
data is hardly feasible because of resource constraints. However, this which amount and nature of information be transferred to the fol-
becomes tenable if the raw data are replaced by low-dimensional lowing layer. As is shown in Fig. 1, the output of a hidden layer l is
flow representations that can be “handled” by MLC algorithms. A given by hl = fl (Wl hl-1 + bl ), where hl−1 is the previous-layer out-
possible route is to simplify the system by replacing it with a “coarse” put. If a DNN with L layers is considered, the final output y is then
model that reduces the dimensionality of the flow by identifying a given by the relation
small number of modes, which are dominant and thus represent the
principal physical and operational characteristics of the flow system. y = fL (WL fL−1 (⋯(f 1 (W1 x + b1 ))⋯ + bL−1 ) + bL ). (2)
The model is then used as “a bridge” between the high-dimensional
complex raw data and the MLC algorithms. Based on the objec- A DNN is a multi-layer stack of simple modules that can be
tive defined by a cost function, MLC algorithms select modes that non-linear, transforming the input data successively with increas-
have to be enhanced and/or modes that have to be depressed ing levels of abstraction. By adding and combining enough of these
or suppressed and define a strategy for achieving the purpose of modules, any function can be learned, and a DNN can then emu-
the algorithm. This approach is a hybrid between model-driven late this function. Results obtained by Hecht-Nielsen,26 based on
and data-driven methods, correcting the deficiencies of the latter Kolmogorov’s theorem,31 show that a DNN is a “universal approxi-
mutually. mator,” able to approximate any function.

Phys. Fluids 32, 067103 (2020); doi: 10.1063/5.0012906 32, 067103-3


Published under license by AIP Publishing
Physics of Fluids ARTICLE scitation.org/journal/phf

To demonstrate the effect of the use of non-linear activation Training the network entails finding the parameters θ
L
functions, the functions in Eq. (2) are initially prescribed to be linear = {Wl , bl }l=1 that minimize the (expected) loss of meaningful infor-
[e.g., f (x) = ax]. The transformation operated by the network is then mation between the output and the target value. Modern neural
given by networks use a technique called backpropagation9,37,60,67 to train the
model and discover intricate structures in high-dimensional data.
L ⎛ L
bi ⎞ A backpropagation algorithm allows the DNN to adjust its inter-
y = (∏ ai Wi ) x + ∑ i j j . (3)
i=1 ⎝ i=1 ∏j=1 a W ⎠
nal parameters used to compute the representation in each layer
from the representation in the previous layer. This approach places
This formula shows clearly that their combination is still a lin- an increased computational strain on the activation function and
ear function—i.e., the final activation function of the last layer is a its derivative function. To properly adjust the weight vector Wl ,
linear function of the input into the first layer. Therefore, a DNN a gradient vector is computed by the learning algorithm for each
with L layers can be replaced by an ANN with a single layer model- weight, providing information on the error variation as the weight is
ing only a linear regression between the input and the output. This is slightly modified. The weight vector is then adjusted in the opposite
contrasted with the relation given in Eq. (2) for which the DNN cor- direction to the gradient vector.
responds to a non-linear regression model if the activation functions The concept of backpropagation by which ANNs learn to tune
are non-linear. their parameters is illustrated in Fig. 2 by using an electric circuit

11 December 2023 21:54:35


FIG. 2. Simple representation of how
a single layer neural network tunes its
weights (learns) using the backpropaga-
tion method and gradient descent.

Phys. Fluids 32, 067103 (2020); doi: 10.1063/5.0012906 32, 067103-4


Published under license by AIP Publishing
Physics of Fluids ARTICLE scitation.org/journal/phf

as a surrogate for a DNN. The electric current flows from a socket get smaller and smaller as information is pushed backward in the
to a bulb light, and its intensity is regulated by two rotary switches. network, resulting in that the neurons embedded in the first layer
The intensity of the input and output currents is represented by x are learning very slowly as compared to the neurons in the later lay-
and y, respectively, and h is the voltage after the first switch (hidden ers in the hierarchy. The problem of vanishing gradients implies that
layer). The objective here is to find the configuration of both switches DNNs take a long time to train and thus degrade the model’s accu-
(weights) for which the brightness is optimal (not necessarily maxi- racy. On the other hand, exploding gradients is the exact opposite
mal), y′ being the optimal intensity. A cost function J can be defined problem to vanishing gradients: large gradient errors accumulate,
and represented by the relation J = ∥y − y′ ∥. The optimal bright- as the magnitude of weights is high, resulting in excessively large
ness can be obtained by minimizing the cost function J by adjusting updates to the model’s weights. The model oscillates around the
w1 and w2 . The variation of the cost function for small changes in minimum or even overshoots the optimum, and the model is thus
w1 and w2 has to be determined. Although these derivatives can- unable to learn.
not be defined directly, they can be estimated by backpropagating Generally, the corruption of the learning process by vanish-
the sensitivity of J to variations in the electric current. The route ing gradients can be avoided by using the REctified Linear Unit
to extracting these derivatives and linking them to ∂y ∂J
is referred (RELU)48 function as the activation function, as the gradient is 0
to as the “chain of sensitivity” or “chain of rules,”9,35,67 and this is for negative inputs and 1 for positive ones. The RELU is relatively
shown in the second frame in Fig. 2. The role of the backpropaga- robust toward the vanishing/exploding gradient problem, and for
tion is to define the amounts by which w1 and w2 have to be adjusted, DNNs, the initial values of the weights can be set to particular values,
the adjustments being weighted by the sensitivity. Finding the opti- depending on the non-linear activation function used, as defined by
mal configuration is an iterative process of making small changes Glorot and Bengio.22
in w1 and w2 and updating the sensitivity estimates until J cannot
decrease further. Small weight changes Δw1 and Δw2 are imposed at
each iteration in order to perform a step in the downhill direction
of the sensitivity curve. This approach is referred to as the “gradient III. DIMENSIONALITY REDUCTION WITH
descent” method. Weight increments are proportional to the sen- AUTO-ENCODER
sitivity through the constant multiplier lr, called the “learning rate.” An AE is a neural-network architecture that has three parts:
This constant has to be chosen with care. If it is too large, the descent an encoder, a bottleneck (latent space), and a decoder. An AE is an

11 December 2023 21:54:35


learning process will be unstable, while an excessively small value unsupervised ML algorithm that is trained to reconstruct its inputs,
will result in the learning process taking an unreasonably amount of considered high-dimensional data. As a consequence, the algorithm
time to converge. forces the hidden layer to try to learn good compressed representa-
Updating weights by backpropagating gradients of loss defined tions of the inputs. A conceptual representation of the AE training
from the cost function can be an unstable process, and two well- process and examples of applications are conveyed in Fig. 3. By
known problems altering the learning process are “vanishing gra- reducing the dimensionality of the data and conserving only the
dients” and “exploding gradients.” Vanishing gradients was a sig- most essential information, AE technology offers, in the context
nificant problem when sigmoid and tanh were the only activation of fluid dynamics, a new path for deriving a representation of the
functions available, and only shallow DNNs could be trained. As flow in question that is more compact and efficient than the native
previously explained and illustrated in Fig. 2, derivatives of activa- data sets in the original description. Unlike Principal Component
tion functions must be computed for propagating information in the Analysis (PCA, closely connected to POD), which is intrinsically a
network. If the values of the latter are inferior to 1, gradients tend to linear regression method, an AE is capable of modeling complex

FIG. 3. Conceptual representation of


auto-encoder training and objectives.

Phys. Fluids 32, 067103 (2020); doi: 10.1063/5.0012906 32, 067103-5


Published under license by AIP Publishing
Physics of Fluids ARTICLE scitation.org/journal/phf

non-linear functions and consequently is more effective for describ- POD is intrinsically a linear regression method, and it essen-
ing the underlying manifold structure of the data.3,5 The AE can tially learns a linear transformation that projects the data into
be regarded as a generalization of the POD54 or Dynamic Mode another space, where vectors of projections are defined by the
Decomposition (DMD).51,66 The automated extraction of flow fea- variance of the data. The AE methodology is not subject to such
tures by unsupervised learning algorithms,35 such as the AE com- linearity constraints, as data dimensionality is reduced by stack-
bined with other ML algorithms—for instance, clustering methods ing multiple non-linear transformations. The AE is capable of
(to be discussed later)—offers innovative methods for flow modeling modeling complex non-linear functions and consequently is more
and control using low-dimensional dynamical models. effective for describing the underlying manifold structure of the
data.3,5
The manner in which a single-layer AE operates on input data
A. Auto-encoder technology: Principles and purpose
to give an output that optimally represents the input by way of a
The AE’s principles were first introduced in the ML commu- low-order representation of the input data is given by Eq. (2) and
nity more than 30 years ago.5,34 However, it is only the development illustrated in Fig. 4. For maximizing the likeness between input and
of new powerful computational hardware and Python libraries that reconstruction, the AE has to adjust its weights and biases. If h and
have enabled its widespread utilization. As previously mentioned, g are linear activation functions and no bias is used, then the output
an AE is a multi-layered neural network, composed of an encoder is y = WW ′ x [see also Eq. (3)]. Plaut54 showed that the AE learns to
and a decoder, separated by a “bottleneck.” As a first step, the input span the same subspace as POD, where y = WW T x, if the mean-
data are compressed into a lower-dimensional space by the encoder. squared error defined the loss function (J = (x − y)2 ). By using
Then, the decoder reconstitutes the original input by using the com- non-linear activation functions, non-linear trends in the data set
pressed information embedded in the “latent space,” as illustrated can be identified. Hence, the AE is capable of modeling complex
in Fig. 3. If some structure or order exists in the input data, in non-linear functions, and it can be seen as a generalization of the
whatever sense (e.g., correlations between features), this structure POD.3,5,12,45,54
is “learned” and consequently leveraged when the input is forced To illustrate the difference between PCA and ANN with non-
through the AE’s bottleneck. By reducing the latent-space dimen- linear activation functions, an imaginary set of data is prescribed
sion, the AE has to exploit the natural data structure for identi- in Fig. 5. A set of points are organized, broadly, along a curved
fying the most efficient representation, in order to compress the path with two lines of different slopes in quadrants II and IV,

11 December 2023 21:54:35


data without any loss of important information. The size of the respectively, characterizing the behavior of the data. The objective
bottleneck constrains the amount of information allowed to tra- is to represent the data as best as possible with minimum effort.
verse the full network by imposing a bottleneck dimension (much) The first POD mode Φ1 captures the maximum variance. In this
smaller than the dimension required to express the original data. two-dimensional example, the first mode is represented by the red
As the decoder has imperfect information, following the compres- line, and the second mode—the blue line—is perpendicular to it.
sion, penalizing the AE according to the reconstruction error, the Although a large proportion of the information is captured by the
encoder and decoder are forced to cooperate to identify the essen- first mode, the second mode is still required for adequate data repre-
tial attributes of the input data and determine how to best recon- sentation. The first mode represents the average of the left and right
struct the original input from the “encoded” state. The AE thus “dispersion branches,” revealing the inability of the POD to describe
aims to learn and describe latent attributes of the input data, and the non-linear trend, and making the interpretation of this mode
dimensionality reduction can be considered an information-filtering contentious. While a neural network attempts to discover a lower-
process. dimensional hyperplane, which describes the original data, the AE is
able to learn non-linear manifolds (a manifold is defined, in simple
B. AE as a non-linear generalization of the proper terms, as a continuous, non-intersecting surface).
orthogonal decomposition The compression performance operated by an AE can be dras-
tically increased with respect to POD by using non-linear activa-
Proper Orthogonal Decomposition (POD) is a conventional tion functions with bias, and by increasing the number of layers
method, which is most frequently used in fluid mechanics to analyze (depth). However, the more elaborate the AE becomes, the harder
the modal structure of flow fields. POD was introduced by Lumley42 it is to extract the modes, and ultimately to interpret them. This
and is also known as the Principal Component Analysis (PCA) pre-
viously defined by Pearson.53 It is interesting to contrast the basic
properties of POD and AE in order to clarify the differences in
performance.
By its very nature and construction, POD modes are orthog-
onal to one another. The original field is thus reconstructed by a
combination of linearly uncorrelated modes. The projection basis is
defined in such a way that most of the variance (energy) is embed-
ded in the first mode, and additional contributions decrease within
the following modes. Thus, most of the signal energy is captured by
the first few modes, unless the fluctuation field is populated with a
wide spectrum of energetic scales, as is the case in highly turbulent
FIG. 4. Data transformation process operated by a vanilla AE.
flows at a high Reynolds number.

Phys. Fluids 32, 067103 (2020); doi: 10.1063/5.0012906 32, 067103-6


Published under license by AIP Publishing
Physics of Fluids ARTICLE scitation.org/journal/phf

FIG. 5. Conceptual representation of the


projection basis using the PCA method
against a neural network.

means that, by adding layers with a large number of neurons or fil- scales, the CNN can thus be applied to the identification of patterns
ters with non-linear activation functions, the potency of the AE for featuring in such flows.10,38
function approximation can be efficiently increased, but the nega-
tive aspect is that interpretability of the data structure in the latent D. Autoencoder cost function: Measure of the
space decreases. Even if the dynamics in the latent space mimics “likeness”
the fluid dynamics in the original space, data in the latent space
The primary goal of most ML algorithms is to construct a
cannot be directly easily related to the physics of the flow. Nev-
model able to map input variables by identifying their features to
ertheless, the motivation of using the AE with a sufficiently high-
output variables, referred to as a target. Given a training set, the
level of complexity is the opportunity to work within a space of low
objective is to learn a function f, such that f (x) is a “good” pre-
dimension.
dictor for the corresponding value of y. Whether the predictor is
considered “good” or “bad” is subjective, and it is tied to the prob-
C. Definition of a convolutional auto-encoder (CAE) lem hypothesis. In fact, a single data set can be used for different
In the early stages of the development of computer vision, neu- purposes, and the cost function measuring the model accuracy dur-
ral networks were composed of stacked layers of perceptrons, fully ing the training process must be adapted to the objectives in order

11 December 2023 21:54:35


inter-connected to one another. Aside from the vast numbers of for the ML algorithm to identify which features are relevant to the
parameters to be determined and the risk of learning information objective.
not connected to relevant input data, such as noise (over-fitting), As the AE objective is to reproduce the input data, the cost
the main disadvantage of this technique is that input images or function must provide a good measure of the “likeness” between
volume data are flattened, and consequently, shared information the input data and their reconstruction. One option is to use the
in small neighborhoods cannot be exploited. Convolutional Neural mean-squared error, as for the PCA. However, by minimizing the
Networks (CNNs) are much more sophisticated techniques,24,32,36 cost function J = (y − x)2 (averaged over the total number of obser-
and they have proven extremely effective in areas such as image vations, or data points, by batch), the AE will first learn features
recognition and classification. They comprise convolution layers responsible for large fluctuations. This approach may be adequate
with activation functions and pooling layers. The primary purpose for most cases. However, as the fluid-dynamic system is non-linear,
of convolutional layers is to extract “features” from the input image. large-amplitude phenomena may have their roots in weak phenom-
Convolution preserves the spatial relationship between pixels by ena. Hence, a cost function that is not weighted by the amplitude
learning image features using short lines, or small squares or vol- fluctuation should be more appropriate. Against this background,
umes of input data. By considering small portions and using the the similarity between the output y and the target value x is quanti-
shared information between the different points within these areas, fied by the parameter α given by the following equation (values being
the CNN learns to recognize patterns. To do this, small matrices, centered, i.e., xi = xi′ + xi with xi′ = 0):
called “filters” or “kernels” in CNN terminology, are spanned across
2x′ y′
the whole input, identifying and matching patterns within the whole
input domain. They act as feature detectors from the original input α= . (4)
image. Matrices obtained by sliding filters over the data and com- (x′ )2 + (y′ )2
puting the dot-product (convolution) are often referred to as “acti-
vation maps” or “feature maps.” The role of convolutional layers is The profile of α against y′ /x′ is shown in Fig. 6(a). Possible val-
to identify patterns by tuning the filters and the bias, while the role of ues taken by α span within the limit −1 to 1, obtained when y′ = −x′
pooling layers is to select only the most important of them by reduc- and y′ = x′ , respectively. The mean value of α is qualitatively in a
ing the dimensionality of each feature map (pooling is also referred x ′ y′
manner akin to the correlation coefficient γ = √ √ . How-
to as sub-sampling or down-sampling). (x )2 (y′ )2

It can be argued that the CNN pertains to image recognition ever, while the latter measures the linear link between fluctuations,
and cannot be related to, or exploited for, fluid dynamics. However, α is more restrictive, as the magnitude is also taken into considera-
fluid flows, even if turbulent and thus chaotic, contain influential tion. For example, if y is linearly correlated with x by the function y′


coherent structures and feature certain aspects of order and orga- = ax , then γ2 is equal to 1 whatever the value of a, while α equals
nization. By defining “libraries” of filters able to describe coherent 1 if and only if a = 1; thus, y′ = x′ .

Phys. Fluids 32, 067103 (2020); doi: 10.1063/5.0012906 32, 067103-7


Published under license by AIP Publishing
Physics of Fluids ARTICLE scitation.org/journal/phf

In order to effectively use an algorithm that is based on the gra-


dient descent method during the training process for converging to
the solution y′ = x′ , a function j may be derived, using α as follows:
j(α, ε) = α+11+ε − 1+ε
1
. Figure 6(b) shows that j tends to 0 as y′ tends
to x′ . The upper limit of j = 1 can be imposed by using ε = 0.618. As
2

mentioned in Sec. II, during the backpropagation process, the values


of the weight increments are proportional to the sensitivity. The cost
function J is defined such that the gradient can be increased during
the training process, namely,

1 1
J= − . (5)
[ α+1
2
] 2n + ε 1 + ε

Profiles of J for different values of n are shown in Fig. 6(c).


For the present study, there is virtually no difference between fields
reconstructed by an AE trained using this cost function or if the
mean-squared error is used instead. This finding is not surprising, as
the loss of information is almost zero. The cost function introduced
is designed for flows populated by a broad spectrum of eddy sizes;
the advantage of using this method over the standard MSE requires
further investigations.

IV. APPLICATION OF AE TO A TWO-DIMENSIONAL


FLOW AROUND A CYLINDER
A. AE architecture and parameters

11 December 2023 21:54:35


A simulated 2D unsteady flow around a cylinder is used here
as a test case. The flow was computed by the lattice Boltzmann
technique, using a Python code.50 The simulation was performed at
Re = 220, with a two-dimensional nine-velocity square lattice
(D2Q9) implementation of the code. The computational domain is
covered by a mesh of 256 × 88 nodes in the streamwise and spanwise
directions, respectively. A snapshot of the streamwise velocity field
is conveyed in Fig. 8(a).
A summary of the AE architecture and parameters used for the
current study is shown, schematically, in Fig. 7. As the flow struc-
tures are coherent in space and time, volumes of 12 stacked snap-
shots are used as an input tensor, corresponding approximately to
one period of vortex shedding. Both the encoder and the decoder
are built using three 3D convolutional layers [with filter dimensions
given in Fig. 7(b)]. The initialization of the distribution of weights
for each layer relies on a uniform distribution proposed by Glorot
and Bangio.22 By using 3D convolutional layers, the CNN exploits
both spatial and temporal correlations. As shown in Fig. 7, the size of
the input is successively reduced when traversing through the differ-
ent layers of the encoder, and then the input is reflated to its original
size by the decoder. The information propagates from the encoder to
the decoder through a layer composed by three-perceptrons forming
3
the bottleneck. Only 12×256×88 × 100 ≈ 0.001% of the original infor-
mation is conserved in the latent space, and only this information is
used to reconstruct the input.
The ability of the AE to perform non-linear regression is con-
veyed by the “Exponential Linear Unit (ELU)” as the activation func-
FIG. 6. Construction of the cost function J used to quantify the reconstruction error tion for all layers, except for the final layer in which the sigmoid
2x′ y′
between x and y, Eq. (5), where ε = 0.618. Profiles of (a) α = (x′ )2 +(y′ )2 , (b) function is preferred. The ELU is very similar to the well-known
j= 1
α+1

− 1
1+ε
, and (c) J = 1
[ α+1 ]2 +ε
n − 1
1+ε
. RELU function.48 In fact, for non-negative inputs, both are identi-
2 2
cal. The main difference between the two functions is the way the

Phys. Fluids 32, 067103 (2020); doi: 10.1063/5.0012906 32, 067103-8


Published under license by AIP Publishing
Physics of Fluids ARTICLE scitation.org/journal/phf

11 December 2023 21:54:35


FIG. 7. AE architecture and parameters: (a) table and (b) graphical representation.

negative inputs are processed: the ELU varies slowly until its output A subset that is composed of 128 snapshots (corresponding approx-
equals a negative constant, whereas the RELU becomes sharply null. imately to ten shedding cycles) is used for the AE training. This
In the case of ELU, the neurons never contain zero ordered gradient number has been chosen to take full advantage of the TPU hard-
even for negative inputs and the training can continue, whatever the ware, developed by Google. All codes used to obtain the results in
sign of the input values, unlike ANNs using the RELU. This modifi- this paper were written using the COLAB platform with version 14
cation allows the model using the ELU to give more accurate results of the Tensorflow library.1
with faster convergence.13 Once the training has been performed, a volume of streamwise
velocity fields is randomly chosen from the validation subset and fed
into the AE. One of the 12 snapshots is shown in Fig. 8(a). The corre-
B. Autoencoder: Learning process and reconstruction sponding reconstructed snapshot from the AE is shown in Fig. 8(b).
The database used for the present study contained around 43 For measuring the reconstruction error, the parameter ϕ given by
shedding periods, which are sampled over 500 temporal snapshots. the following equation is introduced:

Phys. Fluids 32, 067103 (2020); doi: 10.1063/5.0012906 32, 067103-9


Published under license by AIP Publishing
Physics of Fluids ARTICLE scitation.org/journal/phf

11 December 2023 21:54:35

FIG. 8. Snapshot of the streamwise velocity field for a 2D flow around a cylinder: (a) original field; (b) and (c) reconstructed snapshots from the AE and three POD modes,
respectively; (d) and (e) reconstruction error defined by ϕ2 [Eq. (6)]; (f) probability-weighted distribution of ϕ2 .


n n
(Urec (x, z) − Uori (x, z))2 Fig. 8(d). A visual comparison of the input image with the recon-
ϕn (x, z) = 100 × n , (6) structed one demonstrates the high-fidelity of the reconstruction
Uori (x, z)
process, with errors occurring mainly in the region just behind the
where U ori and U rec are the original and reconstructed fields, respec- cylinder. The error margin increases up to ϕ2 = 15. However, the
tively. The reconstruction error defined by ϕ2 (x, z) is conveyed in probability-weighted distribution of ϕ2 , plotted in blue in Fig. 8(f),

Phys. Fluids 32, 067103 (2020); doi: 10.1063/5.0012906 32, 067103-10


Published under license by AIP Publishing
Physics of Fluids ARTICLE scitation.org/journal/phf

low-dimensional representation. In what follows next, the AE recon-


struction is contrasted with that achieved with Proper Orthogo-
nal Decomposition (POD). The application of this method results
in a modal decomposition of the input data, with modes ordered
according to their energy content. Once the POD modes have been
obtained, these can be recombined to give an approximate repre-
sentation of the original flow. The more the modes used for the
reconstruction, the lower the reconstruction error. Hence, the effi-
ciency of the POD may be compared to that of the corresponding
reconstruction obtained with the AE. For the same flow analyzed
above with the AE, the streamwise velocity field reconstructed from
the modes (eigenvectors) and the associated error fields are shown in
FIG. 9. Temporal fluctuations of the space-averaged reconstruction errors ⟨ϕ2 ⟩x,z Figs. 8(c) and 8(e), respectively. As the POD’s temporal coefficients
[Eq. (6)]: blue line, POD reconstruction (three first modes); red and green lines, AE describe the fluid dynamics, and because the AE with a bottleneck
reconstructions from training and validation subsets. dimension of 3 yields only three temporal coefficients, only three
POD modes will be used for the reconstruction. The principal fea-
tures of the original field, mainly associated with the large vortices
being shed, are evidently reproduced by the POD. However, the
reveals that the error magnitudes are mainly below 4, the mean error is much larger than the one obtained with the AE, especially
value of ϕ2 (x, z) can be estimated by integrating the curve (⟨ϕ2 ⟩x,z behind the cylinder where the structures, although being relatively
= ∫ ϕ2 Pdf (ϕ2 )dϕ2 ), and it is in the vicinity of 0.7. The information small, may play an essential role in the dynamics of the shedding
loss is, therefore, insignificant. This result is impressive, especially process. In Fig. 8(f), the probability-weighted distribution function
since the input data have been compressed by more than 99.998%; of ϕ2 for the POD is given by the blue line, and the error level is
this demonstrates the ability of the AE to extract the most impor- seen to be much greater than the maximum level obtained with the
tant features of the dynamical system and to provide an extremely AE reconstruction. This set of figures thus brings into focus how the

11 December 2023 21:54:35

FIG. 10. Identification process of a low-dimensional dynamical model by using the AE combined with the clustering algorithm.

Phys. Fluids 32, 067103 (2020); doi: 10.1063/5.0012906 32, 067103-11


Published under license by AIP Publishing
Physics of Fluids ARTICLE scitation.org/journal/phf

POD reconstruction is outperformed by the AE. Most importantly,


the AE can extract the large-scale and small-scale features, despite
the massive compression in the latent space.
The temporal evolution of the spatial average of the error ϕ2
obtained with the AE (red and green lines) and the POD (blue line)
is plotted in Fig. 9. The red portion of line for the AE corresponds
to the training data. The green line represents the reconstruction
errors obtained when a random field is fed into the AE after the
learning process. In both cases, the error margins are low and very
similar in level. This result is important, as it signifies that the AE
has not learned undue details and noise only present in the train-
ing subset, i.e., there is no “overfitting”—a condition that would be
indicated by a lower error in the reconstruction of the training data.
In comparison with the AE result, the average and the variance of
the reconstruction error for the POD are almost ten times higher,
thus demonstrating the superiority of the AE for finding an optimal
representation.

V. FROM FLOW FEATURES TO LOW-DIMENSIONAL


DYNAMICAL MODELS
Fluid-mechanics problems of engineering interest are gener-
ally characterized by dynamical systems of very high dimension
and strong non-linearity. Approximating the flow dynamics by low-
dimensional models is then extremely challenging in most practi-
cally relevant cases. Whatever the approach used for modeling the
flow, the first step is to determine which features are the most impor-

11 December 2023 21:54:35


tant and have to be conserved for predicting the flow. The POD,
or its variants such as spectral POD, offers a path for discover-
ing features based on the energy distribution. Indeed, as previously
mentioned, the POD identifies the best-fitting hyperplanes though
the input variables using linear regression, POD modes are by con-
struction orthogonal. The AE offers more “flexibility” for extract-
ing the most important features relevant to the dynamics, as the
cost function can be adapted to the problem. The AE also offers a
superior route for modeling non-linear dynamic systems as it can
identify non-linear relationships. By leveraging the AE capacity to
identify and represent in a compact manner the important flow fea-
tures, the objective of this section is to determine a probabilistic
low-dimensional dynamical system. In the present study, the dimen-
sionality reduction is performed in five steps (see the illustration in
Fig. 10):
● In the first step, a low-dimensional projection subspace is
identified by using the AE.
● In the second step, this low-dimensional representation is
exploited to efficiently cluster the data set by similarity
principles.
● In the third step, a dynamical model describing the transi-
tion probabilities from one cluster to another is derived.
● Finally, in steps 4 and 5, the decoder layer is used to deter-
mine a low-dimensional dynamical system in the original
data space.
These approaches are detailed in the following paragraphs.

A. Low-dimensionality projection using AE FIG. 11. (a)–(c) Plots showing a variable in the latent space from different perspec-
tives. Each input volume is mapped to a single black dot point. The first 11 data
The requirement that the latent-space dimensions should be
volumes are represented by the colored dots, ranging from blue to red.
substantially lower than the input tensor implies that the encoder

Phys. Fluids 32, 067103 (2020); doi: 10.1063/5.0012906 32, 067103-12


Published under license by AIP Publishing
Physics of Fluids ARTICLE scitation.org/journal/phf

FIG. 12. Conceptual representation of clustering methods


based on “distance” against “connectivity.”

and decoder must be able to identify and keep the most meaningful
features. As the flow structures are coherent in space and time, a vol-
ume of 12 stacked snapshots is used as an input tensor, 12 snapshots
corresponding to the vortex-shedding period. The encoder converts
the input volume of 12 × 256 × 88 points to encoded features embed-
ded within a three-component vector. Each data-volume input is
thus reduced to a single point in the 3D latent-space representa-
tion. The projection of the volumes is conveyed by the black dots
in Fig. 11, and the first 11 volumes are represented by the colored
dots, sequentially arranged from blue to red. The encoded features
form a loop, where a revolution is equivalent to a shedding cycle.
Although the physical flow features cannot be interpreted directly
from the components in the latent space, the compact representa-
tion nevertheless reflects a spatio-temporal organization. The low-
dimensionality of the latent space can be leveraged for represent-
ing the underlying temporal dynamics by processing elements or
groups of elements, which will then be used by the decoder to yield
a physical representation of the original field.

11 December 2023 21:54:35


B. Spectral clustering: Grouping similar events
One path to reducing the problem dimensionality is to group
similar elements or events, and to approximate them by a “mean”
representation. This approach raises several questions: Which flow
features are the most relevant for describing the fluid dynamics?
How can these features be related to a criterion upon which the simi-
larity is based? How can similarity be measured? How should a mean
representation be determined? There are no unambiguous answers
to these questions, and it is only by an iterative process, based on
knowledge and discovery, that answers arise.
Several methods exist for grouping elements (i.e., “cluster-
ing”). They can be arranged, broadly, in two groups: (i) based on
their “distance” to neighbors—the “K-means” clustering method
being probably the most popular algorithm, and (ii) based on their
connectivity—in which case points widely separated may lie in the
same cluster and points even extremely close can be split into differ-
ent clusters. Spectral clustering is one method, among others, based
on the principle of connectivity. To increase clarity, one may con-
sider points distributed in such a way that they form two parallel data
segments (Fig. 12), with all data being clustered in two ways, each in
one of the two above categories. With the first method, based on the
distance, points from both lines are contained in each cluster; their
inclusion is in cluster 1 or cluster 2 depending upon the position of
the points relative to the centers of the segments: points on the left
are in cluster 1 and points on the right are in cluster 2. In the sec-
ond method, based on the principle of connectivity between points,
items that share a common attribute—in this case, position within
an identifiable coherent region, belong to the same cluster. FIG. 13. Spectral clustering of data projection into the latent space. Data are dis-
tributed into eight different clusters. The three plots (a)–(c) show the same latent
The most significant flow features are automatically selected
variables from different perspectives.
by the AE. The encoder layer converts the input data to encoded

Phys. Fluids 32, 067103 (2020); doi: 10.1063/5.0012906 32, 067103-13


Published under license by AIP Publishing
Physics of Fluids ARTICLE scitation.org/journal/phf

features embedded within an optimal subspace. By combining this cluster is associated with a flow state, forming the “skeleton” of the
low-dimensional representation of the data and the spectral cluster- flow-dynamical system. This system can then be represented as a
ing algorithm, a coarse-grained description of the data is defined. probabilistic Markov chain, corresponding to a stochastic model,
Results from spectral clustering are shown in Fig. 13, each cluster which describes a sequence of possible events in which the probabil-
being represented by a different color. The number of clusters is ity of each event depends only on the state attained in the previous
constrained to be 8, and as shown in the histogram in Fig. 14(a), event, following the cluster-based reduced-order modeling (CROM)
the events are equally distributed over all the states (clusters).

C. Low-dimensional dynamical model construction


By exploiting the ability of the AE to represent the most rele-
vant flow-dynamic features in a low-dimensional space, in combina-
tion with a spectral clustering algorithm for regrouping events based
on their connectivity, a low-dimensional system—its dimension
being fixed by the number of clusters, can be determined. Each

11 December 2023 21:54:35

FIG. 14. Probabilistic clustered dynamical model. Probability values are estimated
in percentage terms and given by the integers. (a) Probabilities for the flow to FIG. 15. Low-dimensional dynamical model: (a) map of migration probabilities
stay in its current configuration (same cluster) and (b) probabilities for the flow to between modes and (b) modes reconstructed from clusters by the decoder.
navigate from one state to another. Integers in figure (a) correspond to modes shown in figure (b).

Phys. Fluids 32, 067103 (2020); doi: 10.1063/5.0012906 32, 067103-14


Published under license by AIP Publishing
Physics of Fluids ARTICLE scitation.org/journal/phf

framework introduced in Kaiser et al.29 Thus, it is necessary to deter- used to make “conventional,” deterministic, predictions of the flow
mine the probability of a flow configuration remaining in its present evolution, based on a preceding learning process.
state, Fig. 14(a), and the probability of the flow migrating toward Intuitively, one approach is to use as input a snapshot at a spe-
another state, the map of the motions between clusters being shown cific time step and “ask” the AE to construct the following time-step
in Fig. 14(b). snapshot, using the latter as output—an approach similar to the
The flow configuration used in this study is relatively simple, dynamic mode decomposition. In fact, if the activation functions
and the low-dimensional dynamical system is depicted in Fig. 15. are linear and MSE is used as the cost function, the AE is a linear
The skeleton of the low-dimensional dynamical system is obtained operator, which should be similar to the DMD.66 However, the AE
using the decoder for reconstructing elements from clusters in the is also capable of performing non-linear transformation if the activa-
latent space and is shown in Fig. 15(b). The dynamics of the sys- tion functions are non-linear. Therefore, the “time-lagged AE” can
tem are illustrated by the orientation and thickness of the arrows in be seen as the generalization of the DMD.
Fig. 15(a). Another possible approach, illustrated in Fig. 16, is to per-
In this section, it was shown that a model, making probabilis- form the prediction in the latent space, taking advantage of the low
tic prediction possible, can be derived by clustering elements from dimensionality, in which case only the dynamics of the most relevant
the latent space obtained by the AE. This approach can be useful features are conserved. Modeling the evolution of latent variables is
for providing insight into which control strategy to adopt, by either easier and more cost-effective, as with this approach, the evaluation
promoting some modes rather than others or undermining modes of new steps scales with the size of the low-dimensional represen-
leading the flow to the disadvantageous configuration (such as drag tation and not with the size of the full-dimensional data. The first
increase47 ). step is to train the AE to learn how to extract and compact the most
valuable information associated with the flow dynamics in the latent
space. Once this task is completed, the following step is to predict the
temporal evolution of the low-dimensional system by using a Con-
VI. PREDICTION OF TEMPORAL EVOLUTION: volutional Neural Network (CNN) in the latent space, which aims
DETERMINISTIC APPROACH
to extract the temporal evolution. As is indicated in Fig. 16, once
In Sec. V, a low-dimensional model of the flow dynamical sys- the training phase is completed, the combination of the AE with the
tem was obtained using an AE in combination with spectral clus- CNN can be used to predict the flow evolution from an initial con-
tering. Importantly, the flow prediction is probabilistic—providing

11 December 2023 21:54:35


dition. A data volume at t r represents the initial condition, and this
an approximation of the flow state. However, the AE can also be is fed into the AE. The second CNN uses its projection in the latent

FIG. 16. Illustration of the temporal prediction process using the AE: upper row, training; lower row, prediction.

Phys. Fluids 32, 067103 (2020); doi: 10.1063/5.0012906 32, 067103-15


Published under license by AIP Publishing
Physics of Fluids ARTICLE scitation.org/journal/phf

space to predict the following time step (at t r +1 ). The vector obtained
is fed into the CNN in order to predict the time step t r +2 , and this is
repeated within a closed loop. The vectors obtained are decoded “on
the fly” for constructing the time-lagged data volumes. The CNN
used comprises five convolutional layers with mono-dimensional fil-
ters, the input being time-series data covering 12 time steps in the
latent space.
One of the most useful ways to uncover the structure in high-
dimensional data is to project it down to a subspace, such as a 2D
plane, where hidden features may become visible. Of course, the
challenge is to find the plane that best illuminates the data structure.
In the present test application, the above process is imple-
mented in order to predict up to 20 shedding cycles. The low-
dimensional data predicted in the latent space are compared with the
original encoded data in Fig. 17. The left-hand side plots show the
time evolution for the three latent space dimensions. The predicted
variations (blue lines) virtually collapse onto the variations result-
ing from feeding the actual fields into the latent space (red lines).
However, closer examination reveals differences in the peaks of the
latent-space vector and also reveals a lag between the prediction and
the actual values [see Fig. 17(b)]. In the phase plot in Fig. 17(d), every
projection of the original event over the 20 cycles is represented by
a black dot, forming a near-continuous loop. Prediction results are
compared to original projection at four selected time steps: ◯ t/T 0
≈ 4, △ t/T 0 ≈ 7, ◽ t/T 0 ≈ 15, and ◇ t/T 0 ≈ 19, with T 0 being the period
for one shedding cycle. At t/T 0 ≈ 4, i.e., after four vortex-shedding

11 December 2023 21:54:35


periods, the predicted and the encoded data are indicated by the blue
and red symbols, respectively, showing that the lag between the pre-
dicted and the actual set of latent-space values increases. However,
the figure shows that, notwithstanding the lag, the predicted points
are still part of “the correct orbit,” implying that the predicted flow
field should essentially be accurate, except for a slight lag.
The accuracy of the prediction is demonstrated in Fig. 18,
which compares the predicted with the actual flow fields at four dif-
ferent time steps. For each of these values, the predicted velocity field
is reconstructed by the AE/CNN, and a visual comparison with the
original field highlights the prediction accuracy. As seen, even after
19 cycles, the differences between original and predicted fields are
visually insignificant.
Using Eq. (6), the prediction error ϕ2 [Eq. (6)] can be quanti-
fied, and this is represented by the blue line in Fig. 19(a), in com-
parison with the AE-reconstruction from Fig. 9 (red line). Over the
first predicted cycle, the error is contained within the range between
9% and 16%, the mean value being around 12%. The error mar-
gin steadily increases with the number of cycles, the error level
varying between 10% and 50% during the cycle. This increase in
error appears to be at odds with the visually insignificant differ-
ences between the flow fields in Fig. 18, but this is a consequence
of the increasing lag indicated in Fig. 17. In order to evaluate the
effect of the time lag on the spatial prediction accuracy, the error in
Fig. 19(b) is estimated between the predicted field at t/T 0 ≈ 16.71
and time-lagged fields; a massive drop from 50% to less than 10%
is observed when the original field is shifted to the earlier time
step.
The use of the AE, for extracting and compacting the informa- FIG. 17. Comparison between the original data (red) and the AE prediction (blue)
tion to a low-dimensional representation, in combination with the within the embedded space; (a)–(c) temporal evolution along the three latent com-
ponents; (d) projection in the 3D latent space at four different time steps t/T 0 : ◯
CNN for the temporal prediction in the latent space, can be claimed ≈ 4, △ ≈ 7, ◽ ≈ 15, and ◇ ≈ 19.
to provide an effective method for predicting the flow. However,

Phys. Fluids 32, 067103 (2020); doi: 10.1063/5.0012906 32, 067103-16


Published under license by AIP Publishing
Physics of Fluids ARTICLE scitation.org/journal/phf

FIG. 18. Velocity fields predicted by the


AE combined with a CNN in the latent
space alongside of the original plots after
(a) 4 cycles, (b) 7.6 cycles, (c) 14.8
cycles, and (d) 19.3 cycles.

11 December 2023 21:54:35

while the general spatial features are well predicted, the precise tem- a smooth manifold in the high-dimensional input space. By identi-
poral evolution suffers from the phase lag discussed above. The accu- fying dependencies between data in a high-dimensional space, man-
racy of the CNN can probably be improved by tuning its architecture ifolds on which data live on are learned by the AE and then lever-
and the training parameters. aged for computing a low-dimensional embedding of the underlying
To conclude, the implicit assumption behind dimensional- manifold. Input data are represented by the AE in a more com-
ity reduction is the existence of relationships between features of pact coordinate system in which manifolds are “unfolded.” As only
high-dimensional data, i.e., data points lying on, or being close to, essential information is conserved and represented in lower-space

Phys. Fluids 32, 067103 (2020); doi: 10.1063/5.0012906 32, 067103-17


Published under license by AIP Publishing
Physics of Fluids ARTICLE scitation.org/journal/phf

FIG. 20. Illustration of the process of sparse reconstruction using the AE and
CNN.

present application, this approach is exemplified by data collected


with sensors at three locations relatively far behind the cylinder,
as shown in Fig. 21. The first step is a two-stage training process.
In the first stage, the AE learns to compress information in a low-
FIG. 19. Measure of the error between the predicted snapshot and the ground- dimensional space, and in the second phase, the CNN learns how

11 December 2023 21:54:35


truth field: (a) time evolution of the prediction error and (b) error between the to predict the latent-space vector by taking the sensed data as input.
reconstructed field at t/T 0 ≈ 16.71 and time-shifted snapshots. Once both the AE and the CNN are trained, data collected by sen-
sors can be fed into the CNN and converted to data “intelligible” by
the decoder, which can then reconstruct the full-volume data.
Figure 22(a) shows three velocity fields: the left-hand side plot
dimensions, the dynamics of the latent variables can be more easily represents the raw field taken for the last time step available, i.e.,
discovered and then leveraged to cost-effectively predict the flow. In more than 30 cycles after the last training-set snapshot, the field in
the present work, the CNN provides the evolution of the dynamical the middle is obtained by feeding the raw field into the AE only, and
system in the low-dimensional coordinate system. However, other the right-hand side plot shows the velocity field created by the CNN
data-driven methods can also be used, for example, long short-term and the decoder from the sensor measurements. The reconstructed
memory networks27 (LSTMs), which are capable of learning long- field is seen to be close to the raw field, demonstrating the CNN’s
term dependencies, connecting previous time-delayed information capacity to provide accurate prediction in the latent space.
to the current time step. Several recent works23,30,44 combining the The error Φ2 between the raw field and the sparse reconstruc-
AE with LSTMs prove to be an excellent approach for flow predic- tion is indicated by the blue line in Fig. 22(b). The average error
tion. By combining an AE with another data-driven method called is below 1%, which is comparable to the error obtained with the
SINDy, the method introduced by Brunton et al.,8 Champion et al.11 AE-reconstructed field in the training subset represented by the red
were able to unveil governing equations from raw data. line.
For the present periodic dynamical system, the AE may also
VII. SPARSE RECONSTRUCTION be an effective means for recovering high-frequency information.
Gathering high-resolution data, in space and time, requires a
significant amount of resource. In recent years, several methods have
been developed for reconstructing “high-quality” data from con-
taminated and/or under-sampled data. Particle Image Velocimetry
(PIV) is an example that illustrates resolution limitations when seek-
ing high spatio-temporal resolution. An interesting study carried
out by Rabault et al.56 shows how a simple ANN can carry out an
end-to-end PIV, and it could become a convincing alternative to the
traditional PIV algorithm.
The AE method offers a new way for reconstructing high-
resolution fields from data collected from a few sensors. The sparse- FIG. 21. Flow visualization with the location of sensors used for the sparse
reconstruction represented by the red points.
reconstruction process is shown schematically in Fig. 20. In the

Phys. Fluids 32, 067103 (2020); doi: 10.1063/5.0012906 32, 067103-18


Published under license by AIP Publishing
Physics of Fluids ARTICLE scitation.org/journal/phf

on many practically relevant flows. As a consequence, the num-


ber and the size of databases have experienced phenomenal growth,
and this abundance of data poses serious challenges to data min-
ing and post-processing that requires appropriate tools. This paper
has exposed some recent developments in machine learning that
are beginning to contribute thoughtfully to a solution of these
challenges.
Deep learning methodologies in fluid dynamics form a rapidly
growing field of research, the Auto-Encoder (AE) technology being
an important segment of the ML area. AE architectures are used to
compress and decompress high-dimensional data. By reducing the
dimensionality of the data and conserving only the most essential
information, AE systems offer a new path to deriving a representa-
tion of the flows that is more compact and efficient than the native
data sets in the original domains. Unlike the principal component
analysis algorithm, the AE is capable of modeling complex non-
linear functions and consequently is more effective for unveiling
manifold structures in the data. The automated extraction of flow
features by unsupervised learning algorithms, such as the AE com-
bined with clustering, offers innovative methods for flow modeling
and control using low-dimensional dynamical models.
The main messages of this study are as follows:

1. A probabilistic low-dimensional dynamical model can be built


by applying a clustering algorithm in the latent space, allowing
for a probabilistic flow prediction at low cost, which can also
provide insight into flow-control strategies to be implemented.

11 December 2023 21:54:35


2. An accurate prediction of the flow evolution can be performed
by using a second neural network in the latent space.
3. High-resolution spatio-temporal information can be retrieved,
using a second neural network as a transfer function from sen-
sors to corresponding data representation in the latent space,
and by accumulating snapshots.

By choosing a relatively simple test case, this study aims to show


FIG. 22. Visualization of the sparse-reconstruction accuracy: (a) velocity fields for
the vortex-shedding problem; from left to right: raw field, AE reconstruction, and that ML algorithms do not necessarily have to be “black boxes.”
AE/CNN sparse reconstruction; (b) error estimate between the reconstructed and They can be used for extracting the physical principles and struc-
original fields, defined by Φ2 [see Eq. (6)]; red line, field reconstructed with the AE; tures underlying the data to obtain interpretable models, as shown
blue line, sparse reconstruction with AE/CNN/sensors. by other studies undertaken in a broad range of scientific fields
(biomolecular dynamics,62 astronomy,28 economical and social sci-
ences,19 biology,12 and quantum mechanics58 among others). For
For the present flow configuration, the sampling frequency is more complex flows, the task of identifying interpretable and robust
around 11/T 0 , where T 0 represents one shedding cycle. The low- models is more challenging. This will require more sophisticated AE
dimensional representation in the latent space constitutes a loop architectures, such as a variational autoencoder, and more impor-
formed by the colored circles in Fig. 11, for which a full revolution tantly, ML algorithms will have to be physics-informed, for example,
is completed over one shedding cycle. As more and more volume by embedding the mass-conservation law (see Ref. 46) directly into
snapshots are projected into the latent space, points add up suc- the AE’s architecture. AEs are more challenging to implement and
cessively (represented by the tiny black dots in Fig. 11), reducing use than other data-driven methods (such as POD and DMD). How-
the gap between them for ultimately forming a continuous line; this ever, AEs combined with other ML algorithms hold great promise
gap can be associated with the time delay between two snapshots. for mapping high-dimensional strongly non-linear problems into
Thus, as the time gap between successive points in the latent space low-dimensional (approximately) linear ones. A low-dimensional
reduces, the high-frequency resolution of one shedding cycle can be dynamical system can then be leveraged to predict the flow and to
recovered by successively decoding all points forming the loop. design control laws more effectively.

VIII. CONCLUSIONS ACKNOWLEDGMENTS


Present-day experimental and numerical-simulation tech- The author would like to thank Michael Leschziner for prag-
niques provide an unprecedented volume of extremely detailed data matic and fruitful discussions and is deeply grateful for his support

Phys. Fluids 32, 067103 (2020); doi: 10.1063/5.0012906 32, 067103-19


Published under license by AIP Publishing
Physics of Fluids ARTICLE scitation.org/journal/phf

18
in allowing him to conduct this work under favorable working con- B. F. Farrell and P. J. Ioannou, “Stochastic forcing of the linearized Navier-
ditions. The author is also grateful for the insightful comments Stokes equations,” Phys. Fluids A 5(11), 2600–2609 (1993).
19
offered by Laurent Cordier, Nicholas Hutchins, and the anonymous I. Foster, R. Ghani, R. S. Jarmin, F. Kreuter, and J. Lane, Big Data and Social
peer reviewers. The generosity and expertise of one and all have Science: A Practical Guide to Methods and Tools (CRC Press, 2016).
20
P. Garnier, J. Viquerat, J. Rabault, A. Larcher, A. Kuhnle, and E. Hachem, “A
improved this study in innumerable ways.
review on deep reinforcement learning for fluid mechanics,” arXiv:1908.04127
(2019).
21
DATA AVAILABILITY N. Gautier, J.-L. Aider, T. Duriez, B. R. Noack, M. Segond, and M. Abel,
“Closed-loop separation control using machine learning,” J. Fluid Mech. 770,
Data and codes that support the findings of this study are avail- 442–457 (2015).
22
able on Github: github.com/LionelAgo/Vortex_AE. All codes used X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedfor-
to obtain the results in this paper were written using the COLAB ward neural networks,” in Proceedings of the Thirteenth International Conference
on Artificial Intelligence and Statistics (AISTATS, 2010), pp. 249–256.
platform with version 14 of the Tensorflow library.1 23
F. J. Gonzalez and M. Balajewicz, “Deep convolutional recurrent autoen-
coders for learning low-dimensional feature dynamics of fluid systems,”
REFERENCES arXiv:1808.01346 (2018).
24
1
I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning (MIT Press, 2016).
M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghe- 25
P. Hall and S. Sherwin, “Streamwise vortices in shear flows: Harbingers of
mawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. transition and the skeleton of coherent structures,” J. Fluid Mech. 661, 178–205
Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, (2010).
and X. Zheng, “Tensorflow: A system for large-scale machine learning,” in 12th 26
R. Hecht-Nielsen, “Kolmogorov’s mapping neural network existence theorem,”
USENIX Symposium on Operating Systems Design and Implementation (OSDI 16)
in Proceedings of the International Conference on Neural Networks (IEEE Press,
(USENIX Association, Savannah, GA, 2016), pp. 265–283.
2 New York, 1987), Vol. 3, pp. 11–14.
S. Bagheri, L. Brandt, and D. S. Henningson, “Input-output analysis, model 27
S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput.
reduction and control of the flat-plate boundary layer,” J. Fluid Mech. 620,
9(8), 1735–1780 (1997).
263–298 (2009). 28
3 S. Jamal and J. S. Bloom, “On neural architectures for astronomical time-series
P. Baldi and K. Hornik, “Neural networks and principal component analysis:
classification,” arXiv:2003.08618 (2020).
Learning from examples without local minima,” Neural Networks 2(1), 53–58 29
(1989). E. Kaiser, B. R. Noack, L. Cordier, A. Spohn, M. Segond, M. Abel, G. Daviller,
4 J. Östh, S. Krajnović, and R. K. Niven, “Cluster-based reduced-order modelling of
N. Benard, J. Pons-Prats, J. Periaux, G. Bugeda, P. Braud, J. P. Bonnet, and

11 December 2023 21:54:35


a mixing layer,” J. Fluid Mech. 754, 365–414 (2014).
E. Moreau, “Turbulent separated shear flow control by surface plasma actuator: 30
Experimental optimization by genetic algorithm approach,” Exp. Fluids 57(2), 22 R. King, O. Hennigh, A. Mohan, and M. Chertkov, “From deep to physics-
(2016). informed learning of turbulence: Diagnostics,” arXiv:1810.07785 (2018).
31
5
H. Bourlard and Y. Kamp, “Auto-association by multilayer perceptrons and A. N. Kolmogorov, “On the representation of continuous functions of many
singular value decomposition,” Biol. Cybernetics 59(4-5), 291–294 (1988). variables by superposition of continuous functions of one variable and addition,”
6
M. P. Brenner, J. D. Eldredge, and J. B. Freund, “Perspective on machine learning in Doklady Akademii Nauk (Russian Academy of Sciences, 1957), Vol. 114, pp.
for advancing fluid mechanics,” Phys. Rev. Fluids 4(10), 100501 (2019). 953–956.
32
7
S. L. Brunton, B. R. Noack, and P. Koumoutsakos, “Machine learning for fluid A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep
mechanics,” Annu. Rev. Fluid Mech. 52, 477 (2019). convolutional neural networks,” Commun. ACM 60(6), 84–90 (2017).
33
8
S. L. Brunton, J. L. Proctor, and J. N. Kutz, “Discovering governing equations J. N. Kutz, “Deep learning in fluid dynamics,” J. Fluid Mech. 814, 1–4 (2017).
34
from data by sparse identification of nonlinear dynamical systems,” Proc. Natl. Y. Le Cun, “Learning process in an asymmetric threshold network,” in Disor-
Acad. Sci. U. S. A. 113(15), 3932–3937 (2016). dered Systems and Biological Organization (Springer, 1986), pp. 233–240.
35
9
A. E. Bryson, Applied Optimal Control: Optimization, Estimation and Control Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature 521(7553),
(CRC Press, 1975). 436–444 (2015).
36
10
S. Cai, S. Zhou, C. Xu, and Qi Gao, “Dense motion estimation of particle images Y. LeCun et al., “Generalization and network design strategies,” Connectionism
via a convolutional neural network,” Exp. Fluids 60(4), 73 (2019). in Perspective (Citeseer, 1989), Vol. 19.
11 37
K. Champion, B. Lusch, J. N. Kutz, and S. L. Brunton, “Data-driven discovery Y. LeCun, D. Touresky, G. Hinton, and T. Sejnowski, “A theoretical framework
of coordinates and governing equations,” Proc. Natl. Acad. Sci. U. S. A. 116(45), for back-propagation,” in Proceedings of the 1988 Connectionist Models Summer
22445–22451 (2019). School (Morgan Kaufmann, CMU, Pittsburgh, Pa, 1988), Vol. 1, pp. 21–28.
12 38
D. Chicco, P. Sadowski, and P. Baldi, “Deep autoencoder neural networks for Y. Lee, H. Yang, and Z. Yin, “PIV-DCNN: Cascaded deep convolutional
gene ontology annotation predictions,” in Proceedings of the 5th ACM Conference neural networks for particle image velocimetry,” Exp. Fluids 58(12), 171
on Bioinformatics, Computational Biology, and Health Informatics (ACM, 2014), (2017).
39
pp. 533–540. R. Li, B. R. Noack, L. Cordier, J. Borée, and F. Harambat, “Drag reduction
13 of a car model by linear genetic programming control,” Exp. Fluids 58(8), 103
D.-A. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and accurate
deep network learning by exponential linear units (ELUs),” arXiv:1511.07289 (2017).
40
(2015). R. Li, B. R. Noack, L. Cordier, J. Borée, E. Kaiser, and F. Harambat, “Linear
14 genetic programming control for strongly nonlinear dynamics with frequency
M. Couplet, C. Basdevant, and P. Sagaut, “Calibrated reduced-order pod-
galerkin system for fluid flow modelling,” J. Comput. Phys. 207(1), 192–220 crosstalk,” Arch. Mech. 70(6), 505–534 (2018).
41
(2005). M. Luhar, A. S. Sharma, and B. J. McKeon, “Opposition control within the
15
P. A. Davidson, Turbulence: An Introduction for Scientists and Engineers resolvent analysis framework,” J. Fluid Mech. 749, 597–626 (2014).
42
(Oxford University Press, 2003). J. Lumley, “The structure of inhomogeneous turbulent flows,” in Army Turbu-
16
J. Derber and A. Rosati, “A global oceanic data assimilation system,” J. Phys. lence Radio Wave Propagation, edited by A. M. Yaglom and V. I. Tatarsky (Nauka,
Oceanogr. 19(9), 1333–1347 (1989). 1967), pp. 166–178.
17 43
T. Duriez, S. L. Brunton, and B. R. Noack, Machine Learning Control-Taming B. Lusch, J. N. Kutz, and S. L. Brunton, “Deep learning for universal linear
Nonlinear Dynamics and Turbulence (Springer, 2017). embeddings of nonlinear dynamics,” Nat. Commun. 9(1), 4950 (2018).

Phys. Fluids 32, 067103 (2020); doi: 10.1063/5.0012906 32, 067103-20


Published under license by AIP Publishing
Physics of Fluids ARTICLE scitation.org/journal/phf

44 56
R. Maulik, B. Lusch, and P. Balaprakash, “Reduced-order modeling of J. Rabault, J. Kolaas, and A. Jensen, “Performing particle image velocimetry
advection-dominated systems with recurrent neural networks and convolutional using artificial neural networks: A proof-of-concept,” Meas. Sci. Technol. 28(12),
autoencoders,” arXiv:2002.00470 (2020). 125301 (2017).
45 57
M. Milano and P. Koumoutsakos, “Neural network modeling for near wall J. Rabault, U. Reglade, N. Cerardi, M. Kuchta, and A. Jensen, “Deep rein-
turbulent flow,” J. Comput. Phys. 182(1), 1–26 (2002). forcement learning achieves flow control of the 2d Karman vortex street,”
46 arXiv:1808.10754 (2018).
A. T. Mohan, N. Lubbers, D. Livescu, and M. Chertkov, “Embedding hard
58
physical constraints in neural network coarse-graining of 3d turbulence,” A. Rocchetto, E. Grant, S. Strelchuk, G. Carleo, and S. Severini, “Learning hard
arXiv:2002.00021 (2020). quantum distributions with variational autoencoders,” npj Quantum Inf. 4(1), 1–7
47 (2018).
A. G. Nair, C.-A. Yeh, E. Kaiser, B. R. Noack, S. L. Brunton, and K. Taira,
59
“Cluster-based feedback control of turbulent post-stall separated flows,” J. Fluid F. Rosenblatt, “The perceptron: A probabilistic model for information storage
Mech. 875, 345–375 (2019). and organization in the brain,” Psychol. Rev. 65(6), 386 (1958).
48 60
V. Nair and G. E. Hinton, “Rectified linear units improve restricted Boltz- D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by
mann machines,” in Proceedings of the 27th International Conference on Interna- back-propagating errors,” Nature 323(6088), 533–536 (1986).
61
tional Conference on Machine Learning, ICML’10 (Omnipress, Madison, WI, USA, A. L. Samuel, “Some studies in machine learning using the game of checkers,”
2010), pp. 807–814. IBM J. Res. Dev. 3(3), 210–229 (1959).
49 62
See https://www.oreilly.com/content/a-look-at-deep-learning-for-science/ for H. Sidky, W. Chen, and A. L. Ferguson, “Machine learning for collective variable
information about deep learning technologies apply to science. discovery and enhanced sampling in biomolecular simulation,” Mol. Phys. 118,
50
See https://palabos.unige.ch/lattice-boltzmann/lattice-boltzmann-sample-codes- e1737742 (2020).
63
various-other-programming-languages for information about the code. R. Sutton and A. Barto, Reinforcement Learning: An Introduction (MIT Press,
51
S. E. Otto and C. W. Rowley, “Linearly recurrent autoencoder networks for 2011).
64
learning dynamics,” SIAM J. Appl. Dyn. Syst. 18(1), 558–593 (2019). H. Tennekes and J. L. Lumley, A First Course in Turbulence (MIT Press, 1972).
52 65
V. Parezanović, J.-C. Laurentie, C. Fourment, J. Delville, J.-P. Bonnet, A. Spohn, N. G. Van Kampen, “Determinism and predictability,” Synthese 89(2), 273–281
T. Duriez, L. Cordier, B. R. Noack, M. Abel et al., “Mixing layer manipulation (1991).
experiment,” Flow, Turbul. Combust. 94(1), 155–173 (2015). 66
C. Wehmeyer and F. Noé, “Time-lagged autoencoders: Deep learning of slow
53
K. Pearson, “LIII. On lines and planes of closest fit to systems of points in space,” collective variables for molecular kinetics,” J. Chem. Phys. 148(24), 241703 (2018).
London, Edinburgh Dublin Philos. Mag. J. Sci. 2(11), 559–572 (1901). 67
P. Werbos, “Beyond regression: New tools for prediction and analysis in the
54
E. Plaut, “From principal subspaces to principal components with linear autoen- behavioral sciences,” Ph.D. dissertation (Harvard University, 1974).
coders,” arXiv:1804.10253 (2018). 68
X. Zou, I. M. Navon, and J. Sela, “Control of gravitational oscillations in
55
S. B. Pope, Turbulent Flows (IOP Publishing, 2001). variational data assimilation,” Mon. Weather Rev. 121(1), 272–289 (1993).

11 December 2023 21:54:35

Phys. Fluids 32, 067103 (2020); doi: 10.1063/5.0012906 32, 067103-21


Published under license by AIP Publishing

You might also like