Professional Documents
Culture Documents
Genki Yagawa
Atsuya Oishi
Computational
Mechanics
with Deep
Learning
An Introduction
Lecture Notes on Numerical Methods
in Engineering and Sciences
Series Editor
Eugenio Oñate , Jordi Girona, 1, Edifici C1 - UPC, Universitat Politecnica de
Catalunya, Barcelona, Spain
Editorial Board
Charbel Farhat, Department of Mechanical Engineering, Stanford University,
Stanford, CA, USA
C. A. Felippa, Department of Aerospace Engineering Science, University of
Colorado, College of Engineering & Applied Science, Boulder, CO, USA
Antonio Huerta, Universitat Politècnica de Cataluny, Barcelona, Spain
Thomas J. R. Hughes, Institute for Computational Engineering, University of Texas
at Austin, Austin, TX, USA
Sergio Idelsohn, CIMNE - UPC, Barcelona, Spain
Pierre Ladevèze, Ecole Normale Supérieure de Cachan, Cachan Cedex, France
Wing Kam Liu, Evanston, IL, USA
Xavier Oliver, Campus Nord UPC, International Center of Numerical Methods,
Barcelona, Spain
Manolis Papadrakakis, National Technical University of Athens, Athens, Greece
Jacques Périaux, CIMNE - UPC, Barcelona, Spain
Bernhard Schrefler, Mechanical Sciences, CISM - International Centre for
Mechanical Sciences, Padua, Italy
Genki Yagawa, School of Engineering, University of Tokyo, Tokyo, Japan
Mingwu Yuan, Beijing, China
Francisco Chinesta, Ecole Centrale de Nantes, Nantes Cedex 3, France
This series publishes text books on topics of general interest in the field of
computational engineering sciences.
The books will focus on subjects in which numerical methods play a fundamental
role for solving problems in engineering and applied sciences. Advances in finite
element, finite volume, finite differences, discrete and particle methods and their
applications to classical single discipline fields and new multidisciplinary domains
are examples of the topics covered by the series.
The main intended audience is the first year graduate student. Some books define
the current state of a field to a highly specialised readership; others are accessible to
final year undergraduates, but essentially the emphasis is on accessibility and clarity.
The books will be also useful for practising engineers and scientists interested in
state of the art information on the theory and application of numerical methods.
Genki Yagawa · Atsuya Oishi
Computational Mechanics
with Deep Learning
An Introduction
Genki Yagawa Atsuya Oishi
Professor Emeritus Graduate School of Technology
University of Tokyo and Toyo University Industrial and Social Sciences
Tokyo, Japan Tokushima University
Tokushima, Japan
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Switzerland AG 2023
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Computational Mechanics
Deep Learning
On the other hand, the advances of computers have brought about significant devel-
opments in machine learning, which aims classifying and making decisions by the
process of finding inherent rules and trends in large amounts of data based on
algorithms rather than human impressions and intuition.
Feedforward neural networks are one of the most popular machine learning algo-
rithms. They have the ability to approximate arbitrary continuous functions and have
been applied to various fields since the development of the error back propagation
learning in 1986. Since the beginning of 21st century, they have become able to
v
vi Preface
use many hidden layers, called deep learning. Their areas of applications have been
further expanded due to the performance improvement by using more hidden layers.
Although the development of computational mechanics including the FEM has made
it possible to analyze various complex phenomena, there still remain many problems
that are difficult to deal with. Specifically, numerical solution methods such as the
FEM are solid solution methods based on mathematical equations (partial differential
equations), so they are useful when finding solutions to partial differential equations
based on given boundary and initial conditions. However, it is not the case when
estimating boundary and initial conditions from the solutions. In fact, the latter is
often encountered in the design phase of artifacts.
In addition, as deep learning and neural networks can discover mapping relations
between data without explicit mathematical formulas, it is possible to find inverse
mappings only by swapping the input and output. For this reason, deep learning and
neural networks have been accepted in the field of computational mechanics as an
important method to deal with the weak points of conventional numerical methods
such as the FEM.
They were mainly applied to such limited areas as the estimation of constitu-
tive laws of nonlinear materials and non-destructive evaluation, but with the recent
development of deep learning, their applicability has been expanded dramatically. In
other words, a fusion has started between deep learning and computational mechanics
beyond the conventional framework of computational mechanics.
Readership
The authors’ previous book titled Computational Mechanics with Neural Networks
published in 2021 from Springer covers most of the applications of neural networks
and deep learning in computational mechanics from its early days to the present
together with applications of other machine learning methods. Its concise descrip-
tions of individual applications make it suitable for researchers and engineers to get
an overview of this field.
On the other hand, the present book, Computational Mechanics with Deep
Learning: An Introduction, is intended to select carefully some recent applications of
deep learning and to discuss each application in detail, but in an easy-to-understand
manner. Sample programs are included for the readers to try out in practice. This
book is therefore useful not only for researchers and engineers, but also for a wide
range of readers who are interested in this field.
Preface vii
stiffness matrix calculation program, and Sect. 9.2 those in the field of deep learning,
such as the feedforward neural network, both of which are given with background
mathematical formulas.
Chapter 10 presents programs for the application of deep learning to the elemental
integration discussed in Chap. 4. With these programs and those presented in Chap. 9,
the readers of the present book could easily try “Computational Mechanics with Deep
Learning” by themselves.
ix
Contents
Part I Fundamentals
1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 Deep Learning: New Way for Problems Unsolvable
by Conventional Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Progress of Deep Learning: From McCulloch–Pitts Model
to Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 New Techniques for Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3.1 Numerical Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3.2 Adversarial Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.3.3 Dataset Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.3.4 Dropout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.3.5 Batch Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.3.6 Generative Adversarial Networks . . . . . . . . . . . . . . . . . . . 31
1.3.7 Variational Autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.3.8 Automatic Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . 39
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2 Mathematical Background for Deep Learning . . . . . . . . . . . . . . . . . . . 49
2.1 Feedforward Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.2 Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.3 Training Acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.3.1 Momentum Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.3.2 AdaGrad and RMSProp . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
2.3.3 Adam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
2.4 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.4.1 What Is Regularization? . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.4.2 Weight Decay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
2.4.3 Physics-Informed Network . . . . . . . . . . . . . . . . . . . . . . . . . 72
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
xi
xii Contents
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
Part I
Fundamentals
Chapter 1
Overview
Problem 1 Assume a square plate, its bottom fixed at both ends, and
loaded partially at the top (Fig. 1.1a). Let us find the displacements
(u 1 , v1 ), . . . , (u 4 , v4 ) at the four points at the top (Fig. 1.1b).
The first solution method, which will be the simplest, is to actually apply a load
to the plate and measure the displacements, which may give the most reliable results
if it is easy to set up the experimental conditions and measure the physical quantity
of interest. This method can be called an experiment-based solution method.
The second method that comes to mind is to calculate the displacements by the
finite element analysis [32]. This problem can be solved by the two-dimensional
finite element stress analysis based on the following three kinds of equations.
Equations of balance of forces in an analysis region:
(a) (b)
{ ∂τx y
∂σx
∂x
+ ∂y
=0
∂τx y ∂σ y in Ω (1.1.1)
∂x
+ ∂y
=0
Equation (1.1.1) is solved under the conditions Eqs. (1.1.2) and (1.1.3). Equa-
tion (1.1.2), which describes equilibrium at the load boundary, is called the Neumann
boundary condition, and Eq. (1.1.3), which describes the fixed displacements, the
Dirichlet boundary condition.
Based on the finite element method, Eqs. (1.1.1), (1.1.2) and (1.1.3) are formulated
as a set of linear equations as follows [72]:
where [K ] in the left-hand side is called the coefficient matrix or the global stiffness
matrix, {U } the vector of nodal displacements, and {F} the right-hand side vector
calculated from the nodal equivalent load. The nodal displacements of all the nodes
in the domain to be solved are obtained by solving the simultaneous linear equations,
Eq. (1.1.4). For each of the four points specified in the problem, the displacements
1.1 Deep Learning: New Way for Problems Unsolvable … 5
of the point can be directly obtained as the nodal displacements if the point is a node,
or by interpolating the displacements of the surrounding nodes if it is not a node.
This solution method, which is based on the numerical solution of partial differential
equations, is called a computational method based on differential equation or, simply,
equation-based numerical solution method.
Then, we consider the following problem.
Problem 2 Assume the same square plate, its bottom fixed at both ends, and
loaded at one side of the top as Problem 1 (Fig. 1.1a). But, as shown in Fig. 1.2,
there is a hole inside the plate. Find the displacements (u 1 , v1 ), . . . , (u 4 , v4 ) at
the four points at the top (Fig. 1.1b).
The experiment for this problem may be more difficult than for the previous case.
Especially, if the domain is not a plate but a cube with a void being embedded, it will
be very time consuming to prepare for the experiment.
On the other hand, the equation-based numerical solution method can solve
Problem 2 without any difficulty by using a mesh divided according to the given
shape. This versatility of the equation-based numerical solution methods such as
the finite element method is a great advantage over the experiment-based solution
methods.
Supported by this advantage, it has become possible for numerical methods to
deal with almost all kinds of applied mechanics problems. Nowadays, the methods
are taken as the first choice for solving various problems.
However, it is clear that even the equation-based numerical solution method is
not a panacea if we consider the following problem.
Problem 3 Assume a square plate, its bottom fixed at both ends, loaded at one
side of the top (Fig. 1.1a) and the displacements at the four points at the top
(round hole)
To solve the direct problem with the equation-based numerical method is equal
to find this kind of mapping.
On the other hand, the mapping relation to solve Problem 3 is expressed as
1.1 Deep Learning: New Way for Problems Unsolvable … 7
⎧
⎪
⎪ Dirichret boundary condition
⎪
⎪
⎪
⎪ Neumann boundary condition
⎪
⎪ ⎧ ⎫
⎪
⎪ ⎪
⎨ ⎪ u1 ⎪
⎪ ⎪
⎪
h: ⎪
⎪ v1 ⎪⎪ → Hole parameters
⎪ ⎨ ⎬ (1.1.6)
⎪
⎪ ..
⎪
⎪ ⎪ . ⎪
⎪
⎪ ⎪
⎪ ⎪
⎪
⎪ ⎪
⎪ u ⎪ ⎪
⎪
⎩ ⎩ 4⎪ ⎭
v4
Solving the inverse problem is equal to find the above kind of mapping. In this case,
it is to find mapping from the displacements that would usually be results obtained
by solving the governing equations to the hole parameters (shape and position of the
hole) that are conditions usually considered as input to solve the governing equations
[41].
It is clear that the inverse problem is much difficult to handle than the direct
problem, where the solution can be achieved directly through the routine operation
of solving equations. It is noted that an inverse problem such as Problem 3 is a type
of problem that we encounter often when we design an artifact, asking “How can we
satisfy this condition?” This means that solving inverse problems efficiently is one
of the most important issues for applied mechanics.
Now, omitting the parameters used in Eq. (1.1.5), we have
⎧ ⎫ ⎧ ⎫
⎪
⎪ u1 ⎪ ⎪ u1 ⎪
⎪ ⎪
⎪ ⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎨ ⎬ ⎨ v1 ⎪
⎪ v 1 ⎪ ⎪ ⎬
g : HoleParams → .. or ..
⎪ . ⎪ ⎪ . ⎪ = g(HoleParams) (1.1.7)
⎪
⎪ ⎪ ⎪ ⎪
⎪ u4 ⎪
⎪ ⎪ ⎪
⎪ u4 ⎪
⎪
⎩ ⎪ ⎭ ⎪
⎩ ⎪ ⎭
v4 v4
Then, let us find the H () among all the admissible candidates, that minimizes
Σ
N Σ
n
( )2
L= p j (i) − p Hj (i) (1.1.11)
i=1 j=1
Problem 4 Assume a square plate with its bottom fixed at both ends, loaded
at one side of the top (Fig. 1.1a), and the displacements at the four points
on the top being measured as (u 1 , v1 ), . . . , (u 4 , v4 ). Then, find the shape and
1.1 Deep Learning: New Way for Problems Unsolvable … 9
Then, among the broad range of admissible G()s, we find the G() that minimizes
the equation as follows:
4 {
Σ
N Σ
( )2 ( )2 }
u j (i) − u Gj (i) + v j (i) − v Gj (i) → min (1.1.14)
i=1 j=1
10 1 Overview
In this section, the development of deep learning and its predecessor, feedforward
neural networks, is studied.
First, let us review a feedforward neural network, the predecessor of deep learning,
which is a network consisting of layers of units with connections between units in
adjacent layers. A unit performs multiple-input, single-output nonlinear transforma-
tion, similarly to a biological neuron (Fig. 1.5). In a feedforward neural network with
n layers, the first layer is called the input layer, the second to (n − 1)th layers the
intermediate or hidden layers, and the nth layer the output layer. Figure 1.6 shows
the structure of a feedforward neural network. The signal input to the input layer
is sequentially passed through the hidden layers and becomes the output signal at
the output layer. Here, the input signal undergoes a nonlinear transformation in each
layer. A feedforward neural network is considered “deep,” if it has five or more
nonlinear transformation layers [43].
A brief chronology of feedforward neural networks and deep learning is shown
as follows:
1943 McCulloch–Pitts model [45]
1958 Perceptron [56]
1967 Stochastic gradient decent [1]
1969 Perceptrons [47]
1.2 Progress of Deep Learning: From McCulloch–Pitts Model … 11
( n )
Σ
O = f (u) = f wi Ii + θ (1.2.1)
i=1
In this model, the output of the neuron is binary (0 or 1), and the Heaviside function
is used as the activation function as
{
1 (u ≥ 0)
O = f (u) = (1.2.2)
0 (u < 0)
where w0 corresponds to θ in Eq. (1.2.1). Then, the classification rule with the
perceptron is written as
{ ( ( ) )
x i ∈ C 1 ( f (w T x i ) ≥ 0 )
(1.2.4)
x i ∈ C2 f w T x i < 0
( ( ) ) (1.2.6)
⎩ w(k+1) = w(k) + αxi f w(k) xi < 0
T
( )
If f w(k) T x i < 0 holds, we have
( ) (( )T ) ( )
f w(k+1) T x i = f w(k) + αx i x i = f w(k) T x i + α|x i |2 (1.2.7)
(Equation )(1.2.7) suggests that the weights are updated so that the value
(k+1)T
f w Xi approaches positive.
Iteratively applying this learning rule to all the input data, weights w that can
correctly classify all input data are determined. This learning rule was proven to
converge in a finite number of learning iterations, called the perceptron convergence
theorem [57]. The perceptron has attracted a great deal of attention, and the first
boom of neural networks occurred with it.
In 1969, however, the limitations of the perceptron were theoretically demon-
strated [47], that it was theoretically proven that a simple single-layer perceptron
could be applied only to linearly separable problems (Fig. 1.9), which cast doubt
on its applicability to practical classification problems. The hope for the perceptron
dropped drastically, and the first neural network boom calmed down.
The weakness of the perceptron, which was only effective for linearly separable
problems, was solved by making it multilayered, but causing a new demand for a
suitable learning algorithm.
14 1 Overview
Class 1
Class 2
Fig. 1.9 Linearly separable and inseparable data a linearly separable b linearly inseparable
In 1986, the back propagation algorithm was introduced as a new learning algo-
rithm for multilayer feedforward neural networks, as shown in Fig. 1.6 [58], which
is known as an algorithm based on the steepest descent method that modifies the
connection weights between units in the direction of decreasing the error, which is
defined as the square of the difference between the output from the output layer unit
and the corresponding teacher data as follows:
1Σ P Σ
n n
L
( p L p )2
E= O j − Tj (1.2.8)
2 p=1 j=1
where
p
O Lj the output of the jth unit in the output at the Lth layer (output layer) for the
pth training pattern.
p
T j the teacher signal corresponding to the output of the jth unit in the output layer
for the pth training pattern.
nP the total number of training patterns.
nL the total number of output units.
Let w (k)
ji be the connection weight between the ith unit of the kth layer and the
jth unit of the (k + 1)th layer in Fig. 1.6, then the back propagation algorithm
successively modifies w (k)
ji as follows:
∂E
w (k) (k)
ji ← w ji − α (1.2.9)
∂w (k)
ji
1.2 Progress of Deep Learning: From McCulloch–Pitts Model … 15
0.8
0.6
0.4
0.2
0.0
-0.2
-4 -2 0 2 4
x
1
f (x) = (1.2.10)
1 + e−x
Figure 1.10 shows the Heaviside function and the sigmoid function. It is seen that
the latter is a smoothed version of the former. (see Sect. 2.1 for details of the back
propagation algorithm.)
In 1989, it was shown that feedforward neural networks can approximate arbitrary
continuous functions [19, 29]. However, this theoretical proof is a kind of existence
theorem, and provides little answer to important practical questions such as how large
a neural network (number of layers, number of units in each layer, etc.) should be
used, what training parameters be used, and how many training cycles are required for
convergence. Accordingly, determination of such meta-parameters is usually made
by trial and error.
With the advent of the back propagation algorithm in 1986, multilayer feedfor-
ward neural networks were put to practical use, and the application range of neural
networks was greatly expanded, resulting in the second neural network boom. It
should be noted that almost twenty years earlier than the advent of the back propaga-
tion algorithm, the prototype of the algorithm was proposed [1], but its importance
was not widely recognized at that time.
After a while, the second neural network boom that had started with the advent
of the back propagation algorithm gradually calmed down. This was due to the fact
16 1 Overview
106 Roadrunner
Performance
EarthSimulator
4
10
ASCI Red
2
10
SX-3
Cray-2
100
Cray-1
10-2
1970 1980 1990 2000 2010 2020
Year
that when the scale of a feedforward neural network was increased to improve its
function and performance, the learning process became too slow or often did not
proceed at all. There were two main reasons for this: one the speed of the computer
and the other the vanishing gradient problem.
Let us consider first the speed of computers. Figure 1.11 shows the history of the
fastest supercomputers, where the vertical axis is the computation speed, defined
by the number of floating-point operations per second (FLOPS: Floating-point
Operations Per Second). The unit used here is Giga FLOPS (109 FLOPS).
It is seen from the figure that, in 1986, when the back propagation algorithm was
started, the speed of supercomputers was about 2 GFLOPS, it was 220 GFLOPS
in 1996, 280 TFLOPS (TeraFLOPS: 1012 FLOPS) in 2006, and 415 PFLOPS
(PetaFLOPS: 1015 FLOPS) in 2021.
A simple calculation suggests that the training time of a feedforward neural
network, which takes only one minute on a current computer (415 PFLOPS), took
1482 min (about one day) on a computer (280 TFLOPS) in 2006, 1,886,364 min
(about three and a half years) on a computer (220 GFLOPS) in 1996, and
207,500,000 min (about 400 years) on a computer (2 GFLOPS) in 1986. In reality,
this calculation is not necessarily true because of the effects of parallel processing
and other factors, but it still shows the speed of progress in computing speed, in other
words the slowness of the computers of old time, suggesting that it was necessary to
wait for the progress of computers in order to apply the back propagation algorithm
to relatively large neural networks.
As discussed above, the calculation speed of computers has been the big issue for
the back propagation algorithm. In addition to that, another barrier to the applica-
tion of large-scale multilayer feedforward neural networks to practical problems is
the vanishing gradient problem. This problem exists in multilayer neural networks,
1.2 Progress of Deep Learning: From McCulloch–Pitts Model … 17
where learning does not proceed in layers far away from the output layer, preventing
performance improvement by increasing the number of hidden layers. The cause of
the vanishing gradient is that the amount of correction by back propagation algorithm
as
∂E
Δw (k)
ji = −α (1.2.11)
∂w (k)
ji
becomes small in the deeper layers (layers close to the input layer) due to the
small derivative of the sigmoid function that was most commonly employed as the
activation function. (see Sect. 2.1 for details.)
Because of these issues, feedforward neural networks, while having the back
propagation learning algorithm and the versatility of being able to simulate arbitrary
nonlinear continuous functions, were “applied only to problems that a relatively
small network could handle.”
The serious situation described above changed in 2006 as the methods to avoid the
vanishing gradient problem by layer-by-layer pretraining [4, 28] and also those for
training multilayer feedforward neural networks without the issue were proposed.
Here, we discuss how the autoencoder is used to pretrain multilayer feedforward
neural networks. The structure of autoencoder is shown in Fig. 1.12, which is a
feedforward neural network with one hidden layer, and the number of units in the
input layer is the same as the number of units in the output layer. The autoencoder
is trained to output the same data as the input data by the error back propagation
learning using the input data as the teacher data. After the training is completed,
the autoencoder simply outputs the input data, which seems to be a meaningless
operation, but in fact it corresponds to the conversion of the input data into a different
representation format in the hidden layer. For example, if the number of hidden layer
units is less than the number of input layer units, a compressed representation of the
input data is obtained.
(1) First, the autoencoder A is trained using the input data of the original five-layer
neural network. After the training is completed, the connection weights between
the input and the hidden layers of the autoencoder A are set to the initial values
of the connection weights between the first and second layers of the original
five-layer neural network.
(2) Then, the autoencoder B is trained, where the output of the hidden layer of
the autoencoder A is used as the input data for training. After the training is
completed, the connection weights between the input and the hidden layers of
the autoencoder B are set to the initial values of the connection weights between
the second and third layers of the original five-layer neural network.
4 C
5 3
2 3 B
1 2
2 A
(3) Third, the autoencoder C is trained, where the output of the hidden layer of
the autoencoder B is used as the input data for training. After the training is
completed, the connection weights between the input and the hidden layers of
the autoencoder C are set to the initial values of the connection weights between
the third and fourth layers of the original five-layer neural network.
(4) Finally, after initializing the connection weights between each layer with the
values obtained by the autoencoders in (1), (2) and (3) above, the error back
propagation learning of the five-layer feedforward neural network is performed
using the original input and the teacher data.
Thus, one can solve the vanishing gradient problem by setting the initial values of
the connection weights between layers starting from those closest to the input layer
with autoencoders.
Another factor, which made possible feedforward neural networks deeply multi-
layered, is the significant improvement in computer performance, suggesting that
learning can now be completed in a practical computing time. As a result, the restric-
tions on the construction of feedforward neural networks have been relaxed, and
the scale of the neural network can be increased according to the complexity of
the practical problem. In addition, in 2007, CUDA [39], a language using graphics
processing units (GPUs) for numerical computation, was introduced, which have
become widely used as accelerators for training and inference of feedforward neural
networks, further improving computer performance.
With the development of the pretraining method and the significant improve-
ment of computer performance above, the third neural network boom has started
with the emergence of a multilayer large-scale neural network, the so-called deep
learning. The necessity of pretraining is, however, decreasing due to improvements
of activation functions, training methods and computer performance.
The success of deep learning is owing to the development of convolutional neural
networks also, in which the units of each layer are arranged in a two-dimensional grid.
In other words, in a conventional feedforward neural network, all units in adjacent
layers are connected to each other, whereas in a convolutional neural network, a unit
in a layer is connected to only some units in the precedent layer. Figure 1.14 shows
the structure and function of a convolutional neural network. The input to the (k, l)th
p
unit in the pth layer, Ukl , is given using the outputs of units in the (p − 1)th layer
p−1
Oi, j as follows:
Σ T −1
S−1 Σ
p p−1 p−1 p
Uk,l = h s,t · Ok+s,l+t + θk,l (1.2.12)
s=0 t=0
p p−1
where θk,l is the bias of the (k, l)th unit in pth layer, h s,t , the weight at (s, t) in (p −
1)th layer, which, unlike the weights in a fully connected feedforward neural network,
is identical between units in the same layer, S and T are the range of contributions
to the input, and Fig. 1.14 shows the case of S = T = 3.
20 1 Overview
The weights h s,t can be expressed in matrix form as in Fig. 1.15 for the case
of S = T = 3. The operation of Eq. (1.2.12) with h s,t is the same as the filter
operation in image processing [21]. Figure 1.16 shows examples of filters used in
image processing, Fig. 1.16a the Laplacian mask used for image sharpening, and
both Figs. 1.16b and c for edge detection, where the direction of the edge to be
detected is different between them. Note that the convolution operation represented
by Eq. (1.2.12) is similar to feature extraction in image processing, and when the
input data is an image, it can be interpreted as an operation to extract the features of
the input image. For details on the calculation in the convolutional layer, see Sect. 2.2.
From a historical point of view, the introduction of the locality such as convo-
lutional layers into feedforward neural networks had already been done in Neocog-
nitron [18], which was inspired by the hierarchical structure of visual information
processing [31]. Figure 1.17 shows the structure of Neocognitoron. The prototype
of the current convolutional layer was proposed in 1989 [42], which are known very
useful when images are employed as input. In addition to images, convolutional
Fig. 1.17 Neocognitron. Reprinted from [18] with permission from Springer
neural networks have become widely used for various multidimensional data such
as voice or speech.
The ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) [59] is an
image recognition contest using ImageNet, a large image dataset of over 10 million
images. In 2012, deep learning showed dominant performance in ILSVRC [40].
Since then, deep learning has been the best performer, and the winning systems of
the contest since 2012 are given as follows:
2012 AlexNet [40]
2013 ZFNet [69]
2014 GoogLeNet [66]
2015 ResNet [26]
2016 CUImage [70]
2017 SENet [30]
22 1 Overview
In this section, some of new and increasingly important techniques in deep learning
are discussed.
First, let us study the numerical accuracy required for deep learning.
1.3 New Techniques for Deep Learning 23
It is well known that basic numbers employed in computers are binary, and there
are several formats for floating-point real numbers, among which we choose one
depending on the necessary precision level [67]. The floating-point real number is
represented by a series of binary digits, which consist of a sign part to represent to
be positive or negative, an exponent part for order, and a mantissa part for significant
digits, with the total length (number of bits) varying according to the precision of
the number. The IEEE754 standard specifies three types of real numbers, a double-
precision real number (FP64) with about 16 decimal significant digits, a single-
precision real number (FP32) with about 7 decimal digits, and a half-precision real
number (FP16) with about 3 decimal digits, which occupy 64, 32, and 16 bits of
memory, respectively. In the field of computational mechanics, FP64 is usually used,
and some problems even need quadruple-precision real numbers.
In contrast to the above, it has been shown that deep learning can suffice accuracy
even when using real numbers with relatively low precision [12–14, 25]. Although
training in deep learning usually requires higher numerical precision due to the
calculation of derivatives than when inferring with trained neural networks, it has
been shown that FP32 and FP16 are sufficient even for training.
For this reason, new low-precision floating-point real number formats are also
being used for deep learning, including BFloat16 (BF16) and Tensor Float 32 (TF32).
The former, proposed by Google, has more exponent bits than FP16 with the same
exponent bits as FP32. Since the number of bits in the mantissa part is reduced, that of
significant digits is also reduced, but the range of numbers that can express is almost
the same as that of FP32. The latter, proposed by Nvidia, has the same number of
digits in the exponent part as FP32, and the same number of digits in the mantissa
part as FP16. The TF32 format has 19 bits in total, meaning a special format whose
length is not a power of 2. The major floating-point number formats are summarized
in Fig. 1.18.
d1.f = f1 ;
d1.i = d1.i >> nsh ;
d1.i = d1.i << nsh ;
d2.f = f2 ;
d2.i = d2 i >> nsh ;
d2.i = d2.i << nsh ;
d3.f = d1.f*d2.f ;
d3.i = d3.i >> nsh ;
d3.i = d3.i << nsh ;
printf(“%2d %f %f %f\n”,nsh,d1.f,d2.f,d3.f);
}
return 0;
}
List 1.3.2 Results of the PLP test code (CentOS 7.9, gcc 4.8.5)
1 2.718282 3.141593 8.539734
2 2.718282 3.141592 8.539730
3 2.718281 3.141592 8.539726
4 2.718281 3.141590 8.539719
5 2.718277 3.141586 8.539673
6 2.718277 3.141586 8.539673
7 2.718262 3.141571 8.539551
8 2.718262 3.141541 8.539307
9 2.718262 3.141479 8.539062
10 2.718262 3.141357 8.538086
11 2.718262 3.141113 8.537109
12 2.717773 3.140625 8.535156
13 2.716797 3.140625 8.531250
14 2.714844 3.140625 8.515625
15 2.710938 3.140625 8.500000
16 2.703125 3.140625 8.437500
17 2.687500 3.125000 8.375000
18 2.687500 3.125000 8.250000
19 2.625000 3.125000 8.000000
20 2.500000 3.000000 7.500000
21 2.500000 3.000000 7.000000
22 2.000000 3.000000 6.000000
Deep learning has shown very good performance in image recognition and is said to
surpass human ability of discrimination in some areas. However, it has been reported
that deep learning can misidentify images that can be easily identified by humans.
Goodfellow et al. [23], employing an image that should be judged to be a panda
on which a small noise is superimposed, show that the superimposed image looks
almost identical to the original image to the human eye or can be easily identified
as a panda, whereas the convolutional neural network GoogLeNet [66] judges it as
a gibbon. This kind of input data is called an adversarial example.
26 1 Overview
Δu j = w Tj Δx (1.3.2)
Equation (1.3.2) shows that Δu j is the inner product of w j and Δx, and therefore
the variation Δu j takes the maximum value when Δx = kw j , showing that among
various noises of similar magnitude, the variation of the input to a unit, and also the
output of the unit, becomes the largest for noise vector Δx with the specific direction,
i.e., parallel to w j .
For a well-trained multilayer feedforward neural network, we can also make the
output fluctuate greatly with small fluctuations of input as follows. When the error
function of the neural network is represented as E = E(x), the input noise vector
Δx = (Δx1 , . . . , Δxn )T , and the small positive constant ε , then adding the noise
vector Δx generated by
( )
∂E
Δxi = ε · sgn (1.3.3)
∂ xi
to the input vector, we can increase a significant difference between the output of
neural network and the teacher data. Here, sgn(x) is defined as follows:
{
1 (x ≥ 0)
sgn(x) = (1.3.4)
−1 (x < 0)
where α is a positive constant, and Δx adv a noise vector generated by Eq. (1.3.3).
1.3 New Techniques for Deep Learning 27
In neural networks and deep learning, the number of training patterns is one of the
most important issues. If it is small, overtraining [27] is likely to occur. To avoid
this, it is necessary to have as many training patterns as possible. In many situations,
however, it is not easy to collect a sufficient number of training patterns as seen in
the case of medical imaging (X-ray, MRI, etc.).
For this reason, the original training patterns (images) are processed to make
new training patterns, called dataset augmentation. As an example, in the deep
learning library Keras, its ImageDataGenerator function provides images that have
been processed in various ways, such as rotation, translation, inversion, and shear
deformation, to increase the number of training patterns. Table 1.1 shows the main
parameters of ImageDataGenerator and their effects.
Even when the input data are audio data, the data augmentation described above
is performed and proved to be effective [34, 53].
Data augmentation for images is considered to be difficult in such case as a fully
connected feedforward neural network. Then, another data augmentation method,
the superimposition of noise, is studied [60], which is based on
input original
xi = (1 + ri )xi , ri ∈ [−ε, ε] (1.3.6)
1.3.4 Dropout
For data pairs that are to be used as input and teacher data for neural networks to
learn mapping, it is common to transform the data into a certain range of values,
or to process the data to align the mean and variance. Batch normalization is that
dynamically performs the above transformation also in hidden layers.
As mentioned above, transformation operations on input data of a neural network
are usually
( p performed. Let the number of input data be n, and the pth input data
p p)
x p = x1 , x2 , . . . , xd . The maximum value, minimum value, mean, and standard
deviation of each component are, respectively, calculated as follows:
{ p}
ximax = max xi (1.3.8)
p
{ p}
ximin = min xi (1.3.9)
p
1Σ p
n
μi = x (1.3.10)
n p=1 i
⎡
l Σ
l1 n ( p )2
σi = √ xi − μi (1.3.11)
n p=1
30 1 Overview
The most commonly employed transformation operations are the 0–1 transfor-
mation and the standardization.( Assuming that) the input data to the neural network
p p p
after transformation are x̃ p = x̃1 , x̃2 , . . . , x̃d , the 0–1 transformation of the input
data is given by
p
p xi − ximin
x̃i = (1.3.12)
ximax − ximin
The above transformation can mitigate the negative effects of large difference in
numerical ranges between individual parameters.
The batch normalization [33] performs the same operations on the inputs of each
layer as on the input data. When the input values of the ith unit of the lth layer in
p p
the pth learning pattern are xl,i and the output is yl,i , the input–output relationship is
expressed by
⎛ ⎞
( p)
Σ
= f⎝ wi,l j yl−1, j + θl,i ⎠
p p
yl,i = f xl,i (1.3.14)
j
where θl,i is the bias of the ith unit in the lth layer, wi,l j the connection weight between
the ith unit in the lth layer and the jth unit in the (l − 1)th layer.
The batch normalization is used to standardize the input values of each unit in a
mini-batch of size m. That is, if the input values of}the units in each training pattern
{ k+1
in a mini-batch are xl,i , x k+2 , . . . , xl,i
k+m−1
, xl,i
k+m
, then we employ as input values
{ k+1 l,ik+2 }
the transformations x̃l,i , x̃l,i , . . . , x̃l,ik+m−1
, x̃l,i
k+m
given as
p
p xl,i − μl,i
x̃l,i = γ + β, (k + 1 ≤ p ≤ k + m) (1.3.15)
σl,i
Here, both γ and β are parameters that are updated by learning, and μl,i and σl,i
are, respectively, calculated by
1 Σ p
k+m
μl,i = x (1.3.16)
m p=k+1 l,i
⎡
l
Σ ( p
l 1 k+m )2
σl,i = √ x − μl,i + ε (1.3.17)
m p=k+1 l,i
1.3 New Techniques for Deep Learning 31
Generative adversarial networks (GANs) [24], one of the most innovative techniques
developed for deep learning, consist of two neural networks: the generator and the
discriminator.
The former is a neural network to generate data that satisfies certain conditions,
and the latter that to judge whether the input data is true data or not. The generator
takes arbitrary data (for example, arbitrary data generated by random numbers) as
input and is trained to output data that satisfies certain conditions. The discriminator
is trained to correctly discriminate between data output by the generator (called fake
data) and data prepared in advance that truly satisfies certain conditions (called real
data). In the early stages of learning, the fake data output by the generator can be
easily detected as “fake” by the discriminator, but as the training of the generator
progresses, it may output fake data that cannot be detected even by the discriminator.
The goal of GAN is to build a generator that outputs real data that satisfies certain
conditions, or those that cannot be detected as “fake data” by the discriminator. The
training process of GAN can be summarized as follows:
(1) Two neural networks, the generator and the discriminator, are prepared as shown
in Fig. 1.21. The number of output units of the generator should be the same as
the number of input units of the discriminator, and the number of output units
of the discriminator is 1 because the discriminator determines whether the input
data is real or fake only.
(2) Prepare a large number of true data that satisfies certain conditions, called real
data.
(3) The generator takes a lot of arbitrary data generated by random numbers as
input to collect a lot of output data of the generator called fake data (Fig. 1.22).
(4) Training of the discriminator is performed using the real data prepared in (2) and
the fake data collected in (3) as input data. As shown in Fig. 1.23, the teacher
data should be real (e.g., 1) for real data and fake (e.g., 0) for fake data. In this
way, the discriminator is trained to correctly discriminate between real and fake
data.
(5) After the training of the discriminator, the generator is trained by connecting
the generator and the discriminator in series, as shown in Fig. 1.24. The input
Discriminator
(weights fixed)
Generator
data to the connected network is those of the generator, and the teacher data is
“real.” The back propagation algorithm is used to train the connected network,
where all the parameters (e.g., connection weights) of the discriminator part are
fixed, and only the parameters of the generator part are updated. In this way, the
generator is trained to output data that is judged to be real by the discriminator.
(6) Return to (3) after the training of the generator is completed.
Repeating the training of the discriminator and the generator alternately, the
trained generator finally becomes able to output data that is indistinguishable from
the real data by the discriminator. The GAN can use convolutional neural networks
for the generator and the discriminator.
The training process of GAN is written as a min–max problem as follows [24]:
⎡ ⏋ ⎡ ⏋
min max V (D, G) = E x∼ pd (x) log D(x) + E z∼ pz (z) log(1 − D(G(z))) (1.3.18)
G D
where V (D, G) is the objective function, D and G the discriminator and the gener-
ator, respectively, D(x) is the output of the discriminator for input data x, G(z) the
output of the generator for input data z, pd (x) the probability distribution of x, and
p z (z) the probability distribution of z. The training process (4) above is understood
to be the max operation of the left-hand side of Eq. (1.3.18), that is, the maximization
of the right side by updating the discriminator, while the training process (5) above
is the min operation on the left-hand side of Eq. (1.3.18), meaning the minimization
of the second term on the right-hand side by updating the generator.
34 1 Overview
GANs based on convolutional neural networks are used for various problems
including generation of images. For example, it can generate an image of a dog from
a noisy image with random numbers as input. However, it is known to be difficult to
control what kind of dog image the generator creates.
Conditional generative adversarial network (CGAN) [48] is a modified version
of GAN that can control the images generated by the generator, where both the
generator and the discriminator accept the same input data as in GAN as well as data
about the attributes of the data called label data. The training process of CGAN is
summarized as follows:
(1) Two neural networks, the generator and the discriminator, are prepared as shown
in Fig. 1.25. Both the generator and the discriminator use information about the
attributes of the data (Label data) as input data in addition to the standard GAN
input data. Thus, the number of input units of the discriminator is the sum of
two numbers: one is the number of output units of the generator and the other
the number of label data. The number of output units of the discriminator is 1
because the discriminator determines whether the input data is real or fake only.
(2) Prepare a large number of true data that satisfies certain conditions, called real
data, which are accompanied by label data indicating their attributes.
(3) The generator takes a lot of arbitrary data generated by random numbers and
their attribute information (label data) as input to collect a lot of output data
(fake data) (Fig. 1.26).
(4) Training of the discriminator is performed using the real data prepared in (2)
and the fake data collected in (3) as input data. The corresponding label data
are also used as input. As shown in Fig. 1.27, the teacher data are set real (e.g.,
1) for real data and fake (e.g., 0) for fake data, respectively. In this way, the
discriminator is trained to correctly discriminate between real and fake data.
(5) After the discriminator is trained, the training of the generator is performed by
connecting the generator and the discriminator in series, as shown in Fig. 1.28.
The input data to the connected network are the input data of the generator
and its label data with the teacher data being “real.” The back propagation
algorithm is used to train the connected network, where all the parameters (e.g.,
connection weights) of the discriminator part are fixed, and only the parameters
of the generator part are updated. In this way, the generator is trained to output
data judged to be real by the discriminator.
(6) Return to (3) after the training of the generator is completed.
36 1 Overview
Discriminator
(weights fixed)
Label Generator
(Fake Data)
where y is the attribute information (Label data). Equation (1.3.19) can be regarded
as the modified version of Eq. (1.3.18) conditioned with respect to y.
GANs have attracted much attention, especially for their effectiveness in image
generation and speech synthesis, and various improved GANs have been proposed.
Researches on them are still active: for example, DCGAN [55] using a convolutional
neural network, InfoGAN [9] with an improved loss function, LSGAN [44] with a loss
function based on the least square error, CycleGAN [71] with doubled generators and
discriminators, WGAN [2] with a loss function based on the Wasserstein distance,
ProgressiveGAN [35] with hierarchically high resolution, and StyleGAN [36] with
an improved generator.
It is known that the variational autoencoder performs the similar function as the
generative adversarial network (GAN) described in Sect. 1.3.6.
1.3 New Techniques for Deep Learning 37
Figure 1.29 shows the basic schematic diagram of the autoencoder. Let the number
( )T
of training data be N , the kth training data (input) x k = x1k , x2k , . . . , xnk , the
( ) T
encoder output yk = y1k , y2k , . . . , ymk (m < n), and the decoder output x̃ k =
( k k ) T
x̃1 , x̃2 , . . . , x̃nk . Then, the objective function E to be minimized in the training
process of the autoencoder is given by
1 Σll k l2
N
l
E= l x̃ − x k l (1.3.20)
2 k=1
Here, it is assumed that the input data are used as the teacher data also. This leads
to that the output yk of the encoder is considered to be a compressed representation
of the input data x k .
Unlike the conventional autoencoders, the variational autoencoders [37, 38] learn
distributions of probability. Figure 1.30 shows a schematic diagram of the operation
of a variational autoencoder. The encoder is assumed to represent the probability
distribution of Eq. (1.3.21). Here, z = (z 1 , z 2 , . . . , z m )T is called a latent variable and
is usually much ( lower ) dimensional (m « n) than the input x = (x1 , x2 , . . . , xn )T .
Note that N z|μ, σ 2 is a multidimensional normal distribution with mean μ =
(μ1 , μ2 , . . . , μm )T and variance σ 2 (standard deviation σ = (σ1 , σ2 , . . . , σm )T ).
( )
qφ (z|x) = N z|μ, σ 2 (1.3.21)
Decoder
Code
Encoder
Input
38 1 Overview
Decoder
Encoder
Input
The encoder outputs the mean μ and the standard deviation σ (square root of
variance) as parameters of the probability
( distribution
) with the input x.
Though z is to be sampled from N z|μ, σ 2 , it is practically determined in the
variational autoencoder to enable the error back propagation learning as
z = μ + εσ (1.3.22)
where ε is a number sampled from N (ε|0, 1). Equation (1.3.22) is called the
reparameterization trick.
On the other hand, the decoder is assumed to represent the probability distribution
as follows:
E = E KL + E Recon (1.3.24)
where E KL is calculated from the Kullback Leibler (KL) divergence [5], which
is a measure of the difference, representing the distance between two probability
distributions qφ (z|x) and pθ (z), and given by
1.3 New Techniques for Deep Learning 39
1 Σ( )
m
E KL = − 1 + log σ j2 − μ2j − σ j2 (1.3.25)
2 j=1
1Σ ( )
L
E Recon = − log pθ x|z k (1.3.26)
L k=1
where L is the number of ε (i.e., the number of z) for a single input x. As the output
will change with the value of, averaging is naturally performed. For example, when
input and output are image data (each pixel is represented by a real number (0, 1)),
we often use the equation as follows:
1 Σ Σ{ ( )}
L n
E Recon = − xi log yik + (1 − xi ) log 1 − yik (1.3.27)
L k=1 i=1
As written in Sect. 1.2, the differentiation is frequently used in the error back propa-
gation learning. (see Sect. 2.1 for details.) Libraries for deep learning (such as tensor-
flow) are usually equipped with an automatic differentiation function for calculating
derivatives, which is useful for general purposes as well.
As is well known, the derivative of a function f (x) is defined as,
df f (x + h) − f (x)
= lim (1.3.28)
dx h→0 h
∂f f (x + h, y) − f (x, y)
= lim (1.3.29)
∂x h→0 h
∂f f (x, y + h) − f (x, y)
= lim (1.3.30)
∂y h→0 h
On the other hand, several methods for differentiation are available on a computer,
including numerical differentiation, symbolic differentiation and automatic differ-
entiation. Let us clarify the difference between them.
40 1 Overview
f (x + h) − f (x)
∃δ > 0, |h| < δ, =0 (1.3.32)
h
This is due to the reason that f (x + h) and f (x) are interpreted as the same value
in a computer for extremely small value of h, since the precision of the numerical
representation in a computer is limited in finite digits. Thus, h cannot be made suffi-
ciently small, making it difficult to obtain accurate derivative values with numerical
differentiation.
On the other hand, in symbolic differentiation, derivatives are obtained in the form
of equations, which results in accurate evaluation of derivative values. There have
been some software systems that can perform symbolic differentiation: REDUCE and
Maxima, which have been developed since the 1960s. In addition to them, commer-
cial software such as Mathematica, Maple, and Derive, which are equipped with
formula manipulation system including symbolic differentiation. In terms of deep
learning, the partial differentiation of the output of a neural network by the connec-
tion weight is often encountered, but its mathematical expression can be complicated
(see Sect. 2.1), and the benefit of obtaining the mathematical expression in explicit
form is little. The error back propagation learning does not require explicit expres-
sions of derivatives and it is sufficient to know the values of the derivatives. For this
reason, obtaining the derivative as a mathematical expression is considered too much
process for deep learning.
The last method, automatic differentiation, provides accurate derivative values
unlike numerical differentiation. Although automatic differentiation cannot provide
rigorous mathematical expressions in explicit form unlike symbolic differentiation,
the method is useful for deep learning because it calculates the exact derivative value
based on the chain rule of differentiation, where all arithmetic operations are repre-
sented by computational graphs and automatic differentiation uses computational
graphs.
As a simple example, Fig. 1.31 shows a computational graph for z = x + y. The
nodes in a computational graph represent numerical values, and the lines connecting
the nodes represent operations (such as arithmetic operations) on the numerical
values.
1.3 New Techniques for Deep Learning 41
Figure 1.32 shows the structure of a three-layer feedforward neural network, where
the output of the unit in the output layer is represented as O1 , the output of the units
in the hidden layer as H1 , H2 , and H3 , the output of the unit in the input layer unit
(i.e., input) as I1 and I2 , the connection weights between the unit in the output layer
and units in the hidden layer as c11 , c12 , c13 , and the connection weights between the
units in the hidden layer and the units in the input layer as b11 , . . . , b32 . Let f () be
the activation function of the units in the output layer and also in the hidden layer.
For simplicity, we assume that the bias values of all units are zero. Then, We have
( 2 )
Σ
H1 = f b1i Ii = f (b11 I1 + b12 I2 ) (1.3.33)
i=1
( 2 )
Σ
H2 = f b2i Ii = f (b21 I1 + b22 I2 ) (1.3.34)
i=1
( 2 )
Σ
H3 = f b3i Ii = f (b31 I1 + b32 I2 ) (1.3.35)
i=1
⎛ ⎞
Σ
3
O1 = f ⎝ c1 j H j ⎠ = f (c11 H1 + c12 H2 + c13 H3 ) (1.3.36)
j=1
v−1 = I1 , v0 = I2 (1.3.37)
v1 = b11 v−1 + b12 v0 , v2 = b21 v−1 + b22 v0 , v3 = b31 v−1 + b32 v0 (1.3.38)
v8 = f (v7 ) = O1 (1.3.41)
42 1 Overview
b11 b32
I1 I2
v4 v5 v6
f f f
v1 v2 v3
b11 b32
v-1 v0
I1 I2
1.3 New Techniques for Deep Learning 43
Let us calculate the derivative of the output with respect to the input using the
automatic differentiation method on this computational graph as
∂ O1
(1.3.42)
∂ I1
Two methods of automatic differentiation are available: the forward and the
reverse modes. First, let us try the forward mode, in which the derivative of vi
with respect to I1 is calculated sequentially as
∂vi ∂vi
v̇ı = = (1.3.43)
∂ I1 ∂v−1
∂v8 ∂v8 ∂ O1
v̇8 = = = (1.3.44)
∂v−1 ∂ I1 ∂ I1
∂v−1
v̇−1 = =1 (1.3.45)
∂v−1
∂v0
v̇0 = =0 (1.3.46)
∂v−1
∂v1 ∂(b11 v−1 + b12 v0 )
v̇1 = = = b11 v̇−1 + b12 v̇0 = b11 (1.3.47)
∂v−1 ∂v−1
∂v2 ∂(b21 v−1 + b22 v0 )
v̇2 = = = b21 v̇−1 + b22 v̇0 = b21 (1.3.48)
∂v−1 ∂v−1
∂v3 ∂(b31 v−1 + b32 v0 )
v̇3 = = = b11 v̇−1 + b32 v̇0 = b31 (1.3.49)
∂v−1 ∂v−1
∂v4 ∂ f (v1 ) ∂ f (v1 ) ∂v1
v̇4 = = = = f ' (v1 )v̇1 (1.3.50)
∂v−1 ∂v−1 ∂v1 ∂v−1
∂v5 ∂ f (v2 ) ∂ f (v2 ) ∂v2
v̇5 = = = = f ' (v2 )v̇2 (1.3.51)
∂v−1 ∂v−1 ∂v2 ∂v−1
∂v−1 ∂ f (v3 ) ∂ f (v3 ) ∂v3
v̇6 = = = = f ' (v3 )v̇3 (1.3.52)
∂v−1 ∂v−1 ∂v3 ∂v−1
∂v7 ∂(c11 v4 + c12 v5 + c13 v6 )
v̇7 = = = c11 v̇4 + c12 v̇5 + c13 v̇6 (1.3.53)
∂v−1 ∂v−1
44 1 Overview
Note that the calculations in the forward mode are performed sequentially from
the input side, and all the values required for each calculation are known since already
calculated at the precedent steps or easily calculated at the current step.
Now, let us try the reverse mode, where the derivative of O1 with respect to vi is
calculated sequentially as Fig. 1.33.
∂ O1
vi = (1.3.55)
∂vi
∂ O1 ∂ O1
v −1 = = (1.3.56)
∂v−1 ∂ I1
∂ O1 ∂ O1
v8 = = =1 (1.3.57)
∂v8 ∂ O1
∂ O1 ∂v8 ∂ f (v7 )
v7 = = = = f ' (v7 ) (1.3.58)
∂v7 ∂v7 ∂v7
∂ O1 ∂v8 ∂v8 ∂v7 ∂(c11 v4 + c12 v5 + c13 v6 )
v6 = = = = v7 = v 7 c13 (1.3.59)
∂v6 ∂v6 ∂v7 ∂v6 ∂v6
∂ O1 ∂v8 ∂v8 ∂v7 ∂(c11 v4 + c12 v5 + c13 v6 )
v5 = = = = v7 = v 7 c12 (1.3.60)
∂v5 ∂v5 ∂v7 ∂v5 ∂v5
∂ O1 ∂v8 ∂v8 ∂v7 ∂(c11 v4 + c12 v5 + c13 v6 )
v4 = = = = v7 = v 7 c11 (1.3.61)
∂v4 ∂v4 ∂v7 ∂v4 ∂v4
∂ O1 ∂ O1 ∂v6 ∂ f (v3 )
v3 = = = v6 = v 6 f ' (v3 ) (1.3.62)
∂v3 ∂v6 ∂v3 ∂v3
∂ O1 ∂ O1 ∂v5 ∂ f (v2 )
v2 = = = v5 = v 5 f ' (v2 ) (1.3.63)
∂v2 ∂v5 ∂v2 ∂v2
∂ O1 ∂ O1 ∂v4 ∂ f (v1 )
v1 = = = v4 = v 4 f ' (v1 ) (1.3.64)
∂v1 ∂v4 ∂v1 ∂v1
∂ O1 ∂ O1 ∂v1 ∂ O1 ∂v2 ∂ O1 ∂v3
v̄ 0 = = + +
∂v0 ∂v1 ∂v0 ∂v2 ∂v0 ∂v3 ∂v0
∂v1 ∂v2 ∂v3 ∂(b11 v−1 + b12 v0 )
= v̄ 1 + v̄ 2 + v̄ 3 = v̄ 1
∂v0 ∂v0 ∂v0 ∂v0
References 45
Note that the calculations in the reverse mode are performed sequentially from the
output side, and all the values required for each calculation are known when needed.
Since the error back propagation algorithm, which is the standard training method
for neural networks and deep learning, often uses partial differentiation with respect
to parameters, many deep learning libraries have an automatic differentiation func-
tion. In particular, the automatic differentiation method in the reverse mode is
closely related to the error back propagation algorithm [3]. It is noted that auto-
matic differentiation plays an important role in physics-informed neural networks
(Sect. 2.4.3).
References
1. Amari, S.: A theory of adaptive pattern classifiers. IEEE Trans. Electron. Comput. EC-16,
299–307 (1967)
2. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN, arXiv: 1701.07875, (2017)
3. Baydin, A.G., Pearlmutter, B.A., Radul, A.A., Siskind, J.M.: Automatic differentiation in
machine learning: a survey. J. Mach. Learn. Res. 18, 5595–5637 (2018)
4. Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep
networks. Proceedings of NIPS, (2006)
5. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer (2006)
6. Biswas, A., Chandrakasan, A.P.: Conv-RAM: An energy-efficient SRAM with embedded
convolution computation for low-power CNN-based machine learning applications, 2018 IEEE
International Solid - State Circuits Conference - (ISSCC), 2018, pp. 488–490, https://doi.org/
10.1109/ISSCC.2018.8310397
7. Breiman, L.: Random forests, Machine Learning. 45(1), 5–32 (2001)
8. Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A.,
Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T.,
Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E.,
Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever,
I., Amodei, D.: Language models are few-shot learners. arXiv: 2005.14165, (2020)
9. Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: InfoGAN: Inter-
pretable representation learning by information maximizing generative adversarial nets. arXiv:
1606.03657, (2016)
10. Ciresan, D., Meier, U., Masci, J., Schmidhuber, J.: Multi-column deep neural network for traffic
sign classification. Neural Netw. 32, 333–338 (2012)
46 1 Overview
11. Conneau, A., Baevski, A., Collobert, R., Mohamed, A., Auli, M.: Unsupervised cross-lingual
representation learning for speech recognition. arXiv: 2006.13979, (2020)
12. Courbariaux, M., Bengio, Y., David, J.P.: Binaryconnect: Training deep neural networks with
binary weights during propagations. Adv. Neural Inf. Process. Sys. 28, 3105–3113 (2015)
13. Courbariaux, M., David, J.P., Bengio. Y.: Low precision storage for deep learning, arXiv:
1412.7024, (2014)
14. Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., Bengio. Y.: Binarized neural networks:
Training deep neural networks with weights and activations constrained to +1 or −1, arXiv:
1602.02830, (2016)
15. Deng, C., Liao, S., Xie, Y., Parhi, K.K., Qian, X., Yuan, B.: PermDNN: efficient compressed
DNN architecture with permuted diagonal matrices, Proceedings of the 51st Annual IEEE/ACM
International Symposium on Microarchitecture (MICRO-51), 2018, pp. 189–202, https://doi.
org/10.1109/MICRO.2018.00024
16. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional
transformers for language understanding, arXiv: 1810.04805, (2018)
17. Elman, J.L., Zipser, D.: Learning the hidden structure of speech. Journal of the Acoustical
Society of America 83, 1615–1626 (1988)
18. Fukushima, K.: Neocognitron: a self-organizing neural network model for a mechanism of
pattern recognition unaffected by shift in position. Biol. Cybern. 36(4), 193–202 (1980)
19. Funahashi, K.: On the approximate realization of continuous mappings by neural networks.
Neural Netw. 2, 183–192 (1989)
20. Gong, J., Shen, H., Zhang, G., Liu, X., Li, S., Jin, G., Maheshwari, N., Fomenko, E., Segal,
E.: Highly efficient 8-bit low precision inference of convolutional neural networks with Intel-
Caffe, In Proceedings of the 1st on Reproducible Quality-Efficient Systems Tournament on
Co-designing Pareto-efficient Deep Learning (ReQuEST ‘18). Association for Computing
Machinery, New York, NY, USA, Article 2, 1. https://doi.org/10.1145/3229762.3229763
21. Gonzalez, R.C., Woods, R.E.: Digital Image Processing (Second Edition). Prentice-Hall (2002)
22. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016)
23. Goodfellow, I.J., Shlens, J. Szegedy, C.: Explaining and harnessing adversarial examples. arXiv:
1412.6572, (2014)
24. Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Bengio,
Y.: Generative adversarial networks. arXiv: 1406.2661, (2014)
25. Gupta, S., Agrawal, A., Gopalakrishnan, K., Narayanan, P.: Deep learning with limited numer-
ical precision. Proceedings of the 32nd International Conference on Machine Learning, Lille,
France, 2015.
26. He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition. 2016
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016,
pp. 770–778, https://doi.org/10.1109/CVPR.2016.90.
27. Heykin, S.: Neural Networks: A comprehensive Foundation. Prentice Hall (1999)
28. Hinton, G.E., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural
Comput. 18, 1527–1544 (2006)
29. Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal
approximators. Neural Netw. 2, 359–366 (1989)
30. Hu, J., Shen, L., Sun, G.: Squeeze-and-Excitation Networks. 2018 IEEE/CVF Conference on
Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 7132–7141,
https://doi.org/10.1109/CVPR.2018.00745.
31. Hubel, D.H., Wiesel, T.N.: Receptive fields, binocular interaction and functional architecture
in cat’s visual cortex. J. Physiol. 160, 106–154 (1962)
32. Hughes, T.J.R.: The Finite Element Method : Linear Static and Dynamic Finite Element
Analysis. Dover (2000)
33. Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing
internal covariate shift. In International conference on machine learning (pp. 448–456). PMLR,
2015.
References 47
34. Jaitly, N., Hinton, G.: Vocal Tract Length Perturbation (VTLP) improves speech recognition.
in ICML Workshop on Deep Learning for Audio, Speech and Language Processing, 2013.
35. Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality,
stability, and variation. arXiv: 1710.10196, (2017)
36. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial
networks. arXiv: 1812.04948, (2018)
37. Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. arXiv: 1312.6114, (2013)
38. Kingma, D.P., Welling, M.: An Introduction to Variational Autoencoders. Found. Trends Mach.
Learn. 12(4), 307–392 (2019). https://doi.org/10.1561/2200000056
39. Kirk, D.B., Hwu, W.W.: Programming Massively Parallel Processors: A Hands-on Approach.
Morgan Kaufmann (2010)
40. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional
neural networks. In NIPS’ 2012, (2012)
41. Kubo, S.: Inverse problems related to the mechanics and fracture of solids and structures. JSME
Int. J. 31(2), 157–166 (1988)
42. LeCun, Y.: Generalization and network design strategies. Technical Report CRG-TR-89-4,
Department of Computer Science, University of Toronto (1989)
43. LeCun, Y., Bengio, Y., Hinton, G.E.: Deep learning. Nature 521, 436–444 (2015)
44. Mao, X., Li, Q., Xie, H., Lau, R.Y.K., Wang, Z., Smolley, S.P.: Least squares generative
adversarial networks. arXiv: 1611.04076, (2016)
45. McCulloch, W.S., Pitts, W.: A logical calculus of the ideas immanent in nervous activity. Bull.
Math. Biophys. 5, 115–133 (1943)
46. Michalewicz, Z.: Genetic Algorithms + Data Structures = Evolution Programs. Springer-
Verlag (1992)
47. Minsky, M.L., Papert, S.A.: Perceptrons. MIT Press (1969)
48. Mizra, M., Osindero, S.: Conditional generative adversarial nets. arXiv: 1411.1784, (2014)
49. Murphy, K.P.: Machine Learning: A Probabilistic Perspective. MIT Press (2012)
50. Oishi, A., Yagawa, G., Computational mechanics enhanced by deep learning. Comput. Methods
Appl. Mech. Eng. 327, 327–351 (2017)
51. Oishi, A., Yamada, K., Yoshimura, S., Yagawa, G.: Quantitative nondestructive evaluation with
ultrasonic method using neural networks and computational mechanics. Comput. Mech. 15,
521–533 (1995)
52. Oishi, A., Yamada, K., Yoshimura, S., Yagawa, G., Nagai, S., Matsuda, Y.: Neural network-
based inverse analysis for defect identification with laser ultrasonics. Res. Nondestruct. Eval.
13(2), 79–95 (2001)
53. Park, D.S., Chan, W., Zhang, Y., Chiu, C., Zoph, B., Cubuk, E.D., Le, Q.V.: SpecAugment: A
simple data augmentation method for automatic speech recognition. Proceedings of Interspeech
2019, pp. 2613–2617, https://doi.org/10.21437/Interspeech.2019-2680
54. Ping, W., Peng, K., Gibiansky, A., Arik, S.O., Kannan, A., Narang, S., Raiman, J., Miller, J.:
Deep voice 3: Scaling text-to-speech with convolutional sequence learning. arXiv: 1710.07654,
(2018)
55. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolu-
tional generative adversarial networks. arXiv: 1511.06434, (2015)
56. Rosenblatt, F.: The perceptron: A probabilistic model for information storage and organization
in the brain. Psychol. Rev. 65, 386–408 (1958)
57. Rosenblatt, F.: On the convergence of reinforcement procedures in simple perceptrons. Cornell
Aeronautical Laboratory Report, VG-1196-G-4, (1960)
58. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating
errors. Nature, 323, 533–536 (1986)
59. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A.,
Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet Large Scale Visual Recognition
Challenge. Int. J. Comput. Vision 115, 211–252 (2015). https://doi.org/10.1007/s11263-015-
0816-y
48 1 Overview
60. Sietsma, J., Dow, R.: Creating artificial neural networks that generalize. Neural Netw. 4, 67–79
(1991)
61. Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser,
J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalch-
brenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., Hassabis, D.:
Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489
(2016)
62. Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T.,
Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T., Hui, F., Sifre,L., van den Driessche, G.,
Graepel, T., Hassabis, D.: Mastering the game of Go without human knowledge. Nature 550,
354–359 (2017)
63. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image
recognition. ICLR 2015, arXiv: 1409.1556, (2015)
64. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple
way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
65. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. Adv.
Neural Inf. Process. Sys. 27, 3104–3112 (2014).
66. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V.,
Rabinovich, A.: Going deeper with convolutions. IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), 2015, pp. 1–9, https://doi.org/10.1109/CVPR.2015.7298594
67. Ueberhuber, C.W.: Numerical Computation 1: Methods, Software, and Analysis. Springer
(1997)
68. Wan, L., Zeiler, M., Zhang, S., Le Cun, Y., Fergus, R.: Regularization of neural networks
using DropConnect. Proceedings of the 30th International Conference on Machine Learning,
in PMLR 28(3), 2013, pp. 1058–1066
69. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet D.,
Pajdla T., Schiele B., Tuytelaars T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture
Notes in Computer Science, vol 8689. Springer, Cham. https://doi.org/10.1007/978-3-319-
10590-1_53
70. Zeng, X., Ouyang, W., Yan, J., Li, H., Xiao, T., Wang, K., Liu, Y., Zhou, Y., Yang, B., Wang,
Z., Zhou, H., Wanget, X.: Crafting GBD-Net for object detection. IEEE Trans. Pattern Anal.
Mach. Intell. 40(09), 2109–2123 (2018)
71. Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-
consistent adversarial networks. arXiv: 1703.10593, (2017)
72. Zienkiewicz, O.C., Morgan, K.: Finite Elements and Approximation. Dover (2006)
Chapter 2
Mathematical Background for Deep
Learning
Abstract This chapter deals with the operation of neural networks and deep learning
in detail using mathematical formulas. Section 2.1 explains the feedforward neural
network including the error back propagation algorithm, Sect. 2.2 the convolutional
neural networks, which have become the mainstream of deep learning in recent years,
and Sect. 2.3 compares various methods for accelerating the training process. Finally,
Sect. 2.4 describes regularization methods to suppress overtraining for improving
performance of the trained neural networks.
Regarding a fully connected feedforward neural network, which is the most basic
neural network, both the forward and the error back propagation processes are
discussed here using mathematical expressions.
In general, a feedforward neural network has a hierarchical structure of units with
nonlinear transformation functions [4], and the input–output relationship of the j-th
unit of the l-th layer is expressed as
( )
O lj = f U lj (2.1.1)
where
O lj : output value of the activation function of the j-th unit in the l-th layer,
U lj : input value to the activation function of the j-th unit in the l-th layer,
f (): activation function.
Note that U lj is expressed using the input from the units in the previous layer as
follows:
Σ
nl−1
U lj = wl−1
ji · Oi
l−1
+ θ lj (2.1.2)
i=1
where
n l−1 : the number of units in the (l − 1)-th layer,
wl−1 ji : the connection weight between the i-th unit in the (l − 1)-th layer and the
j-th unit in the l-th layer,
Oil−1 : the output of the i-th unit in the (l − 1)-th layer,
θ lj : the bias of the j-th unit in the l-th layer.
As for the activation functions f (), the following functions are often employed.
1
f (x) = Sigmoid function (2.1.3)
1 + e−x
ex − e−x
f (x) = tanh x = Hyperbolic tangent function (2.1.4)
ex + e−x
{
x (x ≥ 0)
f (x) = Rectified linear unit ( ReLU) function (2.1.5)
0 (x < 0)
Regarding the hyperbolic tangent function and the ReLU function, we, respec-
tively, have
-1
-4 -2 0 2 4
2.1 Feedforward Neural Network 51
0.5
0.0
-0.5
-4 -2 0 2 4
( )' ( x )2
df ex − e−x e − e−x
= =1− = 1 − ( f (x))2 (2.1.7)
dx ex + e−x (1 + e−x )2
{
df 1 (x ≥ 0)
= (2.1.8)
dx 0 (x < 0)
The first-order derivatives of these three function are shown in Fig. 2.2, where the
derivative of the sigmoid function is valued to be at most 0.25, which is smaller than
the derivatives of the others.
Note that the following function is usually used as the activation function of input
units,
f (x) = x (2.1.9)
and that a linear function is often used to be the activation function of output units as
f (x) = ax + b (2.1.10)
1Σ P Σ n n
L
( p L p )2
E= O j − Tj (2.1.11)
2 p=1 j=1
52 2 Mathematical Background for Deep Learning
where
p L
O j : the output of the j-th unit in the output at the L-th layer (output layer) for
the p-th training pattern,
p
T j : the teacher signal corresponding to the output of the j-th unit in the output
layer for the p-th training pattern,
n L : the number of units in the L-th layer (output layer),
n P : the total number of training patterns.
In stochastic gradient descent methods, the error for each pattern is often used instead
of Eq. (2.1.11) as
1Σ
n
L
( p L p )2
E= O j − Tj (2.1.12)
2 j=1
From Eqs. (2.1.11) and (2.1.12), this error is regarded as a function of the
connection weights and the biases. Then, it is written as
E = E(w, θ ) (2.1.13)
where w is a vector of all the connection weights and θ a vector of all the biases.
Finding w and θ that minimize the error is called training or learning of a feedforward
neural network. The error back propagation algorithm based on the steepest descent
method is widely used for training, where the connection weights and biases are
iteratively updated based on the gradient of the error as
∂ E(w, θ )
Δwl−1
ji = − (2.1.14)
∂wl−1
ji
wl−1
ji = w ji + α · Δw ji
l−1 l−1
(2.1.15)
∂ E(w, θ )
Δθ lj = − (2.1.16)
∂θ lj
θ lj = θ lj + β · Δθ lj (2.1.17)
where
Δwl−1 ji : the amount of update of the connection weight between the j-th unit in
the l-th layer and the i-th unit in the (l − 1)-th layer,
Δθ lj : the amount of update of the bias at the j-th unit in the l-th layer,
α: the learning coefficient for update of the connection weight,
β: the learning coefficient for update of the bias.
2.1 Feedforward Neural Network 53
1Σ
n
4
( L )2
E= O j − Tj (2.1.18)
2 j=1
Let us start with the calculation of the amount of update of connection weights.
∂E
First, ∂w 3 is calculated as follows:
ab
1Σ
n
∂E 4
∂ ( 4 )2
= O j − Tj
∂wab
3 2 j=1 ∂wab
3
Σ
n4
( ) ∂ O 4j
= O 4j − T j
j=1
∂wab
3
( ) ∂ Oa4
= Oa4 − Ta (2.1.19)
∂wab3
( 4)
∂ Oa4 ∂ f Ua
=
∂wab
3
∂wab
3
( 4)
∂ f Ua ∂Ua4
=
∂Ua4 ∂wab
3
( 4 ) (Σn 3 )
∂ f Ua ∂ i=1 wai · Oi + θa
3 3 4
=
∂Ua4 ∂wab3
54 2 Mathematical Background for Deep Learning
( ) n3
∂ f Ua4 Σ ∂wai
3
= Oi3
∂Ua4 i=1 ∂wab
3
( )
∂ f Ua4 3
= Ob (2.1.20)
∂Ua4
∂E
Similarly, ∂wcd
2 is calculated as follows:
1Σ
n
∂E 4
∂ ( 4 )2
= O j − Tj
∂wcd
2 2 j=1 ∂wcd
2
Σ
n4
( ) ∂ O 4j
= O 4j − T j (2.1.22)
j=1
∂wcd
2
( )
∂ O 4j ∂ f U 4j
=
∂wcd
2
∂wcd
2
( )
∂ f U 4j ∂U 4
j
=
∂U 4j ∂wcd
2
( ) (Σ )
n3
∂ f U 4j ∂ i=1 w ji · Oi + θa
3 3 4
=
∂U 4j ∂wcd
2
( )
∂ f U 4j Σ
n3
∂ Oi3
= w 3ji (2.1.23)
∂U 4j i=1
∂wcd
2
( )
∂ Oi3 ∂ f Ui3
=
∂wcd
2
∂wcd
2
( 3)
∂ f Ui ∂Ui3
=
∂Ui3 ∂wcd 2
( 3 ) (Σn 2 )
∂ f Ui ∂ k=1 wik · Ok + θi
2 2 3
=
∂Ui3 ∂wcd2
( 3 ) n2
∂ f Ui Σ ∂wik 2
= Ok2
∂Ui3 k=1 ∂wcd 2
2.1 Feedforward Neural Network 55
( )
∂ f Ui3 ∂wid
2
= Od2 (2.1.24)
∂Ui3 ∂wcd
2
∂E
Further, ∂weg
1 is calculated as
1Σ
n
∂E 4
∂ ( 4 )2
= O j − Tj
∂weg
1 2 j=1 ∂weg
1
Σ
n4
( ) ∂ O 4j
= O 4j − T j (2.1.26)
j=1
∂weg
1
( )
∂ O 4j ∂ f U 4j
=
∂weg
1 ∂weg
1
( )
∂ f U 4j ∂U 4
j
=
∂U 4j ∂weg
1
( ) (Σ )
n3
∂ f U 4j ∂ i=1 w ji · Oi + θa
3 3 4
=
∂U 4j ∂weg
1
( )
∂ f U 4j Σ
n3
∂ Oi3
= w 3ji (2.1.27)
∂U 4j i=1 ∂weg
1
( )
∂ Oi3 ∂ f Ui3
=
∂weg
1 ∂weg
1
( 3)
∂ f Ui ∂Ui3
=
∂Ui3 ∂weg1
( 3 ) (Σn 2 )
∂ f Ui ∂ k=1 wik · Ok + θi
2 2 3
=
∂Ui3 ∂weg1
56 2 Mathematical Background for Deep Learning
( ) n2
∂ f Ui3 Σ ∂ Ok2
= wik
2
(2.1.28)
∂Ui3 k=1 ∂weg
1
( )
∂ Ok2 ∂ f Uk2
=
∂weg
1 ∂weg
1
( 2)
∂ f Uk ∂Uk2
=
∂Uk3 ∂weg 1
( 2 ) (Σn 1 1 )
∂ f Uk ∂ l=1 wkl · Ol + θi
1 2
=
∂Uk2 ∂weg
1
( 2 ) n1
∂ f Uk Σ ∂wkl 1
= Ol1
∂Uk2 l=1 ∂weg 1
( )
∂ f Uk2 ∂wkg1
= Og1 (2.1.29)
∂Uk2 ∂weg 1
Substituting Eqs. (2.1.27), (2.1.28), and (2.1.29) into Eq. (2.1.26), we achieve
( )
( ) n2 ( )
∂E Σ
n4
( ) ∂ f U 4j Σ
n3
∂ f Ui3 Σ ∂ f Uk2 ∂wkg
1
= O 4j − Tj w 3ji wik
2
Og1
∂weg
1
j=1
∂U 4j i=1
∂Ui3 k=1
∂Uk2 ∂weg
1
( )
( ) ( 2)
)∂ f Uj Σ
4
Σ
n4
( n3
∂ f Ui3 2 ∂ f Ue
= O 4j − T j w 3ji wie Og1
j=1
∂U 4j i=1
∂Ui3 ∂Ue2
(2.1.30)
From these calculations above, the amount of updates of the connection weights is
written as follows:
( )
∂E ( 4 ) ∂ f U 4j
Δw ji = − 3 = − O j − T j
3
Oi3 (2.1.31)
∂w ji ∂U 4j
( )
( 4)
∂ E Σn4
( ) ∂ f U ∂ f U 3j
k
Δw 2ji = − 2 = − Ok4 − Tk wk3j Oi2 (2.1.32)
∂w ji k=1
∂U 4
k ∂U 3
j
∂E
Δw 1ji = −
∂w 1ji
( )
( ) n3 ( )
Σ
n4
( ) ∂ f Uk4 Σ ∂ f Ul3 ∂ f U 2j
=− Ok4 − Tk wkl
3
wl2j Oi1 (2.1.33)
k=1
∂Uk4 l=1
∂Ul3 ∂U 2j
2.1 Feedforward Neural Network 57
∂E
Next, let us calculate the amount of update of biases. First, ∂θa4
is calculated as
follows:
1Σ
n
∂E 4
∂ ( 4 )2
= O j − Tj
∂θa
4 2 j=1 ∂θa
4
Σ
n4
( ) ∂ O 4j
= O 4j − T j
j=1
∂θa4
( ) ∂ Oa4
= Oa4 − Ta (2.1.34)
∂θa4
( )
∂ Oa4 ∂ f Ua4
=
∂θa4 ∂θa4
( )
∂ f Ua4 ∂Ua4
=
∂Ua4 ∂θa4
( ) (Σn 3 )
∂ f Ua4 ∂ i=1 wai · Oi + θa
3 3 4
=
∂Ua4 ∂θa4
( 4)
∂ f Ua
= (2.1.35)
∂Ua4
∂E
Similarly, ∂θb3
is calculated as
1Σ
n
∂E 4
∂ ( 4 )2
= O j − Tj
∂θb
3 2 j=1 ∂θb
3
Σ
n4
( ) ∂ O 4j
= O 4j − T j (2.1.37)
j=1
∂θb3
( )
∂ O 4j ∂ f U 4j
=
∂θb3 ∂θb3
( )
∂ f U 4j ∂U 4
j
=
∂U 4j ∂θb3
58 2 Mathematical Background for Deep Learning
( ) (Σ )
n3
∂ f U 4j ∂ i=1 w ji · Oi + θ j
3 3 4
=
∂U 4j ∂θb3
( )
∂ f U 4j Σ
n3
∂ Oi3
= w 3ji
∂U 4j i=1
∂θb3
( )
∂ f U 4j ∂ Ob3
= w 3jb (2.1.38)
∂U 4j ∂θb3
( )
∂ Ob3 ∂ f Ub3
=
∂θb3 ∂θb3
( )
∂ f Ub3 ∂Ub3
=
∂Ub3 ∂θb3
( ) (Σn 2 )
∂ f Ub3 ∂ k=1 wbk · Ok + θb
2 2 3
=
∂Ub3 ∂θb3
( 3)
∂ f Ub
= (2.1.39)
∂Ub3
∂E
Further, ∂θc2
is calculated as follows:
1Σ
n
∂E 4
∂ ( 4 )2
= O − Tj
∂θc2 2 j=1 ∂θc2 j
Σ
n4
( ) ∂ O 4j
= O 4j − T j (2.1.41)
j=1
∂θc2
( )
∂ O 4j ∂ f U 4j
=
∂θc2 ∂θc2
( )
∂ f U 4j ∂U 4
j
=
∂U 4j ∂θc2
( ) (Σ )
n3
∂ f U 4j ∂ i=1 w 3
ji · O 3
i + θ 4
j
=
∂U 4j ∂θc2
2.1 Feedforward Neural Network 59
( )
∂ f U 4j Σ
n3
∂ Oi3
= w 3ji (2.1.42)
∂U 4j i=1
∂θc2
( )
∂ Oi3 ∂ f Ui3
=
∂θc2 ∂θc2
( )
∂ f Ui3 ∂Ui3
=
∂Ui3 ∂θc2
( ) (Σn 2 )
∂ f Ui3 ∂ k=1 wik · Ok + θi
2 2 3
=
∂Ui3 ∂θc2
( 3 ) n2
∂ f Ui Σ 2 ∂ Ok2
= w (2.1.43)
∂Ui3 k=1 ik ∂θc2
( )
∂ Ok2 ∂ f Uk2
=
∂θc2 ∂θc2
( )
∂ f Uk2 ∂Uk2
=
∂Uk2 ∂θc2
( ) (Σn 1 1 )
∂ f Uk2 ∂ l=1 wkl · Ol + θk
1 2
=
∂Uk2 ∂θc2
( 2) 2
∂ f Uk ∂θk
= (2.1.44)
∂Uk2 ∂θc2
Substituting Eqs. (2.1.42), (2.1.43), and (2.1.44) into Eq. (2.1.41), we obtain
( )
( ) n2 ( )
∂E Σ
n4
( ) ∂ f U 4j Σ
n3
∂ f Ui3 Σ ∂ f Uk2 ∂θk2
= O 4j − Tj w 3ji wik
2
∂θc2 j=1
∂U 4j i=1 ∂Ui3 k=1
∂Uk2 ∂θc2
( )
( ) ( 2)
Σ
n4
( ) ∂ f U 4j Σ
n3
∂ f Ui3 2 ∂f Uc
= O 4j − Tj w 3ji wic (2.1.45)
j=1
∂U 4j i=1
∂Ui3 ∂Uc2
After these calculations above, the amount of updates of the biases is given as follows:
( )
∂E ( 4 ) ∂ f Ui4
Δθi4 = −
= − O i − T i (2.1.46)
∂θi4 ∂Ui4
( )
( 3)
Σn4
( 4 ) ∂ f U 4j
∂E 3 ∂ f Ui
Δθi = − 3 = −
3
O j − Tj w ji (2.1.47)
∂θi j=1
∂U 4j ∂Ui3
60 2 Mathematical Background for Deep Learning
( )
( 3) ( 2)
Σn4
( ) ∂ f U 4j Σn3
∂ E 3 ∂ f Uk 2 ∂ f Ui
Δθi = − 2 = −
2
O j − Tj
4
w wki (2.1.48)
∂θi j=1
∂U 4j k=1 jk ∂Uk3 ∂Ui2
In this section, both the forward propagation and the error back propagation of the
convolutional layer in the convolutional neural networks are studied in detail using
mathematical expressions.
Now, the convolutional layer [2, 8], in practice, consists of three kinds of layers:
a convolutional layer, an activation function layer, and a pooling layer.
A convolutional layer is often employed for the image data, which are regarded
as a two-dimensional array, meaning that the units in a convolutional layer of a
convolutional neural network are two-dimensionally arranged. Figure 2.4 shows its
schematic illustration, which takes two-dimensional data of the size M×N as input
and outputs two-dimensional data of the size M×N with a filter of the size S×T.
Connections in the convolutional layer are defined as
Σ T −1
S−1 Σ
p−1 p−1
p
Umn = h st · Om+s,n+t + θmn
p
(2.2.1)
s=0 t=0
S
T
N M
p−1
where Om+s,n+t is the output of the (m + s, n + t)-th unit in the (p-1)-th layer, where
p−1
units are arranged in a two-dimensional manner, h st the (s, t)-th component of the
p
filter of S×T size for the (p−1)-th layer, θmn the bias of the (m, n)-th unit in the p-th
p
layer and Umn the input value to the activation function of the (m, n)-th unit in the
p-th layer, where units are also arranged in a two-dimensional manner.
For example, when S = T = 3, the summation part of the right-hand side of
Eq. (2.2.1) is the sum of all components of the matrix as
⎛ p−1 p−1 p−1 p−1 p−1 p−1
⎞
h 0,0 Om+0,n+0 h 1,0 Om+1,n+0 h 2,0 Om+2,n+0
⎜ p−1 p−1 p−1 p−1 p−1 p−1 ⎟
⎝ h 0,1 Om+0,n+1 h 1,1 Om+1,n+1 h 2,1 Om+2,n+1 ⎠
p−1 p−1 p−1 p−1 p−1 p−1
h 0,2 Om+0,n+2 h 1,2 Om+1,n+2 h 2,2 Om+2,n+2
⎛ p−1 p−1 p−1
⎞ ⎛ p−1 p−1 p−1 ⎞
Om+0,n+0 Om+1,n+0 Om+2,n+0 h h 1,0 h 2,0
⎜ p−1 p−1 p−1 ⎟ ⎜ 0,0 p−1 p−1 p−1 ⎟
= ⎝ Om+0,n+1 Om+1,n+1 Om+2,n+1 ⎠ ʘ ⎝ h 0,1 h 1,1 h 2,1 ⎠ (2.2.2)
p−1 p−1 p−1 p−1 p−1 p−1
Om+0,n+2 Om+1,n+2 Om+2,n+2 h 0,2 h 1,2 h 2,2
where ʘ means the product of the corresponding components of the two matrices,
called the Hadamard product.
p−1 p
In Eq. (2.2.1), h st and θmn are parameters, which are to be updated by the error
back propagation algorithm. The update rule for each parameter is written as follows:
p p p p ∂E
θmn ← θmn + α2 Δθmn = θmn − α2 p (2.2.4)
∂θmn
where α1 and α2 are the learning coefficients. The derivative in Eq. (2.2.3) is
calculated as follows:
ΣΣ
M−1 N −1 p
∂E ∂E ∂Umn
= p ·
p−1
∂h st m=0 n=0
∂Umn ∂h p−1
st
(Σ )
S−1 ΣT −1 p−1 p−1 p
Σ Σ ∂E
M−1 N −1 ∂ s=0 t=0 h st · O m+s,n+t + θmn
= p ·
m=0 n=0
∂U mn ∂h
p−1
st
ΣΣ
M−1 N −1
∂E p−1
= p · Om+s,n+t (2.2.5)
m=0 n=0
∂Umn
ΣΣ
M−1 N −1 p
∂E ∂E ∂Umn
p = p · p
∂θmn m=0 n=0
∂Umn ∂θmn
(Σ )
S−1 ΣT −1 p−1 p−1 p
ΣΣ
M−1 N −1
∂E ∂ s=0 t=0 h st · Om+s,n+t + θmn
= p · p
m=0 n=0
∂Umn ∂θmn
ΣΣ
M−1 N −1
∂E
= p · δmm δnn
m=0 n=0
∂Umn
∂E
= p (2.2.6)
∂Umn
p
If a common bias value is used within the same layer, i.e. θmn ≡ θ p as taken often
in practice, then Eq. (2.2.6) turns into
ΣΣ
M−1 N −1
∂E ∂E
= p (2.2.8)
∂θ p
m=0 n=0
∂Umn
As in the case of the fully connected feedforward neural network (Sect. 2.1),
∂E
the ∂U p that appears in the parameter update equation is given by the error back
mn
propagation calculation, where we need such values as ∂ Ep−1 of each layer;
∂ Omn
ΣΣ
M−1 N −1 p
∂E ∂E ∂Umn
= p ·
p−1
∂ Omn m=0 n=0
∂U mn ∂ Omn
p−1
(Σ )
S−1 ΣT −1 p−1 p−1 p
ΣΣ
M−1 N −1
∂E ∂ s=0 t=0 h st · Om+s,n+t + θmn
= p ·
m=0 n=0
∂Umn ∂ Omn
p−1
N −1
( S−1 T −1 )
ΣΣ
M−1
∂E Σ Σ p−1 ∂ Om+s,n+t p−1
= p · h st ·
m=0 n=0
∂Umn s=0 t=0
p−1
∂ Omn
N −1
( S−1 T −1 )
ΣΣ
M−1
∂E Σ Σ p−1
= p · h st · δm,m+s δn,n+t (2.2.9)
m=0 n=0
∂Umn s=0 t=0
of the above equation is the sum of the nine components of the following matrix:
⎛ ∂E p−1 ∂E p−1 ∂E p−1 ⎞
p
∂U5,6
· h 2,2 p
∂U6,6
· h 1,2 p
∂U7,6
· h 0,2
⎜ p−1 ⎟
⎜ ∂E
·
p−1 ∂ E
h 2,1 ∂U p ·
p−1 ∂ E
h 1,1 ∂U p · h 0,1 ⎟
⎝ p
∂U5,7 6,7 7,7 ⎠
∂E p−1 ∂ E p−1 ∂ E p−1
p
∂U5,8
· h 2,0 ∂U p · h 1,0 ∂U p · h 0,0
⎛ ⎞
6,8 7,8
Equation (2.2.10) shows that the back propagation calculation can be done by
convolution with the inverted filter matrix.
In the activation function layer, the activation function is activated by making the
output of the convolutional layer to be input. In the case of the ReLU function, which
is most commonly used with convolutional layers, the output of the activation layer
is calculated as
( p−1 )
p
Umn = ReLU Omn (2.2.11)
where there are no parameters to be trained in the activation function layer. The
derivative for the error back propagation is calculated as follows:
ΣΣ
M−1 N −1 p
∂E ∂E ∂Umn
= p ·
∂ Omn
p−1
m=0 n=0
∂Umn ∂ Omn
p−1
( )
p−1
ΣΣ
M−1 N −1
∂E ∂ReLU Omn
= p ·
m=0 n=0
∂U mn
p−1
∂ Omn
( )
p−1
∂E ∂ReLU Omn
= p ·
∂Umn p−1
∂ Omn
⎧ ( )
⎨ ∂ Ep O p−1 > 0
= ∂Umn ( p−1 )
mn
(2.2.12)
⎩ 0 Omn ≤ 0
Employing a pooling layer just after a convolution layer, the connections in the
pooling layer are defined as follows:
⎛ ⎞ g1
1 Σ ( )g
=⎝ ⎠
p p−1
Umn Oi j (2.2.13)
S×T (i, j)∈Dmn
64 2 Mathematical Background for Deep Learning
p−1
where Oi j is the output of the (i, j)-th unit arranged in a two-dimensional manner
in the (p-1)-th layer, Dmn the pooling window of the (m, n)-th unit in the p-th layer
and (i, j) the index of the unit within the pooling window Dmn of SxT size. The
values of S and T are often set to the same as the filter size in the convolutional layer.
Setting g in Eq. (2.2.13) to 1.0 results in an average pooling as follows:
1 Σ p−1
p
Umn = Oi j (2.2.14)
S×T (i, j)∈Dmn
There are no parameters to be tuned in the pooling layer. The computation of the
derivative for the error back propagation is performed in the case of the average
pooling as
ΣΣ
M−1 N −1 p
∂E ∂E ∂Umn
= p ·
p−1
∂ Omn m=0 n=0
∂Umn ∂ Omnp−1
⎛ ⎞
ΣΣ N −1 Σ p−1
M−1
∂E ⎝ 1 ∂ Oi j
= p ·
⎠
m=0 n=0
∂U mn S×T (i, j)∈Dmn
p−1
∂ Omn
⎛ ⎞
ΣΣ
M−1 N −1 Σ
∂E ⎝ 1
= p · δmi δn j ⎠ (2.2.16)
m=0 n=0
∂U mn S×T (i, j)∈Dmn
Similarly, the computation of the derivative for the error back propagation is done in
the case of the max pooling as
ΣΣ
M−1 N −1 p
∂E ∂E ∂Umn
= p ·
∂ Omn
p−1
m=0 n=0
∂Umn ∂ Omn
p−1
ΣΣ
M−1 N −1 ( { })
∂E ∂ p−1
= p · max Oi j (2.2.17)
m=0 n=0
∂Umn ∂ Omn
p−1 (i, j)∈Dmn
p−1 p−1
Omn − O mn
p
Umn = √ (2.2.18)
c + σmn
2
2.3 Training Acceleration 65
where
p−1 1 Σ p−1
O mn = Oi, j (2.2.19)
S×T (i. j)∈Dmn
1 Σ ( p−1
)
p−1 2
σmn
2
= Oi, j − O mn (2.2.20)
S×T (i, j)∈Dmn
p−1
Here, Omn is the output of the (m, n)-th unit in the (p-1)-th layer, where the units
are arranged in a two-dimensional manner, Dmn the normalization window of the
(m, n)-th unit in the p-th layer, (i, j) the index of the unit within the normalization
window Dmn of S×T size and c a small constant number to avoid the division by
zero.
In Sect. 2.3, we discuss several methods for accelerating the error back propagation
learning.
Error back propagation algorithm based on the stochastic gradient decent method
(SGD) is known often very time consuming, and then, various attempts have been
made to accelerate its speed.
The momentum method [10] is one of the standard acceleration methods as it is
relatively easy to implement and is effective in many cases.
Let the connection weight in the t-th update be w (t)
ji , and then, the amount of
(t)
update of the connection weight Δw ji in the standard backpropagation algorithm is
written as follows:
∂ E(w, θ )
Δw (t)
ji = − (2.3.1)
∂w (t)
ji
w (t+1)
ji = w (t) (t)
ji + α · Δw ji (2.3.2)
∂ E(w, θ )
Δ M w (t)
ji = − + γ Δ M w (t−1) = Δw (t) M (t−1)
ji + γ Δ w ji (2.3.3)
∂w (t)
ji
ji
w (t+1)
ji = w (t) M (t)
ji + α · Δ w ji (2.3.4)
Here, γ is a positive constant. In this method, the update at the current step is corrected
with the past updates multiplied with powers of γ as follows:
w (t+1)
ji = w (t) M (t)
ji + α · Δ w ji
( )
= w (t) (t) M (t−1)
ji + α · Δw ji + γ Δ w ji
( ( ))
= w (t)
ji + α · Δw (t)
ji + γ Δw (t−1)
ji + γ Δ M (t−2)
w ji
( ( ( )))
= w ji + α · Δw ji + γ Δw ji + γ Δw ji + γ Δ M w (t−3)
(t) (t) (t−1) (t−2)
ji
( )
= w (t) (t)
ji + α · Δw ji + γ Δw ji
(t−1)
+ γ 2 Δw (t−2)
ji + γ 3 Δw (t−3)
ji + ···
(2.3.5)
The momentum method has the effect of accelerating the update by increasing the
amount of update when the current update is in the same direction as the previous
update and suppressing the vibration by decreasing the amount of update when the
direction of update is opposite.
w (t+1)
ji = w (t) (t)
ji + α ji (t) · Δw ji (2.3.7)
2.3 Training Acceleration 67
γ
α ji (t) = √ (2.3.8)
∈ + S ji (t)
t (
Σ )2
S ji (t) = Δw (τji ) (2.3.9)
τ =1
Here, γ is a constant and ∈ a small constant to avoid division by zero and to improve
numerical stability.
In this method, the learning coefficients become smaller for parameters that have
been updated largely or more frequently, while they remain relatively large for other
parameters.
On the other hand, the RMSProp method [11] can be regarded as an improved
version of the AdaGrad method. In the RMSProp method, S ji (t) in Eq. (2.3.9) of the
AdaGrad method is modified so that the asymptotic equations given as
⎧ ( )2
⎪
⎨ S ji (1) = ρ Δw (1) (t = 1)
ji
⎪ ( )2
(2.3.10)
⎩ S ji (t) = ρ Δw (t) + (1 − ρ)S ji (t − 1) (t ≥ 2)
ji
It is seen from Eq. (2.3.11) that the AdaGrad method treats the updates of all
steps equally, while the RMSProp method emphasizes the recent updates. For many
problems, the RMSProp method is known more effective than the AdaGrad method.
2.3.3 Adam
Finally, the Adam method [6], which is probably the most commonly used acceler-
ation method today, is discussed. Here again, a comparison with other methods is
given to make it easier to understand its features.
68 2 Mathematical Background for Deep Learning
∂ E(w, θ )
Δw (t)
ji = − (2.3.12)
∂w (t)
ji
w (t+1)
ji = w (t)
ji + α ji (t) · Δ
Adam (t)
w ji (2.3.13)
m (0) = 0 (2.3.14)
m (t)
ΔAdam w (t)
ji = (2.3.16)
1 − β1t
1 − β1 ( )
ΔAdam w (t)
ji =
(t) (t−1)
t Δw ji + β1 Δw ji + β12 Δw (t−2) + β13 Δw (t−3) + ···
1 − β1 ji ji
(2.3.17)
This equation is similar to Eq. (2.3.5) of the momentum method (Sect. 2.3.1),
suggesting that the Adam method is also an improved version of the momentum
method.
On the other hand, the learning coefficient α ji (t) in Eq. (2.3.13) is given by
v (0) = 0 (2.3.18)
( )2
v (t) = β2 · v (t−1) + (1 − β2 ) Δw (t)
ji (2.3.19)
v (t)
v̂ (t) = (2.3.20)
1 − β2t
γ
α ji (t) = √ (2.3.21)
ε+ v̂ (t)
2.4 Regularization 69
2.4 Regularization
We study here some methods to stabilize learning and prevent overtraining, which
are usually called regularization methods. Section 2.4.1 explains the meaning of
regularization in the context of inverse problems, and Sects. 2.4.2 and 2.4.3 describe
representative regularization methods for the error back propagation algorithm.
A problem in which the cause is input and its result is obtained as output is called a
direct problem, while a problem in which the cause is to be inferred from the result
is called an inverse problem [7]. Let us consider a collision between two cars. The
direct problem is to estimate the deformation and damage using relative positions
of cars, directions of travel of them, speed at the time of collision, and so on as
inputs, while the inverse problem is to estimate the positional relationship and speed
at the time of collision from the deformation and damage after the collision of the
cars. Non-destructive evaluation such as defect identification is also a typical inverse
problem. It is known that inverse problems are much more difficult to be solved than
direct problems.
A typical inverse problem is defined as a problem of estimating the underlying
function yi = f (xi ) from n observed data, (x1 , y1 ), (x2 , y2 ), · · · , (xn , yn ). To solve
this problem, it is usually performed to find the function f opt among various f () that
minimizes the sum of the squared error defined as
1Σ
n
ES( f ) = ( f (xi ) − yi )2 (2.4.1)
2 i=1
It is well known, however, that the search for the function fopt often fails. Figure 2.5
shows such a case, where f A in Fig. 2.5a reproduces the sample points well, while
f B in Fig. 2.5b has errors at each sample point. As for the sum of squared errors
defined in Eq. (2.4.1), it is clear that the error is bigger in f B than in f A or
70 2 Mathematical Background for Deep Learning
y y
x x
a a
E( f A ) < E( f B ) (2.4.2)
E T ( f ) = E S ( f ) + λE R ( f ) (2.4.3)
1
ER( f ) = || D f 2 || (2.4.4)
2
2
is often used, where D is a differential operator and the squared norm. As an
example, we assume
∥ 2∥
1∥ df ∥
ER( f ) = ∥ ∥ (2.4.5)
∥
2 dx ∥
ET ( f A) > ET ( f B ) (2.4.6)
may hold for an appropriate value of λ and f B can be selected as the f opt .
The same may happen in the error back propagation learning in neural networks
to minimize the error defined as
1Σ
n L
( p L p )2
E= O j − Tj (2.4.7)
2 j=1
When trying to minimize Eq. (2.4.7) with the small number of training patterns, it
is possible to overfit the training patterns, resulting in the reduction of squared error
for training patterns, but the increase of error for verification patterns. This is called
overtraining.
By adding the regularization term E R to Eq. (2.4.7) as
1Σ
n L
( p L p )2
E= O j − T j + λE R (2.4.8)
2 j=1
it is shown that overtraining is suppressed and the estimation accuracy for patterns
that are not used for training is improved; in other words, the generalization capability
is improved.
The regularization method takes the sum of the squares of all the connection weights
as the regularization term is one of the most famous methods for neural networks,
where the error function to be minimized is given as follows:
1Σ 1 Σ Σ Σ( l )2
nL
( p L p )2
E T = E S + λE R = O j − Tj + λ w ji (2.4.9)
2 j=1 2 l j i
where wlji is the connection weight between the j-th unit in the (l + 1)-th layer and the
i-th unit in the l-th layer. The amount of update of wab
c
in the error back propagation
learning is written as
∂ ET ∂ ES ∂ ER
ΔT wab
c
=− c =− c −λ c = Δ S wab − λwab
c c
(2.4.10)
∂wab ∂wab ∂wab
72 2 Mathematical Background for Deep Learning
Here, ΔT wab c
is the amount of update of wabc
including the regularization term, and
Δ S wab is that when there is no regularization term. If the learning coefficient is set
c
(2.4.11)
where the second term of the right-hand side of Eq. (2.4.11) is the same as the
correction when there is no regularization term. The first term of the right-hand side
of Eq. (2.4.11) is the term due to regularization. Since 0 < 1 − αλ < 1 for most
cases, it usually has the effect of reducing the absolute value of the connection weight
in every training epoch. Therefore, the regularization of Eq. (2.4.9) is called Weight
Decay.
∂u(x, t)
+ N [u] = 0, x ∈ Ω, t ∈ [0, T ] (2.4.12)
∂t
where u(x, t) is the unknown, N [] the nonlinear differential operator and Ω a subset
of R D .
Assuming a data-driven solution of u(x, t) by the neural network, we usually
need to minimize the error function E D as follows:
1 Σ ⎟ ( i i) ⎟
nu
ED = ⎟u x , t − u i ⎟2 (2.4.13)
u u
n u i=1
( )
where x iu , tui , u i is the training data of u(x, t) including initial and boundary training
data and n u the number of training data. Minimization of E D is a method that has
been widely used to obtain an approximate solution using neural networks.
In the physics-informed neural network, a new loss term E P is added to E D ,
where E P is constructed based on the physical laws (partial differential equations or
governing equations) that u(x, t) should satisfy. In case of Eq. (2.4.12), the new loss
function is given as follows:
References 73
⎟ ( ) ⎟2
n f ⎟ ∂u x j , t j ⎡ ( )⏋⎟⎟
λ Σ ⎟ f f
E = E D + λE P = E D + ⎟ + N u x f , t f ⎟⎟
j j
(2.4.14)
n f j=1 ⎟⎟ ∂t ⎟
( )
j j
where x f , t f is the collocation point, n f the number of collocation points and λ
a weight to balance the two loss terms. ( )
To calculate E P , we need derivative values of the output u x iu , tui of a feed-
forward neural network with respect to its input data. These derivatives can be
obtained by automatic differentiation (see Sect. 1.3.8). SciANN [3], a package for
physics-informed neural networks based on a deep learning library with automatic
differentiation, is also available.
References
1. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic
optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011).
2. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016)
3. Haghighat, E., Juanes, R.: SciANN: A Keras/TensorFlow wrapper for scientific computations
and physics-informed deep learning using artificial neural networks. Comput. Methods Appl.
Mech. Eng. 373, 113552 (2021), https://doi.org/10.1016/j.cma.2020.113552
4. Heykin, S.: Neural Networks: A comprehensive Foundation. Prentice Hall (1999)
5. Karniadakis, G.E., Kevrekidis, I.G., Lu, L., Perdikaris, P., Wang, S., Yang, L.: Physics-informed
machine learning. Nature Rev. Phys. 3, 422–440 (2021). https://doi.org/10.1038/s42254-021-
00314-5
6. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. in the 3rd International
Conference for Learning Representations (ICLR), San Diego, 2015, arXiv:1412.6980
7. Kubo, S.: Inverse problems related to the mechanics and fracture of solids and structures. JSME
Int. J. 31(2), 157–166 (1988)
8. LeCun, Y.: Generalization and network design strategies. Technical Report CRG-TR-89–4,
Department of Computer Science, University of Toronto (1989)
9. Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics-informed neural networks: A deep
learning framework for solving forward and inverse problems involving nonlinear partial
differential equations. J. Comput. Phys. 378, 686–707 (2019)
10. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating
errors. Nature 323, 533–536 (1986)
11. Tieleman, T., Hinton, G.: Lecture 6.5-rmsprop: Divide the gradient by a running average of its
recent magnitude. COURSERA: Neural networks for machine learning 4(2), 26–31 (2012)
12. Tikhonov, A.N., Arsenin, V.Y.: Solution of Ill-posed Problems. John Wiley & Sons (1977)
Chapter 3
Computational Mechanics with Deep
Learning
Abstract The present chapter overviews recent research trends of deep learning
related to computational mechanics. In Sect. 3.1, we see the growing interest in deep
learning in recent years based on the trend of the number of published papers on
this topic, discussing how deep learning is applied to various fields in computational
mechanics. In Sect. 3.2, we review the research trends from the list of papers on
computational mechanics with deep learning published since 2018.
3.1 Overview
Various papers on feedforward neural networks and deep learning have been reported
in the field of computational mechanics, including material constitutive equations
[2, 3, 7], elemental integration of the finite element method [9], acceleration and
accuracy improvement of the finite element method [6, 11], contact analysis [5, 10],
non-destructive testing [12–14], and structural identification [1, 18]. These studies
in the field of computational mechanics are overviewed in [11, 15, 16].
The book published in 2021 [17] has categorized the above studies as follows:
• Constitutive Models
• Numerical Quadrature
• Identifications of Analysis Parameters
• Solvers and Solution Methods
• Structural Identification
• Structural Optimization.
Here, the category of constitutive models includes modeling of nonlinear and
history-dependent materials, that of numerical quadrature includes optimization of
elemental integration, and that of identification of analysis parameters includes opti-
mization of time increment in dynamic analysis. And the category of solvers and
solution methods includes applications of neural networks to no-reflection bound-
aries and domain decomposition as well as contact search, while the category of
structural identification includes defect identification and structural identification,
Year
2005
2002
1999
1996 Deep Learning
Neural Network
1993
0 10 20 30 40 50
Number of Articles
Note that the total numbers of papers with the title “neural network” or “deep
learning” have increased rapidly since 2017, especially that of “ deep learning “ is
to be remarked.
As mentioned above, deep learning and neural networks have been applied to
various fields of computational mechanics, all of which attempt to reproduce the
input–output relationship or causal relationship in some process of computational
mechanics on a neural network.
Let us consider the non-destructive testing (defect identification). The response
at the observation point can be calculated by performing numerical simulation for
solid with some defect using the finite element method, where input and output are
set as follows:
input (x): Location and size of a defect
output ( y): Response at observation points.
In ordinary numerical simulations, the input (x) is the cause and the output ( y) is the
result, where the problem of finding the result ( y) for the cause (x) is called a direct
problem. On the other hand, the problem of finding the cause (x) for the result ( y)
is called an inverse problem. These relations are, respectively, written as follows:
3.1 Overview 77
Training Phase
Neural networks are trained
using the collected patterns.
Application Phase
Trained neural networks are used
in target applications.
78 3 Computational Mechanics with Deep Learning
(2) Training Phase: Training of the neural network with deep learning is performed
to acquire mapping relations using the input/output data collected above. When
the trained neural network is to be used as a tool for solving a direct problem,
it is suggested to set its input and output data as follows:
( p p p )
Input data for deep learninng: x p = x1 , x2 , . . . , xn−1 , xnp
( p p p )
Output data for deep learning: y p = y1 , y2 , . . . , ym−1 , ymp
( )
Direct problem f : R n → R m , i.e., y p = f x p (3.1.3)
On the other hand, when the trained neural network is to be used as a tool for
solving an inverse problem, it is suggested to set its input/output data as follows:
( p p p )
Input data for deep learning: y p = y1 , y2 , . . . , ym−1 , ymp
( p p p )
Output data for deep learning: x p = x1 , x2 , . . . , xn−1 , xnp
( )
Inverse problemk f −1 : R m → R n , i.e., x p = f −1 y p (3.1.4)
(3) Application Phase: By inputting new data to the trained neural network, the
estimated data are output based on the mapping relation constructed in the
above Training Phase. Even when input data are new or independent from those
employed in the Data Preparation Phase, the trained neural network outputs
appropriate data based on its generalization capability.
In the Data Preparation Phase, it is efficient to collect a large amount of input and
output data through computational mechanics simulations. Although experimental
data may work in this case, it is not always suitable as deep learning requires a large
amount of data. In general, it often leads to the degradation of the mapping relation
with insufficient training patterns, resulting in the inaccuracy of the learned mapping
or even the failure in learning the mapping relation.
3.2 Recent Papers on Computational Mechanics with Deep Learning 79
Though both the Data Preparation and the Training Phases need a huge amount
of computation, these are independently performed before the Application Phase.
For example, the computation time for the Training Phase can often be significantly
reduced by using GPUs and other accelerators suitable for deep learning.
It is important to note that more computer time is required for inference as the
number of hidden units and layers is increased to improve the estimation accuracy,
and special arithmetic units for inference are available to solve this issue.
When a neural network already trained for some target problem is applied to
other problem, it is often effective to retrain the network with new input and output
data. This is because the neural network trained once for a target problem can often
be retrained much more quickly for other similar problem, which is called domain
adaptation or transfer learning [4].
In this section, we survey the latest papers related to deep learning in the field of
computational mechanics to explore the research trends in this area.
Compiled at the end of this chapter is a list of almost 140 papers [19–157]
related to neural networks and deep learning published in five journals (IJNME,
C&S, CMAME, FEAD, and CM: see Sect. 3.1) since 2018.
Table 3.1 summarizes seven papers of generative networks, such as the generative
adversarial networks and the variational autoencoder among the 140 papers above.
The first column of the table shows the year of publication, the second column the
journal title, the third column the neural network structure mainly used (C: convo-
lutional neural network, F: Fully connected feedforward neural network, and R:
Recurrent neural network), and the fourth column the title of the paper.
Table 3.2 summarizes papers on convolutional neural networks. It is interesting
to see that Tables 3.1 and 3.2 show that convolutional neural networks are used in
about 20% of the 140 papers listed above.
Table 3.3 summarizes 15 papers related to physics-informed networks, which
have been increasingly applied to computational mechanics in recent years, mainly
using fully connected feedforward neural networks.
In summary, a variety of new technologies that have emerged in recent years in
the field of deep learning have been rapidly adopted to the field of computational
mechanics, and the scope of deep learning applied to computational mechanics is
expanding further.
80 3 Computational Mechanics with Deep Learning
References
1. Facchini, L., Betti, M., Biagini, P.: Neural network based modal identification of structural
systems through output-only measurement. Comput. Struct. 138, 183–194 (2014)
2. Furukawa, T., Yagawa, G.: Implicit constitutive modelling for viscoplasticity using neural
networks. Int. J. Numer. Meth. Eng. 43, 195–219 (1998)
3. Ghaboussi, J., Pecknold, D.A., Zhang, M., Haj-Ali, R.: Autoprogressive training of neural
network constitutive models. Int. J. Numer. Meth. Eng. 42, 105–126 (1998)
4. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016)
5. Hattori, G., Serpa, A.L.: Contact stiffness estimation in ANSYS using simplified models and
artificial neural networks. Finite Elem. Anal. Des. 97, 43–53 (2015)
6. Kim, J.H., Kim, Y.H.: A predictor-corrector method for structural nonlinear analysis. Comput.
Methods Appl. Mech. Eng. 191, 959–974 (2001)
7. Lefik, M., Schrefler, B.A.: Artificial neural network as an incremental non-linear constitutive
model for a finite element code. Comput. Methods Appl. Mech. Eng. 192, 3265–3283 (2003)
References 83
28. Bhatnagar, S., Afshar, Y., Pan, S., Duraisamy, K., Kaushik, S.: Prediction of aerodynamic
flow fields using convolutional neural networks. Comput. Mech. 64, 525–545 (2019). https://
doi.org/10.1007/s00466-019-01740-0
29. Bhattacharjee, S., Matouš, K.: A nonlinear data-driven reduced order model for computational
homogenization with physics/pattern-guided sampling. Comput. Methods Appl. Mech. Eng.
359, 112657 (2020). https://doi.org/10.1016/j.cma.2019.112657
30. Chen, G.: Recurrent neural networks (RNNs) learn the constitutive law of viscoelasticity.
Comput. Mech. 67, 1009–1019 (2021). https://doi.org/10.1007/s00466-021-01981-y
31. Chen, G., Li, T., Chen, Q., Ren, S., Wang, C., Li, S.: Application of deep learning neural
network to identify collision load conditions based on permanent plastic deformation of shell
structures. Comput. Mech. 64, 435–449 (2019). https://doi.org/10.1007/s00466-019-01706-2
32. Cheng, M., Fang, F., Pain, C.C., Navon, I.M.: Data-driven modelling of nonlinear spatio-
temporal fluid flows using a deep convolutional generative adversarial network. Comput.
Methods Appl. Mech. Eng. 365, 113000 (2020). https://doi.org/10.1016/j.cma.2020.113000
33. Cheng, M., Fang, F., Pain, C.C., Navon, I.M.: An advanced hybrid deep adversarial autoen-
coder for parameterized nonlinear fluid flow modelling. Comput. Methods Appl. Mech. Eng.
372, 113375 (2020). https://doi.org/10.1016/j.cma.2020.113375
34. Chi, H., Zhang, Y., Tang, T.L.E., Mirabella, L., Dalloro, L., Song, L., Paulino, G.H.: Universal
machine learning for topology optimization. Comput. Methods Appl. Mech. Eng. 375, 112739
(2021). https://doi.org/10.1016/j.cma.2019.112739
35. Chung, E.T., Efendiev, Y., Leung, W.T., Vasilyeva, M.: Nonlocal multicontinua with repre-
sentative volume elements. Bridging separable and non-separable scales. Comput. Methods
Appl. Mech. Eng. 377, 113687 (2021). https://doi.org/10.1016/j.cma.2021.113687
36. Chung, I., Im, S., Cho, M.: A neural network constitutive model for hyperelasticity based on
molecular dynamics simulations. Int. J. Numer. Methods Eng. 122, 5–24 (2021). https://doi.
org/10.1002/nme.6459
37. Dehghani, H., Zilian, A.: Poroelastic model parameter identification using artificial neural
networks: on the effects of heterogeneous porosity and solid matrix Poisson ratio. Comput.
Mech. 66, 625–649 (2020). https://doi.org/10.1007/s00466-020-01868-4
38. Dehghani, H., Zilian, A.: ANN-aided incremental multiscale-remodelling-based finite strain
poroelasticity. Comput. Mech. 68, 131–154 (2021). https://doi.org/10.1007/s00466-021-020
23-3
39. Deng, H., To, A.C.: Topology optimization based on deep representation learning (DRL) for
compliance and stress-constrained design. Comput. Mech. 66, 449–469 (2020). https://doi.
org/10.1007/s00466-020-01859-5
40. Deng, H., To, A.C.: Reverse shape compensation via a gradient-based moving particle opti-
mization method. Comput. Methods Appl. Mech. Eng. 377, 113658 (2021). https://doi.org/
10.1016/j.cma.2020.113658
41. Dong, H., Nie, Y., Cui, J., Kou, W., Zou, M., Han, J., Guan, X., Yang, Z.: A wavelet-based
learning approach assisted multiscale analysis for estimating the effective thermal conduc-
tivities of particulate composites. Comput. Methods Appl. Mech. Eng. 374, 113591 (2021).
https://doi.org/10.1016/j.cma.2020.113591
42. Duan, W., Ma, X., Huang, L., Liu, Y., Duan, S.: Phase-resolved wave prediction model for
long-crest waves based on machine learning. Comput. Methods Appl. Mech. Eng. 372, 113350
(2020). https://doi.org/10.1016/j.cma.2020.113350
43. Feng, J., Teng, Q., Li, B., He, X., Chen, H., Li, Y.: An end-to-end three-dimensional recon-
struction framework of porous media from a single two-dimensional image based on deep
learning. Comput. Methods Appl. Mech. Eng., 368, 113043 (2020). https://doi.org/10.1016/
j.cma.2020.113043
44. Feng, S.Z., Han, X., Ma, Z.J., Królczyk, G., Li, Z.X.: Data-driven algorithm for real-time
fatigue life prediction of structures with stochastic parameters. Comput. Methods Appl. Mech.
Eng. 372, 113373 (2020). https://doi.org/10.1016/j.cma.2020.113373
45. Fernández, M., Jamshidian, M., Böhlke, T., Kersting, K., Weeger, O.: Anisotropic hypere-
lastic constitutive models for finite deformations combining material theory and data-driven
References 85
approaches with application to cubic lattice metamaterials. Comput. Mech. 67, 653–677
(2021). https://doi.org/10.1007/s00466-020-01954-7
46. D Finol Y Lu V Mahadevan A Srivastava 2019 Deep convolutional neural networks for
eigenvalue problems in mechanics Int. J. Numer. Methods Eng. 118 258 275 https://doi.org/
10.1002/nme.6012
47. Freno, B.A., Carlberg, K.T.: Machine-learning error models for approximate solutions to
parameterized systems of nonlinear equations. Comput. Methods Appl. Mech. Eng., 348,
250-296 (2019). https://doi.org/10.1016/j.cma.2019.01.024
48. Fu, J., Cui, S., Cen, S., Li, C.: Statistical characterization and reconstruction of heterogeneous
microstructures using deep neural network. Comput. Methods Appl. Mech. Eng. 373, 113516
(2021). https://doi.org/10.1016/j.cma.2020.113516
49. Fuchs, A., Heider, Y., Wang, K., Sun, W., Kaliske, M.: DNN2: A hyper-parameter rein-
forcement learning game for self-design of neural network based elasto-plastic constitutive
descriptions. Comput. Struct. 249, 106505 (2021). https://doi.org/10.1016/j.compstruc.2021.
106505
50. Gatti, F., Clouteau, D.: Towards blending Physics-Based numerical simulations and seismic
databases using Generative Adversarial Network. Comput. Methods Appl. Mech. Eng., 372,
113421 (2020). https://doi.org/10.1016/j.cma.2020.113421
51. Ghavamian, F., Simone, A.: Accelerating multiscale finite element simulations of history-
dependent materials using a recurrent neural network. Comput. Methods Appl. Mech. Eng.
357, 112594 (2019). https://doi.org/10.1016/j.cma.2019.112594
52. Haghighat, E., Juanes, R.: SciANN: A Keras/TensorFlow wrapper for scientific computations
and physics-informed deep learning using artificial neural networks. Comput. Methods Appl.
Mech. Eng. 373, 113552 (2021). https://doi.org/10.1016/j.cma.2020.113552
53. Haghighat, E., Raissi, M., Moure, A., Gomez, H., Juanes, R.: A physics-informed deep
learning framework for inversion and surrogate modeling in solid mechanics. Comput.
Methods Appl. Mech. Eng. 379, 113741 (2021). https://doi.org/10.1016/j.cma.2021.113741
54. Hamdia, K.M., Ghasemi, H., Bazi, Y., AlHichri, H., Alajlan, N., Rabczuk, T.: A novel deep
learning based method for the computational material design of flexoelectric nanostructures
with topology optimization. Finite Elem. Anal. Des. 165, 21–30 (2019). https://doi.org/10.
1016/j.finel.2019.07.001
55. Han, S., Choi, H.-S., Choi, J., Choi, J.H., Kim, J.-G.: A DNN-based data-driven modeling
employing coarse sample data for real-time flexible multibody dynamics simulations. Comput.
Methods Appl. Mech. Eng. 373, 113480 (2021). https://doi.org/10.1016/j.cma.2020.113480
56. Han, Z., De, R.S.: A deep learning-based hybrid approach for the solution of multiphysics
problems in electrosurgery. Comput. Methods Appl. Mech. Eng. 357, 112603 (2019). https://
doi.org/10.1016/j.cma.2019.112603
57. Heider, Y., Wang, K., Sun, W.: SO(3)-invariance of informed-graph-based deep neural network
for anisotropic elastoplastic materials. Comput. Methods Appl. Mech. Eng. 363, 112875
(2020). https://doi.org/10.1016/j.cma.2020.112875
58. Hernandez, Q., Badías, A., González, D., Chinesta, F., Cueto, E.: Deep learning of
thermodynamics-aware reduced-order models from data. Comput. Methods Appl. Mech. Eng.
379, 113763 (2021). https://doi.org/10.1016/j.cma.2021.113763
59. Hou, T.Y., Lam, K.C., Zhang, P. Zhang, S.: Solving Bayesian inverse problems from the
perspective of deep generative networks. Comput. Mech. 64, 395–408 (2019). https://doi.org/
10.1007/s00466-019-01739-7
60. Huang, D., Fuhg, J.N., Weißenfels, C., Wriggers, P.: A machine learning based plasticity
model using proper orthogonal decomposition. Comput. Methods Appl. Mech. Eng. 365,
113008 (2020). https://doi.org/10.1016/j.cma.2020.113008
61. Im, S., Kim, H., Kim, W., Cho, M.: Neural network constitutive model for crystal structures.
Comput. Mech. 67, 185-206 (2021). https://doi.org/10.1007/s00466-020-01927-w
62. Jagtap, A.D., Kharazmi, E., Karniadakis, G.E.: Conservative physics-informed neural
networks on discrete domains for conservation laws: Applications to forward and inverse
problems. Comput. Methods Appl. Mech. Eng. 365, 113028 (2020). https://doi.org/10.1016/
j.cma.2020.113028
86 3 Computational Mechanics with Deep Learning
63. Jokar, M., Semperlotti, F.: Finite element network analysis: A machine learning based compu-
tational framework for the simulation of physical systems. Comput. Struct. 247, 106484
(2021). https://doi.org/10.1016/j.compstruc.2021.106484
64. Jung, J., Yoon, K., Lee, P.-S.: Deep learned finite elements. Comput. Methods Appl. Mech.
Eng. 372, 113401 (2020). https://doi.org/10.1016/j.cma.2020.113401
65. Kalogeris, I., Papadopoulos, V.: Diffusion maps-aided Neural Networks for the solution of
parametrized PDEs. Comput. Methods Appl. Mech. Eng. 376, 113568 (2021). https://doi.org/
10.1016/j.cma.2020.113568
66. Kharazmi, E., Zhang, Z., Karniadakis, G.E.M.: hp-VPINNs: Variational physics-informed
neural networks with domain decomposition. Comput. Methods Appl. Mech. Eng. 374,
113547 (2021). https://doi.org/10.1016/j.cma.2020.113547
67. Kiani, J., Camp, C., Pezeshk, S.: On the application of machine learning techniques to derive
seismic fragility curves. Comput. Struct. 218, 108–122 (2019). https://doi.org/10.1016/j.com
pstruc.2019.03.004
68. Kiani, J., Camp, C., Pezeshk, S., Khoshnevis, N.: Application of pool-based active learning
in reducing the number of required response history analyses. Comput. Struct. 241, 106355
(2020). https://doi.org/10.1016/j.compstruc.2020.106355
69. Kim, D.H., Zohdi, T.I., Singh, R.P.: Modeling, simulation and machine learning for rapid
process control of multiphase flowing foods. Comput. Methods Appl. Mech. Eng. 371, 113286
(2020). https://doi.org/10.1016/j.cma.2020.113286
70. Kissas, G., Yang, Y., Hwuang, E., Witschey, W.R., Detre, J.D., Perdikaris, P.: Machine learning
in cardiovascular flows modeling: Predicting arterial blood pressure from non-invasive 4D
flow MRI data using physics-informed neural networks. Comput. Methods Appl. Mech. Eng.
358, 112623 (2020). https://doi.org/10.1016/j.cma.2019.112623
71. Kneifl, J., Grunert, D., Fehr, J.: A non-intrusive nonlinear model reduction method for struc-
tural dynamical problems based on machine learning. Int. J. Numer. Methods Eng. 122,
4774–4786 (2021). https://doi.org/10.1002/nme.6712
72. Koeppe, A., Bamer, F., Markert, B.: An intelligent nonlinear meta element for elastoplastic
continua: deep learning using a new Time-distributed Residual U-Net architecture. Comput.
Methods Appl. Mech. Eng. 366, 113088 (2020). https://doi.org/10.1016/j.cma.2020.113088
73. Le, V., Caracoglia, L.: A neural network surrogate model for the performance assessment
of a vertical structure subjected to non-stationary, tornadic wind loads. Comput. Struct. 231,
106208 (2020). https://doi.org/10.1016/j.compstruc.2020.106208
74. Lejeune, E., Linder, C.: Interpreting stochastic agent-based models of cell death. Comput.
Methods Appl. Mech. Eng. 360, 112700 (2020). https://doi.org/10.1016/j.cma.2019.112700
75. Li, H., Kafka, O.L., Gao, J. Yu, C., Nie, Y., Zhang, L., Tajdari, M., Tang, S., Guo, X., Li, G.,
Tang, S., Cheng, G., Liu, W.K.: Clustering discretization methods for generation of material
performance databases in machine learning and design optimization. Comput. Mech. 64,
281–305 (2019). https://doi.org/10.1007/s00466-019-01716-0
76. Li, T., Pan, Y., Tong, K., Ventura, C.E., de Silva, C.W.: A multi-scale attention neural network
for sensor location selection and nonlinear structural seismic response prediction. Comput.
Struct. 248, 106507 (2021). https://doi.org/10.1016/j.compstruc.2021.106507
77. Li, X., Liu, Z., Cui, S., Luo, C., Li, C., Zhuang, Z.: Predicting the effective mechanical property
of heterogeneous materials by image based modeling and deep learning. Comput. Methods
Appl. Mech. Eng. 347, 735–753 (2019). https://doi.org/10.1016/j.cma.2019.01.005
78. Li, X., Ning, S., Liu, Z., Yan, Z., Luo, C., Zhuang, Z.: Designing phononic crystal with
anticipated band gap through a deep learning based data-driven method. Comput. Methods
Appl. Mech. Eng. 361, 112737 (2020). https://doi.org/10.1016/j.cma.2019.112737
79. Liu, M., Liang, L., Sun, W.: Estimation of in vivo constitutive parameters of the aortic wall
using a machine learning approach. Comput. Methods Appl. Mech. Eng. 347, 201–217 (2019).
https://doi.org/10.1016/j.cma.2018.12.030
80. Liu, M., Liang, L., Sun, W.: A generic physics-informed neural network-based constitutive
model for soft biological tissues. Comput. Methods Appl. Mech. Eng. 372, 113402 (2020).
https://doi.org/10.1016/j.cma.2020.113402
References 87
81. Liu, Z.: Deep material network with cohesive layers: Multi-stage training and interfacial
failure analysis. Comput. Methods Appl. Mech. Eng. 363, 112913 (2020). https://doi.org/10.
1016/j.cma.2020.112913
82. Liu, Z., Wu, C.T., Koishi, M.: A deep material network for multiscale topology learning and
accelerated nonlinear modeling of heterogeneous materials. Comput. Methods Appl. Mech.
Eng. 345, 1138–1168 (2019). https://doi.org/10.1016/j.cma.2018.09.020
83. Liu, Z., Wu, C.T., Koishi, M.: Transfer learning of deep material network for seamless struc-
ture–property predictions. Comput. Mech. 64, 451–465 (2019). https://doi.org/10.1007/s00
466-019-01704-4
84. Logarzo, H.J., Capuano, G., Rimoli, J.J.: Smart constitutive laws: Inelastic homogenization
through machine learning. Comput. Methods Appl. Mech. Eng. 373, 113482 (2021). https://
doi.org/10.1016/j.cma.2020.113482
85. Lu, X., Giovanis, D.G., Yvonnet, J., Papadopoulos, V., Detrez, F., Bai, J.: A data-driven compu-
tational homogenization method based on neural networks for the nonlinear anisotropic elec-
trical response of graphene/polymer nanocomposites. Comput. Mech. 64, 307–321 (2019).
https://doi.org/10.1007/s00466-018-1643-0
86. Lye, K.O., Mishra, S., Ray, D., Chandrashekar, P.: Iterative surrogate model optimization
(ISMO): An active learning algorithm for PDE constrained optimization with deep neural
networks. Comput. Methods Appl. Mech. Eng. 374, 113575 (2021). https://doi.org/10.1016/
j.cma.2020.113575
87. Mack, J., Arcucci, R., Molina-Solana, M., Guo, Y.-K.: Attention-based convolutional autoen-
coders for 3D-Variational data assimilation. Comput. Methods Appl. Mech. Eng. 372, 113291
(2020). https://doi.org/10.1016/j.cma.2020.113291
88. Mao, Z., Jagtap, A.D., Karniadakis, G.E.: Physics-informed neural networks for high-speed
flows. Comput. Methods Appl. Mech. Eng. 360, 112789 (2020). https://doi.org/10.1016/j.
cma.2019.112789
89. Meister, F., Passerini, T., Mihalef, V., Tuysuzoglu, A., Maier, A., Mansi, T.: Deep learning
acceleration of Total Lagrangian Explicit Dynamics for soft tissue mechanics. Comput.
Methods Appl. Mech. Eng. 358, 112628 (2020). https://doi.org/10.1016/j.cma.2019.112628
90. Meng, X., Li, Z., Zhang, D., Karniadakis, G.E.: PPINN: Parareal physics-informed neural
network for time-dependent PDEs. Comput. Methods Appl. Mech. Eng. 370, 113250 (2020).
https://doi.org/10.1016/j.cma.2020.113250
91. Nguyen, T.N., Lee, S., Nguyen-Xuan, H., Lee, J.: A novel analysis-prediction approach for
geometrically nonlinear problems using group method of data handling. Comput. Methods
Appl. Mech. Eng. 354, 506–526 (2019). https://doi.org/10.1016/j.cma.2019.05.052
92. Nguyen-Thanh, V.M., Nguyen, L.T.K., Rabczuk, T., Zhuang, X.: A surrogate model for
computational homogenization of elastostatics at finite strain using high-dimensional model
representation-based neural network. Int. J. Numer. Methods Eng. 121, 4811–4842 (2020).
https://doi.org/10.1002/nme.6493
93. Oh, S., Jiang, CH., Jiang, C., Marcus, P.S.: Finding the optimal shape of the leading-and-
trailing car of a high-speed train using design-by-morphing. Comput. Mech. 62, 23–45 (2018).
https://doi.org/10.1007/s00466-017-1482-4
94. Pan, L., Novák, L., Lehký, D., Novák, D., Cao, M.: Neural network ensemble-based sensi-
tivity analysis in structural engineering: Comparison of selected methods and the influence
of statistical correlation. Comput. Struct. 242, 106376 (2021). https://doi.org/10.1016/j.com
pstruc.2020.106376
95. Papanikolaou, S.: Microstructural inelastic fingerprints and data-rich predictions of plasticity
and damage in solids. Comput. Mech. 66, 141–154 (2020). https://doi.org/10.1007/s00466-
020-01845-x
96. Parish, E.J., Carlberg, K.T.: Time-series machine-learning error models for approximate solu-
tions to parameterized dynamical systems. Comput. Methods Appl. Mech. Eng. 365, 112990
(2020). https://doi.org/10.1016/j.cma.2020.112990
97. Patel, D., Tibrewala, R., Vega, A., Dong, L., Hugenberg, N., Oberai, A.A.: Circumventing the
solution of inverse problems in mechanics through deep learning: Application to elasticity
88 3 Computational Mechanics with Deep Learning
imaging. Comput. Methods Appl. Mech. Eng. 353, 448–466 (2019). https://doi.org/10.1016/
j.cma.2019.04.045
98. Patel, R.G., Trask, N.A., Wood, M.A., Cyr, E.C.: A physics-informed operator regression
framework for extracting data-driven continuum models. Comput. Methods Appl. Mech.
Eng. 373, 113500 (2021). https://doi.org/10.1016/j.cma.2020.113500
99. Petrolo, M., Carrera, E.: Selection of element-wise shell kinematics using neural networks.
Comput. Struct. 244, 106425 (2021). https://doi.org/10.1016/j.compstruc.2020.106425
100. Phillips, T.R.F., Heaney, C.E., Smith, P.N., Pain, C.C.: An autoencoder-based reduced-order
model for eigenvalue problems with application to neutron diffusion. Int. J. Numer. Methods
Eng. 122, 3780–3811 (2021). https://doi.org/10.1002/nme.6681
101. Pled, F., Desceliers, C., Zhang, T.: A robust solution of a statistical inverse problem in multi-
scale computational mechanics using an artificial neural network. Comput. Methods Appl.
Mech. Eng. 373, 113540 (2021). https://doi.org/10.1016/j.cma.2020.113540
102. Ranade, R., Hill, C., Pathak, J.: DiscretizationNet: A machine-learning based solver for
Navier–Stokes equations using finite volume discretization. Comput. Methods Appl. Mech.
Eng. 378, 113722 (2021). https://doi.org/10.1016/j.cma.2021.113722
103. Regazzoni, F., Dedè, L., Quarteroni, A.: Machine learning of multiscale active force generation
models for the efficient simulation of cardiac electromechanics. Comput. Methods Appl.
Mech. Eng. 370, 113268 (2020). https://doi.org/10.1016/j.cma.2020.113268
104. Ren, K., Chew, Y., Zhang, Y.F., Fuh, J.Y.H., Bi, G.J.: Thermal field prediction for laser
scanning paths in laser aided additive manufacturing by physics-based machine learning.
Comput. Methods Appl. Mech. Eng. 362, 112734 (2020). https://doi.org/10.1016/j.cma.2019.
112734
105. Rizzo, F., Caracoglia, L.: Artificial Neural Network model to predict the flutter velocity of
suspension bridges. Comput. Struct. 233, 106236 (2020). https://doi.org/10.1016/j.compst
ruc.2020.106236
106. Saha, S., Gan, Z., Cheng, L., Gao, J., Kafka, O.L., Xie, X., Li, H., Tajdari, M., Kim, H.A.,
Liu, W.K.: Hierarchical Deep Learning Neural Network (HiDeNN): An artificial intelligence
(AI) framework for computational science and engineering. Comput. Methods Appl. Mech.
Eng. 373, 113452 (2021). https://doi.org/10.1016/j.cma.2020.113452
107. Samaniego, E., Anitescu, C., Goswami, S., Nguyen-Thanh, V.M., Guo, H., Hamdia, K.,
Zhuang, X., Rabczuk, T.: An energy approach to the solution of partial differential equations in
computational mechanics via machine learning: Concepts, implementation and applications.
Comput. Methods Appl. Mech. Eng. 362, 112790 (2020). https://doi.org/10.1016/j.cma.2019.
112790
108. Shahriari, M., Pardo, D., Rivera, J.A., Torres-Verdin, C., Picon, A., Ser, J.D. Ossandon, S.,
Calo, V.M.: Error control and loss functions for the deep learning inversion of borehole
resistivity measurements. Int. J. Numer. Methods Eng. 122, 1629–1657 (2021). https://doi.
org/10.1002/nme.6593
109. Sheikholeslami, M., Gerdroodbary, M.B., Moradi, R., Shafee, A., Li, Z.: Application of Neural
Network for estimation of heat transfer treatment of Al2O3-H2O nanofluid through a channel.
Comput. Methods Appl. Mech. Eng. 344, 1–12 (2019). https://doi.org/10.1016/j.cma.2018.
09.025
110. Shishegaran, A., Varaee, H., Rabczuk, T., Shishegaran, G.: High correlated variables creator
machine: Prediction of the compressive strength of concrete. Comput. Struct. 247, 106479
(2021). https://doi.org/10.1016/j.compstruc.2021.106479
111. Stoffel, M., Gulakala, R., Bamer, F., Markert, B.: Artificial neural networks in structural
dynamics: A new modular radial basis function approach vs. convolutional and feedforward
topologies. Comput. Methods Appl. Mech. Eng. 364, 112989 (2020). https://doi.org/10.1016/
j.cma.2020.112989
112. Sun, L., Gao, H., Pan, S., Wang, J.-H.: Surrogate modeling for fluid flows based on physics-
constrained deep learning without simulation data. Comput. Methods Appl. Mech. Eng. 361,
112732 (2020). https://doi.org/10.1016/j.cma.2019.112732
References 89
113. Tajdari, M., Pawar, A., Li, H., Tajdari, F., Maqsood, A., Cleary, E., Saha, S., Zhang, Y.J.,
Sarwark, J.F., Liu, W.K.: Image-based modelling for Adolescent Idiopathic Scoliosis: Mech-
anistic machine learning analysis and prediction. Comput. Methods Appl. Mech. Eng. 374,
113590 (2021). https://doi.org/10.1016/j.cma.2020.113590
114. Tamaddon-Jahromi, H.R., Chakshu, N.K., Sazonov, I., Evans, L.M., Thomas, H., Nithiarasu,
P.: Data-driven inverse modelling through neural network (deep learning) and computational
heat transfer. Comput. Methods Appl. Mech. Eng. 369, 113217 (2020). https://doi.org/10.
1016/j.cma.2020.113217
115. Tang, M., Liu, Y., Durlofsky, L.J.: Deep-learning-based surrogate flow modeling and geolog-
ical parameterization for data assimilation in 3D subsurface flow. Comput. Methods Appl.
Mech. Eng. 376, 113636 (2021). https://doi.org/10.1016/j.cma.2020.113636
116. Teichert, G.H., Garikipati, K.: Machine learning materials physics: Surrogate optimization
and multi-fidelity algorithms predict precipitate morphology in an alternative to phase field
dynamics. Comput. Methods Appl. Mech. Eng. 344, 666–693 (2019). https://doi.org/10.1016/
j.cma.2018.10.025
117. Teichert, G.H., Natarajan, A.R., Van der Ven, A., Garikipati, K.: Machine learning materials
physics: Integrable deep neural networks enable scale bridging by learning free energy func-
tions. Comput. Methods Appl. Mech. Eng. 353, 201–216 (2019). https://doi.org/10.1016/j.
cma.2019.05.019
118. Teichert, G.H., Natarajan, A.R., Van der Ven, A., Garikipati, K.: Scale bridging materials
physics: Active learning workflows and integrable deep neural networks for free energy func-
tion representations in alloys. Comput. Methods Appl. Mech. Eng. 371, 113281 (2020). https://
doi.org/10.1016/j.cma.2020.113281
119. Tian, J., Qi, C., Sun, Y., Yaseen, Z.M.: Surrogate permeability modelling of low-permeable
rocks using convolutional neural networks. Comput. Methods Appl. Mech. Eng. 366, 113103
(2020). https://doi.org/10.1016/j.cma.2020.113103
120. Viana, F.A.C., Nascimento, R.G., Dourado, A., Yucesan, Y.A.: Estimating model inadequacy
in ordinary differential equations with physics-informed neural networks. Comput. Struct.
245, 106458 (2021). https://doi.org/10.1016/j.compstruc.2020.106458
121. Vlassis, N.N., Ma, R., Sun, W.: Geometric deep learning for computational mechanics Part I:
anisotropic hyperelasticity. Comput. Methods Appl. Mech. Eng. 371, 113299 (2020). https://
doi.org/10.1016/j.cma.2020.113299
122. Vlassis, N.N., Sun, W.: Sobolev training of thermodynamic-informed neural networks for
interpretable elasto-plasticity models with level set hardening. Comput. Methods Appl. Mech.
Eng. 377, 113695 (2021). https://doi.org/10.1016/j.cma.2021.113695
123. Wang, C., Xu, L.-Y., Fan, J.-S.: A general deep learning framework for history-dependent
response prediction based on UA-Seq2Seq model. Comput. Methods Appl. Mech. Eng. 372,
113357 (2020). https://doi.org/10.1016/j.cma.2020.113357
124. Wang, K., Sun, W.: A multiscale multi-permeability poroplasticity model linked by recur-
sive homogenizations and deep learning. Comput. Methods Appl. Mech. Eng. 334, 337–380
(2018). https://doi.org/10.1016/j.cma.2018.01.036
125. Wang, K., Sun, W.: Meta-modeling game for deriving theory-consistent, microstructure-based
traction–separation laws via deep reinforcement learning. Comput. Methods Appl. Mech. Eng.
346, 216–241 (2019). https://doi.org/10.1016/j.cma.2018.11.026
126. Wang, K., Sun, W.: An updated Lagrangian LBM–DEM–FEM coupling model for dual-
permeability fissured porous media with embedded discontinuities. Comput. Methods Appl.
Mech. Eng. 344, 276–305 (2019). https://doi.org/10.1016/j.cma.2018.09.034
127. Wang, K., Sun, W., Du, Q.: A cooperative game for automated learning of elasto-plasticity
knowledge graphs and models with AI-guided experimentation. Comput. Mech. 64, 467–499
(2019). https://doi.org/10.1007/s00466-019-01723-1
128. Wang, K., Sun, W., Du, Q.: A non-cooperative meta-modeling game for automated third-party
calibrating, validating and falsifying constitutive laws with parallelized adversarial attacks.
Comput. Methods Appl. Mech. Eng. 373, 113514 (2021). https://doi.org/10.1016/j.cma.2020.
113514
90 3 Computational Mechanics with Deep Learning
129. Wang, L., Chan, Y.-C., Ahmed, F., Liu, Z., Zhu, P., Chen, W.: Deep generative modeling
for mechanistic-based learning and design of metamaterial systems. Comput. Methods Appl.
Mech. Eng. 372, 113377 (2020). https://doi.org/10.1016/j.cma.2020.113377
130. Wang, L., Chen, Z., Yang, G., Sun, Q., Jianli Ge, J.: An interval uncertain optimization method
using back-propagation neural network differentiation. Comput. Methods Appl. Mech. Eng.
366, 113065 (2020). https://doi.org/10.1016/j.cma.2020.113065
131. Wang, L., Liu, Y., Gu, K., Wu, T.: A radial basis function artificial neural network (RBF
ANN) based method for uncertain distributed force reconstruction considering signal noises
and material dispersion. Comput. Methods Appl. Mech. Eng. 364, 112954 (2020). https://doi.
org/10.1016/j.cma.2020.112954
132. Wang, N., Chang, H., Zhang, D.: Efficient uncertainty quantification for dynamic subsurface
flow with surrogate by Theory-guided Neural Network. Comput. Methods Appl. Mech. Eng.
373, 113492 (2021). https://doi.org/10.1016/j.cma.2020.113492
133. Wang, Q., Zhang, G., Sun, C., Wu, N.: High efficient load paths analysis with U* index
generated by deep learning. Comput. Methods Appl. Mech. Eng. 344, 499–511 (2019). https://
doi.org/10.1016/j.cma.2018.10.012
134. Wei, S., Jin, X., Li, H.: General solutions for nonlinear differential equations: a rule-based
self-learning approach using deep reinforcement learning. Comput. Mech. 64, 1361–1374
(2019). https://doi.org/10.1007/s00466-019-01715-1
135. Wessels, H., Weißenfels, C., Wriggers, P.: The neural particle method – An updated Lagrangian
physics informed neural network for computational fluid dynamics. Comput. Methods Appl.
Mech. Eng. 368, 113127 (2020). https://doi.org/10.1016/j.cma.2020.113127
136. White, D.A., Arrighi, W.J., Kudo, J., Watts, S.E.: Multiscale topology optimization using
neural network surrogate models. Comput. Methods Appl. Mech. Eng. 346, 1118–1135
(2019). https://doi.org/10.1016/j.cma.2018.09.007
137. Wu, L., Nguyen, V.D., Kilingar, N.G., Noels, L.: A recurrent neural network-accelerated
multi-scale model for elasto-plastic heterogeneous materials subjected to random cyclic and
non-proportional loading paths. Comput. Methods Appl. Mech. Eng. 369, 113234 (2020).
https://doi.org/10.1016/j.cma.2020.113234
138. Wu, L., Zulueta, K., Major, Z., Arriaga, A., Noels, L.: Bayesian inference of non-linear
multiscale model parameters accelerated by a Deep Neural Network. Comput. Methods Appl.
Mech. Eng. 360, 112693 (2020). https://doi.org/10.1016/j.cma.2019.112693
139. Wu, P., Sun, J., Chang, X., Zhang, W., Arcucci, R., Guo, Y., Pain, C.C.: Data-driven reduced
order model with temporal convolutional neural network. Comput. Methods Appl. Mech.
Eng. 360, 112766 (2020). https://doi.org/10.1016/j.cma.2019.112766
140. Xiao, S., Deierling, P., Attarian, S., El Tuhami, A.: Machine learning in multiscale modeling
of spatially tailored materials with microstructure uncertainties. Comput. Struct. 249, 106511
(2021). https://doi.org/10.1016/j.compstruc.2021.106511
141. Xu, J., Duraisamy, K.: Multi-level convolutional autoencoder networks for parametric predic-
tion of spatio-temporal dynamics. Comput. Methods Appl. Mech. Eng. 372, 113379 (2020).
https://doi.org/10.1016/j.cma.2020.113379
142. Xu, W., Jiao, Y., Fish, J.: An atomistically-informed multiplicative hyper-elasto-plasticity-
damage model for high-pressure induced densification of silica glass. Comput. Mech. 66,
155-187 (2020). https://doi.org/10.1007/s00466-020-01846-w
143. Yamaguchi, T., Okuda, H.: Zooming method for FEA using a neural network. Comput. Struct.
247, 106480 (2021). https://doi.org/10.1016/j.compstruc.2021.106480
144. Yang, H., Guo, X., Tang, S., Liu, W.K.: Derivation of heterogeneous material laws via data-
driven principal component expansions. Comput. Mech. 64, 365–379 (2019). https://doi.org/
10.1007/s00466-019-01728-w
145. Yang, Y., Perdikaris, P.: Conditional deep surrogate models for stochastic, high-dimensional,
and multi-fidelity systems. Comput. Mech. 64, 417–434 (2019). https://doi.org/10.1007/s00
466-019-01718-y
146. Yao, H., Gao, Y., Liu, Y.: FEA-Net: A physics-guided data-driven model for efficient mechan-
ical response prediction. Comput. Methods Appl. Mech. Eng. 363, 112892 (2020). https://
doi.org/10.1016/j.cma.2020.112892
References 91
147. Yin, M., Zheng, X., Humphrey, J.D., Karniadakis, G.E.: Non-invasive inference of thrombus
material properties with physics-informed neural networks. Comput. Methods Appl. Mech.
Eng. 375, 113603 (2021). https://doi.org/10.1016/j.cma.2020.113603
148. Zargaran, A., Janoske, U.: Development of an algorithm for reconstruction of droplet history
based on deposition pattern using computational fluid dynamics and convolutional neural
network. Comput. Methods Appl. Mech. Eng. 372, 113442 (2020). https://doi.org/10.1016/j.
cma.2020.113442
149. Zhang, L., Cheng, L., Li, H., Gao, J., Yu, C., Domel, R., Yang, Y., Tang, S., Liu, W.K.:
Hierarchical deep-learning neural networks: finite elements and beyond. Comput. Mech. 67,
207–230 (2021). https://doi.org/10.1007/s00466-020-01928-9
150. Zhang, P., Yin, Z.-Y.: A novel deep learning-based modelling strategy from image of particles
to mechanical properties for granular materials with CNN and BiLSTM. Comput. Methods
Appl. Mech. Eng. 382, 113858 (2021). https://doi.org/10.1016/j.cma.2021.113858
151. Zhang, R., Chen, Z., Chen, S., Zheng, J., Büyüköztürk, O., Sun, H.: Deep long short-term
memory networks for nonlinear structural seismic response prediction. Comput. Struct. 220,
55-68 (2019). https://doi.org/10.1016/j.compstruc.2019.05.006
152. Zhang, R., Liu, Y., Sun, H.: Physics-informed multi-LSTM networks for metamodeling of
nonlinear structures. Comput. Methods Appl. Mech. Eng. 369, 113226 (2020). https://doi.
org/10.1016/j.cma.2020.113226
153. Zhang, T., Li, Y., Li, Y., Sun, S., Gao, X.: A self-adaptive deep learning algorithm for accel-
erating multi-component flash calculation. Comput. Methods Appl. Mech. Eng. 369, 113207
(2020). https://doi.org/10.1016/j.cma.2020.113207
154. Zhang, X., Garikipati, K.: Machine learning materials physics: Multi-resolution neural
networks learn the free energy and nonlinear elastic response of evolving microstructures.
Comput. Methods Appl. Mech. Eng. 372, 113362 (2020). https://doi.org/10.1016/j.cma.2020.
113362
155. Zhang, X., Xie, F., Ji, T., Zhu, Z., Zheng, Y.: Multi-fidelity deep neural network surrogate
model for aerodynamic shape optimization. Comput. Methods Appl. Mech. Eng. 373, 113485
(2021). https://doi.org/10.1016/j.cma.2020.113485
156. Zhang, Y., Wen, Z., Pei, H., Wang, J., Li, Z., Yue, Z.: Equivalent method of evaluating mechan-
ical properties of perforated Ni-based single crystal plates using artificial neural networks.
Comput. Methods Appl. Mech. Eng. 360, 112725 (2020). https://doi.org/10.1016/j.cma.2019.
112725
157. Zhu, Q., Liu, Z., Yan, J.: Machine learning for metal additive manufacturing: predicting
temperature and melt pool fluid dynamics using physics-informed neural networks. Comput.
Mech. 67, 619–635 (2021). https://doi.org/10.1007/s00466-020-01952-9
Part II
Case Study
Chapter 4
Numerical Quadrature with Deep
Learning
Abstract It is well known that the element stiffness matrix of a distorted element
calculated using numerical quadrature has a relatively large error. In this chapter,
a method to improve the efficiency of the element integration without degrading
accuracy is studied by employing deep learning.
The numerical quadrature is often used for the element integration of the finite
element method [1, 2], where the integral value is approximated by the sum of the
products of the values of the integrand and the corresponding weights at several points
(coordinates) called integration points. In the Gauss–Legendre numerical quadrature,
which is one of the most popular numerical quadratures in the finite element method,
the integral of a function f (x) in the range of [−1, 1] is approximated as follows:
∫1 Σ
n
f (x)dx ≈ f (xi )Hi (4.1.1)
−1 i=1
where xi is the coordinate of the integration point, Hi the weight at the integration
point xi , and n the number of integration points.
1 dn ( 2 )n
Pn (x) = n n
x −1 (4.1.2)
2 n! d x
For example, the first-, the second-, and the third-
order Legendre polynomials are, respectively, written as
P1 (x) = x (4.1.3)
1( 2 )
P2 (x) = 3x − 1 (4.1.4)
2
1( 3 )
P3 (x) = 5x − 3x (4.1.5)
2
The Legendre polynomial Pn has a special feature that the integral in the range
of [−1.0, 1.0] of the product of the Legendre polynomial Pn and any polynomial of
the (n−1)-th order Q n−1 (x) is zero, which is written as
∫1
Q n−1 (x)Pn (x) = 0 (4.1.6)
−1
For example, the integral of the product of the cubic Legendre polynomial P3 (x)
and an arbitrary quadratic polynomial ax 2 + bx + c is zero as is shown in
∫1 ∫1
( 2 ) ( 2 )1( 3 )
ax + bx + c P3 (x)dx = ax + bx + c 5x − 3x dx
2
−1 −1
∫1
( ) ⎡ ⏋1
= 5bx 4 − 3bx 2 dx = bx 5 − bx 3 0 = 0 (4.1.7)
0
Pn (x) = 0 (4.1.8)
{ }
has n different real solutions x1 , x2 , · · · , x n−1 , xn (xi < xi+1 ) in the range of
(–1.0,1.0).
For example, solutions of Eq. (4.1.8) for the case of n = 2 and n = 3 are,
respectively, written as
/ /
1 1
x1 = − , x2 = (n = 2) (4.1.9)
3 3
4.1 Summary of Numerical Quadrature 97
/ /
3 3
x1 = − , x2 = 0, x3 = (n = 3) (4.1.10)
5 5
The Lagrange polynomial of the (n-1)-th order, L in−1 (x), has the following
properties.
{
( ) 1 (i = j)
L in−1 xj = = δi j (Kronecker delta) (4.1.12)
0 (i /= j)
Σ
n
y = f (x) = yi · L in−1 (x) (4.1.13)
i=1
( ) Σ n
( ) Σ n
y = f xj = yi · L in−1 x j = yi · δi j = y j (4.1.14)
i=1 i=1
98 4 Numerical Quadrature with Deep Learning
Here, we study the formulation of the Gauss–Legendre quadrature using the Legendre
and Lagrange polynomials, both described above.
As discussed
{ above, for the } n-th order Legendre polynomial Pn (x), n different real
values x1 , x2 , · · · , x n−1 , xn (xi < xi+1 ) are obtained as the solutions of Pn (x) = 0
Eq. (4.1.8).
Using these n values, we can construct n Lagrange polynomials of the (n-1)-th
order, L in−1 (x), based on Eq. (4.1.11).
For an arbitrary integrand f (x), we { let yi = f (xi ), (i =}1, · · · , n) be the value of
the function at the above n solutions x1 , x2 , · · · , x n−1 , xn and define a (2n − 1)-th
order polynomial Q 2n−1 (x) as follows:
Σ
n
Q 2n−1 (x) = Rn−1 (x) · Pn (x) + yi · L in−1 (x) (4.1.15)
i=1
where Rn−1 (x) is an arbitrary polynomial of the (n − 1)-th order, Pn (x) the n-th
{ and L i (x) the (n} − 1)-th order Lagrange polynomial
n−1
order Legendre polynomial,
defined with n solutions x1 , x2 , · · · , x n−1 , xn of Pn (x) = 0.
{ Then, we can }see that Q 2n−1 (x) is equal to f (x) at solutions
x1 , x2 , · · · , x n−1 , xn of the n-th order Legendre polynomial Pn as follows:
( ) ( ) ( ) Σ n
( )
Q 2n−1 x j = Rn−1 x j · Pn x j + yi · L in−1 x j
i=1
( ) Σ
n
( )
= Rn−1 x j · 0 + yi · δi j = y j = f x j (4.1.16)
i=1
This means that the (2n-1)-th order polynomial Q 2n−1 (x) can be regarded as an
approximate polynomial of f (x), and thus the integral of f (x) can be approximated
by the integral of Q 2n−1 (x) as
∫1 ∫1
f (x)dx ≈ Q 2n−1 (x)dx
−1 −1
∫1 Σ
n ∫1
= Rn−1 (x)Pn (x)dx + yi L in−1 (x)dx (4.1.17)
−1 i=1 −1
Here, the first term of the right-hand side of this equation is zero due to the property
of Eq. (4.1.6), and we obtain
4.1 Summary of Numerical Quadrature 99
∫1 Σ
n ∫1
f (x)dx ≈ yi L in−1 (x)dx (4.1.18)
−1 i=1 −1
The above equation suggests that the definite integral value of the left-hand side
is approximated by the sum in the right-hand side. If Hi Eq. (4.1.19) is defined as
∫1
Hi = L in−1 (x)dx (4.1.19)
−1
we finally obtain
∫1 Σ
n Σ
n
f (x)dx ≈ yi Hi = f (xi )Hi (4.1.20)
−1 i=1 i=1
5 8 5
H1 = , H2 = , H3 = (n = 3) (4.1.22)
9 9 9
∫1
⎡ ⏋1 1
e x dx = e x 0 = e − = 2.3504023872876028 · · · (4.1.23)
e
−1
∫1 Σ
n
e x dx ∼
= e xi · Hi = e x1 · H1 + e x2 · H2 + · · · + e xn · Hn (4.1.24)
−1 i=1
100 4 Numerical Quadrature with Deep Learning
-2
Fig. 4.1 Accuracy of 10
Gauss–Legendre quadrature
-4
10
-6
10
-8
10
Error
-10
10
-12
10
-14
10
-16
10
1 2 3 4 5 6 7 8
Number of Quadrature Points
where n is the number of integration points. The error of the approximate value
obtained by the Gauss–Legendre quadrature is shown in Fig. 4.1. The horizontal axis
shows the number of integration points and the vertical axis the absolute value of
the difference between the true and the approximate values obtained by the Gauss–
Legendre quadrature. The calculations are performed using double-precision real
numbers. It is shown that the accuracy is very high when using more than eight
integration points. As can be seen from this example, the accuracy is improved by
increasing the number of integration points. Table 4.1 shows the coordinates and
weights of the integration points up to 10 integration points.
The Gauss–Legendre quadratures in two and three dimensions are defined as
natural extension of that of one dimension, respectively, as.
Two-dimensional Gauss–Legendre quadrature:
∫1 ∫1 Σ
n Σ
m
( )
f (x, y)dxdy ≈ f xi , y j · Hi j (4.1.25)
−1 −1 i=1 j=1
∫1 ∫1 ∫1 Σ
n Σ
m Σ
l
( )
f (x, y, z)dxdydz ≈ f xi , y j , z k · Hi jk (4.1.26)
−1 −1 −1 i=1 j=1 k=1
However, it is seen that there is room for improvement in some cases. Let us
consider the following integral.
∫1 ⎡ ]1
x7 1 −1 2
x 6 dx = = − = = 0.2857142857 · · · (4.1.27)
7 −1 7 7 7
−1
∫1 ( / )6 (/ )6
1 1 2
x dx ≈ 1.0 × −
6
+ 1.0 × = = 0.074074074 · · · (4.1.28)
3 3 27
−1
Calculating the integral using the Gauss–Legendre quadrature with three integra-
tion points, we still have some error as
∫1 ( / )6 (/ )6
5 3 8 5 3 6
x 6 dx ≈ × − + × 06 + × = = 0.24 (4.1.29)
9 5 9 9 5 25
−1
Note that the correct values of the integral are obtained, if the weights are changed
from 1.0 to 27
7
in Eq. (4.1.28) and from 95 to 2521
in Eq. (4.1.29), respectively.
Instead, changing the coordinates of integration / points /can reduce the error.
For example, by changing the coordinates from ± 1
3
to ± 6 1
7
in the case of two
integration points above, we get the correct value as
∫1 ( / )6 (/ )6
6 1 6 1 2
x dx = 1.0 × −
6
+ 1.0 × = = 0.2857142857 · · · (4.1.30)
7 7 7
−1
The standard coordinates and weights of the Gauss–Legendre quadrature are deter-
mined only by the number of integration points, and the same coordinates and weights
are used for any integrand function. On the other hand, the best coordinates and
weights to be used for improving accuracy of integral naturally differ depending on
integrated function.
4.2 Summary of Stiffness Matrix for Finite Element Method 103
The isoparametric element is popular among the finite element community, in which
the displacements {u} and the coordinates {x} at any point in an element are approx-
imated using the displacements and the coordinates of the nodes and the shape func-
tions Ni (ξ, η, ζ ) [1, 2]. For the three-dimensional case, {u} and {x} in an element
are, respectively, expressed as
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
u u(ξ, η, ζ ) Σ n Ui
{u} = ⎝ v ⎠ = ⎝ v(ξ, η, ζ ) ⎠ = Ni (ξ, η, ζ ) · ⎝ Vi ⎠ (4.2.1)
w w(ξ, η, ζ ) i=1 Wi
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
x x(ξ, η, ζ ) Σ n Xi
⎝ ⎠ ⎝
{x} = y = y(ξ, η, ζ ) = ⎠ ⎝
Ni (ξ, η, ζ ) · Yi ⎠ (4.2.2)
z z(ξ, η, ζ ) i=1 Zi
where n is the total number of nodes of the element and (Ui , Vi , Wi )T and
(X i , Yi , Z i )T the displacements and the coordinates of the i-th node of the element,
respectively.
Next, we define a vector {U } of all the nodal displacements in an element and a
matrix [N ] of shape functions as
⎛ ⎞
U1
⎜ ⎟
⎜ ⎟
V1
⎜ ⎟
⎜ ⎟
W1
⎜ ⎟..
{U } = ⎜
⎜
⎟
⎟ . (4.2.3)
⎜ ⎟
⎜ Un ⎟
⎜ ⎟
⎝ Vn ⎠
Wn
⎡ ⎤
N1 0 0 Nn 0 0
[N ] = ⎣ 0 N1 0 · · · 0 Nn 0 ⎦ (4.2.4)
0 0 N1 0 0 Nn
Then, the displacement {u} at a given point in the element is expressed by the
nodal displacement vector {U } as follows:
104 4 Numerical Quadrature with Deep Learning
⎛ ⎞
U1
⎜ ⎟
⎜ V1 ⎟
⎛Σ ⎞ ⎡ ⎤⎜ ⎟
Ni Ui N1 0 0 Nn 0 0 ⎜⎜
W1 ⎟
⎟
Σ ..
{u} = ⎝ Ni Vi ⎠ = ⎣ 0 N1 0 · · · 0 Nn 0 ⎦⎜
⎜ .
⎟ = [N ]{U } (4.2.5)
⎟
Σ
Ni Wi 0 0 N1 0 0 Nn ⎜⎜ Un
⎟
⎟
⎜ ⎟
⎝ Vn ⎠
Wn
The strain {ε} and stress {σ } at a given point in an element are also written using
the nodal displacement vector {U } as
⎛ ⎞ ⎛ ⎞ ⎡ ⎤
∂u ∂
εx ∂x
0 0 ∂x
⎜ ⎟ ⎜ ∂v ⎟ ⎢ 0 ∂ 0 ⎥
⎜ εy ⎟ ⎜ ∂y ⎟ ⎢ ∂y ⎥⎛ ⎞
⎜ ⎟ ⎜ ⎟ ⎢ ⎥ u
⎟ ⎜ ⎟ ⎢ 0 0 ∂z ⎥
∂w ∂
⎜ εz
{ε} = ⎜ ⎟=⎜ ∂z ⎟=⎢ ∂ ∂ ⎥⎝ v ⎠ = [L]{u} = [L][N ]{U }
⎜ γx y ⎟ ⎜ ∂u
+ ∂∂vx ⎟ ⎢
⎟ ⎢ ∂y ∂x 0
⎥
⎜ ⎟ ⎜ ∂y ⎥ w
⎝ γ yz ⎠ ⎜
⎝
∂v
+ ∂w
⎟ ⎢ ∂ ∂
⎠ ⎣ 0 ∂z ∂ y
⎥
⎦
∂z ∂y
γzx ∂u
+ ∂w ∂
0 ∂∂x
∂z ∂x ∂z
(4.2.6)
⎛ ⎞ ⎛ ⎞
σx εx
⎜ σ ⎟ ⎜ ε ⎟
⎜ y ⎟ ⎜ y ⎟
⎜ ⎟ ⎜ ⎟
⎜ σ ⎟ ⎜ εz ⎟
{σ } = ⎜ z ⎟ = [D]⎜ ⎟ = [D]{ε} = [D][L][N ]{U } (4.2.7)
⎜ τx y ⎟ ⎜ γx y ⎟
⎜ ⎟ ⎜ ⎟
⎝ τ yz ⎠ ⎝ γ yz ⎠
τzx γzx
where [D] is the stress–strain matrix. The product [L][N ] is often denoted as[B],
referred to as the strain–displacement matrix. For a three-dimensional isotropic
elastic body, [D] is given as
⎡ ⎤
ν ν
1 1−ν 1−ν
0 0 0
⎢ ν ν ⎥
⎢ 1 1−ν 0 0 0 ⎥
⎢ 1−ν
ν ν ⎥
E(1 − ν) ⎢ ⎢
1 0 0 0 ⎥
⎥
[D] = 1−ν 1−ν
(4.2.8)
(1 + ν)(1 − 2ν) ⎢ ⎥
1−2ν
⎢ 0 0 0 2(1−ν)
0 0 ⎥
⎢ 0 1−2ν ⎥
⎣ 0 0 0 2(1−ν)
0 ⎦
1−2ν
0 0 0 0 0 2(1−ν)
{ }
where U G is a vector of displacements of all the nodes in the structure, [K ]
the global stiffness matrix, and {F} the load vector. The global stiffness matrix is
constructed by assembling all the element stiffness matrices of the whole structure
as
Σ
ne
⎡ e⏋
[K ] = k (4.2.10)
e=1
Since the size of the global stiffness matrix [K ] is different from that of the
element stiffness matrix [k e ], the summation in Eq. (4.2.10) is performed in such a
way as each component of the element stiffness matrix is added to the corresponding
position of the global stiffness matrix, rather than simple summation.
The element stiffness matrix is given using the stress–strain matrix [D] and the
strain–displacement matrix [B] as
∫
⎡ e⏋
k = [B]T [D][B]dv (4.2.11)
ve
where v e means that the entire element is taken as the integral domain.
The element integration is performed by transforming the coordinates from the
real space (x yz space) to the parameter space (ξ ηζ space) and then using the Gauss–
Legendre quadrature. Figure 4.2 shows the coordinate transformation in the two-
dimensional case, while that in the three-dimensional space results in the integration
over the [−1, 1] × [−1, 1] × [−1, 1] region in the ξ ηζ space as follows:
˚ ∫1 ∫1 ∫1
⎡ e⏋
k = [B] [D][B]dxdydz =
T
[B]T [D][B] · |J | · dξ dηdζ (4.2.12)
ve −1 −1 −1
y η
4 (1,1)
3
4 3
1 0 ξ
2
1 2
0 x
( - 1, - 1)
Fig. 4.2 Coordinate transformation for numerical quadrature in the two-dimensional space
Thus, the element stiffness matrix calculated using the Gauss–Legendre quadra-
ture is written as
l
l
⎡ e⏋ Σ n Σm Σ l
( T )l
k ≈ [B] [D][B] · |J | ll · Hi, j,k (4.2.15)
i=1 j=1 k=1 l ξ =ξi
η=η j
ζ =ζk
1
N1 (ξ, η, ζ ) = (1 − ξ )(1 − η)(1 − ζ ) (4.2.17)
8
1
N2 (ξ, η, ζ ) = (1 + ξ )(1 − η)(1 − ζ ) (4.2.18)
8
4.3 Accuracy Dependency of Stiffness Matrix on Numerical Quadrature 107
5 6
1 2
(-1,-1,-1)
1
N3 (ξ, η, ζ ) = (1 + ξ )(1 + η)(1 − ζ ) (4.2.19)
8
1
N4 (ξ, η, ζ ) = (1 − ξ )(1 + η)(1 − ζ ) (4.2.20)
8
1
N5 (ξ, η, ζ ) = (1 − ξ )(1 − η)(1 + ζ ) (4.2.21)
8
1
N6 (ξ, η, ζ ) = (1 + ξ )(1 − η)(1 + ζ ) (4.2.22)
8
1
N7 (ξ, η, ζ ) = (1 + ξ )(1 + η)(1 + ζ ) (4.2.23)
8
1
N8 (ξ, η, ζ ) = (1 − ξ )(1 + η)(1 + ζ ) (4.2.24)
8
When an element stiffness matrix (Sect. 4.2) of the finite element method is calculated
using the numerical integration method (Sect. 4.1), its accuracy usually depends on
the shape of the element. In this section, we discuss how to quantitatively evaluate
the error and some clues to improve the accuracy.
Using the Gauss–Legendre quadrature, the element stiffness matrix [k] in the finite
element method is represented by the sum of the product of the value of integrand
108 4 Numerical Quadrature with Deep Learning
∫ Σ
NG
⎡ ( )⏋
[k] = [B] [D][B]dv ≈
T
F ξg , ηg , ζg wg (4.3.1)
ve g=1
)matrix, v the
e
where [B] is the strain–displacement matrix, [D] the stress–strain
(
element domain, NG the total number of⎡integration
( points,
)⏋ ξg , ηg , ζg the coordinate
values of the g-th integration point, F ξg , ηg , ζg the element stiffness matrix
calculated only with the contribution of the g-th integration point, and wg the weight
of the g-th integration point. Note that, in the standard Gauss–Legendre quadrature,
the coordinate values and the weights of the integration points are independent of
the integrand function, but the standard common values of them, which only depend
on the number of integration points, are used in common for any integrand function.
Since the Gauss–Legendre quadrature uses a polynomial as the approximation of
the integrand, it is inevitable that errors may occur in the integration. As it is clear from
Eq. (4.3.1), the computational complexity is proportional to the number of integration
points, resulting in the use of the moderate number of them. It is also known that
the shape of the element has a great influence on the accuracy of the elemental
integration; a square-shaped element in two-dimensional space and a cubic-shaped
element in three-dimensional space can be integrated with very high accuracy even
with a small number of integration points, whereas the accuracy decreases rapidly
with the distortion level of the element.
Consider an eight-noded hexahedral element of cubic shape with (0, 0, 0)-(1, 1, 1)
as the diagonal as shown in Fig. 4.4 with basis functions given as Eqs. (4.2.17)–
(4.2.24). Let us study the accuracy of the element integration when distortion is
introduced into the shape of the element by changing the position of nodes other
than node P0 in the figure. In Fig. 4.5, the element A has a cubic shape, the element
B some degree of distortion, and the element C further degree of distortion. The
coordinates of each nodal point of elements B and C are shown in Table 4.2. The
results are shown in Fig. 4.6, where the horizontal axis is the number of integration
points per axis and the vertical axis the error index (Error) of the numerical quadrature
of the element stiffness matrix, which is defined by
Σ ll g ⎡ exact ⏋ ll
ij l[k ]ij − k ijl
Error = l⎡ ⏋ l l (4.3.2)
l
maxl k exact i j l
ij
⎡ ⏋
where [k g ] is the element stiffness matrix calculated with g integration points, k exact
is the exact element stiffness matrix, and []i j denotes the component located at the
i-th row and the j-th column of the matrix. Since it is difficult to obtain the exact
element stiffness matrix, that calculated with 30 integration points per axis, i.e.,
27,000 integration points per element, is considered to be the exact one.
4.3 Accuracy Dependency of Stiffness Matrix on Numerical Quadrature 109
P2
P0 (0,0,0) P1
It can be seen from the figure that the almost converged element stiffness matrix of
the element A with perfect cubic shape is obtained with only two integration points
per axis, while the convergence speed slows down as the shape distortion grows, and
the number of integration points required to reach a prescribed accuracy increases.
110 4 Numerical Quadrature with Deep Learning
-5
10
Error
-7
10
-9
10
-11
10
-13
10
0 5 10 15 20 25 30
Number of Quadrature Points per Axis
4.3 Accuracy Dependency of Stiffness Matrix on Numerical Quadrature 111
⎡ ⏋
where NG is the total number of integration points per element, k exact is the true
element stiffness matrix, which is usually substituted by that calculated with a large
number of integration points (e.g., 30 points per axis, or 27,000 points per element),
and ∥∥ means to take the sum of the squares of each matrix component. Equa-
tion (4.3.2) is defined as the ratio of the difference to the maximum component of the
matrix to make the index comparable between matrices of different elements with
different shapes, while Eq. (4.3.3) is a simple norm intending comparison between
matrices of the same element with different quadrature parameters.
Let the standard values of the coordinates and weights of the g-th
( integration
) point
in the Gauss–Legendre quadrature be, respectively, denoted as ξg0 , ηg0 , ζg0 and wg0 ,
( )
then ξg , ηg , ζg and wg can be expressed as
( )
wg = wg0 1 + Δwg (g = 1, · · · , NG ) (4.3.7)
Using the notations above, the sum of squared errors in Eq. (4.3.3) is given as
( )
where, for example, Δξ = Δξ1 , Δξ2 , · · · , Δξ NG .
112 4 Numerical Quadrature with Deep Learning
Defining the fitness to be the inverted ratio of the L value to that with standard
quadrature parameters (= L(0, 0, 0, 0)) as
L(0, 0, 0, 0)
Fitness (Δξ , Δη, Δζ , Δw) = (4.3.9)
L(Δξ , Δη, Δζ , Δw)
we obtain the optimal quadrature parameters as (Δξ , Δη, Δζ , Δw) that result in the
maximum fitness.
Here, the effect of (Δξ , Δη, Δζ , Δw) on the fitness is studied for the case where
the element stiffness matrix of an eight-noded hexahedral element is calculated using
the numerical quadrature with eight integration points (two points in each axis).
Using the element of the cubic shape with the (0, 0, 0)-(1, 1, 1) being the diagonal
as a reference (See Fig. 4.4), we generate distorted elements by changing the position
of the nodal point P6 .
Figure 4.7 shows the change in the fitness when the coordinate ξ of each integration
point is shifted for the element with the coordinates of P6 being (1,1,2). Here, the
vertical axis is the fitness, and when it exceeds 1.0, it means that the element stiffness
matrix calculated with the modified quadrature parameters is closer to the true matrix
than that calculated with standard quadrature parameters. The horizontal axis is the
amount of change in the coordinate value of the integration point. For each of the eight
integration points, the change in fitness when the coordinate ξ of the integration point
is moved is shown. For example,
( 001 in the figure means the integration point whose
√ √ √ )
standard coordinates are −1/ 3, −1/ 3, 1/ 3 and 010 that whose standard
( √ √ √ )
coordinates are −1/ 3, 1/ 3, −1/ 3 .
√
Results for integration points where the standard ζ -coordinate is 1/ 3 are shown
with markers. Note that some lines overlap due to symmetry. Figures 4.8 and 4.9
show the change in the fitness when the η and ζ coordinates of each integration
point are moved, respectively. The latter figure shows √ that the fitness is improved by
moving the integration point with ζ -coordinate √ 1/ 3 in the plus(+) direction and
the integration point with ζ -coordinate −1/ 3 in the minus(−) direction.
It is also seen that the degree of improvement in the fitness is different for each
integration point. Figure 4.10 shows the change in the fitness when the weights of
each integration point are changed for the same element, depicting that the fitness is
improved by increasing the weight at any integration point.
Figure 4.11 shows the change in the fitness when the coordinate ξ of each inte-
gration point is shifted with the position of P6 being (2,1,1), while Figs. 4.12 and
4.13 show those when the coordinates η and ζ of each integration point are shifted,
respectively. The change in the fitness when the weight of an integration point is
changed is also shown in Fig. 4.14.
These results show that the fitness greater than 1.0 can be obtained by changing any
of the quadrature parameters, that more accurate numerical quadrature of an element
4.3 Accuracy Dependency of Stiffness Matrix on Numerical Quadrature 113
0.8
Fitness
0.6 000
001
010
0.4
011
100
0.2 101
110
111
0.0
-0.02 -0.01 0 0.01 0.02
Δξ
0.8
Fitness
0.6 000
001
010
0.4
011
100
0.2 101
110
111
0.0
-0.02 -0.01 0 0.01 0.02
Δη
stiffness matrix can be achieved by optimizing the quadrature parameters for each
element, that the change in the fitness values near the optimum one is gradual, and
that changing the coordinates of the integration points results in better fitness than
changing the weights of them.
114 4 Numerical Quadrature with Deep Learning
0.8
Fitness
0.6 000
001
010
0.4
011
100
0.2 101
110
111
0.0
-0.02 -0.01 0 0.01 0.02
Δζ
0.8
Fitness
0.6 000
001
010
0.4 011
100
101
0.2
110
111
0.0
-0.02 -0.01 0 0.01 0.02
Δw
In the previous section, it is shown that the accuracy of the numerical quadrature
can be improved by changing quadrature parameters and that the degree of improve-
ment is quantitatively evaluated by the fitness defined in Eq. (4.3.9). In this section,
4.4 Search for Optimal Quadrature Parameters 115
1.0
0.8
Fitness
0.6 000
001
010
0.4 011
100
101
0.2
110
111
0.0
-0.02 -0.01 0 0.01 0.02
Δξ
0.8
Fitness
0.6 000
001
010
0.4 011
100
101
0.2 110
111
0.0
-0.02 -0.01 0 0.01 0.02
Δη
defining the optimal quadrature parameters as (Δξ , Δη, Δζ , Δw) that maximize the
fitness for a given number of integration points, a method for obtaining the optimal
quadrature parameters for each element is discussed.
The quadrature parameters are classified into two categories: the coordinates of
the integration points Δξ , Δη, and Δζ , and the weights of the integration points Δw.
116 4 Numerical Quadrature with Deep Learning
0.8
Fitness
0.6 000
001
010
0.4
011
100
0.2 101
110
111
0.0
-0.02 -0.01 0 0.01 0.02
Δζ
1.0
0.8
Fitness
0.6 000
001
010
0.4 011
100
0.2 101
110
111
0.0
-0.02 -0.01 0 0.01 0.02
Δw
With regard to the increase in the computational load when performing the numerical
quadrature for an element stiffness matrix using optimal quadrature parameters, the
change in the weights of integration points Δw requires only a few modifications in
4.4 Search for Optimal Quadrature Parameters 117
the program for the element stiffness matrix causing little increase in the computa-
tional load, whereas the change in the coordinates of integration points Δξ , Δη, and
Δζ requires additional changes in the program related to basis functions, which often
increases the computational load significantly. In addition, the number of coordinate
values Δξ , Δη, and Δζ to be tuned is three times as large as that of weights Δw.
On the other hand, as we have seen in Sect. 4.3, optimization of the coordinates
of integration points may provide a higher degree of improvement in the accuracy
of the elemental integration than that of the weights alone.
In any case, an efficient method for searching the optimal parameters is required.
In [17], the weights Δw are optimized employing a random search to find the optimal
parameters. On the other hand, the set of optimal parameters (Δξ , Δη, Δζ , Δw),
maximizing the fitness defined in Eq. (4.3.9) or equivalently minimizing the error
defined in Eq. (4.3.8), are efficiently obtained by using various evolutionary computa-
tion algorithms, which have another advantage that it is easy to add various constraints
to the individual parameters to be tuned while using Eq. (4.3.9) as the target function
to be maximized.
Here, we study an efficient search method for the optimal quadrature parameters
using the evolutionary algorithms, which are optimization algorithms inspired by
the evolution and behavior of living things, such as genetic algorithm (GA) [18, 19],
which imitates the evolution of lives, artificial bee colony algorithm (ABC) [20],
which imitates the foraging behavior of honeybees, particle swarm optimization
(PSO) [21], which mimics the swarming behavior of birds and fish, firefly algorithm
(FA) [22], which mimics the courtship behavior of fireflies, and bat algorithm (BA)
[23], which mimics the reverberant localization behavior of bats. They are often
called the swarm intelligence [24]. Evolutionary computation algorithms have been
applied to a variety of engineering problems [25–27].
In this section, PSO is employed among others to search for the optimal quadrature
parameters.
When the number of parameters to be optimized is N p , the i-th individual x in (or
its equivalent) and its speed v in at the n-th generation (or n-th iteration) in PSO are
represented by a one-dimensional array, respectively, as follows:
( )
x in = xi,1
n
, xi,2
n
, xi,3
n
, · · · , xi,N
n
p −2
, xi,N
n
p −1
, xi,N
n
p
(4.4.1)
( )
v in = vi,1
n
, vi,2
n
, vi,3
n
, · · · , vi,N
n
p −2
, v n
, v n
i,N p −1 i,N p (4.4.2)
where xi,n j is the j-th component of the coordinates indicating the position in N p -
dimensional space of the i-th individual of the n-th generation, and vi,n j is its velocity.
The coordinates of each individual correspond to the set of quadrature parameters
(Δξ , Δη, Δζ , Δw), which allow the fitness value of each individual to be calculated
from Eq. (4.3.9).
The update equations for v in and for x in are, respectively, written as follows:
118 4 Numerical Quadrature with Deep Learning
( ) ( )
v in = αv in−1 + β g n − x in × r nd + γ pin − x in × r nd (4.4.3)
x in+1 = x in + v in (4.4.4)
where g n is the best individual in the population at the n-th generation; pin is the best
of the i-th individual up to the n-th generation; α, β, and γ are constants; and rnd is
a random number in the range [0.0, 1.0].
The flowchart of PSO is shown in Fig. 4.15. After the initial population is gener-
ated, the best individual of each generation and the best of each individual up to the
current generation are determined, and each individual is repeatedly updated along
the directions to these best individuals. PSO is an algorithm that uses the gradient
to the best individual, and once a good individual is found, it is expected to have a
good convergence in such real-valued searches as the present case.
Here, PSO is used to search for the optimal quadrature parameters for an eight-
noded hexahedral element. The number of integration points is set to eight (two
for each axis). Using the element of cubic shape shown in Fig. 4.4, 100 hexahe-
dral elements are generated by randomly shifting all nodal coordinates in the range
of [−0.2, 0.2] from the reference position. And, for these elements generated, the
optimal quadrature parameters, which maximize Eq. (4.3.9), are searched by PSO,
where the number of individuals is set to 1,000 and that of generations (steps) 10,000,
the maximum change in the coordinates of integration points within ±0.1, and the
change in the weights at integration points within ±10% of the standard weights.
The quadrature parameters for the search are 24 coordinate values and 8 weights of
initial particles
evaluation
yes
stop criterion
end
satisfied
no
update particles
4.4 Search for Optimal Quadrature Parameters 119
eight integration points; then the length of the array representing an individual is set
to 32.
The structure of an array of the quadrature parameters for each individual is shown
as
(
x in = (Δξ1 )in , (Δη1 )in , (Δζ1 )in , · · · , (Δξ8 )in ,
)
(Δη8 )in , (Δζ8 )in , (Δw1 )in , · · · , (Δw8 )in (4.4.5)
4.0
Fitness
3.0
2.0
1.0
0.0
Best Average Worst
10
0
0 5 10 15
Fitness obtained by optimizing C&W
From both figures above, it can be inferred that an element for which high fitness
value is obtained by one of the three optimization methods (i.e., an element for which
the fitness value can be greatly improved) will have high fitness value by any of the
other methods, indicating a strong dependency on the element shape.
The distribution of the best fitness obtained when all parameters are optimized,
i.e., the best in Fig. 4.16, is shown in Fig. 4.19, indicating that the fitness varies
4.4 Search for Optimal Quadrature Parameters 121
1.5
1.0
0.5
0.0
0 2 4 6 8 10
Fitness obtained by optimizing C
element by element significantly. The shape of the element with the largest fitness is
shown in Fig. 4.20a and that with the smallest fitness in Fig. 4.20b.
It is concluded that PSO is effective in finding the optimal quadrature parameters
for each element, with which a more accurate element stiffness matrix is obtained
than with the standard Gauss–Legendre quadrature parameters.
15
Number of Elements
10
0
1 3 5 7 9 11 13 15
Fitness
122 4 Numerical Quadrature with Deep Learning
In Sect. 4.4, we have discussed how to improve the accuracy of the elemental integra-
tion by optimizing the quadrature parameters with the number of integration points
fixed. On the other hand, the accuracy improvement of the elemental integration
can also be achieved by increasing the number of integration points. In the present
section, we study the optimal number of integration points to achieve a predetermined
accuracy and the effect of elemental shape on the optimal number.
As shown in Fig. 4.6, the convergence rate of the element integration depends on
the shape of the element. In the case of an eight-noded hexahedral element of cubic
shape, an accurate element stiffness matrix is obtained even with two integration
4.5 Search for Optimal Number of Quadrature Points 123
points per axis (or total number of integration points is eight), whereas for an element
of irregular shape, a large number of integration points are required to obtain an
accurate element stiffness matrix. Note that a good accuracy above means that the
difference between the element stiffness matrix under consideration and that obtained
with a very large number of integration points or Eq. (4.3.2) is small.
Here, the optimal number of integration points for an element is defined as
the minimum number of integration points per axis for which the error defined in
Eq. (4.3.2) is less than a predefined value (threshold). In the case of Fig. 4.6 with
the threshold set to 10−7 , the optimal number of integration points for element A is
2, that for element B is 5, and that for element C is 8, where the same number of
integration points per axis is assumed for all the axes.
In large-scale finite element analysis, the number of elements employed is huge
and the shapes of them are various. Consider the same number of integration points is
assumed for each element in the domain. Then, the accuracy of the calculated element
stiffness matrix may vary element by element as discussed above. Therefore, it seems
reasonable to perform the numerical quadrature for the element stiffness matrix using
the optimal number of integration points for each element.
Let the eight-noded hexahedral element of a cubic shape (Fig. 4.4) be a reference.
We generate a number of elements of various shapes from the above element by
translating all the nodes within a range of ± 0.2 along each axis, but the nodes P0
and P1 are fixed at (0,0,0) and (1,0,0), respectively, and the z-coordinate of node
P3 is fixed to 0 [17]. Using this method, a total of 100,000 elements are generated
and the optimal numbers of integration points for each element for three different
threshold values are calculated, and then the numbers of elements classified by the
optimal number of integration points are tabulated (Table 4.3). For example, when
the threshold is set to 10−7 , 58,467 out of 100,000 elements are found to have an
optimal integration point of 5 per axis.
Based on the discussions above, two methods for optimizing the elemental integration
using deep learning are discussed here: Sect. 4.6.1 describes the estimation of optimal
quadrature parameters and Sect. 4.6.2 that of the optimal number of integration points.
In this section, we study to develop the rules that derive the optimal number of inte-
gration points from the element shape parameters, where deep learning is employed
to construct the rules behind the correspondence.
The method for obtaining the optimal number of integration points in the elemental
integration by deep learning is summarized in the following three phases, where the
parameters to describe the element shape (nodal coordinates, etc.) are denoted as
{e − parameters}, and the number of integration points used in element integration
by n, m and l in each axis, respectively. Here, it is assumed to be n = m = l.
(1) Data Preparation Phase: Setting a threshold, calculate the optimal number of
integration points n opt for a large number of elements with various shapes. (See
Sect. 4.5) This yields a large number of data pairs ({e − parameters}, n opt ).
(2) Training Phase: Deep learning is performed on the data pairs obtained above,
where input and teacher data are, respectively, set as follows:
Input data: {e − parameters}
Teacher data: n opt
Once the training is done, the input and output of the trained neural network
are, respectively, given as follows:
126 4 Numerical Quadrature with Deep Learning
First, training patterns are generated by random sampling from the data created in
Sect. 4.5. Among the eight nodes in an element, the node P0 is fixed to (0,0,0), the
node P1 to (1,0,0), and the z-coordinate of the node P3 to 0. Thus, 17 coordinate
values of the remaining nodes are used as {e − parameters} to define the element
shape. The threshold value of Error [Eq. (4.3.2)] in the element integration is set to
4.7 Numerical Example A 127
10−7 . As can be seen from Table 4.3, the optimal number of integration points is
distributed from 4 to 11 for 100,000 elements. It is noted that the number of elements
with the optimal number of integral greater than 8 is less than 50, which seems very
small.
Here, we define the problem to classify elements into the following five categories.
Category 1: n opt = 4 (542 elements belong to this category.)
Category 2: n opt = 5 (58,467 elements belong to this category.)
Category 3: n opt = 6 (37,852 elements belong to this category.)
Category 4: n opt = 7 (2,917 elements belong to this category.)
Category 5: n opt ≥ 8 (222 elements belong to this category.)
In this section, we construct a feedforward neural network that estimates the optimal
number of integration points n opt from the element shape. The input and output data
(teacher data) of the neural network are set as follows:
Input data: 17 nodal coordinates of the 8 nodes in an element
Teacher data: n opt , the optimal number of integration points for the element.
Regarding the structure of the feedforward neural network used here, the numbers
of units in the input and output layers are automatically determined by the number
of input data and that of teacher data, respectively. In the present case, the number
of units in the input layer is 17. That in the output layer is set to 5 as the one-hot
encoding is used as the teacher data. In other words, we have the same number of
outputs as the number of categories (5 in this case), and only the unit corresponding
to the correct category outputs 1, while the other units output 0.
On the other hand, the number of hidden layers and that of units in each hidden
layer are often determined from various combinations by trial and error. Here, we
have decided to choose as the structure of a feedforward neural network from the
candidate combinations as follows:
The numbers of hidden layers: 1, 2, 3, 4, 5, and 6
The numbers of units per hidden layer: 20, 40, 60, and 80
Note, in the case of two or more hidden layers, the numbers of units in each hidden
layer are assumed to be the same.
50,000 training patterns are randomly selected from the 100,000 patterns collected
in the Data Preparation Phase; then, out of them, five different numbers of training
patterns (5000, 10,000, 20,000, 30,000, and 50,000) are employed to check the effect
of the number of training patterns.
In addition, 10,000 patterns are randomly selected from the remaining 50,000
patterns to be used as patterns for verifying the generalization ability of the neural
network after training.
128 4 Numerical Quadrature with Deep Learning
As described above, we choose the best training condition from 120 conditions,
including 6 different numbers of hidden layers, 4 of units per hidden layer, and 5 of
training patterns. All other conditions such as the learning coefficients are common,
and the number of training epochs is set to 10,000.
For comparison of the neural networks trained with various conditions, the
average estimation error of the patterns for verification of the generalization ability
is employed, which is defined as
1 Σ
NV Σ 5
lp l
Error N N = l O j − pTj l (4.7.1)
N V p=1 j=1
where p O j is the output of the j-th unit of the output layer for the p-th input pattern,
p
T j the corresponding teacher data, and N V the number of patterns for verification
of the generalization capability (10,000 in this case).
It is known in the training of neural networks that the results trained are affected
by the initial values of the connection weights. Therefore, in order to reduce the
influence of the initial value of the connection weight, five training sessions with
different initial values are performed under the same training condition, and the best
result among them is considered to represent the training condition.
Figures 4.21, 4.22, 4.23 and 4.24 show the training results, where the horizontal
axis is the number of hidden layers, and the vertical axis is the average estimation
error Error N N defined in Eq. (4.7.1). Figure 4.21 shows the results with 20 units per
hidden layer, where label TP05U20 depicts the results with 5000 training patterns
and TP50U20 those with 50,000 training patterns. Similarly, Figs. 4.22, 4.23, and
4.24 show the results when the number of units per hidden layer is 40, 60, and 80,
respectively.
All the results show that the accuracy improves as the number of training data
is increased. In most cases, in addition, the accuracy is improved by increasing the
number of hidden layers and units per hidden layer.
Based on the results of Sect. 4.7.2, we consider here to estimate the optimal number
of integration points using the neural network trained with 50,000 training patterns,
where, out of neural networks trained with 50,000 training patterns, the neural
network with 5 hidden layers and 40 units per hidden layer is employed. Some
neural networks, including that with 6 hidden layers and 80 units per hidden layer,
show smaller average estimation errors than the selected one, but the difference is
small. So, the selection has been done in consideration of reducing the amount of
computation during estimation.
4.7 Numerical Example A 129
0.3
Error
0.2
TP05U20
TP10U20
0.1 TP20U20
TP30U20
TP50U20
0.0
1 2 3 4 5 6
Number of Hidden Layers
0.3
Error
0.2
TP05U40
TP10U40
0.1 TP20U40
TP30U40
TP50U40
0.0
1 2 3 4 5 6
Number of Hidden Layers
To study the generalization capability of the neural network employed, Table 4.4
shows the results of the estimation of the optimal number of integration points for
the patterns (10,000 patterns). Since one-hot encoding is used as the output method,
the category corresponding to the output unit that outputs the maximum value for
each input pattern is judged as the estimated category (the optimal number of integral
130 4 Numerical Quadrature with Deep Learning
0.3
Error
0.2
TP05U60
TP10U60
0.1 TP20U60
TP30U60
TP50U60
0.0
1 2 3 4 5 6
0.3
Error
0.2
TP05U80
TP10U80
0.1 TP20U80
TP30U80
TP50U80
0.0
1 2 3 4 5 6
points). The table shows that the percentage of correct classification is more than
91%, indicating that most of the misclassifications are those into adjacent categories.
Thus, the optimal number of integration points, shown to be estimated from the
element geometry, can be used to reduce the computational load of the element
integration process.
4.8 Numerical Example B 131
The same element data and the teacher data as in Sect. 4.7 are employed
in the Data Preparation Phase. As the input data for the neural network,
seven shape features: AlgebraicShapeMetric, MaxEdgeLength, MinEdgeLength,
MaxEdgeAngle, MinEdgeAngle, MaxFaceAngle, and MinFaceAngle are employed
here. Computer codes for calculating these features are described in Sect. 9.1.2, and
each of them is detailed as follows:
AlgebraicShapeMetric: A quantity defined based on the condition number of a
hexahedral element [28], which takes values in the range [0.0,1.0] and is 1.0 for a
perfect cubic shape.
MaxEdgeLength and MinEdgeLength: The maximum and minimum edge lengths
of a hexahedral element, respectively. Note that, obviously, MaxEdgeLength ≥ 1.0
and MinEdgeLength ≤ 1.0 hold for the elements to be considered here.
MaxEdgeAngle and MinEdgeAngle: The maximum and minimum angles
between the three edges starting from each vertex of a hexahedral element,
respectively.
MaxFaceAngle and MinFaceAngle: The maximum and minimum angles between
the two faces that share an edge of a hexahedral element, respectively.
The optimal number of integration points n opt is defined as the minimum number
of integration points per axis with which Error in Eq. (4.3.2) is less than the threshold;
i.e., n opt is defined as the number of integration points satisfying the following
equation.
132 4 Numerical Quadrature with Deep Learning
( ) ( )
Error n opt − 1 > threshold ≥ Error n opt (4.8.1)
where Error(k) is the error when integrating an element with k integration points per
axis.
The difference between the error and the threshold when integrating with n opt inte-
gration points depends on the elements. In order to evaluate the relationship between
the shape features and the convergence of the elemental integration, we introduce
opt
n r as a more detailed indicator of the convergence of the elemental integration,
which is defined as the number of integration points (non-integer, hypothetical real
value), where Error in Eq. (4.3.2) is exactly equal to the threshold as given by the
following equations.
( )
Error n ropt = threshold (4.8.2)
( ) ( ) ( )
Error n opt − 1 ≥ Error n ropt = threshold ≥ Error n opt (4.8.3)
The relationship between each shape feature and the optimal number of integration
opt
points (real value) n r is shown in Figs. 4.25, 4.26, 4.27, 4.28, 4.29, 4.30 and 4.31.
Figure 4.25 shows the relationship between the shape feature, AlgebraicShapeMetric,
opt
and the optimal number of integration points (real value) n r , where the horizontal
opt
axis is the AlgebraicShapeMetric value, and the vertical axis is n r as displayed
for 5000 randomly selected elements. It is clear from the figure that the smaller the
opt
AlgebraicShapeMetric value is, the larger the value of n r tends to be.
opt
Figures 4.26 and 4.27 show the relationship between MinEdgeLength and n r ,
opt
and that between MaxEdgeLength and n r , respectively. It can be seen from the
figures that the degree of correlation is relatively small.
opt
Figures 4.28 and 4.29 show the relationship between MinEdgeAngle and n r ,
opt
and that between MaxEdgeAngle and n r , respectively, while Figs. 4.30 and 4.31
opt
depict that between MinFaceAngle and n r , and that between MaxFaceAngle and
opt
n r , respectively. It can be seen from the figures that the degree of correlation is
strong.
As described above, all of the shape features are correlated with the convergence
of the elemental integration; then it seems reasonable to construct a neural network
to estimate the optimal number of integration points n opt using these shape features
as input.
4.8 Numerical Example B 133
3
0.5 0.6 0.7 0.8 0.9 1.0
AlgebraicShapeMetric
3
0.6 0.7 0.8 0.9 1.0
MinEdgeLength
Here, a feedforward neural network that estimates the optimal number of integration
points n opt from element shape parameters is constructed. The input data and output
data (teacher data) of the neural network are, respectively, set as follows:
134 4 Numerical Quadrature with Deep Learning
3
1.0 1.1 1.2 1.3 1.4 1.5
MaxEdgeLength
Fig. 4.27 Relation between MaxEdgeLength and optimal number of quadrature points
9
Optimal Number of Quadrature Points
3
20 30 40 50 60 70 80 90
MinEdgeAngle
Fig. 4.28 Relation between MinEdgeAngle and optimal number of quadrature points
4.8 Numerical Example B 135
3
90 100 110 120 130 140 150 160
MaxEdgeAngle
Fig. 4.29 Relation between MaxEdgeAngle and optimal number of quadrature points
9
Optimal Number of Quadrature Points
3
50 60 70 80 90
MinFaceAngle
Fig. 4.30 Relation between MinFaceAngle and optimal number of quadrature points
136 4 Numerical Quadrature with Deep Learning
3
90 100 110 120 130 140 150
MaxFaceAngle
We show here the results of the estimation of the optimal number of integration points
for the generalization capability verification patterns (10,000 patterns) in Table 4.5.
Since one-hot encoding is used as the output method, the category corresponding to
the output unit that outputs the maximum value for each input pattern is determined
to be the predicted category (the optimal number of integration points). It can be seen
References 137
from Table 4.5 that the level of correct classification is more than 74%, indicating
that most of the misclassifications are those into adjacent categories.
As shown above, it is also possible to estimate the optimal number of integration
points for an element from the shape features of the element, but the accuracy is
inferior to the estimation using the nodal coordinate values. However, it may be
possible to achieve higher accuracy by the better selection of set of shape features
and the use of nodal coordinate values in combination with the shape features.
References
13. Nagy, A. P., Benson, D. J.: On the numerical integration of trimmed isogeometric elements.
Comput. Methods Appl. Mech. Eng. 284, 165–185 (2015)
14. Rajendran, S.: A technique to develop mesh-distortion immune finite elements. Comput.
Methods Appl. Mech. Eng. 199, 1044–1063 (2010)
15. Schillinger, D., Hossain, S. J., Hughes, T. J. R.: Reduced Bezier element quadrature rules for
quadratic and cubic splines in isogeometric analysis. Comput. Methods Appl. Mech. Eng. 277,
1–45 (2014)
16. Sevilla, R., Fernandez-Mendez, S.: Numerical integration over 2D NURBS-shaped domains
with applications to NURBS-enhanced FEM. Finite Elem. Anal. Des. 47, 1209–1220 (2011)
17. Oishi, A., Yagawa, G.: Computational mechanics enhanced by deep learning. Comput. Methods
Appl. Mech. Eng. 327, 327–351 (2017)
18. Goldberg, D. E.: Genetic Algorithms in Search, Optimization & Machine Learning. Addison-
Wesley (1989)
19. Michalewicz, Z.: Genetic Algorithms + Data Structures = Evolution Programs. Springer-
Verlag (1992)
20. Karaboga, D., Basturk, B.: A powerful and efficient algorithm for numerical function
optimization: artificial bee colony (ABC) algorithm. J. Global Optim. 39(3), 459–471 (2007)
21. Kennedy, J., Eberhart, R.: Particle Swarm Optimization. In: Proceedings of IEEE International
Conference on Neural Networks, IV, pp. 1942–1948 (1995)
22. Yang, X. S.: Nature-Inpsired Metaheursitic Algorithms, Luniver Press, Frome, UK (2008)
23. Yang, X. S.: A new metaheuristic bat-Inspired algorithm. Studies in Computational Intelligence
284, 65–74, Springer (2010)
24. Bonabeau, E., Dorigo, M., Theraulaz, G.: Swarm Intelligence: From Natural to Artificial
Systems. Oxford University Press (1999)
25. Botello, S., Marroquin, J. L., Oñate, E., Van Horebeek, J.: Solving structural optimization
problems with genetic algorithms and simulated annealing. Int. J. Numer. Methods Eng. 45(5),
1069–1084 (1999)
26. Parpinelli, R. S., Teodoro, F. R., Lopes, H. S.: A comparison of swarm intelligence algorithms
for structural engineering optimization. Int. J. Numer. Methods Eng. 91, 666–684 (2012)
27. Vieira, I. N., Pires de Lima, B. S. L., Jacob, B. P.: Bio-inspired algorithms for the optimization
of offshore oil production systems. Int. J. Numer. Methods Eng. 91, 1023–1044 (2012)
28. Knupp, P. M.: A method for hexahedral mesh shape optimization. Int. J. Numer. Methods Eng.
58, 319–332 (2003)
Chapter 5
Improvement of Finite Element Solutions
with Deep Learning
Abstract The accuracy of the FEM solution is known to be improved when dividing
the analysis domain into smaller elements, while the computation time increases
explosively. In this chapter, we discuss a method to improve the accuracy of the
FEM solution with a small number of elements using error information and deep
learning.
In the finite element method (FEM), the static problem of a structure is reduced to
the following simultaneous linear equations.
{ }
[K ] U G = {F} (5.1.1)
{ }
where U G is the displacement vector of all the nodes { in }the domain, [K ] the global
stiffness matrix, and {F} the load vector. The size of U G is the product of the total
number of nodes and the degrees of freedom per node. The global stiffness matrix
[K ] is constructed by assembling all the element stiffness matrices in the analysis
domain as shown in
Σ
ne
[ e]
[K ] = k (5.1.2)
e=1
where [k e ] is the element stiffness matrix of the eth element and n e the total number
of elements in the domain.
The displacements of each node are obtained by solving Eq. (5.1.1), from which
the displacements, strains, and stresses at an arbitrary location in the analysis domain
are calculated. If the shape function of an element is fixed, the accuracy of the
calculated physical quantities (such as displacements, strains, and stresses) depends
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 139
G. Yagawa and A. Oishi, Computational Mechanics with Deep Learning,
Lecture Notes on Numerical Methods in Engineering and Sciences,
https://doi.org/10.1007/978-3-031-11847-0_5
140 5 Improvement of Finite Element Solutions with Deep Learning
y
x (0,0)
on the element size, meaning that a lot of small elements should be used to obtain a
high accuracy.
A simple example of the domain of a two-dimensional stress analysis is shown
in Fig. 5.1, where the bottom surface is fixed and the load is applied to the half of
the top surface, the elements are four-noded linear elements, and the stress anal-
ysis is performed using seven different element divisions with different numbers of
elements: 16 (4 × 4), 64 (8 × 8), 256 (16 × 16), 1024 (32 × 32), 4096 (64 × 64),
16,384 (128 × 128), and 65,536 (256 × 256). Figure 5.2 shows a typical example of
step-by-step element division, where all the elements are divided equally into four at
each step and the material is assumed to be an isotropic elastic one. Figure 5.3 shows
where the stress values are evaluated. Figures 5.4 and 5.5 depict the stress values at
the point A and the point B in Fig. 5.3, respectively, where the horizontal axes are the
total numbers of elements in the meshes and the vertical axes the calculated stress
values. It can be seen from the figures that as the numbers of elements increase or
the element sizes decrease, there exists a tendency to converge to a certain value.
In other words, the accuracy of analysis results can be improved by dividing the
analysis domain into as many elements as possible.
There have been some theoretical studies on the accuracy of the finite element
method [4]. For example, for a one-dimensional problem spanning the interval [a, b],
the accuracy of the finite element solution is evaluated as follows [17]:
┌
|b
|∫ Σm ( i )
| du di u h 2
∥u − u h ∥m = √ − dx ≤ ch k+1−m (5.1.3)
dx i dx i
i=0
a
where u is the exact solution, h the element size, u h the finite element solution, c a
constant, k the degree of the basis function (polynomial), and 2m the order of the
differential equation to be solved. Note that, in the case of the ordinary linear stress
analysis, m = 1. It can be seen from the above equation that the smaller the element
size and the higher the order of the basis functions, the closer the finite element
solution is to the exact one.
5.2 Computation Time versus Element Size 141
As discussed in the previous section, the accuracy of the FEM solution is improved by
reducing the element size. In this section, the increase in the amount of computation
with reducing the element size is studied.
As the element size is reduced, the total number of elements as well as the compu-
tation time increases. From the viewpoint of computational load, the main processes
of the finite element method consist of
• Construction process of the global stiffness matrix [K ].
• Solving process of the set of linear equations with [K ] as the coefficient matrix
(Eq. (5.1.1)).
As for the former, the computational load required is proportional to the total
number of elements in the domain. Consider a two-dimensional rectangular region
is divided evenly by quadrilateral elements as shown in Fig. 5.2. If the length of
one side of the element is halved, the total number of elements together with the
computational load required to construct the global stiffness matrix is quadrupled. If
a hexahedron in the three-dimensional case is divided evenly into smaller hexahedral
elements, halving the length of one side of the element increases the total number
of elements by a factor of eight, and the computational load also does by a factor of
eight.
On the other hand, the computation time required to solve a set of linear equations
with [K ] as the coefficient matrix (Eq. (5.1.1)) also increases as the number of
elements does. Among the two methods to solve a set of linear equations, the direct
method and the iterative method, the number of unknowns is reduced sequentially
to obtain the solution in the former, while in the latter, the arbitrary initial solution
vector is successively updated to get closer to the correct solution vector [9].
Let us consider the amount of computation required to solve simultaneous linear
equations as follows:
16 64 256
B
(1.95, 1.95)
y
x (0,0)
y
τ
xy
0.5
0.0
-0.5
10 100 1000 10000 100000
Number of Elements
0.8
σ
0.6 x
σy
Stress
0.4 τ
xy
0.2
0.0
-0.2
-0.4
10 100 1000 10000 100000
Number of Elements
5.2 Computation Time versus Element Size 143
The Gaussian elimination method is known to be the most basic direct method
for solving a set of linear equations above, a pseudo-code for which is given as List
5.2.1.
List 5.2.1 Pseudo-code for Gaussian elimination
1 for(i=1;i<=n−1;i++){
2 for(j=i+1;j<=n;j++){
3 aa = A[j][i]/A[i][i];
4 b[j] = b[j] − aa*b[i];
5 for(k=i+1;k<=n;k++){
6 A[j][k] = A[j][k] − aa*A[i][k];
7 }
8 }
9}
10 b[n] = b[n]/b[n][n];
11 for(i=n-1;i>=1;i--){
12 for(j=i+1;j<=n;j++){
13 b[i] = b[i] - A[i][j]*b[j];
14 }
15 b[i] = b[i]/A[i][i];
16 }
Σ
n−1 Σ
n Σ
n Σ
n−1 Σ
n Σ
n−1
1
2= 2(n − i ) = 2(n − i)2 = (n − 1)n(2n − 1)
i=1 j=i+1 k=i+1 i=1 j=i+1 i=1
3
(5.2.2)
Σ
n−1 Σ
n Σ
n−1
2= 2(n − i ) = (n − 1)n (5.2.3)
i=1 j=i+1 i=1
entire solution process increases in proportion to the cube of the number of unknowns.
Note here that if a computing process consists of multiple subprocesses and each of
them requires computational load in proportion to different power of the number of
unknowns, the computational load of the subprocess with the highest power becomes
dominant as the number of unknowns increases.
Next, the computational load of the Gauss–Seidel method is discussed, which
is one of the basic iterative solution methods, where the coefficient matrix [A] is
decomposed into two matrices as shown in the following equation.
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
a11 a12 ··· a1n a11 0 ··· 0 0 −a12 · · · −a1n
⎢ a21 a22 ··· a2n ⎥ ⎢ a21 a22 ··· 0 ⎥ ⎢0 0 · · · −a2n ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
[A] =⎢ . .. .. .. ⎥ = ⎢ .. .. .. .. ⎥−⎢. . .. .. ⎥
⎣ .. . . . ⎦ ⎣ . . . . ⎦ ⎣ .. .. . . ⎦
an1 an2 · · · ann an1 an2 · · · ann 0 0 ··· 0
= [N ] − [P] (5.2.4)
In the Gauss–Seidel method, we start with an appropriate initial vector {x}(0) , then
improve it successively by the above equation to converge to the correct solution
vector.
Let us consider the computational load per iteration of the equation. It is noted,
here, that the computational load of the matrix–vector product on the right-hand
side of the equation is proportional to the square of the number of unknowns. Since
the solution of the simultaneous linear equations with [N ] as the coefficient matrix
corresponds to the backward substitution process in the Gaussian elimination method,
the computational load is also proportional to the square of the number of unknowns.
Therefore, the computational load per iteration of the Gauss–Seidel method increases
in proportion to the square of the number of unknowns, and if the number of iterations
is of the same order as the number of unknowns, the overall computational load of
the method is proportional to the cube of the number of unknowns.
Thus, the computational load of each process of the finite element method is given
as
• Construction process of the global stiffness matrix [K ]: O(n) and
• Solving process( of)the set of linear equations with [K ] as the coefficient matrix
(Eq. (5.1.1)): O n 3 ,
5.2 Computation Time versus Element Size 145
⎛ ⎞
U1
⎜ ⎟
⎜ ⎟ V1
⎜ ⎟
⎜ ⎟ U2
{ G} ⎜ ⎟
U =⎜⎜ ⎟ V2 (5.2.7)
⎟ ..
⎜ ⎟
⎜ ⎟ .
⎜ ⎟
⎝ U16 ⎠
V16
146 5 Improvement of Finite Element Solutions with Deep Learning
13 14 15 16
9 10 11 12
5 6 7 8
1 2 3 4
(a) (b)
Fig. 5.6 Sparse global stiffness matrix obtained in the finite element method
The global stiffness matrix is usually symmetric, and nonzero components are
indicated by blue circles only in the upper triangular part above the diagonal of the
matrix as shown in Fig. 5.6b.
As shown in Fig. 5.7, the global stiffness matrix is usually sparse, where the
nonzero components are located in a banded region near the diagonal. The width B
of this band shown in the figure is called the maximum half-bandwidth.
For a sparse matrix of a banded structure such as the global stiffness matrix shown
above, the scope of the for-loop in the Gaussian elimination method (List 5.2.1) ( can)
be narrower, and the computational load in this case can be reduced to about O n B 2 .
Although the maximum half-bandwidth B increases with the number of unknowns
n, the computational load can be greatly reduced since B « n for most cases.
It is known that the nodal numbering of the FEM is arbitrary, and the maximum
half-bandwidth B of the global stiffness matrix depends on the nodal numbering [5,
11]. For example, Fig. 5.8a shows the same mesh as in Fig. 5.6a, but with different
node numbering, and Fig. 5.8b shows the global stiffness matrix for the mesh given
in Fig. 5.8a. It is noted that the maximum half-bandwidth of the matrix in Fig. 5.8b
is slightly larger than that in Fig. 5.6b. Since the maximum half-bandwidth varies
depending on the nodal numbering, affecting the computational load required for
solving corresponding simultaneous linear equations, the optimal nodal numbering
methods have been studied including the Cuthill–McKee (CM) method [7], the
reverse Cuthill–McKee (RCM) method [6, 14], and other methods [8, 12, 18].
It has been shown in the above that the computational load can be reduced using
the structure that the global stiffness matrix is banded with nonzero components only
in the banded regions near the diagonal.
5.2 Computation Time versus Element Size 147
7 11 14 16
4 8 12 15
2 5 9 13
1 3 6 10
(a) (b)
In addition, the memory usage can be reduced as well by storing only the compo-
nents in the band. As shown in Fig. 5.7, not only nonzero components but also zero
components exist in the banded region. Since the zero component does not affect
the results, further reduction of memory and computation time can be achieved by
148 5 Improvement of Finite Element Solutions with Deep Learning
storing only the nonzero components in memory. In this case, the reduction rate
increases with the size of problem or the number of nodes in the domain, meaning
that this storage method is particularly effective in large-scale analysis.
Various compact storage methods are available for sparse matrices including CRS
(Compressed Row Storage), CCS (Compressed Column Storage), and JDS (Jagged
Diagonal Storage) [19]. They store the nonzero components in a one-dimensional
real array and the position information (row, column) of each nonzero component in
a few integer arrays.
Although such a storage method can dramatically reduce the amount of memory
for large-scale analysis and the computation load as well, it should be noted that a
degradation in efficiency may occur due to non-contiguous memory access during
computation.
In this section, two methods for estimating the error in the solution obtained by the
FEM are studied.
Various methods have been used to estimate the error from the correct or exact solu-
tion for the results of the FEM, which are known as the a posteriori error estimation
methods [1, 10, 20, 21].
In the FEM for solid mechanics, the displacement method is usually used, where
the displacements are unknown variables to be solved. In this method, the displace-
ments are continuous at the boundaries between elements, but the strains and stresses,
which are the first-order derivatives of the displacements, are discontinuous. From
the physical point of view, this is unacceptable and this inconvenience is considered
to be caused by the insufficient continuity of the basis functions.
In this regard, let us review the continuity of functions. Here, we introduce the
differentiability class C n , a measure of the continuity (smoothness) of a function,
meaning that a function belonging to C n is continuous up to the n-th derivative.
Let us consider, respectively, the continuities of the functions as
{
−x (x < 0)
f 1 (x) = (5.3.1)
x (x ≥ 0)
{
− 21 x 2 (x < 0)
f 2 (x) = (5.3.2)
x (x ≥ 0)
1 2
2
5.3 Error Estimation of Finite Element Solutions 149
{
− 16 x 3 (x < 0)
f 3 (x) = (5.3.3)
x (x ≥ 0)
1 3
6
Figure 5.9 depicts the function f 1 (x) and its first-order derivative, Fig. 5.10 the
function f 2 (x) and its first- and second-order derivatives, and Fig. 5.11 the function
f 3 (x) and its first-, second-, and third-order derivatives. It is clear from these graphs
that f 1 (x) belongs to C 0 , f 2 (x) to C 1 , and f 3 (x) to C 2 at x = 0, respectively. Note
that for x /= 0, all the three functions belong to C ∞ .
Since the basis functions, defined on an element-by-element basis, have usually
only C 0 continuity at element boundaries, the stresses, which are first-order deriva-
tives of the basis functions, are discontinuous at element boundaries. Figure 5.12
shows the schematic diagrams in one dimension.
0.5
0.0
1
-0.5 1
-1.0
-1.0 -0.5 0.0 0.5 1.0
0.5
0.0
2
2
-0.5
2
2
-1.0 2
0.5
0.0
-0.5
-1.0
-1.0 -0.5 0.0 0.5 1.0
ei-1 ei ei+1
Linear Elements
ei-1 ei ei+1
Quadratic Elements
u (continuous) σ (discontinuous)
Let us have the discontinuous stresses above be smoothed so that they become
continuous at the boundaries between elements as the displacements. For this
purpose, we consider the stresses at an arbitrary position P(ξ, η) of a four-node
quadrilateral element e0 as shown in Fig. 5.13, which are given by
5.3 Error Estimation of Finite Element Solutions 151
⎛ ⎞
σx (ξ, η)
{σ (ξ, η)} = ⎝ σ y (ξ, η) ⎠ = [D][L][N (ξ, η)]{U } (5.3.4)
τx y (ξ, η)
where
⎛ ⎞
U1
⎜ V1 ⎟
⎜ ⎟
⎜ ⎟
{U } = ⎜ ... ⎟, (5.3.5)
⎜ ⎟
⎝ U4 ⎠
V4
[ ]
N1 (ξ, η) 0 N4 (ξ, η) 0
[N (ξ, η)] = ··· , (5.3.6)
0 N1 (ξ, η) 0 N4 (ξ, η)
⎡ ⎤
∂
0
⎢ ∂x ∂ ⎥
[L] = ⎣ 0 ∂ y ⎦, (5.3.7)
∂ ∂
∂y ∂x
Here, (Ui , Vi ) is the displacement at the i-th node and Ni (ξ, η) is the ith basis function.
[D] is the stress–strain matrix, which is given as follows [17]:
⎡ ⎤
1ν 0
E ⎣
[D] = ν 1 0 ⎦(PlaneStress) (5.3.8)
1 − ν2
0 0 1−ν
2
152 5 Improvement of Finite Element Solutions with Deep Learning
⎡ ⎤
ν
1 0
E(1 − ν) ⎢ ν 1−ν
⎥
[D] = ⎣ 1 0 ⎦(PlaneStrain) (5.3.9)
(1 + ν)(1 − 2ν) 1−ν 1−2ν
0 0 2(1−ν)
{ eThe} {nodal { P1} is shared by the elements e4 , e6 , and e7 , and the stresses
} point
σ P41 , σ Pe61 and σ Pe71 at the nodal point P1 are, respectively, calculated for each
element as
{ } { } { } { } { } { }
σ Pe41 = σ e4 (1, −1) , σ Pe61 = σ e6 (1, 1) , σ Pe71 = σ e7 (−1, 1) (5.3.11)
{ }
By taking the average of these stresses, the smoothed stress at the node P1 , σ PS1 ,
can be determined as follows:
{ e0 } { e4 } { e6 } { e7 }
{ S} σ P1 + σ P1 + σ P1 + σ P1
σ P1 = (5.3.12)
4
{ } { } { }
In the same manner, σ PS2 , σ PS3 , and σ PS4 are, respectively, obtained for the
nodes P2 , P3 , andP4 , and then, the smoothed stress at any position P(ξ, η) of the
element e0 is defined as
{ } Σ4
{ }
σ S (ξ, η) = Ni (ξ, η) σ PSi (5.3.13)
i=1
The smoothed stress is considered to be closer to the true one than the original
discontinuous one, and an a posteriori error estimation method based on this (called
ZZ method) has been proposed [24]. The error { of the FEM}solution is usually defined
as the difference between
{ FEMthe true }stresses σ TRUE
(ξ, η) and the stresses obtained
by the FEM analysis
{ TRUE } σ (ξ, η) , while it is almost
{ S impossible
} to obtain the true
stresses
{ σ (ξ,
} η) . On the other hand, as σ (ξ, η) is closer to the true stress
than σ FEM (ξ, η) , the difference between these stresses could represent the error.
The ZZ method is an a posteriori error estimation method based on this, which is
widely used as an error estimation method for the FEM because it is simple and
requires few modifications to analysis codes.
5.3 Error Estimation of Finite Element Solutions 153
As described in Sect. 5.1, the accuracy of the FEM solution is usually improved
by decreasing the element size. This is because the approximation accuracy of the
solution by the basis functions is improved, and the same effect can be obtained by
increasing the order of the basis functions. We discuss here the behavior of error in
the FEM solution when reducing the element size.
In the finite difference method (see Sect. 7.2), a method for improving the solution
based on the relationship between the grid spacing and the accuracy of the solution
is studied [23]. If the approximation
( ) accuracy by the finite difference method with
lattice spacing Δx is O Δx 2 , and the solutions with two different lattice spacing
Δx1 and Δx2 (Δx1 > Δx2 ) are φlΔx1 and φlΔx2 , respectively, then we can write with
φlTRUE being the true solution the following relationships.
It is known that, although not strictly correct, φlTRUE even more accurate than φlΔx2
is achieved assuming the following equality.
where σ FEM is the stress value obtained by the FEM, σ TRUE the true stress value, a
and δ are coefficients, and N is the nodal density. By estimating the coefficients from
several analyses with different nodal densities, it has been possible to estimate σ TRUE .
This method is successfully applied to two- and three-dimensional crack analyses.
The ZZ method described in the previous section is a method for estimating
the error of a finite element solution by taking the difference between the original
solution and the better solution obtained by smoothing. A better solution can also
be obtained by reducing the element size, so it is possible to estimate the error of
a finite element solution by taking the difference between an analysis with a given
154 5 Improvement of Finite Element Solutions with Deep Learning
mesh and an analysis with a finer mesh. The use of meshes with multiple levels of
fineness will provide more detailed error information, which can be used to improve
the solution [22].
In this section, we study the details of the method for improving the finite element
solutions using deep learning based on the error estimation (Sect. 5.3) [16].
In order to obtain an accurate solution with a small number of elements and nodes
by using a posteriori error estimation, a method called the adaptive finite element
method has been studied [2, 3, 15], which consists of three steps: an analysis with the
initial mesh is performed, then a posteriori error estimation is followed, and finally,
any part of analysis domain with relatively large error is remeshed to improve the
accuracy. This process is repeated until the error criterion is satisfied.
The adaptive finite element method is classified into three types: the h-adaptive
method, where the mesh is locally refined; the p-adaptive method, where the order of
the basis functions is locally increased; and the r-adaptive method, where the nodes
are locally relocated. Each method may be used alone or in combination.
The most commonly used method among these three types is the h-adaptive
method, where (1) analysis and a posteriori error estimation are performed on a mesh,
(2) the element subdivision is performed to refine the mesh for regions where the
error exceeds the criterion, and (1) and (2) are repeated until the error becomes suffi-
ciently small everywhere in the domain. Since the subdivision is locally performed,
the increase in the total number of nodes, which directly affects the analysis time,
may be suppressed. However, the repeated subdivision of mesh, even if only partially,
increases the total number of nodes, and the FEM analysis is repeated. Then, the total
computational load is not necessarily small.
In the adaptive FEM, the accuracy of the solution is improved by firstly refining
the mesh based on the error information and then performing analysis using the
refined mesh, where the error is employed to improve the solution not directly but
indirectly.
In contrast, here, a method that directly uses the error information to improve
the solution is presented, where “directly” means “without remeshing.” Specifically,
deep learning with a feedforward neural network is used to estimate the stresses
equivalent to those obtained with a sufficiently fine mesh directly from the stresses
obtained with a coarse mesh and its error information. For this purpose, the feedfor-
ward neural network is trained to output the nearly exact stresses at any point in the
analysis domain. The input data used are the stresses obtained with a coarse mesh
at the point of interest and its surrounding points as well as their error information.
This method is summarized in following three phases.
5.4 Improvement of Finite Element Solutions Using Error Information… 155
Data Preparation Phase: The FEM analyses with a coarse mesh under the various
analysis conditions of analysis domains, load conditions, fixation conditions, etc.,
are performed, and then error information is also obtained by an a posteriori error
estimation method on each of the above results. In addition, for each analysis condi-
tion, the FEM analysis with a very fine mesh is performed to obtain a solution close
to the true solution. Finally, a large number of data pairs are collected: each of
data pairs consists of the solution with a coarse mesh, its error information, and the
corresponding solution with a fine mesh.
Training Phase: A feedforward neural network is constructed by deep learning
using the data pairs collected in the Data Preparation Phase above as training patterns,
where input and teacher data for the neural network are set as follows:
• Input data: solution with a coarse mesh and its error information.
• Teacher data: solution with a fine mesh.
Application Phase: The FEM solution and its error information with a coarse
mesh for a problem to be solved are input to the trained neural network constructed
in the Training Phase; then a corresponding accurate solution that would be obtained
with a fine mesh is output from the neural network.
It is noted that, in this method, the input data for a feedforward neural network
include only the stress state and error information in the vicinity of the location
at which accurate stress is to be estimated, not including the analysis geometry or
boundary conditions, which may make the trained neural network applicable easily
to various analysis conditions.
In addition, since the input data include only the values obtained with a coarse
mesh, and also the inference by the trained neural network is fast, this method makes
it possible to estimate the accurate values of stresses at a specific point much faster
than the conventional FEM analysis with a fine mesh.
It may be a demerit of the present method that we can estimate the stresses not for
the whole region but only for a target point, although the values are accurate. However,
in most cases, it is sufficient to obtain the stress values at a specific important point or
area in the analysis domain. Then, the present method is considered to be a powerful
tool in such situations as optimal design, where repeated analyses are required.
The present method is categorized into Method-A and Method-B depending on
the techniques to get error information [16].
Figure 5.14a, b show, respectively, the flowchart of the standard finite element
analysis and that of Method-A. The latter is a method to get some error information
from differences between stresses obtained by the finite element analysis with a
coarse mesh and smoothed stresses and then estimate accurate stresses by deep
learning from the analysis results and the error information obtained with the coarse
mesh above.
On the other hand, Fig. 5.15a, b, respectively, show the flowchart of the standard
adaptive finite element analysis and that of Method-B. The latter is a method that
gets some error information from the difference of two sets of stresses: the stresses
obtained from the analysis with the initial mesh in standard adaptive FEM analysis
and those with a refined mesh generated at the first step of the adaptive remeshing.
156 5 Improvement of Finite Element Solutions with Deep Learning
While the adaptive FEM analysis repeats both adaptive remeshings and analyses with
the refined meshes, Method-B does not have any loop. In other words, Method-B is
a method for obtaining an accurate solution based on two relatively coarse meshes,
an initial mesh and its refined mesh, using deep learning.
In this section, we study a numerical example of the method using smoothing stress
(Method-A), which is one of the methods for improving the solution of FEM analysis
using some error information and deep learning given in Sect. 5.4.
Here, we test Method-A (see Sect. 5.4) about its basic performance in a two-
dimensional stress analysis using four-node quadrilateral elements [16], where a
feedforward neural network is trained using stresses at a target point obtained by the
FEM with a fine mesh as teacher data, and stresses and some error information at
the point and its surrounding points with a coarse mesh as input data. The neural
network is trained to output accurate stresses at the target point when stresses and
the error information around the point obtained with a coarse mesh are input.
As mentioned above, not only the stresses at the target point where highly accurate
stresses are to be predicted, but also the stresses and some error information in its
neighborhood are used as auxiliary information of input data. For this purpose, as
shown in Fig. 5.16, stress evaluation points are arranged as a grid around the target
point, where such auxiliary information as the error information is generated.
Among many options for the arrangement of the points around the target point
as shown in Fig. 5.16, we employ here that with four neighborhood points shown
in Fig. 5.16a, where the stresses σxS , σ yS , and τxSy , and their smoothed ones σxC , σ yC ,
andτxCy with the FEM using a coarse mesh, and those σxF , σ yF , and τxFy using a fine
mesh are obtained. From these stresses calculated with a coarse mesh at the target
point (PT ) and four points (PN 1 , PN 2 , PN 3 , PN 4 ) around the point, input data for the
neural network (a) and (b) are generated as follows:
Input data (a): Based on the difference between the stresses and the smoothed
stresses obtained with a coarse mesh at the point of interest and four points around
the point, total of 15 (3 stress components × 5 points) values are generated as shown
in Table 5.1, which are considered to represent the distribution of errors in the vicinity
of the target point.
Input data (b): Based on the difference between the stresses at the target point and
those at each of four points around the point, total of 12 (3 stress components × 4
5.5 Numerical Example 157
points) values are generated as shown in Table 5.2, which are considered to represent
the local variation of stress in the vicinity of the target point.
Note that PT σxS means σxS at the point PT , etc., in the table and all the input data
are calculated from the results of the finite element analysis using a coarse mesh.
The difference between the stresses at the target point (PT ) obtained using a
coarse mesh and that obtained using a fine mesh, a total of three values (three stress
components × 1 point), are generated as the teacher data for the neural network. This
indicates that the trained neural network is not expected to have an extrapolation
capability to predict stresses with a fine mesh, but to have an interpolation capability
based on the mapping between the stresses with a coarse mesh and those with a fine
mesh.
Once the configuration of the input data and teacher data for the neural network
has been determined as described above, a large number of training patterns, each of
which representing different stress states around a target point, are to be generated
and collected.
Here, we take a two-dimensional stress analysis of a square area with various
boundary conditions as a platform, where a large number of training patterns, each
representing different stress state around a target point, are generated. The square
shape of the analysis domain is considered (side length 4 [m]), and the material is
assumed to be steel. The entire domain is evenly divided into four-node quadrilateral
elements, and two types of meshes with different levels of fineness are used: a coarse
mesh with 16 elements (4 × 4) and a fine mesh with 65,536 elements (256 × 256).
As for boundary conditions, the bottom surface is fixed, and an equally distributed
load of 1 [N/m] is applied to two edges selected from ten edges (numbered from 1 to 10
in Fig. 5.7, each of them is an edge of an element) of the coarse mesh with 16 elements.
The direction of the load θ is set to one of the following twelve values (Fig. 5.18):
158 5 Improvement of Finite Element Solutions with Deep Learning
{0, π/12, 2π/12, 3π/12, 4π/12, 5π/12, 6π/12, 7π/12, 8π/12, 9π/12,
10π/12, 11π/12}.( Figure
( 5.19
)) shows some samples of the boundary conditions.
10
There are 45 = choices of two edges, where distributed loads are
2
applied, and 144 (=12 × 12) choices of load directions, so the total number of
( ) of load boundary conditions is 6480 (=45 × 144). Note that the notation
choices
n
means the number of ways of choosing r objects out of n objects, ignoring the
r
order of choosing them.
For each of these 6480 boundary conditions, a two-dimensional linear stress anal-
ysis is performed using a coarse mesh (16 elements), and the stresses (σxC , σ yC , τxCy ) at
1600 (40 × 40) stress evaluation points evenly distributed in a grid (grid spacing 0.1)
within the domain are calculated. In addition, the smoothed stresses (σxS , σ yS , τxSy ) at
the stress evaluation points are calculated.
The smoothed stresses are calculated by the following procedure. First, the
smoothed stress at each node is calculated by simply averaging the stresses at the
node obtained for each element. Then, from the smoothed stresses at the node, the
smoothed stresses at the stress evaluation points in an element are obtained using the
shape functions of the element. (See Sect. 5.3.1).
Similarly, a two-dimensional linear stress analysis is performed using a fine mesh
(65,536 elements) under each boundary condition, and (σxF , σ yF , τxFy ) at the stress
evaluation points are calculated.
As a result, stresses (σxS , σ yS , τxSy ) and its smoothed ones (σxC , σ yC , τxCy ) with a
coarse mesh and stresses (σxF , σ yF , τxFy ) with a fine mesh are obtained at 1600 stress
evaluation points for each boundary condition.
It is noted that 1444 points excluding the outermost points out of the 1600 stress
evaluation points can be used as the target points in Fig. 5.16a. Since 1444 target
points are employed for each of the 6480 boundary conditions, a total of 9,357,120
(6480 × 1444) training patterns are collected.
Training and verification patterns used to train the feedforward neural network are
chosen at random from 9.36 million patterns collected in the previous section.
Here, four sets of training patterns are tested, each consisting of 300,000, 200,000,
100,000, and 50,000 patterns, while 100,000 patterns are selected for the verification.
The feedforward neural networks tested have 27 units in the input layer and 3
units in the output layer with the number of hidden layers ranged from 1 to 6, and
the number of units in each hidden layer is selected to be 20, 50, or 80. The number
of training epochs is set to 10,000.
The results of all the training conditions are shown in Fig. 5.20, where the hori-
zontal axis is the number of hidden layers, U20, U50, and U80 mean that the number
5.5 Numerical Example 159
of units per hidden layer is 20, 50, and 80, respectively, and TP050, TP100, TP200,
and TP300 mean that the number of training patterns is 50,000, 100,000, 200,000, and
300,000, respectively. The vertical axis is the average error for 100,000 verification
patterns, defined as
1 Σ
NT P Σ
N OU
| i |
Err or = |O − T i | (5.5.1)
j j
N T P i=1 j=1
where NTP is the number of patterns for verification, NOU that of units in the output
layer of the neural network, O ij the output value of the j-th output unit for the i-th
verification pattern, and T ji the value of corresponding teacher signal.
This figure suggests that.
(a) When the number of units in the hidden layer is small, increasing the number
of intermediate layers may not reduce the error.
(b) When the number of units in the hidden layer is large, the error is reduced with
increasing the number of hidden layers.
(c) For any training condition (number of hidden layers, number of units per hidden
layer), increasing the number of training patterns reduces the error.
This figure also shows that the best result is given with 5 hidden layers, 80 units
per hidden layer, and 300,000 training patterns.
When the number of training epochs is extended to 30,000 for the neural network
trained above, the error for the verification pattern has decreased from 0.04586 at
10,000 training epochs to 0.04294 at 30,000 epochs.
The trained neural network constructed in Sect. 5.5.2 can be applied to various two-
dimensional stress analysis problems. Here, its performance is evaluated in detail for
the 100,000 verification patterns used in the Training Phase.
First, let us discuss the accuracy of the feedforward neural network with five
hidden layers and eighty units per hidden layer, which has been trained with 30,000
epochs and achieved the smallest error.
Figure 5.21 shows the distribution of the estimation error of σx for 100,000 patterns
for verification, where the vertical axis is the number of patterns and the horizontal
axis the error in estimated stress. This error is defined as the difference between the
estimated stress σxN N by the trained neural network and the stress σxF obtained with
the fine mesh, i.e., σxF − σxN N . Note that, for example, if the difference is in the range
of [0.01, 0.03], the median value of 0.02 is taken as the representative value, the
patterns with an error value of −0.98 or less have a representative value of −0.98,
and those with an error value of 0.99 or greater have a representative value of 1.00.
160 5 Improvement of Finite Element Solutions with Deep Learning
For comparison, the distribution of σxF − σxC , the difference between the stresses
obtained with the fine mesh and those with the coarse mesh, is shown as a dotted
line.
Figure 5.21 shows that the stress σxN N estimated by the trained neural network is
closer to the stress σxF obtained by the finite element analysis with the fine mesh than
the stress σxC obtained with the coarse mesh. Figures 5.22 and 5.23 show the similar
results for σ y and τx y , respectively. These figures show that the estimated stresses by
the trained neural network are close to accurate stresses obtained with the fine mesh.
Next, we study some examples in estimating stress distribution using the trained
neural network. Figures 5.24, 5.25 and 5.26 show the stress distributions along the
horizontal line (y = 3.85) near the top surface of the analysis domain shown in
Fig. 5.17 under some loading condition.
Figure 5.24 shows the results of σxN N (denoted as σx (DL) in the figure) estimated
by the trained neural network for 38 stress evaluation points on the corresponding
line. For comparison, the stresses σxF (σx (fine)) calculated with a fine mesh, the
stresses σxC (σx (coarse)) calculated with a coarse mesh, and their smoothed stresses
σxS (σx (smoothing)) are also shown in the figure, where the former stress repre-
sents teacher data and the latter two stresses input data for the neural network. It is
concluded that the estimated results by deep learning reproduce well the highly accu-
rate stress σxF calculated with a fine mesh, even though the estimation is only based
on stresses with a coarse mesh. Similarly, Figs. 5.25 and 5.26 show the estimated
results of σ y and τx y by the trained neural network, respectively. In both cases, as in
the case of σx , the highly accurate stresses are reproduced by the present method.
For more details, refer to the paper [16], which also provides an example of
Method-B (Fig. 5.14)
Mesh Mesh
Results Results
Deep Learning
Improved Results
Results Results
Results
Deep Learning
Improved Results
Target point
Neighboring point
2 9
1 10
0.065
Error
0.060
0.055
0.050
0.045
1 2 3 4 5 6
Number of Hidden Layers
103
2
10
1
10
0
10
-1 -0.5 0 0.5 1
Error
164 5 Improvement of Finite Element Solutions with Deep Learning
5
Fig. 5.22 Error distribution: 10
σy
σyF-σyC
4
10 σyF-σyNN
Number of Patterns
3
10
102
1
10
100
-1 -0.5 0 0.5 1
Error
3
10
2
10
1
10
0
10
-1 -0.5 0 0.5 1
Error
-0.5
-1.0
-1.5
-2.0
-2.5
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
x
References 165
σy
0.5
0.0
-0.5
-1.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
x
0.5
0.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
x
References
1. Ainsworth, M., Oden, J.T.: A posteriori error estimation in finite element analysis. Comput.
Methods Appl. Mech. Eng. 142, 1-88 (1997)
2. Babuska, I., Rheinboldt, W.C.: Error estimates for adaptive finite element computations. SIAM
J. Numer. Anal. 15, 736-754 (1978)
3. Babuska, I., Vogelius, M.: Feedback and adaptive finite element solution of one-dimensional
boundary value problems. Numer. Math. 44, 75-102 (1984)
4. Brenner, S.C., Scott, L.R.: The Mathematical Theory of Finite Element Methods, Springer
(1994)
5. Carey, G.F.: Computational Grids: Generation, Adaptation, and Solution Strategies. Taylor &
Francis (1994)
6. Cuthill, E.: Several Strategies for Reducing the Bandwidth of Matrices. In: Rose D.J.,
Willoughby R.A. (eds) Sparse Matrices and their Applications. The IBM Research Symposia
Series, Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-8675-3_14
166 5 Improvement of Finite Element Solutions with Deep Learning
7. Cuthill, E., McKee, J.: Reducing the bandwidth of sparse symmetric matrices. ACM ’69:
Proceedings of the 1969 24th national conference, Aug. 1969, pp. 157–172.
8. Gibbs, N.E., Poole, W.G. Jr., Stockmeyer, P.K.: An algorithm for reducing the bandwidth and
profile of a sparse matrix. SIAM J. Numer. Anal. 13, 236-250 (1976)
9. Golub, G.H., Van Loan, C.F.: Matrix Computations (Third Edition). The Johns Hopkins
University Press (1996)
10. Grätsch, T., Bathe, K.J.: A posteriori error estimation techniques in practical finite element
analysis. Comput. Struct. 83, 235-265 (2005)
11. Jennings, A., McKeown, J.J.: Matrix Computations (Second Edition). John Wiley & Sons
(1992)
12. King, I.P.: An automatic reordering scheme for simultaneous equations derived from network
systems. Int. J. Numer. Methods Eng. 2, 523-533 (1970)
13. Knuth, D.E.: Big omicron and big omega and big theta. SIGACT News 8(2), 18–24 (1976).
https://doi.org/10.1145/1008328.1008329
14. Liu, W.-H., Sherman, A.H.: Comparative analysis of the Cuthill-McKee and Reverse Cuthill-
McKee ordering algorithms for sparse matrix. SIAM J. Numer. Anal. 13, 198-213 (1976)
15. Murotani, K., Yagawa, G., Choi, J.B.: Adaptive finite elements using hierarchical mesh and
its application to crack propagation analysis. Comput. Methods Appl. Mech. Eng. 253, 1-14
(2013)
16. Oishi, A., Yagawa, G.: Finite elements using neural networks and a posteriori error. Arch.
Comput. Methods Eng. 28, 3433-3456 (2021). https://doi.org/10.1007/s11831-020-09507-0.
17. Reddy, J.N.: An Introduction to the Finite Element Method (Second Edition). McGraw-Hill
(1993)
18. Sloan, S.W.: An algorithm for profile and wavefront reduction of sparse matrices. Int. J. Numer.
Methods Eng. 23, 239-251 (1986)
19. Ueberhuber, C.W.: Numerical Computation 2. Springer (1997)
20. Verfürth, R.: A review of a posteriori error estimation and adaptive mesh refinement techniques.
Wiley-Teubner (1996)
21. Verfürth, R.: A Posteriori Error Estimation Techniques for Finite Element Methods. Oxford
University Press, Oxford (2013)
22. Yagawa, G., Ichimiya, M., Ando, Y.: Analysis method for stress intensity factors based on the
discretization error in the finite element method. Trans. JSME 44(379), 743-755 (1978). (in
Japanese).
23. Zienkiewicz, O.C., Morgan, K.: Finite Elements & Approximation. Dover (2006)
24. Zienkiewicz, O.C., Zhu, J.Z.: A simple error estimator and adaptive procedure for practical
engineering analysis. Int. J. Numer. Methods Eng. 24, 337-357 (1987)
Chapter 6
Contact Mechanics with Deep Learning
It is well known that the collision and contact analysis deals with contact phenomena
between multiple objects or between multiple locations of a single object [7–9, 21,
22]. In this section, the basic items of the contact analysis and contact search in the
finite element method are studied.
Considering the dynamic effect, the matrix equation of the FEM is written as
follows:
where the damping is not considered, [M] is the mass matrix, [K ] the global stiffness
matrix, {F} the load vector, and {Ü } and {U } the acceleration and displacement
vectors, respectively. Discretizing Eq. (6.1.1) in time, we have
( )
1 2 1
[M]{U }n+1 = {F}n − [K ] − [M] {U }n − [M]{U }n−1
(Δt) 2
(Δt) 2
(Δt)2
(6.1.2)
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 167
G. Yagawa and A. Oishi, Computational Mechanics with Deep Learning,
Lecture Notes on Numerical Methods in Engineering and Sciences,
https://doi.org/10.1007/978-3-031-11847-0_6
168 6 Contact Mechanics with Deep Learning
l
Δt ≤ (6.1.3)
c
where l is the element length and c the velocity of the stress wave. Note that, if the
small element size is employed for the sake of achieving accurate result, a very small
value of Δt has to be adopted to satisfy the CFL condition, which may increase the
computational load.
In the dynamic explicit contact analysis, allowing a small penetration into the
other object, the repulsive force proportional to the penetration depth is calculated,
which is defined as the contact force, and it is essential to perform a contact search
to accurately identify the location and contact state of the collision or contact point,
where the contact state includes the penetration depth into the other object.
The procedure for the contact analysis based on the explicit dynamics is
summarized as follows:
1. Start the calculation for the first step (n = 1).
2. Identify the location of a contact point and the contact state by contact search.
3. Calculate the appropriate contact force {FC }n at the contact point.
4. Calculate {U }n+1 based on Eq. (6.1.2) with {FC }n added to the right-hand side
of the equation.
5. n → n + 1
6. Return to 1.
Contact search is usually performed in two stages: the global search and the
subsequent local search. In the former, a sub-region with a high possibility of contact
is searched from the entire region, while, in the latter, the positions of contact points
and contact states are identified at the region picked up in the global search. In the
former search, a hierarchical bounding box is used to improve the efficiency of the
search [3, 11]. Nevertheless, it is still difficult to improve the efficiency of the global
search, especially in distributed-memory parallel processing environments, because
the global contact search must cover multiple objects usually distributed among
processors [12]. As for the latter search, on the other hand, its computational load
and stability due to iterative solution process, may be issues.
For example, the node-segment type contact search algorithm, a typical method of
dynamic contact analysis in the FEM, consists of global and local search processes.
In the former process, for a node at one of the facing contact surfaces a segment (a
face of an element) on the other contact surface that is at the shortest distance to
the node is searched, and then, in the latter process, the exact location of the contact
point is identified for the pair of the node and the segment selected in the former
process [2, 5].
6.1 Basics of Contact Mechanics 169
P3
P2
Let’s consider the local contact search in the node-segment algorithm when a
contact surface consists of rectangular segments as shown in Fig. 6.1. This is to find
the local coordinates ξc and ηc of the foot H (x H , y H , z H ) of the perpendicular line
from the node PS (xS , yS , z S ) at one of the facing contact surfaces to the segment at
the opposite contact surface selected in the global search. Since H (x H , y H , z H ) is at
the segment, we can write using local coordinates as
⎛ ⎞ ⎛ ⎞
xH Σ4 Σ4 Xi
H (ξ, η) = ⎝ y H ⎠ = Ni (ξ, η)Pi = Ni (ξ, η)⎝ Yi ⎠ (6.1.4)
zH i=1 i=1 Zi
where Ni (ξ, η) are the first-order basis functions of a four-node quadrilateral element
of the finite element method and represented as
1
N1 (ξ, η) = (1 − ξ )(1 − η) (6.1.5)
4
1
N2 (ξ, η) = (1 + ξ )(1 − η) (6.1.6)
4
1
N3 (ξ, η) = (1 + ξ )(1 + η) (6.1.7)
4
1
N4 (ξ, η) = (1 − ξ )(1 + η) (6.1.8)
4
Since H (x H , y H , z H ) is the closest point to PS (xS , yS , z S ) at the segment, we have
∂ H (ξ, η)
(H (ξ, η) − PS ) = 0 (6.1.10)
∂ξ
∂ H (ξ, η)
(H (ξ, η) − PS ) = 0 (6.1.11)
∂η
Solving these two equations using Newton’s method, the local coordinates of the
contact point ξc and ηc are calculated, with which the signed distance g between
PS (xS , yS , z S ) and H (ξc , ηc ) is calculated as
−−→
g = H PS · n→ (6.1.12)
where n→ is the outward unit normal vector at H (ξc , ηc ). If g is less than or equal
to 0, the contact is judged to have occurred with the penetration depth |g|, and a
contact force proportional to |g| is added to the contact point H (ξc , ηc ), as well as
to PS (xS , yS , z S ) as a reaction force.
It is noted that the collision and contact analysis using the FEM method has some
problems caused by inaccurate definition in geometry of contact surfaces (Fig. 6.2).
In other words, the majority of the basis functions of the FEM have C 0 continuity
only at the element boundary (see Sect. 5.3.1), causing that the repulsive contact
force added when contact is detected changes its direction discontinuously at the
element boundary, which deteriorates the stability and convergence of the analysis
[10, 16, 20].
The surface irregularities caused by the basis functions could be eliminated by
using smooth basis functions. For example, if NURBS [15, 17] is used as the basis
function of the analysis to represent a smooth surface of bodies, the contact analysis
could be performed with a smooth contact surface as shown in Fig. 6.3. For this reason
and other benefits, the isogeometric analysis using NURBS as the basis function
of analysis [4, 6] and the NURBS-Enhanced FEM (NEFEM) [18, 19] have been
developed with the advantage that the smooth shapes defined in CAD are assured
during the contact analysis.
As studied in the previous section, the B-spline and the NURBS basis functions can
be used to represent smooth surfaces. We discuss here these smooth basis functions
in some details.
The NURBS basis functions for computer-aided design (CAD) system to represent
shapes of object are derived from the one-dimensional B-spline basis functions. These
functions of the pth-order Ni, p (ξ ) are defined as [15, 17]
{
1 (ξi ≤ ξ < ξi+1 )
Ni,0 (ξ ) = (6.2.1)
0 (otherwise)
ξ − ξi ξi+ p+1 − ξ
Ni, p (ξ ) = Ni, p−1 (ξ ) + Ni+1, p−1 (ξ ) (6.2.2)
ξi+ p − ξi ξi+ p+1 − ξi+1
{ }
where a knot vector Ξ = ξ1 , ξ2 , ξ3 , . . . , ξn+ p , ξn+ p+1 is a sequence of monotoni-
cally non-decreasing real numbers, and the rule 0/0 = 0 is applied to the fractional
part of Eq. (6.2.2).
As an example, let’s take the process of constructing five second-order B-spline
basis functions from the knot vector {0, 0, 0, 1, 2, 3, 3, 3}. First, from Eq. (6.2.1),
B-spline functions of the 0-th order from N1,0 to N7,0 are, respectively, given as
follows:
172 6 Contact Mechanics with Deep Learning
{
1 (0 ≤ ξ < 0)
N1,0 (ξ ) = =0 (6.2.3a)
0 (otherwise)
{
1 (0 ≤ ξ < 0)
N2,0 (ξ ) = =0 (6.2.3b)
0 (otherwise)
{
1 (0 ≤ ξ < 1)
N3,0 (ξ ) = (6.2.3c)
0 (otherwise)
{
1 (1 ≤ ξ < 2)
N4,0 (ξ ) = (6.2.3d)
0 (otherwise)
{
1 (2 ≤ ξ < 3)
N5,0 (ξ ) = (6.2.3e)
0 (otherwise)
{
1 (3 ≤ ξ < 3)
N6,0 (ξ ) = =0 (6.2.3f)
0 (otherwise)
{
1 (3 ≤ ξ < 3)
N7,0 (ξ ) = =0 (6.2.3g)
0 (otherwise)
Next, B-spline functions of the first order are, respectively, constructed from the
B-spline functions of the 0th order by Eq. (6.2.2) as
ξ − ξ1 ξ3 − ξ ξ −0 0−ξ
N1,1 (ξ ) = N1,0 (ξ ) + N2,0 (ξ ) = 0+ 0 = 0 (6.2.4a)
ξ2 − ξ1 ξ3 − ξ2 0−0 0−0
ξ − ξ2 ξ4 − ξ
N2,1 (ξ ) = N2,0 (ξ ) + N3,0 (ξ )
ξ3 − ξ2 ξ4 − ξ3
{
ξ −0 1 − ξ 1 (0 ≤ ξ < 1)
= 0+
0−0 1 − 0 0 (otherwise)
{
1 − ξ (0 ≤ ξ < 1)
= (6.2.4b)
0 (otherwise)
ξ − ξ3 ξ5 − ξ
N3,1 (ξ ) = N3,0 (ξ ) + N4,0 (ξ )
ξ4 − ξ3 ξ5 − ξ4
{ {
ξ − 0 1 (0 ≤ ξ < 1) 2 − ξ 1 (1 ≤ ξ < 2)
= +
1 − 0 0 (otherwise) 2 − 1 0 (otherwise)
⎧
⎨ ξ (0 ≤ ξ < 1)
= 2 − ξ (1 ≤ ξ < 2) (6.2.4c)
⎩
0 (otherwise)
ξ − ξ4 ξ6 − ξ
N4,1 (ξ ) = N4,0 (ξ ) + N5,0 (ξ )
ξ5 − ξ4 ξ6 − ξ5
6.2 NURBS Basis Functions 173
{ {
ξ − 1 1 (1 ≤ ξ < 2) 3 − ξ 1 (2 ≤ ξ < 3)
= +
2 − 1 0 (otherwise) 3 − 2 0 (otherwise)
⎧
⎨ ξ − 1 (1 ≤ ξ < 2)
= 3 − ξ (2 ≤ ξ < 3) (6.2.4d)
⎩
0 (otherwise)
ξ − ξ5 ξ7 − ξ
N5,1 (ξ ) = N5,0 (ξ ) + N6,0 (ξ )
ξ6 − ξ5 ξ7 − ξ6
{
ξ − 2 1 (2 ≤ ξ < 3) 3−ξ
= + 0
3 − 2 0 (otherwise) 3−3
{
ξ − 2 (2 ≤ ξ < 3)
= (6.2.4e)
0 (otherwise)
ξ − ξ6 ξ8 − ξ ξ −3 3−ξ
N6,1 (ξ ) = N6,0 (ξ ) + N7,0 (ξ ) = 0+ 0 = 0 (6.2.4f)
ξ7 − ξ6 ξ8 − ξ7 3−3 3−3
Finally, B-spline functions of the second order are, respectively, given from the
B-spline functions of the first order by Eq. (6.2.2) as follows:
ξ − ξ1 ξ4 − ξ
N1,2 (ξ ) = N1,1 (ξ ) + N2,1 (ξ )
ξ3 − ξ1 ξ4 − ξ2
{
ξ −0 1 − ξ 1 − ξ (0 ≤ ξ < 1)
= 0+
0−0 1−0 0 (otherwise)
{
(1 − ξ ) (0 ≤ ξ < 1)
2
= (6.2.5a)
0 (otherwise)
ξ − ξ2 ξ5 − ξ
N2,2 (ξ ) = N2,1 (ξ ) + N3,1 (ξ )
ξ4 − ξ2 ξ5 − ξ3
⎧
{ ξ (0 ≤ ξ < 1)
ξ − 0 1 − ξ (0 ≤ ξ < 1) 2−ξ⎨
= + 2 − ξ (1 ≤ ξ < 2)
1−0 0 (otherwise) 2 − 0⎩
0 (otherwise)
⎧
⎨ ξ (1 − ξ ) + 2 (2 − ξ )ξ (0 ≤ ξ < 1)
1
= 2 (2
1
− ξ )2 (1 ≤ ξ < 2) (6.2.5b)
⎩
0 (otherwise)
ξ − ξ3 ξ6 − ξ
N3,2 (ξ ) = N3,1 (ξ ) + N4,1 (ξ )
ξ5 − ξ3 ξ6 − ξ4
⎧ ⎧
ξ (0 ≤ ξ < 1) ξ − 1 (1 ≤ ξ < 2)
ξ − 0⎨ 3−ξ⎨
= 2 − ξ (1 ≤ ξ < 2) + 3 − ξ (2 ≤ ξ < 3)
2 − 0⎩ 3 − 1⎩
0 (otherwise) 0 (otherwise)
174 6 Contact Mechanics with Deep Learning
⎧
⎨ ξ
1 2
2 (0 ≤ ξ < 1)
= ξ
2 (2
1
− ξ ) + 21 (3 − ξ )(ξ − 1) (1 ≤ ξ < 2) (6.2.5c)
⎩
2 (3
1
− ξ )2 (2 ≤ ξ < 3)
ξ − ξ4 ξ7 − ξ
N4,2 (ξ ) = N4,1 (ξ ) + N5,1 (ξ )
ξ6 − ξ4 ξ7 − ξ5
⎧
ξ − 1 (1 ≤ ξ < 2) {
ξ − 1⎨ 3 − ξ ξ − 2 (2 ≤ ξ < 3)
= 3 − ξ (2 ≤ ξ < 3) +
3 − 1⎩ 3−2 0 (otherwise)
0 (otherwise)
⎧
⎨ 2 (ξ
1
− 1)2 (1 ≤ ξ < 2)
= 2 (ξ − 1)(3 − ξ ) + (3 − ξ )(ξ − 2) (2 ≤ ξ < 3)
1
(6.2.5d)
⎩
0 (otherwise)
ξ − ξ5 ξ8 − ξ
N5,2 (ξ ) = N5,1 (ξ ) + N6,1 (ξ )
ξ7 − ξ5 ξ8 − ξ6
{
ξ − 2 ξ − 2 (2 ≤ ξ < 3) 3−ξ
= + 0
3−2 0 (otherwise) 3−3
{
(ξ − 2)2 (2 ≤ ξ < 3)
= (6.2.5e)
0 (otherwise)
Let’s take N3,2 (ξ ) as an example. As shown in Eq. (6.2.5c), the function is defined
by different expressions for each interval. If the equations for each interval are
denoted by f 1 (ξ ), f 2 (ξ ), and f 3 (ξ ), respectively, we have
⎧ ⎧
⎨ f 1 (ξ ) (0 ≤ ξ < 1) ξ (0 ≤ ξ < 1)
1 2
⎨ 2
N3,2 (ξ ) = f 2 (ξ ) (1 ≤ ξ < 2) = 2 ξ (2 − ξ ) + 21 (3 − ξ )(ξ − 1) (1 ≤ ξ < 2)
1
⎩ ⎩
f 3 (ξ ) (2 ≤ ξ < 3) 2 (3
1
− ξ )2 (2 ≤ ξ < 3)
(6.2.6)
1 1
f 1 (1) = f 2 (1) = , f 2 (2) = f 3 (2) = (6.2.7)
2 2
| | | |
d f 1 || d f 2 || d f 2 || d f 3 ||
= = 1, = = −1 (6.2.8)
dξ |ξ =1 dξ |ξ =1 dξ |ξ =2 dξ |ξ =2
| | | |
d 2 f 1 || d 2 f 2 || d 2 f 2 || d 2 f 3 ||
/= , /= (6.2.9)
dξ 2 |ξ =1 dξ 2 |ξ =1 dξ 2 |ξ =2 dξ 2 |ξ =2
0.4
0.2
0.0
0 1 2 3 4 5
ξ
b
1.2
1.0
0.8
0.6
0.4
0.2
0.0
0 1 2 3 4 5
ξ
Figure 6.4a, b, respectively, show six B-spline basis functions of the first order
constructed from the knot vector {0, 0, 1, 2, 3, 4, 5, 5}, and seven B-spline basis
functions of the second order constructed from {0, 0, 0, 1, 2, 3, 4, 5, 5, 5}.
It is known that a knot vector is allowed to repeat the same knot values, and the
standard knot vector in CAD is an “open knot vector” where the first and last knot
values of that are repeated p + 1 (i.e., the order of the basis function +1) times, and
the pth order B-spline basis functions are C p−k continuous at k-times repeated knot
values.
Let’s discuss some graphs as follows:
Figure 6.5a: eight cubic B-spline basis functions constructed from
{0, 0, 0, 0, 1, 2, 3, 4, 5, 5, 5, 5},
176 6 Contact Mechanics with Deep Learning
1.0
0.8
0.6
0.4
0.2
0.0
0 1 2 3 4 5
ξ
c
1.2
1.0
0.8
0.6
0.4
0.2
0.0
0 1 2 3 4 5
ξ
6.2 NURBS Basis Functions 177
1.0
0.8
0.6
0.4
0.2
0.0
0 1 2 3 4 5
ξ
Σ
n
Ni, p (ξ ) = 1 (for arbitrary ξ ) (6.2.11)
i=1
p,q
The two-dimensional B-spline basis functions Ni, j (ξ, η) and the three-
p,q,r
dimensional B-spline basis functions Ni, j,k (ξ, η, ζ ) are, respectively, defined as
the product of the one-dimensional B-spline basis functions in each axis as follows:
p,q
Ni, j (ξ, η) = Ni, p (ξ ) · M j,q (η) (6.2.12)
p,q,r
Ni, j,k (ξ, η, ζ ) = Ni, p (ξ ) · M j,q (η) · L k,r (ζ ) (6.2.13)
178 6 Contact Mechanics with Deep Learning
Ni, p (ξ ) · wi
Ri, p (ξ ) = Σn (6.2.14)
N (ξ ) · wî
î=1 î, p
where the new parameter w = {w1 , w2 , . . . , wn }(wi > 0) is called the weight. It is
clear from Eq. (6.2.14) that the NURBS basis functions coincide with the B-spline
basis functions when all the weights are equal. In other words, the B-spline basis
functions are included in the NURBS basis functions, and the NURBS basis functions
have the same properties as the B-spline basis functions as follows:
Σ
n
Ri, p (ξ ) = 1(for arbitrary ξ ) (6.2.16)
i=1
Figure Fig. 6.6a–e shows the six quadratic NURBS basis functions constructed
from the knot vector {0, 0, 0, 1, 2, 3, 4, 5, 5, 5}, where Fig. 6.6a shows
the basis functions with w = {1, 1, 1, 1, 1/5, 1, 1}, Fig. 6.6b those with w =
{1, 1, 1, 1, 1/2, 1, 1}, Fig. 6.6c those with w = {1, 1, 1, 1, 1, 1, 1}, Fig. 6.6d those
with w = {1, 1, 1, 1, 2, 1, 1}, and Fig. 6.6e those with w = {1, 1, 1, 1, 5, 1, 1},
respectively. It can be seen from these graphs, that, by changing the value of w5 , not
only N5,2 (ξ ) but also N3,2 (ξ ), N4,2 (ξ ), and N6,2 (ξ ) are affected due to the nature of
the partition of unity (Eq. (6.2.16)), while N1,2 (ξ ) and N2,2 (ξ ) are not affected at all.
p,q
The two-dimensional NURBS basis functions Ri, j (ξ, η) and the three-
p,q,r
dimensional NURBS basis functions Ri, j,k (ξ, η, ζ ) are also defined using the one-
dimensional B-spline basis functions Ni, p (ξ ), M j,q (η), L k,r (ζ ) and weights in each
axis, respectively, as
0.0
0.0 1.0 2.0 3.0 4.0 5.0
ξ
c
1.2
1.0
0.8
0.6
0.4
0.2
0.0
0.0 1.0 2.0 3.0 4.0 5.0
ξ
180 6 Contact Mechanics with Deep Learning
1.0
0.8
0.6
0.4
0.2
0.0
0.0 1.0 2.0 3.0 4.0 5.0
ξ
e
1.2
1.0
0.8
0.6
0.4
0.2
0.0
0.0 1.0 2.0 3.0 4.0 5.0
ξ
Using the NURBS basis functions described in Sect. 6.2, the present section deals
with the methods to represent three-dimensional object shapes with smooth surfaces,
showing how to edit the basis functions while preserving the shape and how to split
the shape.
Using the NURBS basis functions and the control points Bi (or Bi, j , Bi, j,k ), a
curve C(ξ ), a surface S(ξ, η), and a solid V (ξ, η, ζ ) in the three-dimensional space
are, respectively, defined as follows:
⎛ ⎞ ⎛ ⎞
x(ξ ) Σ
n Xi Σ
n
(ξ ) = ⎝ y(ξ ) ⎠ = Ri (ξ ) · ⎝ Yi ⎠ =
p p
Ri (ξ ) · Bi (6.3.1)
z(ξ ) i Zi i
6.3 NURBS Objects Based on NURBS Basis Functions 181
⎛ ⎞ ⎛ ⎞
x(ξ, η) Σn,m X i, j Σ
n,m
S(ξ, η) = ⎝ y(ξ, η) ⎠ = Ri, j (ξ, η) · ⎝ Yi, j ⎠ =
p,q p,q
Ri, j (ξ, η) · Bi, j
z(ξ, η) i, j Z i, j i, j
(6.3.2)
⎛ ⎞ ⎛ ⎞
x(ξ, η, ζ ) Σ p,q,r
n,m,l X i, j,k
V (ξ, η, ζ ) = ⎝ y(ξ, η, ζ ) ⎠ = Ri, j,k (ξ, η, ζ ) · ⎝ Yi, j,k ⎠
z(ξ, η, ζ ) i, j,k Z i, j,k
Σ
n,m,l
p,q,r
= Ri, j,k (ξ, η, ζ ) · Bi, j,k (6.3.3)
i, j,k
C(ξ)
3.0
2.0 C
1
C
2
1.0 C
3
0.0
0.0 1.0 2.0 3.0 4.0 5.0
ξ
⎛ ⎞
x(ξ, η) Σ8 Σ 8
S1 (ξ, η) = ⎝ y(ξ, η) ⎠ = Ni,3 (ξ ) · M j,3 (η) · Pi, j (6.3.4)
z(ξ, η) i=1 j=1
Figure 6.9a, b, respectively, show the locations of the control points and the
generated surface (object). When the NURBS (B-spline) basis functions are used to
generate objects such as lines, surfaces, and solids, the control points may be located
outside the object. It can be derived from Eq. (6.3.2) that a control point Bα,β is a
p,q
part of the object only if Rα,β (ξ0 , η0 ) is 1 for some ξ0 , η0 . This is because all the
other basis functions are 0 from Eqs. (6.2.15) and (6.2.16) in the above case, which
results in S(ξ0 , η0 ) = Bα,β from Eq. (6.3.2).
However, as can be seen from Fig. 6.5a, the maximum values of the majority of
the basis functions are less than 1, then the control points are considered to be apart
from the object. Note here that one of the basis functions becomes 1 for the knot
values at both ends of the open knot vector, indicating each of the four end points at
the corners of the quadrilateral shape in Fig. 6.9c corresponds to one of the control
points, respectively.
c
Object (Surface) Control Point
d
184 6 Contact Mechanics with Deep Learning
Figure 6.9d shows the set of points (called knot line) where the knot values are
equal. We can see from the figure that, due to the nonlinearity shown in Fig. 6.8, the
space between the knot lines is wider for the knot values closer to the ends of the
knot vector, even though the control points are almost equally spaced. As discussed
below, an object defined by the NURBS (B-spline) basis functions can be divided
without overlap along a knot line by adding control points.
Let’s study how to divide an object defined by the NURBS (B-spline) basis func-
tions. Firstly, it is possible to add control points to the object defined by the NURBS
(B-spline) basis functions without changing its shape. This is done by inserting new
knot values into the knot vector for{each axis as follows. }
Assuming a knot vector Ξ = ξ1 , ξ2 , ξ3 , . . . , ξn+ p−1 , ξn+ p , ξn+ p+1 for gener-
ating the pth order basis function in the ξ -axis and the sequence of n control points
{B1 , B2 , . . . , Bn−1 , Bn } in the direction
{ of the ξ -axis, let m new knot values be added }
to the knot vector to obtain Ξ = ξ 1 , ξ 2 , ξ 3 , . . . , ξ n+m+ p−1 , ξ n+m+ p , ξ n+m+ p+1 .
{ }
Then, a new sequence of n + m control points B 1 , B 2 , . . . , B n+m−1 , B n+m are
calculated using
⎧ ⎫ ⎧ ⎫ ⎡ p p ⎤⎧ ⎫
⎨ B1 ⎪
⎪ ⎬ ⎪
⎨ B1 ⎪
⎬ T1,1 · · · T1,n ⎨ B1 ⎪
⎪ ⎬
.. . ⎢ . .. .. ⎥ ..
⎪ . ⎪ = T ⎪ .. ⎪ = ⎣ .. . . ⎦⎪ . ⎪ (6.3.5)
⎩ ⎭ ⎩ ⎭ p p ⎩ ⎭
B n+m Bn Tn+m,1 · · · Tn+m,n Bn
span. Among the curves and surfaces defined in Eqs. (6.3.1) and (6.3.2), the curve
element corresponding to a single knot span and the surface element corresponding
to the direct product of two single knot spans are called a curve segment and a surface
segment, respectively.
As for the curve segment, [a pth] order curve segment corre-
sponding
{ to a single knot span ξi , ξi+1 is defined } by a knot vector
ξi− p , ξi− p+1 , . . . , ξi−1 , ξi , ξi+1{, ξi+2 , . . . , ξi+ p , ξi+ p+1 with} 2( p + 1) compo-
nents and p + 1 control points ⎧Bi− p , Bi− p+1 , . . . ,⎫Bi−1⎧ , Bi . When the knot vector
⎫
⎨ ⎬ ⎨ ⎬
of a segment has a structure of a, . . . , a , b, . . . , b or c, a, . . . , a , b, . . . , b, d ,
⎩ ! " ! "⎭ ⎩ ! " ! " ⎭
p+1 p+1 p p
the segment is called a Bezier segment.
Consider dividing the surface shown in Fig. 6.9d into 25 surface segments
along the knot lines. First, to divide the surface into segments in the ξ direction,
knot values are added to the original knot vector {0, 0, 0, 0, 1, 2, 3, 4, 5, 5, 5, 5} to
construct a new knot vector {0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 5}. The
basis functions constructed from this new knot vector is shown in Fig. 6.10.
Because of the repeated knot values, the basis functions are C 0 continuous at the
repeated knot values except for both ends, and the knot spans on both sides of the
repeated knot values share only one control point at the knot value. This means that
both the knot spans and the curve segments defined on them are separable at the
knot value without overlapping. As an example, for the sequence of control points
corresponding to η = 0, a new sequence of control points is generated from the
original one based on Eq. (6.3.5) as follows:
⎧ ⎫ ⎡ 3 ⎤⎧ ⎫
⎨ P 1,1 ⎪
⎪ ⎬ T1,1 · · · T1,8
3
⎨ P1,1 ⎪
⎪ ⎬
.. ⎢ .. .. . ⎥ .
⎪
=
. ⎪ ⎣ . . .. ⎦⎪ .. ⎪ (6.3.8)
⎩ ⎭ ⎩ ⎭
P 16,1 3
T16,1 · · · T16,8
3
P8,1
1.0
0.8
0.6
0.4
0.2
0.0
0.0 1.0 2.0 3.0 4.0 5.0
ξ
186 6 Contact Mechanics with Deep Learning
In the case of dividing the new control points along the knot line into five segments,
the control points and knot vectors belonging to each segment are shown in Table
6.2. In the case of dividing the surface into 25 surface segments as shown in Fig. 6.11,
those of the two surface segments A and B in the figure are shown in Table 6.3 as
examples. It can be seen that all the segments shown in Tables 6.2 and 6.3 are Bezier
segments.
For Bezier segments, it is possible to elevate or reduce the orders of the basis
functions. The procedure for elevating the order of a one-dimensional Bezier (curve)
segment is explained as follows. When a p-th order segment consisting of p+1 control
i −1 p+2−i
Bi = Bi−1 + Bi (i = 1, . . . , p + 2) (6.3.9)
p+1 p+1
On the other hand, the order reduction procedure for a one-dimensional Bezier
(curve) segment is given{ as follows. When }a p-th order segment consisting of
p + 1 control points B1 , B2 , . . . , B p , B p+1 is to be reduced in its degree, the
{ }
new p control points B 1 , B 2 , . . . , B p−1 , B p of the ( p − 1)-th order segment are
generated according to the odd–even of p as follows:
In the case of p is even:
⎧
⎪
⎪ Bi (i = 1)
⎪
⎨ Bi −αi B i−1 (i = 2, . . . , r )
Bi = 1−αi
Bi+1 −(1−αi+1 )B i+1
(6.3.10)
⎪
⎪ (i = r + 1, . . . , p − 1)
⎪
⎩ αi+1
B p+1 (i = p)
where
p−2 i −1
r= , αi = (6.3.11)
2 p
where
p−1 i −1
r= , αi = (6.3.13)
2 p
In the present section, a contact search method between smooth contact surfaces is
studied based on the conventional contact search method.
It is well-recognized that the isogeometric analysis using NURBS as the basis
functions can be applied to the dynamic contact analysis.
First, consider the dynamic analysis based on the isogeometric analysis. In the
dynamic explicit method of the finite element method, the first-order basis functions
are usually used because the higher-order basis functions often require smaller value
of the time step Δt, increasing the computational load.
On the other hand, in the isogeometric analysis using the NURBS (B-spline) basis
functions, the constraint on the time step Δt is not severe even if the higher-order
basis functions are used. Figure 6.13 shows the maximum time step, with which the
one-dimensional wave propagation analysis under the condition of constant nodal
(control point) spacing can be stably performed. The horizontal axis is the order of the
basis function and the vertical axis the maximum time step width for stable analysis,
shown as the ratio to that in the finite element analysis with the basis functions of the
first order. It can be seen from the figure that in the case of the finite element analysis,
Δt becomes smaller as the order of the basis functions increases, while in the case
of using the B-spline basis functions, the constraint on Δt is conversely relaxed as
the order increases.
As shown in Fig. 6.14, this tendency is even more pronounced when the knot
values are repeated several times (see Sects. 6.2 and 6.3). The horizontal axis is the
knot multiplicity and the vertical axis the maximum time step Δt for stable analysis,
which is again a ratio to that for the case of the finite element analysis with the basis
functions of the first order. As can be seen from the figure, the use of the NURBS basis
functions relaxes the constraint on the time step in the explicit dynamic analysis.
Next, consider the contact analysis, especially the contact search with the isogeo-
metric analysis. The contact search differs greatly between the ordinary finite element
method and the isogeometric analysis using NURBS as the basis function. As shown
in Sect. 6.1, in the former, the local contact search between contact surfaces is based
on the calculation of distances between nodes on one of the contact surface and
segments on the other.
6.4 Local Contact Search for Surface-to-Surface Contact 189
The reason why the contact search can be attributed to the contact search between
a point and a surface (point-to-surface type) is that the nodes are always on the contact
surface, and also the shape of the segment is simple when the linear basis functions
are used (see Fig. 6.12a).
On the other hand, when the contact surface is a NURBS surface, the local
contact search between the contact surfaces is performed between the Bezier surface
190 6 Contact Mechanics with Deep Learning
1.5
1.0 FEA
NURBS
0.5
1 2 3 4 5
Order of Basis Functions
segments generated by dividing the contact surfaces. Unlike the point-to-surface type
contact search in the finite element method where a node is used as a representative
point of a contact surface, a Bezier surface segment does not have obvious control
points that represent the shape. This is because the control points constituting a
Bezier surface segment are not necessarily located on the segment. For this reason,
a difficulty arises in using some control points for contact detection.
As an example, Fig. 6.15 shows two Bezier surface segments facing each other
and their control points. In the figure, the control points of the lower segment are
shown in red, and those of the upper segment in blue. The two segments are not in
contact, but their control points intersect each other, which indicates the difficulty of
using some control points for contact detection in this case.
6.4 Local Contact Search for Surface-to-Surface Contact 191
2.5
1.5
FEA
NURBS-2nd
1.0 NURBS-3rd
NURBS-4th
NURBS-5th
0.5
1 2 3 4 5
Knot Multiplicity
Note, even in the contact search between two Bezier surface segments, the point-
to-surface type contact search can be employed to judge the contact state between
them. Specifically, a lot of points are set on one of the contact segments as shown
in Fig. 6.16, and the point-to-surface type contact search according to Eqs. (6.1.10)
and (6.1.11) is performed between each set of points and the other segment, then
the contact state at each point can be determined based on the signed distance in
Eq. (6.1.12).
192 6 Contact Mechanics with Deep Learning
In this section, to solve the computational load of the surface-to-surface local search,
a fast and stable local contact search method using feedforward neural networks and
deep learning is presented.
Now, let’s look again at the contact between segments. Figure 6.18a shows two
Bezier surface segments in contact, and Fig. 6.18b a rotated version of the segment
pair of Fig. 6.18a. The contact conditions (local coordinates of the contact points,
penetration depth, etc.) in Fig. 6.18a, b are identical. Thus, the contact state is
6.5 Local Contact Search with Deep Learning 193
10
0
2 3 4 5
Order of NURBS basis functions
invariant to translation and rotation, indicating that the contact state between Bezier
surface segments is determined only by the shape of both segments and their relative
arrangement.
Therefore, the local contact search is regarded as a process to obtain a mapping
from the shape and relative arrangement of the two segments to the contact state
between them. By constructing this mapping on a feedforward neural network,
the surface-to-surface local contact search can be performed without iterative
computation [11, 13, 14].
Thus, the surface-to-surface local contact search using a feedforward neural
network based on deep learning consists of the following three phases [11].
(1) Data Preparation Phase: Setting a number of pairs of Bezier surface segments
with various shapes and relative arrangements, and, for each pair, the contact
state between the two segments is calculated using the method described in
Sect. 6.4. In this way, a large number of data pairs, called training patterns, of
shape and relative arrangements of segments, and corresponding contact states
are collected.
Fig. 6.18 Two different pairs of contacting segments with the same state of contact
194 6 Contact Mechanics with Deep Learning
(2) Training Phase: The patterns collected in the Data Preparation Phase are used
to train a feedforward neural network through deep learning with the following
condition:
Input data: shape and relative arrangements of segments,
Teacher data: contact states between the segments.
(3) Application Phase: The feedforward neural network trained in the Training
Phase is incorporated into the contact analysis code. The trained neural network
promptly outputs the contact state between two Bezier segments given, based on
their shapes and relative arrangements. Thus, the fast surface-to-surface local
contact search is performed.
Here, the constraints on the input data are discussed. The shape of a Bezier surface
segment is defined by the sum of the products of the basis functions and the control
points as shown in Sect. 6.3. The basis function of the Bezier surface segment is
common among segments of the same order, so the shape of a surface segment is
effectively determined only by the arrangement of the control points.
Now, consider the relative arrangement of two Bezier surface segments. A Bezier
surface segment of the p-th order consists of the following ( p + 1)2 control points.
(1) Translate a segment and place the control point P1,1 at the origin. (Translation)
(2) Rotate the segment around the z-axis and place the control point Pp+1,1 on the
xz-plane. (Rotation)
(3) Rotate the segment around the y-axis and place the control point Pp+1,1 on the
x-axis. (Rotation)
(4) Rotate the segment around the x-axis and place the control point P1, p+1 on the
xy-plane. (Rotation)
Though the operations above are designed based on only one of the segments of a
pair, they are simultaneously performed at both of the segments of the pair, making
several degrees of freedom of control points of one segment constrained while the
relative arrangement of the two segments of the pair remain unchanged. Thus, the
total number of shape parameters of the two segments is reduced and the patterns
with the same relative arrangement are consolidated to one, which enables efficient
learning in the Training Phase.
Note that the surface-to-surface local contact search can be applied to Bezier
surface segments of various orders. It can be used in such various combinations as
the contact search between a Bezier surface segment based on the quadratic basis
functions and that based on the cubic basis functions, and that between a segment
of the fourth order and that of the fifth order. In addition, it can be employed for the
cases where the order of Bezier surface segments differs in each axis.
The number of shape parameters of Bezier surface segments varies depending on
the order of the basis functions. Then, when using feedforward neural networks for
local contact search, a neural network has to be constructed for each combination of
the two Bezier surface segments with different orders. However, this is not necessarily
efficient.
We could mitigate this inefficiency by making use of the properties of the Bezier
segments. As shown in Sect. 6.3, the order elevation or reduction can be applied to a
Bezier surface segment. If a Bezier surface segment of arbitrary order is approximated
by that of a predetermined order, then we have only to construct a feedforward neural
network for the pairs of approximated segments of the prescribed order. Thus, the
local contact search between any pair of segments of various orders can be performed
only by a single feedforward neural network. Here, a Bezier surface segment of
arbitrary order is approximated with that of the second order.
However, as shown in Sect. 6.3, a higher-order segment has a higher ability to
represent complex shape (see Fig. 6.12a–c), so the approximation accuracy can be
a problem when, for example, approximating a Bezier surface segment of the fifth
order with that of the second order. In this problem, subdivision of a segment could
be adopted: the complexity of the shape of the smaller segments generated by adding
new knot values and repartitioning the original segments is significantly reduced
[11].
The above process makes it possible to perform the surface-to-surface contact
search with a single feedforward neural network by performing subdivision until
it can be approximated with sufficient accuracy by Bezier surface segments of the
second order.
196 6 Contact Mechanics with Deep Learning
We show here a numerical example of the local contact search method using deep
learning described in Sect. 6.5.
Let’s discuss an application of the surface-to-surface local contact search using deep
learning to segment pairs whose basis functions are second-order NURBS in both
axes. Here, a feedforward neural network is trained to estimate the contact state
between segments using patterns generated from a large number of segment pairs
(both are Bezier surface segments) with various configurations.
Firstly, a lot of segment pairs are generated. After placing the nine control points
that constitute one of the pair of Bezier surface segments of the second order called
the master segment in the grid reference position (Table 6.5), all the coordinates are
modified by adding uniform random numbers with x-, y-, and z-coordinates of P1,1 ,
the y- and z-coordinates of P3,1 , and the z-coordinate of P1,3 being fixed. The range
of the modification is set to [−0.3, 0.3] for x-, y-, and z-coordinates, and the weight
of each control point in the range of [0.5, 2.0] using a uniform random number. Thus,
a lot of second-order Bezier surface segments of various shapes are generated.
The other segment of the segment pair called the slave segment is also gener-
ated through the same procedure above, then random rotation and translation are
performed on the slave segment. Specifically, for a slave segment, we perform the
rotation around the z-axis in the range of (−π, π ), that around the x-axis in the range
of (−π/4, π/4), then that around the y-axis in the range of (−π/4, π/4), and then
the translation in the range of [−2.0, 2.0] in x- and y-directions and in the range of
[0.0, 1.0] in z-direction. As a result, a large number of segment pairs, i.e., a lot of
pairs of a master segment and the corresponding slave segment with various shapes
and relative positions are created.
Out of 72 (= 18 × 4) parameters for the coordinates and weights of the total 18
control points that make up the two Bezier surface segments of the second order, the
66 parameters excluding the six fixed coordinates of the master segment represent
the shape and relative arrangement of each segment pair.
Secondly, for each of a large number of segment pairs generated above, the contact
state is calculated using the method described in Sect. 6.4. The number of sampling
points on the slave segment side is set to 121 (= 11 × 11) located in an equally
spaced grid pattern in each direction of (ξ, η). The contact state data to be estimated
can be selected arbitrarily. Here, the followings are selected as examples of contact
state data.
In this manner, we can obtain a large number of data pairs (patterns) consisting of
shapes and relative arrangement of two segments and contact states between them.
Here, 50, 000 patterns representing segment pairs in contact and 50, 000 patterns
representing those not in contact are generated.
From 50, 000 patterns representing segments in contact with each other, 25, 000
patterns are selected at random for training and the remaining 25, 000 patterns are
used for verification of the generalization capability.
Three feedforward neural networks to identify each of the contact states are
constructed using the training patterns;
( neural
) networks
( C C )are trained to estimate the
coordinates of the contact points ξSC , ηSC and ξM , ηM , the average penetration
depth Dcontact , and the area ratio Scontact , respectively. The number of units in the
input layer is 66 for all the above neural networks, and the number of units in the
output layer is 4, 1, and 1, respectively.
Based on the results of training with neural networks of various sizes, the neural
network for predicting the coordinates of the contact point has been set to have 5
hidden layers and 40 units per hidden layer. In the same way, that for estimating
Dcontact 3 hidden layers and 40 units per hidden layer, and that for estimating Scontact
2 hidden layers and 40 units per hidden layer.
198 6 Contact Mechanics with Deep Learning
The estimation accuracy of the trained neural network for estimating the coordinates
of the contact point is shown in Fig. 6.20, where
( C CFig.) 6.20a shows the distribution
of the
( C C) estimation error of the master side ξ , η
M ( M and ) Fig. 6.20b the slave side
ξS , ηS . The estimation errors of the slave side ξSC , ηSC are a little larger than those
of the master side, but both of them are estimated with good accuracy.
The estimation accuracies of the neural networks for the mean penetration depth
and the area ratio are shown in Figs. 6.21 and 6.22, respectively. Both figures show
the distribution of errors in the standardized data range where the maximum value
is 1 and the minimum value 0. Although the estimation accuracy of these values
4000
2000
0
-0.4 -0.2 0 0.2 0.4
Error
(a) Master
10000
Slave
ξ
8000
η
Number of Patterns
6000
4000
2000
0
-0.4 -0.2 0 0.2 0.4
Error
(b) Slave
References 199
Number of Patterns
permission from Springer
6000
4000
2000
0
-0.4 -0.2 0 0.2 0.4
Error
4000
2000
0
-0.4 -0.2 0 0.2 0.4
Error
is lower than that of the contact point coordinates, it can be said the trained neural
networks can estimate them well.
As described above, it is shown possible to predict various contact states in detail
in the surface-to-surface contact search by using deep learning.
References
5. Hallquist, J.O., Goudreau, G.L., Benson, D.J.: Sliding interfaces with contact-impact in large-
scale Lagrangian computations. Comput. Methods Appl. Mech. Eng. 51, 107–137 (1985)
6. Hughes, T.J.R., Cottrell, J.A., Bazilevs, Y.: Isogeometric Analysis: CAD, finite elements,
NURBS, exact geometry, and mesh refinement. Comput. Methods Appl. Mech. Eng. 194,
4135–4195 (2005)
7. Konyukhov, A., Izi, R.: Introduction to Computational Contact Mechanics: A Geometric
Approach. Wiley (2015)
8. Konyukhov, A., Schweizerhof, K.: Computational Contact Mechanics: Geometrically Exact
Theory for Arbitrary Shaped Bodies. Springer (2012)
9. Laursen, T.A.: Computational Contact and Impact Mechanics: Fundamentals of modeling
interfacial phenomena in nonlinear finite element analysis. Springer (2002)
10. Liu, W.N., Meschke, G., Mang, H.A.: A note on the algorithmic stabilization of 2d contact
analyses. Computational Methods in Contact Mechanics IV (edited by Gaul, L. and Brebbia,
C.A.), Wessex Institute, 1999, pp. 231–240.
11. Oishi, A., Yagawa, G.: A surface-to-surface contact search method enhanced by deep learning.
Comput. Mech. 65, 1125–1147 (2020)
12. Oishi, A., Yamada, K., Yoshimura, S., Yagawa, G.: Domain decomposition based parallel
contact algorithm and its implementation to explicit finite element analysis. JSME Int. J. 45A(2),
123–130 (2002)
13. Oishi, A., Yoshimura, S.: A new local contact search method using a multi-layer neural network.
Comput. Model. Eng. Sci. 21(2), 93–103 (2007)
14. Oishi, A., Yoshimura, S.: Genetic approaches to iteration-free local contact search. Comput.
Model. Eng. Sci. 28(2), 127–146 (2008)
15. Piegl, L., Tiller, W.: The NURBS Book 2nd ed. Springer (2000)
16. Puso, M.A., Laursen, T.A.: A 3D contact smoothing method using Gregory patches. Int. J.
Numer. Methods Eng. 54, 1161–1194 (2002)
17. Rogers, D.F.: An Introduction to NURBS with Historical Perspective. Academic Press (2001)
18. Sevilla, R., Fernandez-Mendez, S., Huerta, A.: NURBS-enhanced finite element method
(NEFEM). Int. J. Numer. Methods Eng. 76, 56–83 (2008)
19. Sevilla, R., Fernandez-Mendez, S., Huerta, A.: 3D NURBS-enhanced finite element method
(NEFEM). Int. J. Numer. Methods Eng. 88, 103–125 (2011)
20. Wang, F., Cheng, J., Yao, Z.: FFS contact searching algorithm for dynamic finite element
analysis. Int. J. Numer. Methods Eng. 52, 655–672 (2001)
21. Wriggers, P.: Computational contact mechanics. John Wiley & Sons (2002)
22. Zhong, Z.H.: Finite Element Procedures for Contact -Impact Problems. Oxford U.P. (1993)
Chapter 7
Flow Simulation with Deep Learning
Abstract In the previous chapters, we have studied various topics related to the
application of deep learning to solid mechanics. In this chapter, we will discuss the
application of deep learning to fluid dynamics problems. Section 7.1 describes the
basic equations of fluid dynamics, Sect. 7.2 the basics of the finite difference method,
one of the most popular methods for solving fluid dynamics problems, Sect. 7.3 a
practical example of a two-dimensional fluid dynamics simulation, Sect. 7.4 the
formulation of the application of deep learning to fluid dynamics problems, Sect. 7.5
recurrent neural networks that are suitable for the time-dependent problems covered
in this chapter, and finally, Sect. 7.6 a real application of deep learning to the fluid
dynamics simulation.
First, let’s derive the basic equations of fluid mechanics (dynamics), which consist
of the three conservation laws as.
the law of the conservation of mass,
the law of the conservation of momentum, and
the law of the conservation of energy,
together with the constitutive equation, that describe the properties of specific fluids.
Assume both the velocity v(m/s) and mass density ρ(kg/m3 ) of the fluid to be
functions of position and time as follows:
⎛ ⎞
u(x, y, z, t )
v = ⎝ v(x, y, z, t) ⎠, ρ = ρ(x, y, z, t) (7.1.1)
w(x, y, z, t)
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 201
G. Yagawa and A. Oishi, Computational Mechanics with Deep Learning,
Lecture Notes on Numerical Methods in Engineering and Sciences,
https://doi.org/10.1007/978-3-031-11847-0_7
202 7 Flow Simulation with Deep Learning
of the mass inside it is equal to the sum of the masses entering and leaving it (the
law of the conservation of mass).
{ ( ) }
∂ ∂(ρu)
(ρdxdydz) = ρudydz − ρu + dx dydz
∂t ∂x
{ ( ) }
∂(ρv)
+ ρvdzdx − ρv + dy dzdx
∂y
{ ( ) }
∂(ρw)
+ ρwdxdy − ρw + dz dxdy (7.1.2)
∂z
Rearranging this equation, we have the equation of motion in the x-axis direction
as
Du ∂σx x ∂σ yx ∂σzx
ρ = + + + Fx ρ (7.1.5)
Dt ∂x ∂y ∂z
The equations of motion in the y- and z-axis directions can be obtained similarly
as
Dv ∂σx y ∂σ yy ∂σzy
ρ = + + + Fy ρ (7.1.6)
Dt ∂x ∂y ∂z
Dw ∂σx z ∂σ yz ∂σzz
ρ = + + + Fz ρ (7.1.7)
Dt ∂x ∂y ∂z
where Du , Dv , and Dw
Dt Dt Dt
are called the material derivatives, which take into account
the movement of matter, and defined by
Du ∂u ∂u ∂u ∂u
= +u +v +w
Dt ∂t ∂x ∂y ∂z
Dv ∂v ∂v ∂v ∂v
= +u +v +w
Dt ∂t ∂x ∂y ∂z
Dw ∂w ∂w ∂w ∂w
= +u +v +w (7.1.8)
Dt ∂t ∂x ∂y ∂z
Note that Eqs. (7.1.5)–(7.1.7) are also called Euler’s equations of motion.
The constitutive equation of the Newtonian fluid can be written as
( )
σx x = − p + λ ∂∂ux + ∂∂vy + ∂w
∂z )
+ 2μ ∂∂ux
(
σ yy = − p + λ ∂∂ux + ∂∂vy + ∂w
∂z )
+ 2μ ∂∂vy
(
σzz = − p + λ ∂∂ux + ∂∂vy + ∂w + 2μ ∂w
( ) ∂z ∂z (7.1.9)
∂u ∂v
σx y = σ yx = μ ∂ y + ∂ x
( )
σ yz = σzy = μ ∂v + ∂w
( ∂z ∂y
)
σzx = σx z = μ ∂w∂x
+ ∂u
∂z
where p is the pressure (N/m2 ), μ the viscosity coefficient (N/m2 ), and λ the second
viscosity coefficient (N/m2 ). Note that a fluid taking Eq. (7.1.9) as its constitutive
equation is called the Newtonian fluid.
Substituting Eq. (7.1.9) into Eqs. (7.1.5) to (7.1.7), the Navier–Stokes equations
are obtained as follows:
{ ( )}
Du ∂p ∂ ∂u ∂v ∂w
ρ =− + λ + +
Dt ∂x ∂x ∂x ∂y ∂z
( ) ( ) ( )
∂ ∂u ∂ ∂v ∂ ∂w
+ μ + μ + μ
∂x ∂x ∂y ∂x ∂z ∂x
7.1 Equations for Flow Simulation 205
( ) ( ) ( )
∂ ∂u ∂ ∂u ∂ ∂u
+ μ + μ + μ
∂x ∂x ∂y ∂y ∂z ∂z
+ ρ Fx (7.1.10)
{ ( )}
Dv ∂p ∂ ∂u ∂v ∂w
ρ =− + λ + +
Dt ∂y ∂y ∂x ∂y ∂z
( ) ( ) ( )
∂ ∂u ∂ ∂v ∂ ∂w
+ μ + μ + μ
∂x ∂y ∂y ∂y ∂z ∂y
( ) ( ) ( )
∂ ∂v ∂ ∂v ∂ ∂v
+ μ + μ + μ
∂x ∂x ∂y ∂y ∂z ∂z
+ ρ Fy (7.1.11)
{ ( )}
Dw ∂p ∂ ∂u ∂v ∂w
ρ =− + λ + +
Dt ∂z ∂z ∂x ∂y ∂z
( ) ( ) ( )
∂ ∂u ∂ ∂v ∂ ∂w
+ μ + μ + μ
∂x ∂z ∂y ∂z ∂z ∂z
( ) ( ) ( )
∂ ∂w ∂ ∂w ∂ ∂w
+ μ + μ + μ
∂x ∂x ∂y ∂y ∂z ∂z
+ ρ Fz (7.1.12)
∂
= − ∂ y + μ ∂∂ xv2 + ∂∂ yv2 + ∂∂zv2 + ρ Fy
2 2
ρ Dv p
(7.1.14)
Dt ( 2 )
∂p ∂ w ∂2w ∂2w
ρ Dt = − ∂ z + μ ∂ x 2 + ∂ y 2 + ∂z 2 + ρ Fz
Dw
Finally, let’s discuss the law of conservation of energy. The energy E of a fluid
per unit mass is given as the sum of kinetic energy, internal energy e, and potential
energy Ω as follows:
1( 2 )
E= u + v2 + w2 + e + Ω (7.1.15)
2
206 7 Flow Simulation with Deep Learning
The energy in a small parallelepiped shown as Fig. 7.1 is E × ρdxdydz, and its
rate of change with time is given by
∂(ρ E)
dxdydz = Q̇dxdydz + Ẇ − Ė − q̇ (7.1.16)
∂t
where Q is the amount of heat generated inside or directly flowing in from outside the
fluid, q the amount of heat flowing out to the fluid around the small parallelepiped,
E the energy flowing out to the surroundings due to convection, W the work done
by the surroundings due to pressure or viscous forces, and (·) means the variation
per unit time.
Now, let’s look at each term of Eq. (7.1.16) in detail. First, Q̇ should be dealt with
individually after the specific heat source is determined, and for now, we assume as
∂Q
Q̇ = (7.1.17)
∂t
Next, Ẇ is calculated from the work done by the stress on each surface of the
infinitesimal parallelepiped. Let us calculate the work done by the stress in the x-
direction (see Fig. 7.4). In the AEHD surface, the stress in the x-direction is σx x ,
while the fluid velocity in this direction is u, meaning that the work done to the
parallelepiped is −uσx x dydz. The negative sign is due to the fact that the direction of
the stress and the direction of the(flow (displacement)) are opposite. Since the work
∂(uσx x )
on the opposite BCGF surface is uσx x + ∂ x dx dydz, the sum of the works on
these two surfaces is calculated as follows:
( )
∂(uσx x ) ∂(uσx x )
−uσx x dydz + uσx x + dx dydz = dxdydz (7.1.18)
∂x ∂x
The works in the x-direction on the surfaces ABFE and CDHG, and those on
the surfaces ADCB and EFGH are also calculated in the same manner; thus, all the
works in the x-direction is obtained.
Since the works in the y- and z-directions are calculated in the same way, we have
{ ( ) }
∂(uσx x ) ∂ uσx y ∂(uσx z )
Ẇ = + + dxdydz
∂x ∂y ∂z
{ ( ) ( ) ( )}
∂ vσ yx ∂ vσ yy ∂ vσ yz
+ + + dxdydz
∂x ∂y ∂z
{ ( ) }
∂(wσzx ) ∂ wσzy ∂(wσzz )
+ + + dxdydz (7.1.19)
∂x ∂y ∂z
Third, as for Ė, the energy inflow balance from each surface is calculated using
Fig. 7.5, showing the energy inflow and outflow per unit area for each surface. Since
7.1 Equations for Flow Simulation 207
E is the energy balance per unit mass, the energy inflow from the surface AEGD
is E × ρudydz, etc. Thus, the total balance of energy flow for the infinitesimal
parallelepiped is obtained as follows:
{ }
∂(ρu E) ∂(ρvE) ∂(ρwE)
Ė = + + dxdydz (7.1.20)
∂x ∂y ∂z
Finally, as for the term due to heat conduction q̇, the amount of heat flow in and
out per unit area of each surface is calculated with the temperature T and the heat
conduction coefficient κ according to Fourier’s law as shown in Fig. 7.6. Then, the
balance of the whole infinitesimal parallelepiped is calculated as
{ ( ) ( ) ( )}
∂ ∂T ∂ ∂T ∂ ∂T
q̇ = − κ + κ + κ dxdydz (7.1.21)
∂x ∂x ∂y ∂x ∂z ∂x
Substituting Eqs. (7.1.17), (7.1.19), (7.1.20), and (7.1.21) into Eq. (7.1.16) and
rearranging them using Euler’s equations of motion Eqs. (7.1.5)–(7.1.7), the consti-
tutive equation Eq. (7.1.9), the continuity equation Eqs. (7.1.3) and (7.1.15), the
energy equation is obtained as follows:
{ ( ) ( ) ( )}
De ∂Q ∂ ∂T ∂ ∂T ∂ ∂T
ρ = + κ + κ + κ
Dt ∂t ∂x ∂x ∂y ∂x ∂z ∂x
( )
∂u ∂v ∂w
−p + + +ϕ (7.1.22)
∂x ∂y ∂z
From the above, the basic equations for fluid dynamics can be summarized as
The equation of continuity derived from the law of the conservation of mass:
The equations of motion (the Navier–Stokes equations) derived from the law of
the conservation of momentum:
{ ( )}
Du ∂p ∂ ∂u ∂v ∂w
ρ =− + λ + +
Dt ∂x ∂x ∂x ∂y ∂z
( ) ( ) ( )
∂ ∂u ∂ ∂v ∂ ∂w
+ μ + μ + μ
∂x ∂x ∂y ∂x ∂z ∂x
( ) ( ) ( )
∂ ∂u ∂ ∂u ∂ ∂u
+ μ + μ + μ
∂x ∂x ∂y ∂y ∂z ∂z
+ ρ Fx (7.1.25)
{ ( )}
Dv ∂p ∂ ∂u ∂v ∂w
ρ =− + λ + +
Dt ∂y ∂y ∂x ∂y ∂z
( ) ( ) ( )
∂ ∂u ∂ ∂v ∂ ∂w
+ μ + μ + μ
∂x ∂y ∂y ∂y ∂z ∂y
( ) ( ) ( )
∂ ∂v ∂ ∂v ∂ ∂v
+ μ + μ + μ
∂x ∂x ∂y ∂y ∂z ∂z
+ ρ Fy (7.1.26)
{ ( )}
Dw ∂p ∂ ∂u ∂v ∂w
ρ =− + λ + +
Dt ∂z ∂z ∂x ∂y ∂z
( ) ( ) ( )
∂ ∂u ∂ ∂v ∂ ∂w
+ μ + μ + μ
∂x ∂z ∂y ∂z ∂z ∂z
( ) ( ) ( )
∂ ∂w ∂ ∂w ∂ ∂w
+ μ + μ + μ
∂x ∂x ∂y ∂y ∂z ∂z
+ ρ Fz (7.1.27)
Equation (7.1.29) is also called the heat conduction equation. Using ϕ calculated
from the velocity field obtained by solving the continuity equation (Eq. (7.1.24)) and
the Navier–Stokes equations (Eqs. (7.1.25)–(7.1.27)), the temperature field can be
obtained from Eq. (7.1.29).
Equation (7.2.2) has a drawback that it does not provide any information about the
approximation accuracy, while the well-known Taylor expansion can be used as an
error-estimable difference approximation for the derivatives [2, 15], which is given
as follows:
| | |
dφ || 1 2 |
2 d φ| 1 3 |
3 d φ|
φ(a + Δx) = φ(a) + Δx + (Δx) + (Δx) + ···
dx |x=a 2 dx 2 |x=a 6 dx 3 |x=a
∞ |
Σ n |
1 n d φ|
= (Δx) (7.2.3)
n=0
n! dx n |
x=a
where
d 0φ
0! = 1, = φ(x) (7.2.4)
dx 0
7.2 Finite Difference Approximation 211
Using these notations, the Taylor expansions (Eqs. (7.2.3) and (7.2.5)) can be
compactly written, respectively, as
Σ
M−1
1 (n) n 1 (M) M
φl+1 = φ h + φ h (0 ≤ θ ≤ 1) (7.2.8)
n=0
n! l M! l+θ
By rearranging Eq. (7.2.7) with respect to φl(1) , a difference approximation for the
first-order derivative is obtained as
φl+1 − φl 1 1 1 1 (5) 4
φl(1) = − φl(2) h − φl(3) h 2 − φl(4) h 3 − φ h − ···
h 2 6 24 120 l
φl+1 − φl
= + O(h) (7.2.9)
h
where O(h) is the term for the approximation error, which means that, as the grid
spacing h decreases, the approximation error decreases in proportion to the grid
spacing.
On the other hand, φl−1 = φ(xl − Δx) can be expressed by the Taylor expansion
as
1 1 1
φl−1 = φl + φl(1) (−h) + φl(2) (−h)2 + φl(3) (−h)3 + φl(4) (−h)4 + · · ·
2 6 24
(1) 1 (2) 2 1 (3) 3 1 (4) 4 1 (5) 5
= φl − φl h + φl h − φl h + φl h − φ h − · · · (7.2.10)
2 6 24 120 l
2 2 (5) 5
φl+1 − φl−1 = 2φl(1) h + φl(3) h 3 + φ h + ··· (7.2.13)
6 120 l
2 (4) 4
φl+1 + φl−1 = 2φl + φl(2) h 2 + φ h + ··· (7.2.15)
24 l
7.2 Finite Difference Approximation 213
Forward Difference
Backward Difference
Central Difference
1 1 1
φl+2 = φl + φl(1) (2h) + φl(2) (2h)2 + φl(3) (2h)3 + φl(4) (2h)4 + · · ·
2 6 24
1 (2) 2 1 (3) 3 1
= φl + 2 · φl h + 4 · φl h + 8 · φl h + 16 · φl(4) h 4 + · · · (7.2.17)
(1)
2 6 24
Similarly, by replacing Δx with −2Δx in the Taylor expansion Eq. (7.2.3), we
have
1 1 1
φl−2 = φl + φl(1) (−2h) + φl(2) (−2h)2 + φl(3) (−2h)3 + φl(4) (−2h)4 + · · ·
2 6 24
1 1 1
= φl − 2 · φl(1) h + 4 · φl(2) h 2 − 8 · φl(3) h 3 + 16 · φl(4) h 4 − · · · (7.2.18)
2 6 24
214 7 Flow Simulation with Deep Learning
By setting the coefficient of φl(1) to 1 and the coefficients of φl(2) , φl(3) and φl(4) to
0 in Eq. (7.2.19), a finite difference approximation for φl(1) is obtained by solving the
following simultaneous linear equations.
⎧
⎪
⎪ 2a + b − c − 2d = 1
⎨
4a + b + c + 4d = 0
(7.2.20)
⎪ 8a + b − c − 8d = 0
⎪
⎩
16a + b + c + 16d = 0
1 2 2 1
a=− , b= , c=− , d= , (7.2.21)
12 3 3 12
Substituting these values into Eq. (7.2.19), another finite difference approximation
of the first-order derivative is obtained as
−φl+2 + 8φl+1 − 8φl−1 + φl−2 1 (5) 4
φl(1) = +4· φ h + ···
12h 120 l
−φl+2 + 8φl+1 − 8φl−1 + φl−2 ( )
= + O h4 (7.2.22)
12h
This approximation formula has the fourth-order accuracy, more accurate than
the central difference method of the second-order accuracy Eq. (7.2.14).
In the same way, another difference approximation of φl(2) can be obtained by
setting the coefficient of φl(2) to 2 and the coefficients of φl(1) , φl(3) and φl(4) to 0 in
Eq. (7.2.19). Thus, the following simultaneous linear equations are to be solved.
7.2 Finite Difference Approximation 215
⎧
⎪
⎪ 2a + b − c − 2d = 0
⎨
4a + b + c + 4d = 2
(7.2.23)
⎪
⎪ 8a + b − c − 8d = 0
⎩
16a + b + c + 16d = 0
1 4 4 1
a=− , b= , c= , d=− , (7.2.24)
12 3 3 12
Substituting these values into Eq. (7.2.19), another finite difference approximation
of the second-order derivative is obtained as follows:
−φl+2 + 16φl+1 − 30φl + 16φl−1 − φl−2 1 (6) 4
φl(2) = +8· φ h + ···
12h 2 720 l
−φl+2 + 16φl+1 − 30φl + 16φl−1 − φl−2 ( )
= 2
+ O h4 (7.2.25)
12h
This approximation formula has the fourth-order accuracy, which is more accu-
rate than the second-order accurate approximation of the second-order derivative
Eq. (7.2.16). Thus, by expanding the range of sampling function values used in the
finite difference approximation, it is possible to create formulae of derivatives of
various orders with different accuracy.
In addition, it is possible to set the range of function values to be sampled asym-
metrically with respect to the evaluation point of the derivative. For example, let’s
set a = 0 in Eq. (7.2.19), which results in using asymmetric four points in the
neighborhood of the evaluation point, φl+1 , φl , φl−1 and φl−2 . Thus, the following
simultaneous linear equations are to be solved to set the coefficient of φl(1) to 1 and
the coefficients of φl(2) , φl(3) and φl(4) to 0 in Eq. (7.2.19).
⎧
⎨ b − c − 2d = 1
b + c + 4d = 0 (7.2.26)
⎩
b − c − 8d = 0
1 1
a = 0, b = , c = −1, d = , (7.2.27)
3 6
Substituting these values into Eq. (7.2.19), another finite difference approximation
of the first-order derivative is obtained as follows:
2φl+1 + 3φl − 6φl−1 + φl−2 1
φl(1) = − 2 · φl(4) h 3 + · · ·
6h 24
2φl+1 + 3φl − 6φl−1 + φl−2 ( 3)
= +O h (7.2.28)
6h
216 7 Flow Simulation with Deep Learning
φl+1 − φl
φl(1) = + O(h) (7.2.29)
h
The backward difference approximation for the first-order derivative with
accuracy O (h):
φl − φl−1
φl(1) = + O(h) (7.2.30)
h
The central difference approximation for the first-order derivative with accuracy
O (h2 ):
φl+1 − φl−1 ( )
φl(1) = + O h2 (7.2.31)
2h
The difference approximation for the first-order derivative with accuracy O (h3 ):
The difference approximation for the first-order derivative with accuracy O (h4 ):
A two-dimensional grid is shown in Fig. 7.9. The partial derivatives in each axial
direction can be calculated in the same way as in the one-dimensional case. For
example, the difference approximations for the first-order partial derivatives by the
central difference are obtained as follows:
|
∂φ || φi+1, j − φi−1, j ( )
| = + O Δx 2 (7.2.36)
∂ x i, j 2Δx
|
∂φ || φi, j+1 − φi, j−1 ( )
| = + O Δy 2 (7.2.37)
∂ y i, j 2Δy
Similarly, Fig. 7.10 shows a three-dimensional grid. The partial derivatives in each
axial direction can be calculated in the same way as in the one-dimensional case. For
example, the difference approximations for the first-order partial derivatives by the
central difference are obtained as follows:
|
∂φ || φi+1, j,k − φi−1, j,k ( )
| = + O Δx 2 (7.2.38)
∂ x i, j,k 2Δx
|
∂φ || φi, j+1,k − φi, j−1,k ( )
| = + O Δy 2 (7.2.39)
∂ y i, j,k 2Δy
|
∂φ || φi, j,k+1 − φi, j,k−1 ( )
| = + O Δz 2 (7.2.40)
∂ x i, j,k 2Δz
In this section, the basic equations of fluid dynamics derived in Sect. 7.1 are
discretized using the finite difference approximation studied in Sect. 7.2, and a
method for obtaining the solution with a test result are presented.
∂u ∂v ∂w
+ + =0 (7.3.1)
∂x ∂y ∂z
( 2 )
∂p ∂ u ∂2u ∂2u
ρ Du = − ∂x
+ μ + + + ρ Fx
( ∂ x2 ∂y ∂z )
Dt 2 2 2
∂p ∂ v ∂2v ∂2v
ρ Dt = − ∂ y + μ ∂ x 2 + ∂ y 2 + ∂z 2 + ρ Fy
Dv
(7.3.2)
( 2 )
∂p ∂ w ∂2w ∂2w
ρ Dw
Dt
= − ∂z
+ μ ∂x 2 + ∂y 2 + ∂z 2 + ρ Fz
Using the representative length L (m) and the representative speed U (m/s), we
define the non-dimensional quantities as follows:
x y z
x̃ = , ỹ = , z̃ = (7.3.3)
L L L
u v w
ũ = , ṽ = , w̃ = (7.3.4)
U U U
U
t˜ = t (7.3.5)
L
Using these non-dimensional values, Eqs. (7.3.1) and (7.3.2) are converted to the
non-dimensional equations as
∂ ũ ∂ ṽ ∂ w̃
+ + =0 (7.3.6)
∂ x̃ ∂ ỹ ∂ z̃
( )
∂ p̃ 1 ∂ 2 ũ ∂ 2 ũ ∂ 2 ũ
D ũ
D t˜
= − ∂ x̃
+ Re ( ∂ x̃ 2 + ∂ ỹ 2 + ∂ z̃ )
2 + F̃x
∂ 1 ∂ ṽ 2
∂ ṽ
2
∂ ṽ
2
D ṽ
D t˜
p̃
= − ∂ ỹ + Re 2 + ∂ ỹ 2 + ∂ z̃ 2 + F̃y (7.3.7)
( ∂2x̃ )
D w̃ ∂ p̃ 1 ∂ w̃ ∂ w̃
2
∂ 2 w̃
D t˜
= − ∂ z̃ + Re ∂ x̃ 2 + ∂ ỹ 2 + ∂ z̃ 2 + F̃z
where p̃, F̃i , and Re are also the non-dimensional values defined, respectively, as
p
p̃ = (7.3.8)
ρU 2
L
F̃i = Fi (i = x, y, z) (7.3.9)
U2
ρU L
Re = (7.3.10)
μ
The flow field of an incompressible fluid can be obtained by solving Eqs. (7.3.6)
and (7.3.7). For simplicity, the external force term is assumed to be absent in the
following discussion.
Let ṽn and p̃ n be the velocity and pressure at the nth time step, respectively. Then,
by discretizing Eq. (7.3.7) with respect to time, the explicit equation that represents
ṽn+1 is obtained as
( )
ṽn+1 = L 1 ṽn , p̃ n+1 (7.3.11)
Taking the divergence of Eq. (7.3.11) and employing Eq. (7.3.6), Poisson’s
equation for pressure p̃ is obtained as follows:
( )
∇ · ∇ p̃ n+1 = L 2 ṽn (7.3.12)
1 ∂ p̃ ∂ ũ ∂ ṽ ∂ w̃
+ + + =0 (7.3.13)
β ∂τ ∂ x̃ ∂ ỹ ∂ z̃
Based on the discussion in the previous sections, this section presents an example of
analysis of a two-dimensional flow field using the finite difference method, which
will be used as the training data for deep learning in Sect. 7.6 [10].
The basic equations in the two-dimensional space are given as follows:
∂ ũ ∂ ṽ
+ =0 (7.3.14)
∂ x̃ ∂ ỹ
( )
D ũ
D t˜
= − ∂∂ x̃p̃ + Re
1 ∂ 2 ũ
2 +
∂ 2 ũ
( ∂ x̃ ∂ ỹ 2 )
(7.3.15)
∂ p̃ 1 ∂ 2 ṽ ∂ 2 ṽ
D ṽ
D t˜
= − ∂ ỹ + Re ∂ x̃ 2 + ∂ ỹ 2
The analysis domain is shown in Fig. 7.11, and the specifications for the analysis
are summarized in Table 7.1.
As for the boundary conditions, the inlet boundary has a uniform flow in the x-
direction, and the pressure and velocity at the outlet boundary are extrapolated from
nearby values. For the side boundaries, a constant pressure condition is imposed,
and the velocity is extrapolated from nearby values. These boundary conditions are
Circular Cylinder
Flow
not necessarily accurate, but they are proved to be accurate enough for data for deep
learning.
The finite difference method is employed as for the numerical solution
method together with the pseudo-compressibility method. For the discretization in
spatial domain, the Monotonic Upstream-centered Scheme for Conservation Laws
(MUSCL) approximation [8], a kind of upwind difference method, is used to achieve
third-order accuracy. Note that it took about 1.5 s per time step to perform the
simulation. (CPU: Intel Core-i7 2.5 GHz).
We will show an example of visualization of calculation results of velocity in what
follows. With the vorticity ω being the rotation of the velocity field, the vorticity of
velocity v = (u, v, w)T in the three-dimensional space is defined as follows:
⎛ ∂v
⎞
∂z
− ∂w
∂y
⎜ ∂w ∂x ⎟
ω =∇×v=⎝ ∂x
− ∂z ⎠ (7.3.16)
∂u ∂v
∂y
− ∂x
Fig. 7.12 shows the time variation of vorticity around and behind the circular
cylinder, where the four images in the figure are, respectively, vorticity images at
time steps 500, 1000, 1500, and 2000 from top to bottom. Note that these images
only show those near and behind the cylinder, not at the entire analysis domain.
According to the visualization of the vorticity in the entire analysis domain, it is
confirmed that the flow generates the Kalman vortex train, and the vortex shedding
is repeated at regular intervals. It is also confirmed that twin vortices are formed by
the 500th time step, the twin vortices lose their symmetry near the 1000th time step,
the vortex detachment occurs near the 1500th time step, and the vortex trains are
released at the 3000th time step.
In the previous sections, the basic equations and numerical solutions in fluid dynamics
are reviewed with an example of analysis of two-dimensional unsteady flow. As the
analysis of unsteady flow is one of the time-dependent problems, it is known to be
computationally demanding with a large number of time steps. In recent years, the
scale of computation has become larger and such complex phenomena as coupled
7.4 Flow Simulation with Deep Learning 223
problems and multiphysics have been often performed, accelerating the increase of
computation time.
In numerical fluid dynamics analysis, the solution for the next calculation step is
calculated using the results (solution) of the past time steps. If the solution of the next
time step or ahead can be calculated or predicted without using numerical analysis,
it may lead to a significant reduction of computational load.
Here, a method employing deep learning to reduce the computational load in the
fluid analysis is discussed [10], where the prediction method consists of the following
three phases,
224 7 Flow Simulation with Deep Learning
and
( ) ( ) ( )
ResultsDL tpastK , . . . , ResultsDL tpast2 , ResultsDL tpast1 , ResultsDL (tnow ).
In the case of implicit analysis, the prediction result can be used as the initial
value of the iterative solution method.
7.5 Neural Networks for Time-Dependent Data 225
where
p
O lj Output value of the activation function of the j-th unit in the l-th layer for the
p-th pattern,
p l
U j Input value to the activation function of the j-th unit in the l-th layer for the
p-th pattern.
On the other hand, the behavior of a unit in a recurrent neural network [7] is shown
as
⎛ ⎞
(p ) Σ
nl−1
Σ
nl
= f⎝ + θ lj ⎠
p l p−1 l
ROj = f RU j
l
wl−1
ji ·
p
Oil−1 + W lj j ' · R O j' (7.5.2)
i=1 j ' =1
where W lj j ' is the newly added connection weight between the j-th and j ' -th units
p p
of the l-th layer, and R at the bottom left of R O lj and R U lj indicates that they are
quantities related to units of recurrent type.
Let’s consider the function of the newly added second term on the right-hand side
p−1
of Eq. (7.5.2). Note that the superscript on the left shoulder of R O lj ' is p − 1, which
indicates R O lj ' is the output of the j ' unit of the l-th layer for the previous training
p−1
226 7 Flow Simulation with Deep Learning
pattern. In other words, the second term on the right-hand side indicates that the
output of all the units in the l-th layer for the previous training pattern is taken into
account when performing the calculation for the current training pattern. Although
p−1
the term has only the value for the (p − 1)-th training pattern as R O lj ' , it should
p l p l
be noted that R U j and R O j are affected by all the previous training patterns because
p−1 l p−2 l p−2 l p−3 l
R O j ' is affected by R O j and R O j by R O j and so on in a recursive manner
as described in Eq. (7.5.2).
Then, what will be the update rule of the connection weight in the recurrent neural
network? Let’s compare the derivative values between the output of the normal unit
p l p
O j and that of the recurrent unit R O lj . From Eq. (7.5.1), the derivative of the output
O j of the standard unit with respect to wαβ
p l l−1
is given by
( ) ( nl−1 )
∂ O lj
p ∂f p
U lj ∂ p U lj ( ) Σ ∂wl−1
' p ji
= = f l
Uj · p Oil−1 (7.5.3)
∂wαβ
l−1
∂ p U lj ∂wαβ
l−1
i=1
∂wαβ
l−1
p
On the other hand, that of the recurrent unit output R O lj with respect to wαβ
l−1
can
be obtained from Eq. (7.5.2) as
( )
p l
p
∂ R O lj ∂f RU j
p
∂ R U lj
= p
∂wαβl−1
∂ R U lj
∂wαβ l−1
⎛ ⎞
( ) Σ
nl−1
∂w l−1 Σnl
∂
p−1 l
O '
= f ' R U lj ⎝ ⎠
p ji R j
Oi +
p l−1
W lj j ' (7.5.4)
i=1
∂w l−1
αβ '
j =1
∂w l−1
αβ
The last term in Eq. (7.5.4) is the derivative of the output of the recurrent unit
in the previous pattern. Thus, the recurrent neural networks cannot use the standard
error backpropagation algorithm for ordinary feedforward neural networks.
As learning methods for recurrent neural networks, backpropagation through time
(BPTT) [13] and real-time recurrent learning (RTRL) [14] have been developed. The
former is known to be simpler and computationally faster.
Consider a three-layer neural network as shown in Fig. 7.13 (left). Only the
middle layer is a recurrent layer. A simplified diagram of the network is shown
in Fig. 7.13 (right), and the computation for N training patterns is schematically
shown in Fig. 7.14. In practice, training cannot be done at the same time, and calcu-
lations are done in order from the first learning pattern to the N-th learning pattern.
This is because the output of the hidden layer for the previous training pattern is used
for the calculation of the current training pattern. Taking this into account, Fig. 7.14
can be rewritten as Fig. 7.15, which can be regarded as one large network, and the
BPTT method is so designed as applied to this combined network.
7.5 Neural Networks for Time-Dependent Data 227
Fig. 7.15 Another schematic view of three-layer recurrent neural network for consecutive patterns
228 7 Flow Simulation with Deep Learning
1 ΣΣ
n
3
( p 3 p )2
N
E= Ok − Tk (7.5.5)
2 p=1 k=1
First, let’s calculate the update of the connection weight between the hidden layer
and the output layer, which is given as
1 ΣΣ
n
∂ ( p 3 p )2
N
∂E 3
= Ok − Tk
∂wab
2 2 p=1 k=1 ∂wab
2
Σ
N Σ
n3
( ) ∂ p Ok3
= p
Ok3 − p Tk
p=1 k=1
∂wab
2
Σ
N
( ) ∂ p Oa3
= p
Oa3 − p Tk (7.5.6)
p=1
∂wab
2
where
( ) ( )
∂ p Oa3 ∂ f p Ua3 ∂ f p Ua3 ∂ p Ua3
= =
∂wab2
∂wab2 ∂ p Ua3 ∂wab 2
( ) ⎛ ⎞
∂ f p Ua3 ∂ Σ n2
= ⎝ wa2 j · R O 2j + θa3 ⎠
p
∂ p Ua3 ∂wab 2
j=1
( p 3)
∂ f Ua p 2
= · R Ob (7.5.7)
∂ p Ua3
From Eq. (7.5.8), it can be seen that the update of the connection weight between
∂E
the hidden layer and the output layer, ∂w 2 , is calculated using the value of each term
ab
on the right-hand side for each training pattern.
Next, let us calculate the update of the connection weight between input and
hidden layers,
1 ΣΣ
n
∂ ( p 3 p )2
N
∂E 3
= Ok − Tk
∂wcd
1 2 p=1 k=1 ∂wcd
1
7.5 Neural Networks for Time-Dependent Data 229
Σ
N Σ
n3
( ) ∂ p Ok3
= p
Ok3 − p Tk (7.5.9)
p=1 k=1
∂wcd
1
where
( ) ( )
∂ p Ok3 ∂ f p Uk3 ∂ f p Uk3 ∂ p Uk3
= =
∂wcd1
∂wcd1
∂ p Uk3 ∂wcd 1
( ) ⎛ ⎞
∂ f p Uk3 ∂ Σ n2
= ⎝ wk2j · R O 2j + θk3 ⎠
p
∂ p Uk3 ∂wcd1
j=1
( p 3 ) ⎛ n2 ⎞
∂ f Ua Σ ∂
p 2
O
·⎝ ⎠
R j
= wk2j · (7.5.10)
∂ p Ua3 j=1
∂w 1
cd
and
( ) ( )
p 2
p
∂ R O 2j ∂f RU j ∂f p
U 2j ∂ p U 2
j
= =
∂wcd
1
∂wcd 1
∂ p U 2j ∂wcd 1
( ) ⎛ ⎞
∂ f p U 2j ∂ ⎝Σ
n1 Σ n2
W j2j ' · R O 2j ' + θ 2j ⎠
p−1
= w 1ji · p Oi1 +
∂ p U 2j ∂wcd 1
i=1 '
j =1
( p 3) ⎛ 1 ⎞
∂ f Ua ∂w jd p Σ n 2
∂
p−1 2
O '
· ⎝ 1 · Id + ⎠
R j
= W j2j ' · (7.5.11)
∂ p Ua3 ∂wcd '
j =1
∂w 1
cd
Equation (7.5.11) determines the derivative of the output of the j-th unit in the
hidden layer for the p-th learning pattern using that for the (p − 1)-th learning pattern.
Using the value above, Eqs. (7.5.10) and (7.5.9) are used in turn to determine the
∂E
value of ∂w 1 .
cd
∂E
Finally, let us calculate the update of the connection weight ∂ Wab
2 for the feedback
in the hidden layer as
1 ΣΣ
n
∂ ( p 3 p )2
N
∂E 3
= Ok − Tk
∂ Wab
2 2 p=1 k=1 ∂ Wab
2
Σ
N Σ
n3
( ) ∂ p Ok3
= p
Ok3 − p Tk (7.5.12)
p=1 k=1
∂ Wab
2
where
( ) ( )
∂ p Ok3 ∂ f p Uk3 ∂ f p Uk3 ∂ p Uk3
= =
∂ Wab2
∂ Wab
2
∂ p Uk3 ∂ Wab2
230 7 Flow Simulation with Deep Learning
(p ⎛ ) ⎞
∂ ⎝Σ
n2
∂f Uk3
wk2j · R O 2j + θk3 ⎠
p
=
∂ p Uk3 ∂ Wab
2
j=1
( p 3 ) ⎛ n2 ⎞
∂ f Uk Σ ∂
p 2
O
·⎝ ⎠
R j
= wk2j · (7.5.13)
∂ p Uk3 j=1
∂ W 2
ab
and
( ) ( )
p 2
p
∂ R O 2j ∂f RU j ∂f p
U 2j ∂ p U 2
j
= =
∂ Wab
2
∂ Wab
2
∂ p U 2j ∂ Wab 2
( ) ⎛ ⎞
∂ f p U 2j ∂ ⎝Σ
n1 Σn2
W j2j ' · R O 2j ' + θ 2j ⎠
p−1
= w 1ji · p Oi1 +
∂ p U 2j ∂ Wab
2
i=1 '
j =1
( ) ⎛ ⎞
∂ f Ujp 2
∂ W jb
2 Σ n2
∂
p−1 2
O '
·⎝
p−1 2 R j ⎠
= Ob + W j2j ' · (7.5.14)
∂ p U 2j ∂ Wab2 R
j ' =1
∂ W 2
ab
From Eq. (7.5.14), the derivative value of the output of a unit in the hidden layer
2
with respect to Wab for the p-th learning pattern can be calculated using the output
values of the units in the hidden layer and their derivative values for the (p − 1)-th
training pattern. Using the value obtained above, the left-hand side values of Eqs.
(7.5.13) and (7.5.12) are calculated in order, and finally ∂∂WE2 is obtained.
ab
The update values (derivative values) for the bias values θk3 and θ 2j are calculated
in the same manner.
As described above, in BPTT, the amount of update of each parameter can be
calculated by computing the values sequentially according to the order of training
patterns.
In this section, the basic items of long short-term memory (LSTM) are discussed [9],
which is an advanced version of recurrent neural networks [1, 7].
There is a process of sequentially calculating the gradients in the order of
the training patterns in the error backpropagation calculation for recurrent neural
networks, and it is known that the gradient vanishing problem may arise similarly to
the calculation of multiple layers in feedforward neural networks.
Therefore, quantities related to training patterns far apart each other cannot affect
the update of the connection weights, resulting in a network that makes predictions
by referring only to relatively last-minute information, which has been considered
to be a problem in operation based on long-term memory.
7.5 Neural Networks for Time-Dependent Data 231
(output)
output gate
(input D)
cell
(input B)
sum
multiplication
(input A)
Fig. 7.16 Schematic diagram of LSTM memory cell
The long short-term memory (LSTM) network is developed to fix the weaknesses
of recurrent neural networks above. The behavior of the unit in LSTM is shown in
Fig. 7.16 [6]. The LSTM unit in Fig. 7.16 is an extension of the original unit with
the forget gate [4] and peephole connections [3].
The LSTM unit in Fig. 7.16 is designed to perform four different recurrent
processes on the input, taking the input data x t at the current time step (training
pattern) and the output data yt−1 of the LSTM unit at the previous time step (training
pattern) as input, and outputting the output y tj at the current time step. For the same
input data (input A, input B, input C, and input D), the same operations as in the
usual recurrent unit are performed, and the output is calculated based on the results
of these operations. f A (), f B (), f C (), f D (), and f E () are the activation functions and
the sigmoid function is used for f B (), f C (), and f D (), while the tanh function for
f A () and f E (). Another feature of the LSTM unit is that it has a state variable s,
which is given the function of controlling long-term memory. The LSTM unit shown
in Fig. 7.16 has three gates, where the input gate controls the transmission strength
of new input information, the output gate the output strength of memory cells, and
the forget gate the transmission strength of past information.
Let us now look at the operations in the LSTM unit [6]. First, the following
operations are performed on input A as in the recurrent unit,
232 7 Flow Simulation with Deep Learning
Σ
n1 Σ
n2
u A,t
j = w Aji xit + j' + θ j
W jAj ' y t−1 A
(7.5.15)
i=1 j ' =1
( )
g A,t
j = f A u A,t
j (7.5.16)
Next, the following operations are performed on input B as in the recurrent unit,
Σ
n1 Σ
n2
u B,t
j = w Bji xit + j' + θ j + pB · s j
W jBj ' y t−1 B t−1
(7.5.17)
i=1 j ' =1
( )
g B,t
j = f B u B,t
j (7.5.18)
Σ
n1 Σ
n2
u C,t
j = wCji xit + j ' + θ j + pC · s j
W jCj ' y t−1 C t−1
(7.5.19)
i=1 j ' =1
( )
g C,t
j = f C u C,t
j (7.5.20)
At the input gate, the product of g A,t j obtained by Eq. (7.5.16) and g B,tj by
C,t
Eq. (7.5.18) is calculated. At the forget gate, the product of g j obtained by
Eq. (7.5.20) and the state variable s t−1
j is calculated. The state variables are updated
by summing the results of the input gate and forget gate operations as follows:
s tj = g C,t
j · sj
t−1
+ g B,t A,t
j · gj (7.5.21)
Next, the following operations are performed on input D as in the recurrent unit.
Σ
n1 Σ
n2
u D,t
j = w Dji xit + j' + θ j + pD · s j
W jDj ' y t−1 D t
(7.5.23)
i=0 j ' =1
( )
g D,t
j = f D u D,t
j (7.5.24)
y tj = g E,t D,t
j · gj (7.5.25)
In this section, the application of deep learning to the analysis of a flow field around a
two-dimensional circular cylinder (Sect. 7.3.3) is given in detail [10], where convolu-
tional LSTM networks (Sect. 7.5) are employed. It predicts the vorticity visualization
image at a future time step using the previous visualization images obtained from
the fluid analysis results as input.
From the analysis results described in Sect. 7.3.3, visualization images of vorticity
and pressure are created for every 100 time steps between the 100th and the 7400th
time steps, achieving 74 images for each of vorticity and pressure. The images are
cropped so that the flow near the cylinder and behind the cylinder are focused. Each
image has an 8-bit grayscale (256 Gy scale) of 200 pixel width by 100 pixel height.
When used as input data, each pixel value is normalized to a real number in the range
of 0–1. Here, the visualized image of vorticity in the nth step is denoted as VICFD (n),
the image of pressure as PICFD (n), and both images as VPICFD (n).
Let’s predict the image of vorticity at the next time point from the images of those
at the last four time points.
In the case of predicting the next vorticity image from vorticity images at past
time points, the following 70 training patterns can be obtained from 74 images. (The
first four images in each pattern are the input data and the last one the teacher data.)
{( ) }
1{(VICFD (100), VICFD (200), VICFD (300), VICFD (400)), VICFD (500)}
2 VICFD (200), VICFD (300), VICFD (400), VICFD (500) , VICFD (600)
· · ·{( ) }
70 VICFD (7000), VICFD (7100), VICFD (7200), VICFD (7300) , VICFD (7400)
234 7 Flow Simulation with Deep Learning
Similarly, in the case of predicting the next vorticity and pressure images from
the vorticity and pressure images at past time points, the images at 74 different times
provide the 70 training patterns shown as
{( ) }
1 VPICFD (100), VPICFD (200), VPICFD (300), VPICFD (400) , VPICFD (500)
{( ) }
2 VPICFD (200), VPICFD (300), VPICFD (400), VPICFD (500) , VPICFD (600)
· · ·{( ) }
70 VPICFD (7000), VPICFD (7100), VPICFD (7200), VPICFD (7300) , VPICFD (7400)
Deep learning using the 70 training patterns created in Sect. 7.6.1 is performed here.
To build a predictor for the vorticity image, input and teacher data are set as
follows:
Input data: VICFD (n − 300), VICFD (n − 200), VICFD (n − 100), VICFD (n)
Teacher data: VICFD (n + 100)
The trained neural network outputs the predicted image of vorticity,
VIDL (n + 100), for the above input data. The number of epochs is set to 100,000,
and the mini-batch size to 10. Training took several dozen hours on the computer
equipped with Intel Core i7-7700 K CPU and NVIDIA TITAN V GPU.
Similarly, to build a predictor for both the vorticity and pressure images, input
and teacher data are set as follows:
Input data: VPICFD (n − 300), VPICFD (n − 200), VPICFD (n − 100), VPICFD (n)
Teacher data: VPICFD (n + 100)
The trained neural network outputs the predicted image of vorticity,
VIDL (n + 100), and also that of pressure, VIDL (n + 100), for the above input data.
The number of epochs is set to 100,000, and the mini-batch size to 10.
Regarding the neural network employed for deep learning, the convolutional
LSTM network used to build a predictor for vorticity images as an example has
the structure as follows:
The first layer, i.e., the input layer, is a convolutional layer. The input data for this
layer are four 200 × 100 pixel vorticity images at four time steps. Each pixel of the
images is a single-precision real number between 0 and 1. The input layer converts
the input data into 20 × 200 × 100 data by convolution operation using 20 filters
(filter size 3 × 3) and sends the data to the next layer.
In the second to fourth convolutional LSTM layers, 20 × 200 × 100 input data are
converted to 20 × 200 × 100 new data by a two-dimensional convolution operation
using 20 filters (filter size 3 × 3), which are sent to the next layer.
In the fifth layer, which is also the convolutional LSTM layer, 20 × 200 × 100
input data are converted to 3 × 200 × 100 new data by a two-dimensional convolution
operation using three filters (filter size 3 × 3), which are sent to the next layer.
In the sixth layer, the convolutional three-dimensional layer, 3 × 200 × 100 input
data are converted to 1 × 200 × 100 new data by a three-dimensional convolution
operation using a single filter (filter size 3 × 3 × 3), which are output. As the output
image is based on real numbers, they are converted to integer values between 0 and
255, resulting in a normal 8-bit (256 shades) image.
Note that the convolutional LSTM layer used here is a convolutional neural
network with LSTM [11].
Here, the neural network constructed in Sect. 7.6.2 is used to predict the flow field
around a two-dimensional circular cylinder.
Figure 7.17 shows the predicted results for the data included in the training patterns,
depicting the vorticity images between 3200th and 3500th time steps. In the figure,
CFD denotes the image based on the analysis results obtained from the computational
fluid dynamics simulation, which is considered as the correct image here. The others
are, respectively, those by deep learning when the images based on the analysis
results are used as the input (DL with VICFD ) and those when the predicted images
are used as the input (DL with VIDL ). The images in the figure are achieved by the
trained neural network with the following data (images) as input data,
DL with V I C F D :
( )
t = 3200 VICFD (2800), VICFD (2900), VICFD (3000), VICFD (3100)
( )
t = 3300 VICFD (2900), VICFD (3000), VICFD (3100), VICFD (3200)
( )
t = 3400 VICFD (3000), VICFD (3100), VICFD (3200), VICFD (3300)
( )
t = 3500 VICFD (3100), VICFD (3200), VICFD (3300), VICFD (3400)
DL with V I DL :
( )
t = 3200 VICFD (2800), VICFD (2900), VICFD (3000), VICFD (3100)
236 7 Flow Simulation with Deep Learning
CFD
(Correct)
DL
(with )
DL
(with )
Fig. 7.17 Predicted images of vorticities for time steps included in training patterns
( )
t = 3300 VICFD (2900), VICFD (3000), VICFD (3100), VIDL (3200)
( )
t = 3400 VICFD (3000), VICFD (3100), VICFD (3200), VIDL (3300)
( )
t = 3500 VICFD (3100), VICFD (3200), VICFD (3300), VIDL (3400)
The results in Fig. 7.17 show that deep learning is generally able to make good
predictions for this case. Here, the Structural Similarity Index Measure (SSIM) [12],
which is a similarity index of images, is about 70%. (The SSIM is 100% in the case
of an exact match.) When the prediction results by deep learning are also used as
input (DL with VIDL ), the accuracy is lower than when only CFD results are used as
input (DL with VICFD ), but it could be improved by data augmentation to suppress
overtraining.
Figure 7.18 shows the prediction results of the vorticity image for the data not
included in the training pattern, where the vorticity images at the 20200th time step
to the 20300th, 20400th, and 20500th time steps are given. Since the vorticity images
at these time steps are not included in the training patterns, it is considered that the
generalization capability for unknown input data is verified. Note that the images
used as input to the trained neural network are selected in the same manner as those
used for Fig. 7.17, which are shown as follows:
DL with VICFD :
( )
t = 20200 VICFD (19800), VICFD (19900), VICFD (20000), VICFD (20100)
( )
t = 20300 VICFD (19900), VICFD (20000), VICFD (20100), VICFD (20200)
( )
t = 20400 VICFD (20000), VICFD (20100), VICFD (20200), VICFD (20300)
( )
t = 20500 VICFD (20100), VICFD (20200), VICFD (20300), VICFD (20400)
DL with V I DL :
( )
t = 20200 VICFD (19800), VICFD (19900), VICFD (20000), VICFD (20100)
( )
t = 20300 VICFD (19900), VICFD (20000), VICFD (20100), VIDL (20200)
7.6 Numerical Example 237
CFD
(Correct)
DL
(with )
DL
(with )
Fig. 7.18 Predicted images of vorticities for time steps not included in training patterns
( )
t = 20400 VICFD (20000), VICFD (20100), VIDL (20200), VIDL (20300)
( )
t = 20500 VICFD (20100), VIDL (20200), VIDL (20300), VIDL (20400)
The results in Fig. 7.18 show that deep learning is generally able to make good
predictions also for this case. Compared to Fig. 7.17, the prediction accuracy for the
input data not included in the training patterns is poorer than that for the input data
included in the training patterns. It is considered effective to suppress overtraining,
for example, by expanding the training patterns during training.
Next, Fig. 7.19 shows the prediction results of the vorticity image when both the
vorticity and the pressure images are predicted simultaneously, where the time steps
of the predicted images are from the 3200th to the 3400th steps and the input data for
deep learning are images calculated by computational fluid dynamics analysis. For
comparison, the images predicted only from the vorticity images (DL with VICFD )
are also shown. It is noted that the results with both vorticity and pressure images as
input (DL with VPICFD ) are comparable to those with only vorticity images as input
(DL with VICFD .
Finally, Fig. 7.20 shows the prediction results of the pressure image when both the
vorticity and the pressure images are predicted simultaneously, where the time steps
of the predicted images are from the 3200th to the 3400th steps and the input data
for deep learning are images calculated by computational fluid dynamics analysis.
According to the prediction result of the pressure image by deep learning (DL with
VPICFD ), the location of the low pressure area (black in the figure) is almost consistent
with that by the CFD, but there is more noise than that of the vorticity image.
As described above, it can be seen that deep learning using the convolutional
LSTM network can predict well the results of computational fluid dynamics analysis.
The time required for prediction by deep learning is less than one second in order to
obtain a solution 100 steps ahead, while, in the case of CFD, it is necessary to analyze
whole time steps along the way by time-consuming computational fluid dynamics
simulation. The prediction accuracy by deep learning could be further improved by
optimizing the convolutional LSTM network structure and increasing the number of
training patterns to prevent overtraining and improve the generalization capability.
238 7 Flow Simulation with Deep Learning
CFD
(Correct)
DL
(with )
DL
(with )
Fig. 7.19 Predicted images of vorticities using images of vorticities and pressures
CFD
(Correct)
DL
(with )
Fig. 7.20 Predicted images of pressures using images of vorticities and pressures
Acknowledgements We would like to express our gratitude to Dr. Masato Masuda, Prof. Yasushi
Nakabayashi, and Prof. Yoshiaki Tamura for providing the data for this chapter. Figures in
Sections 7.3 and 7.6 are based on the provided data. We are also grateful to Prof. Yoshiaki Tamura for
his kind advice on the description of Section 7.3 and to Dr. Masato Masuda for that of Sections 7.5
and 7.6.
References
1. Elman, J.L.: Finding structure in time. Cogn. Sci. 14, 179–211 (1990)
2. Ferziger, J.H., Peric, M.: Computational Methods for Fluid Dynamics (Second Edition).
Springer (1999)
3. Gers, F.A., Schmidhuber, J.A.: Recurrent nets that time and count. Proceedings of the
IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural
Computing: New Challenges and Perspectives for the New Millennium, Vol. 3, 2000,
pp. 189–194, DOI: https://doi.org/10.1109/IJCNN.2000.861302.
References 239
4. Gers, F.A., Schmidhuber, J.A., Cummins, F.A.: Learning to Forget: Continual Prediction with
LSTM. Neural Comput. 12(10), 2451–2471 (2000). DOI: https://doi.org/10.1162/089976600
300015015
5. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning, MIT Press (2016)
6. Greff, K., Srivastava, R.K., Koutník, J., Steunebrink, B.R., Schmidhuber, J.: LSTM: A Search
Space Odyssey. IEEE Trans. Neural Netw. Learn. Sys. 28(10), 2222–2232 (2017). DOI: https://
doi.org/10.1109/TNNLS.2016.2582924.
7. Heykin, S.: Neural Networks: A comprehensive Foundation. Prentice Hall (1999)
8. Hirsh, C.: Numerical Computation of Internal and External Flows: The Fundamentals of
Computational Fluid Dynamics (Second Edition). Butterworth-Heinemann (2007)
9. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780
(1997)
10. Masuda, M., Nakabayashi, Y., Tamura, Y.: Prediction of computational fluid dynamics results
using convolutional LSTM. Transactions of JSCES. 2020, 20201006 (2020). (in Japanese)
11. Shi, X., Chen, Z., Wang, H., Yeung, D.-Y.,Wong, W.-K., Woo, W.-C.: Convolutional LSTM
Network: a machine learning approach for precipitation nowcasting. In Proceedings of the 28th
International Conference on Neural Information Processing Systems (NIPS’15), Vol. 1, 2015,
pp. 802–810.
12. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error
visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004). DOI:
https://doi.org/10.1109/TIP.2003.819861.
13. Williams, R.J., Peng, J.: An efficient gradient-based algorithm for on-line training of recurrent
network trajectories. Neural Comput. 2, 490–501 (1990)
14. Williams, R.J., Zipser, D.: A learning algorithm for continually running fully recurrent neural
network. Neural Comput. 1, 270–280 (1989)
15. Zienkiewicz, O.C., Morgan, K.: Finite Elements and Approximation, Dover (2006)
Chapter 8
Further Applications with Deep Learning
Abstract In this chapter, some additional applications of deep learning in the field
of computational mechanics are discussed: a method of improving the accuracy of
element stiffness matrices (Sect. 8.1), finite element analysis using convolutional
operations (Sect. 8.2), fluid analysis using variational autoencoders (Sect. 8.3), a
zooming method using feedforward neural networks (Sect. 8.4), and an application
of physics-informed neural networks to solid mechanics (Sect. 8.5).
In the finite element method, methods for improving the accuracy of solutions can
be classified into two main categories shown as follows:
A; Methods reducing the size of elements with a large number of elements
B; Methods increasing the order of the basis functions without reducing the
element size
Method A improves the accuracy of solutions by reducing the element size with
low-order basis functions, which results in reducing the variation of physical quan-
tities (such as displacements) in an element. On the other hand, Method B improves
the accuracy of solutions by taking advantage of the high approximation capability
of the basis functions of higher order.
The accuracy improvement of approximation of the basis functions in an element
naturally results in that of the element stiffness matrix, and finally the better finite
element solution. In other words, the quality of the element stiffness matrices and the
global stiffness matrix directly affects the accuracy of the finite element solutions.
Then, we have discussed a method to improve the quality of the element stiffness
matrix by optimizing the numerical integration parameters using deep learning in
Chap. 4. Here, another method is reviewed, where the strain–displacement matrix
involved in the element integration is improved by deep learning [9].
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 241
G. Yagawa and A. Oishi, Computational Mechanics with Deep Learning,
Lecture Notes on Numerical Methods in Engineering and Sciences,
https://doi.org/10.1007/978-3-031-11847-0_8
242 8 Further Applications with Deep Learning
where (Ui , Vi )T and (X i , Yi )T are the displacements and coordinates of the i-th node
belonging to the element, respectively.
The basis functions in Eqs. (8.1.1) and (8.1.2) are given by the following equations.
1
N1 (ξ, η) = (1 − ξ )(1 − η)(−ξ − η − 1) (8.1.3)
4
1
N2 (ξ, η) = (1 + ξ )(1 − η)(ξ − η − 1) (8.1.4)
4
4 7 (0,1) 3
(-1,1) (1,1)
8 6
(-1,0) 0 (1,0)
1 5 (0,-1) 2
(-1,-1) (1,-1)
8.1 Deep Learned Finite Elements 243
1
N3 (ξ, η) = (1 + ξ )(1 + η)(ξ + η − 1) (8.1.5)
4
1
N4 (ξ, η) = (1 − ξ )(1 + η)(−ξ + η − 1) (8.1.6)
4
1( )
N5 (ξ, η) = 1 − ξ 2 (1 − η) (8.1.7)
2
1 ( )
N6 (ξ, η) = (1 + ξ ) 1 − η2 (8.1.8)
2
1( )
N7 (ξ, η) = 1 − ξ 2 (1 + η) (8.1.9)
2
1 ( )
N8 (ξ, η) = (1 − ξ ) 1 − η2 (8.1.10)
2
These basis functions apparently satisfy the fundamental equations in the finite
element approximation as shown below.
Σ
8
Ni (ξ, η) = 1 (for arbitrary ξ, η) (8.1.11)
i=1
{
( ) 0 (i /= j)
Ni X j , Y j = (8.1.12)
1 (i = j)
The strain {ε} at any point in an element can be expressed using the nodal
displacement vector {U } as follows:
⎛ ⎞ ⎛ ∂u
⎞ ⎡
∂
⎤
εx ∂x ∂x
0 ( )
⎜ ∂v ⎟ ⎢ 0 ∂ ⎥ u
{ε} = ⎝ ε y ⎠ = ⎝ ∂y ⎠=⎣ ∂y ⎦ = [L]{u} = [L][N ]{U } = [B]{U }
∂u ∂v ∂ ∂ v
γx y ∂y
+ ∂x ∂y ∂x
(8.1.16)
where [B] is the strain–displacement matrix, and its components are shown as
⎡ ⎤ ⎡ ∂N ⎤
∂
∂x
0 [ ] ∂x
· · · ∂∂Nx8 0 · · · 0
1
⎢ ∂ ⎥ N1 · · · N8 0 · · · 0 ⎢ ∂N ∂N ⎥
[B] = [L][N ] = ⎣ 0 ∂y ⎦ = ⎣ 0 · · · 0 ∂ y1 · · · ∂ y8 ⎦
0 · · · 0 N1 · · · N8 ∂ N1
∂ ∂
∂y ∂x ∂y
· · · ∂∂Ny8 ∂∂Nx1 · · · ∂∂Nx8
(8.1.17)
The stress {σ } at any point in an element can also be expressed using the nodal
displacement vector {U } as follows:
⎛ ⎞ ⎛ ⎞
σx εx
{σ } = ⎝ σ y ⎠ = [D]⎝ ε y ⎠ = [D]{ε} = [D][L][N ]{U } = [D][B]{U }
τx y γx y
(8.1.18)
where [D] is the stress–strain matrix, which is defined by the Young’s modulus E
and the Poisson’s ratio ν for a two-dimensional isotropic elastic body. Note that [D]
is different between the plane stress and plane strain approximations.
In the case of plane stress approximation:
⎡ ⎤
1ν 0
E
[D] = ( )⎣ν 1 0 ⎦ (8.1.19)
1 − ν2
0 0 1−ν
2
(a) (b)
⎡ ⎤
1−ν ν 0
E ⎣ ν 1−ν
[D] = 0 ⎦ (8.1.20)
(1 + ν)(1 − 2ν) 1−2ν
0 0 2
Using the matrices described above, the element stiffness matrix is achieved as
∫
[ e]
k = [B]T [D][B]dv (8.1.21)
ve
where v e denotes that the integral is performed over the entire domain of the element.
Similar to what discussed for linear elements in Chap. 4, the degradation of
the accuracy of the element stiffness matrix due to the distortion of the element
shape also occurs for quadratic quadrilateral elements. Let’s consider the strain in
a quadratic quadrilateral element shown in Fig. 8.2a due to nodal displacements
{ } ( )T
U = U 1 , · · · , U 8 , · · · , V 1 , · · · , V 8 , . The strain at an arbitrary position in the
element can be expressed by Eq. (8.1.16), but the result obtained may contain some
error if the element geometry is distorted.
Then, we discuss how we can obtain an accurate strain. One way is to divide
the element into many smaller elements as shown in Fig. 8.2b. For example, if the
original quadratic quadrilateral element is divided into 50 × 50 linear quadrilateral
elements, the strains in each element may be obtained with high accuracy. The sizes
of the elements in this case are considered to be sufficiently small that the linear
element would suffice, and the mesh of Fig. 8.2b is called the reference model of the
element of Fig. 8.2a.
Each nodal point on the periphery of the reference model is loaded by displace-
ments interpolated from those of nodal points of the original quadratic quadrilateral
element using quadratic basis functions and the strain values at arbitrary locations
within the element can be accurately calculated. Note that those calculated by the
reference model depend on the Poisson’s ratio, but not on Young’s modulus.
246 8 Further Applications with Deep Learning
It has been reviewed so far that the distortion of the element shape affects the
accuracy of the element stiffness matrix even for quadratic quadrilateral elements,
and that the reference model can be used to calculate the correct strain field. With
the reference model, a method for obtaining a highly accurate strain–displacement
matrix [B] by deep learning will be discussed in what follows. In preparation, the
properties of the strain–displacement matrix [B] are now studied in detail.
As shown in Eq. (8.1.17), the strain–displacement matrix [B] of a quadratic
quadrilateral element is of 3 rows and 16 columns, which can be written as
⎡ ⎤
b1,1 b1,2 b1,15 b1,16
[B] = ⎣ b2,1 b2,2 · · · b2,15 b2,16 ⎦ (8.1.22)
b3,1 b3,2 b3,15 b3,16
Here, let the displacements in the element be those of rigid body due to translation,
( )T
i.e., (U1 , V1 )T = · · · = (U8 , V8 )T = U , V . In this case, the strain values in the
element must be zero. Therefore, we have the following equation.
⎛ ⎞
U
⎜ . ⎟
⎛ ⎞ ⎡ ⎤⎜ .
⎜ .
⎟
⎟
0 b1,1 b1,2 b1,15 b1,16 ⎜ ⎟
⎝ ⎠ ⎣
{ε} = 0 = [B]{U } = b2,1 b2,2 · · · b2,15 b2,16 ⎦⎜
⎜
U ⎟
⎟
⎜V ⎟
0 b3,1 b3,2 b3,15 b3,16 ⎜ ⎟
⎜ .. ⎟
⎝ . ⎠
V
⎛ ⎞
b1,1 + b1,2 + b1,3 + b1,4 + b1,5 + b1,6 + b1,7 + b1,8
= ⎝ b2,1 + b2,2 + b2,3 + b2,4 + b2,5 + b2,6 + b2,7 + b2,8 ⎠U
b3,1 + b3,2 + b3,3 + b3,4 + b3,5 + b3,6 + b3,7 + b3,8
⎛ ⎞
b1,9 + b1,10 + b1,11 + b1,12 + b1,13 + b1,14 + b1,15 + b1,16
+ ⎝ b2,9 + b2,10 + b2,11 + b2,12 + b2,13 + b2,14 + b2,15 + b2,16 ⎠V (8.1.23)
b3,9 + b3,10 + b3,11 + b3,12 + b3,13 + b3,14 + b3,15 + b3,16
Since U and V are independent of each other and Eq. (8.1.23) holds for any
( )T
U , V , we have
This suggests that components of the strain–displacement matrix [B] are not inde-
pendent of each other; for example, b1,8 can be obtained from the other components
by b1,8 = −b1,1 − b1,2 − b1,3 − b1,4 − b1,5 − b1,6 − b1,7 , leading to the result that the
number of independent components of the strain–displacement matrix [B] is reduced
to 3 × 14.
{ }
Poisson(i) = ν i (8.1.26)
x
1 2
248 8 Further Applications with Deep Learning
{ i i i i
}
Disp(i) = U 1 , · · · , U 8 , V 1 , · · · , V 8 (8.1.27)
Next, a set of the exact values of strains at the integration points of the original
element, Strain(i), is calculated by the finite element analysis using the reference
model created for each element, which is assumed to be a finite element model
prepared by dividing the original element into 50 × 50 first-order quadrilateral
elements. The number of integration points is set to three per axis or nine per element.
Strain(i), which consists of three values per integration point, is defined as
{1 }
Strain(i) = ε̂xi , 1 ε̂iy , 1 γ̂xi y , 2 ε̂xi , 2 ε̂iy , 2 γ̂xi y , · · · , 9 ε̂xi , 9 ε̂iy , 9 γ̂xi y (8.1.28)
where the superscripts at the left shoulders of the components are the numbers
of integration points, and the superscripts on the right shoulders those of element
settings.
Thus, a set of data {Shape(i), Shape(i), Disp(i), Strain(i)} are obtained for each
element. We generate 300,000 sets of data as training patterns, and 30,000 sets of
data for validation of the trained network in addition.
Here, the sets of data collected above are used to construct a feedforward neural
network. As the input data to the feedforward neural network, the elemental shape
parameters and the Poisson’s ratio are employed, i.e.
Input data : Shape(i), Poisson(i)
On the other hand, as the teacher data of the feedforward neural network, the
strain–displacement matrix [g B] at the integration points (9 points in total) is used,
i.e.,
Teacher data : [g B](g = 1, · · · , 9)
Since each [g B] has 3 × 14 independent components, the number of parameters
in one teacher data is 378. Thus, the feedforward neural network to be constructed
should have 5 units in the input layer and 378 units in the output layer.
The error function to be minimized is assumed as follows:
(| | | | | |)
1 ΣΣ
N 9
| g εxi −g ε̂xi | || g εiy −g ε̂iy || || g γxi y −g γ̂xi y ||
E= wg · || g i |+|
| | g ε̂i |+| | (8.1.29)
9· N ε̂x y | | g γ̂ i
xy |
i=1 g=1
where N is the number of training patterns, which is 300,000 in this case, wg the
weight of the numerical integration at the g-th integration point, which is a constant
given as a product of the weights in each axis direction, and g ε̂xi , g ε̂iy and g γ̂xi y the
accurate strain values calculated by the reference model given as Strain(i). g εxi , g εiy
8.1 Deep Learned Finite Elements 249
and g γxi y are the strain values calculated from [g B] output by the neural network and
{ i i i i
}
Disp(i) = U 1 , · · · , U 8 , V 1 , · · · , V 8 , obtained by
i
⎞ ⎛
U1
⎜ . ⎟
⎛ ⎞ ⎜ . ⎟
⎜ . ⎟
εx
g i ⎜ i⎟
⎜ g i ⎟ [g ]⎜ U8 ⎟
⎝ εy ⎠ = B ⎜ ⎟
⎜ Vi ⎟ (8.1.30)
γx y
g i ⎜ ⎟
⎜ .1 ⎟
⎜ . ⎟
⎝ . ⎠
i
V8
Now, a feedforward neural network is trained by the error back propagation algo-
rithm to output [g B] so that the correct strain values are obtained at the nine integration
points. Then, the numerical integration is performed to obtain an accurate element
stiffness matrix with [g B] output by the trained neural network above.
The feedforward neural network used here has five hidden layers and 378 units
per hidden layer. In each hidden layer, the batch normalization (see Sect. 1.3.5) is
employed, while ELU (Exponential Linear Unit) as the activation function is used,
which is given by
{
x (x ≥ 0)
f (x) = (8.1.31)
a(e x − 1) (x < 0)
The trained neural network above is used to calculate the stiffness matrix for a new
element with high accuracy. The procedure is summarized as follows:
1. Standard arrangement: Convert a new element to the standard arrangement (See
Sect. 8.1.2.1). [ ]
g
2. Calculation[of OUT ] B : Estimate [g B] using the trained neural network, which is
g
denoted as OUT B . [g ]
3. Calculation of : Since DL B is the matrix for the element converted to the
[standard
g ] configuration, it is re-converted to the matrix for the original element
B .
DL [g ] [g ]
4. Calculation of DL B : Improve DL B by the B-bar method [7, 19] to obtain
[g ]
DL B .
250 8 Further Applications with Deep Learning
[g ]
5. Calculation of [DL K ]: Calculate the element stiffness matrix [DL K ] using DL B .
Let’s look at each of these steps in order.
Standard arrangement:
The target element is converted to the standard arrangement as shown in Fig. 8.4.
First, four corner nodes of the element are numbered from 1 to 4 in a counter-
clockwise manner. Note the nodes at both ends of the longest edge are numbered 1
and 2, respectively. Then, the element is translated so that the first node is located
at the origin.
Then, α being the angle measured counterclockwise between the longest edge
of the element and the positive direction of the x-axis, the element is rotated
clockwise by α around the origin. Thus, the longest edge is placed along the x-
axis. Finally, the element size is proportionally increased or decreased so that the
length of the longest edge is adjusted to be 1 or the second node is located at (1,0).
When the original coordinates of the i-th node are X i = (X i , Yi )T , the transformed
coordinates input X i are obtained by
1
input X = [R(α)]T (X i − X 1 ) (8.1.32)
lmax
3
y y
2
4
1
x x
0 0
y y
x x
0 1 0
where lmax is the length of the longest edge, and [R(α)] the matrix representing
the rotation by α counterclockwise around the origin, which is given as
[ ]
cos α − sin α
[R(α)] = (8.1.33)
sin α cos α
[g ]
Calculation of OUT B :
Inputting
[g ] input X 3 , input X 4 and the Poisson’s ratio to the trained neural network,
OUT B at 9 integration points are obtained.
[g ]
Calculation of DL B :
[g ]
Since OUT B is the matrix for the element in the standard
[g ] arrangement, it should
be converted to the matrix for the original element DL B using
[g ] 1 [g ]
DL B = [T ] OUT B [Q]T (8.1.34)
lmax
where [T ] is the transformation matrix of the strain and [Q] the rotation matrix
of the displacements (16 rows by 16 columns), which are, respectively, given by
⎡ ⎤
cos2 α sin2 α − sin α cos α
[T ] = ⎣ sin2 α cos2 α sin α cos α ⎦ (8.1.35)
2 sin α cos α −2 sin α cos α cos2 α − sin2 α
and
[ ]
cos α[I ] − sin α[I ]
[Q] = (8.1.36)
sin α[I ] cos α[I ]
[g ] [g ] [ '
]
DL B = DL B + DL B (8.1.37)
with
Σ
9
[g ]T [g ]|[ ]|
[DL K ] = wg DL B [D] DL B | g J | (8.1.39)
g=1
Let’s discuss an application of this method [9]. The problem is shown in Fig. 8.5.
Tested are the element division, both equal division with square elements as shown
in Fig. 8.6a and that with distorted elements as shown in Fig. 8.6b. The results are
shown in Fig. 8.7, where the horizontal axis is the logarithm of the element size and
the vertical axis the logarithm of the error. From the results of quadratic elements,
it can be seen that the solution obtained by this method (DL8 in the figure) is more
accurate than the normal element (Q8) and the modified element (P183) used in
ANSYS.
Fig. 8.6 Meshes used for block problem: a Regular meshes and b distorted meshes. Reprinted
from [9] with permission from Elsevier
8.2 FEA-Net
Convolutional neural networks are one of the most important key technologies for
deep learning. In this section, FEA-net [18] is studied, where the main operations in
the finite element method are represented by convolution operations and the analysis
is performed with multiple convolution layers.
First, we show how the matrix–vector product of the global stiffness matrix and the
displacement vector in the finite element method can be expressed by convolution
operations.
Consider a two-dimensional stress analysis of an object that is evenly divided
into quadrilateral elements of the first order as shown in Fig. 8.8. The finite element
method for this problem is given as
[K ]{U } = { f } (8.2.1)
254 8 Further Applications with Deep Learning
where [K ] is the global stiffness matrix and {U } the vector of displacements of all
the nodes.
Let the element nodal vector of the e-th quadrilateral element of the first order be
represented as
⎛ ⎞
U1e
⎜ V1e ⎟
{ e} ⎜
⎜ ⎟
⎟
U = ⎜ ... ⎟, (8.2.2)
⎜ ⎟
⎝Ue ⎠
4
V4e
Figure 8.9 shows a part of Fig. 8.8, where the four elements sharing a node and their
element node numbers are depicted. Similarly, Fig. 8.10 shows the two-dimensional
location index of nodes.
From these two figures, the relations between Ui, j , the displacement in the x-
direction in the two-dimensional configuration, and Uie , that in each element, are
given as
256 8 Further Applications with Deep Learning
4 3 4 3
1 2 1 2
4 3 4 3
1 2 1 2
Similar relations hold for the displacements in the y-direction Vi, j and Vie .
Since the global stiffness matrix is represented by the sum of all the element stiff-
ness matrices [k e ], the matrix–vector product [K ]{U } in Eq. (8.2.1) can be calculated
by evaluating [k e ]{U e } element by element and summing the contributions from each
element. Here, we write the matrix–vector product [K ]{U } as
⎛ . ⎞ ⎛ . ⎞
. ..
⎜ . ⎟ ⎜ U ⎟
⎜ Ui, j ⎟ ⎜ gi, j ⎟
[K ]⎜ ⎟ ⎜
⎜ Vi, j ⎟ = ⎜ g V ⎟
⎟ (8.2.5)
⎝ ⎠ ⎝ i, j ⎠
.. ..
. .
Using Eq. (8.2.6) to calculate each term on the right-hand side of gi,U j , and then
rearranging based on nodal relations as Eq. (8.2.6), the following equations are
obtained.
3 a
gu = k51
a
U1a + k52a
V1a + k53a
U2a + k54a
V2a + k55
a
U3a + k56
a
V3a + k57
a
U4a + k58
a
V4a
= k51 Ui+1, j−1 + k52 Vi+1, j−1 + k53 Ui+1, j + k54 Vi+1, j
a a a a
+ k55a
Ui, j + k56
a
Vi, j + k57
a
Ui, j−1 + k58
a
Vi, j−1 (8.2.8)
258 8 Further Applications with Deep Learning
4 b
gu = k71
b
U1b + k72
b
V1b + k73
b
U2b + k74
b
V2b + k75
b
U3b + k76
b
V3b + k77
b
U4b + k78
b
V4b
= k71
b
Ui+1, j + k72
b
Vi+1, j + k73
b
Ui+1, j+1 + k74
b
Vi+1, j+1 + k75
b
Ui, j+1
+ k76
b
Vi, j+1 + k77
b
Ui, j + k78
b
Vi, j (8.2.9)
2 c
gu = k31
c
U1c + k32
c
V1c + k33
c
U2c + k34
c
V2c + k35
c
U3c + k36
c
V3c + k37
c
U4c + k38
c
V4c
= k31
c
Ui, j−1 + k32
c
Vi, j−1 + k33
c
Ui, j + k34
c
Vi, j + k35
c
Ui−1, j + k36
c
Vi−1, j
+ k37 Ui−1, j−1 + k38 Vi−1, j−1
c c
(8.2.10)
1 d
gu = k11
d
U1d + k12
d
V1d + k13
d
U2d + k14
d
V2d + k15
d
U3d + k16
d
V3d + k17
d
U4d + k18
d
V4d
= k11
d
Ui, j + k12
d
Vi, j + k13
d
Ui, j+1 + k14
d
Vi, j+1 + k15
d
Ui−1, j+1
+ k16
d
Vi−1, j+1 + k17
d
Ui−1, j + k18
d
Vi−1, j (8.2.11)
Summing up Eqs. (8.2.8) to (8.2.11) and rearranging by Ui, j and Vi, j , gi,U j can be
expressed as
( c ) ( a )
gi,U j = k37
c
Ui−1, j−1 + k35 + k17
d
Ui−1, j + k15
d
Ui−1, j+1 + k57 + k31
c
Ui, j−1
( a ) ( b )
+ k55 + k77 + k33 + k11 Ui, j + k75 + k13 Ui, j+1
b c d d
( a )
+ k51a
Ui+1, j−1 + k53 + k71b
Ui+1, j + k73b
Ui+1, j+1
( ) ( a )
+ k38 Vi−1, j−1 + k36 + k18 Vi−1, j + k16 Vi−1, j+1 + k58
c c d d
+ k32c
Vi, j−1
( a ) ( b )
+ k56 + k78 + k34 + k12 Vi, j + k76 + k14 Vi, j+1
b c d d
( a )
+ k52a
Vi+1, j−1 + k54 + k72
b
Vi+1, j + k74
b
Vi+1, j+1 (8.2.12)
[ ] [ ]
where WUU and WVU are, respectively, given as follows:
⎡ ⎤
[ ] k37 k35 + k17 k15
WU = ⎣ k57 + k31 k55 + k77 + k33 + k11 k75 + k13 ⎦
U
(8.2.14)
k51 k53 + k71 k73
⎡ ⎤
[ U] k38 k36 + k18 k16
WV = ⎣ k58 + k32 k56 + k78 + k34 + k12 k76 + k14 ⎦ (8.2.15)
k52 k54 + k72 k74
8.2 FEA-Net 259
Note that the element stiffness matrix of each element is the same for an equally
divided mesh, so the superscripts of the element numbers are omitted in Eqs. (8.2.14)
and (8.2.15).
Similarly, gi,V j is expressed by the following equation.
⎡ ⎤ ⎡ ⎤
[ ] Ui−1, j−1 Ui−1, j Ui−1, j+1 [ V] Vi−1, j−1 Vi−1, j Vi−1, j+1
gi,V j = WUV
⎣ Ui, j−1 Ui, j Ui, j+1 ⎦ + WV ⎣ Vi, j−1 Vi, j Vi, j+1 ⎦
Ui+1, j−1 Ui+1, j Ui+1, j+1 Vi+1, j−1 Vi+1, j Vi+1, j+1
(8.2.17)
[ ] [ ]
where WUV and WVV are, respectively, given as
⎡ ⎤
[ ] k47 k45 + k27 k25
WU = ⎣ k67 + k41 k65 + k87 + k43 + k21 k85 + k23 ⎦
V
(8.2.18)
k61 k63 + k81 k83
⎡ ⎤
[ V] k48 k46 + k28 k26
WV = ⎣ k68 + k42 k66 + k88 + k44 + k22 k86 + k24 ⎦ (8.2.19)
k62 k64 + k82 k84
As described above, the matrix–vector product of the global stiffness matrix and
the nodal displacement vector [K ]{U }, which appears in the finite element analysis,
] calculated by a convolution operation using some 3 × 3 matrices such as
[canUbe
WU as a filter. In the original article [18], this convolution operation is called
FEA-Convolution.
As shown in Sect. 8.2.1, the stress analysis by the finite element method is equivalent
to solving the following simultaneous linear equations.
[K ]{U } = { f } (8.2.20)
Among the iterative methods, the Jacobi method is considered one of the most
basic ones ] where the coefficient matrix [K[ ] is divided
[ [8], ] into the diagonal-only
matrix K D and the off-diagonal-only matrix K N D as
[ ] [ ]
[K ] = K D + K N D (8.2.22)
Input
FEA-Conv
Output
Fig. 8.12 Training and test data for FEA-Net. Reprinted from [18] with permission from Elsevier
Training Phase:
Learning in FEA-Net is not performed on such a whole network as that in Fig. 8.11,
but defined to be the optimization of the filters in the convolution layers that are
the building blocks of the FEA-net. For this reason, a network with one convolution
layer, which can be called FEA-Conv network, is trained using the input data as the
displacement image and the teacher data as the load image. After the training of
the FEA-Conv network, the FEA-Net is constructed using the obtained filters. The
number of layers of the FEA-Net used in this example is 5000.
For comparison, a general CNN consisting of seven convolution layers are also
trained, where a load image is used as the input data and two displacement images
as teacher data.
Application Phase:
Figure 8.13 shows the estimated results for the validation data using the trained FEA-
Net and those using the trained CNN for comparison. The results from the ordinary
finite element analysis are also shown as the reference response. It can be seen that
FEA-Net estimates well the displacement field more accurately than ordinary CNNs.
8.3 DiscretizationNet
In Chap. 7, a method to predict an unsteady flow field using deep learning has been
studied. In this section, another example applying deep learning to a flow field is
taken, which is DiscretizationNet [13], an application of a new deep learning model
called the generative model (Sect. 1.3.7) to the fluid analysis.
The continuity equation and the Navier–Stokes equation (dimensionless) for the
steady-state flow field are known to be, respectively, written as follows:
8.3 DiscretizationNet 263
Fig. 8.13 Prediction of displacement field using FEA-Net. Reprinted from [18] with permission
from Elsevier
∇ ·v =0 (8.3.1)
1 2
(v · ∇)v + ∇ p − ∇ v=0 (8.3.2)
Re
where v and p are the non-dimensionalized velocity vector and pressure, respectively,
and Re is the Reynolds number.
The deep learning model used in this section is named DiscretizationNet [13],
the conceptual diagram of which is shown in Fig. 8.14. This is an application of the
conditional variational autoencoder [10, 15], which is one of the generative models.
Training processes in DiscretizationNet are summarized as follows:
(T1) The information on the shape of analysis domain is denoted by h, which is
expressed by the level set method and takes the values 0 for the inside of the
shape and 1 for the outside. The Geometry autoencoder shown in Fig. 8.15
outputs ĥ when h is input, and the compressed information ηh of h is obtained
when trained to reproduce h with setting ĥ = h.
(T2) The information on the boundary condition is denoted by b, and the boundary
autoencoder shown in Fig. 8.16 outputs b̂ when b is input, and the compressed
information ηb of b is obtained when trained to reproduce b with setting
b̂ = b. The boundary autoencoder is unnecessary for boundary conditions
not changing with time and space.
If a boundary condition is set at each side of a quadrilateral region as shown
in Fig. 8.17, and the boundary condition does not change with space and time,
264 8 Further Applications with Deep Learning
Fig. 8.14 Schematic diagram of DiscretizationNet for Navier–Stokes solution. Reprinted from [13]
with permission from Elsevier
Encoder Decoder
Encoder Decoder
0.0
0.3 1.2
3.0
Dirichret B. C.
then ηb can be simply defined to be ηb = {1, 1, 2, 1, 0.3, 1.2, 0.0, 3.0, 40} for
example, where the last value of 40 is the Reynolds number.
(T3) The velocity vectors and pressures u, v, w and p are initialized with random
numbers.
(T4) u, v, w and p are input to the CNN encoder and the compressed information
η is obtained.
(T5) ηh , ηb and η are input to the CNN decoder and û, v̂, ŵ and p̂ are obtained as
output.
(T6) Using û, v̂, ŵ, p̂,h and b, the residual L train in Eqs. (8.3.1) and (8.3.2) is calcu-
lated. If û, v̂, ŵ and p̂ are correct, the residual will be zero. The residual L train
is expressed by the following equation.
Fig. 8.18 Schematic diagram of DiscretizationNet for new geometry and boundary conditions.
Reprinted from [13] with permission from Elsevier
(A6) If L Appli is small enough, finish with the solutions û NEW , v̂ NEW , ŵ NEW
and p̂ NEW obtained as the output of the CNN decoder of the trained
DiscretizationNet for ηhNEW , ηbNEW and η̂NEW as input. If not, go to (A7).
(A7) Return to (A3) with η̂NEW → ηNEW .
Note that ηhNEW , ηbNEW and the parameters of CNN encoder and decoder of Discretiza-
tionNet are fixed in the process from (A3) to (A7). Note also that the iterations from
(A3) to (A7) will converge within 10 iterations for the well-trained DiscretizationNet
[13].
8.3 DiscretizationNet 267
DiscretizationNet is numerically tested here. The analysis target is shown in Fig. 8.19,
where the flow field from left to right is analyzed, and the circular cylinder serves
as an obstacle. The boundary conditions are the flow velocity at the inlet and the
pressure at the outlet. Though a similar flow field has been treated in Chap. 7, where
the analysis has been done focusing on the unsteady turbulent flow phenomena due
to the high Reynolds number, while this analysis focuses on the steady analysis of the
laminar flow phenomena due to the low Reynolds number. The number of elements
in the figure is that of the finite volume method [2, 6] employed for comparison.
Data Preparation Phase:
Analyses are performed for comparison for five different velocity inlets (0.2, 0.4,
0.6, 0.8, and 1.0) and three different Reynolds numbers (10, 20, and 40).
Training Phase:
A single DiscretizationNet is constructed by the training (T1) to (T9) above for 15
combinations of velocity inlets (0.2, 0.4, 0.6, 0.8, and 1.0) and Reynolds numbers
(10, 20, and 40).
The DiscretizationNet constructed here consists of three convolution layers for
both CNN encoder and CNN decoder, and 64 filters are used in each convolution
layer. The number of training epochs is set to 30,000, and a GPU (NVIDIA Tesla
V100 SXM2) is used for training.
The trained DiscretizationNet is considered to give good results comparable
to those of the comparison analysis (using ANSYS Fluent R19.3) for all 15
conditions.
Application Phase:
The trained DiscretizationNet is applied to new boundary conditions not included
in the training data, where the velocity inlet is set to 0.5 and the Reynolds number
4 Elements
Cylinder
Velocity Pressure
inlet 128 Elements outlet
320 Elements
is selected from 10, 20, and 40. For each of these new boundary conditions, the
inference process from (A1) to (A7) above is performed to obtain the flow field.
The results are shown in Fig. 8.20, where the left column is the results of
the velocity field estimation by DiscretizationNet, the middle column those calcu-
lated using ANSYS Fluent, and the right column the error in the estimation by
DiscretizationNet (the difference from the results by ANSYS). It is concluded that
the estimation by DiscretizationNet is very accurate.
Fig. 8.20 Velocity magnitude at different Reynolds numbers. (A-1) DiscretizationNet, (A-2)
ANSYS Fluent, (A-3) Difference between (A-1) and (A-2). Reprinted from [13] with permission
from Elsevier
8.4 Zooming Method for Finite Element Analysis 269
The schematic diagram of a zooming method is shown in Fig. 8.21, where the original
analysis geometry has fillets at the corners where stresses would be concentrated,
and a mesh accurately reproduce the fillet part with small elements is called a global
fine model. Since the computational load of this global fine model is very high, the
original analysis geometry is divided into two parts to reduce the computational load:
a global coarse model and a local fine model.
In the global coarse model, the analysis domain is divided by large elements with
no regard to the fillet area, and a simplified material model is used. On the other hand,
in the local fine model, only the fillet and its vicinity are divided by fine elements
and a detailed material model is used. First, the global coarse model is analyzed, and
then using its results (displacements) as a boundary condition, the local fine model
is analyzed to obtain the solution.
There remains a problem to be considered when using the results (displacements)
of the global coarse model analysis as the boundary conditions for the local fine
model analysis as discussed below. Figure 8.22 shows a region around the local fine
model. The displacements at each node of the global coarse model are obtained from
the analysis of the global coarse model. In the zooming method, the displacements
are used for the boundary conditions (displacements) at the nodes on the periphery
of the local fine model.
If the nodes of the local fine model are at the periphery or inside of the global
coarse model, the displacements of the nodes of the local fine model can be obtained
by interpolating using the basis functions of the global coarse model. However, as
shown in Fig. 8.22, some nodes of the local fine model exist outside the global coarse
model, where the displacements at these nodes cannot be obtained by interpolation
using the basis functions of the global coarse model.
For each node of the local fine model outside the global coarse model, its nearest
element of the global coarse model is searched and the displacements of the node are
obtained by extrapolation using the displacements of nodes of the element with the
270 8 Further Applications with Deep Learning
Global Model
Original Model
(Whole Model)
Local Model
basis functions of the element. But, this method is known to degrade the accuracy of
extrapolated displacements used in the local fine model.
To solve this problem, a method for estimating nodal displacements of local fine
model nodes outside the global coarse model using a feedforward neural network is
proposed [17].
The zooming method using feedforward neural networks above consists of the
following three phases.
Data Preparation Phase:
Finite element analysis using the global coarse model is performed for a given
analysis condition. From the each analysis result on the i-th node of the global
coarse model that( is within the) region of the local fine model, ( aG pairG of Gthe
)
node{(coordinates )X(iG , YiG , Z iG and)} the nodal displacements Ui , Vi , Wi ,
i.e. X iG , YiG , Z iG , UiG , ViG , WiG , is obtained. In this manner, a lot of data
pairs are collected.
Training Phase:
A feedforward neural network is trained using the data pairs collected in the Data
Preparation Phase, where input and teacher data for the neural network are set as
follows:
( )
Input data : X iG , YiG , Z iG
( )
Teacher data : UiG , ViG , WiG
Application Phase:
( )
The coordinates X iL , YiL , Z iL of a node in the local fine model
( are input) to the
trained feedforward neural network, and the displacements UiL , Vi L , WiL at the
node are obtained as the output, and then the local fine model is analyzed using
these as the boundary conditions.
Fig. 8.23 Analysis domain. Reprinted from [17] with permission from Elsevier
{( ) ( )}
pairs X iG , YiG , Z iG , UiG , ViG , WiG are obtained. Of these, 70% are used as the
training data and the rest as the data for verification of the generalization capability.
Training Phase:
The training data collected in the Data Preparation Phase are used to train the feed-
forward neural network. The structure of the feedforward neural network employed
is given as follows:
( )
Input layer: 3 units for X iG , YiG , Z iG .
Hidden layers: Several structures are tested: 1, 2, 3, 4 or 5 layers, and 10,50 and
100 units per hidden layer.
( )
Output layer: 3 units for UiG , ViG , WiG .
Figure 8.24 shows the training results. The horizontal axis is the number of layers
in the hidden layer, and the vertical axis the error for the validation data. From the
figure, it is seen that the error is the lowest when using three hidden layers with the
ReLU function.
Application Phase:
The trained feedforward neural network with three hidden layers and ReLU as
the activation function is used to determine the boundary condition (displacement)
of the local fine model. The performance of this method is evaluated by the value
of the von Mises stress obtained from the analysis of the local fine model. The von
Mises stress is an index often used in the strength analysis and given as follows [1,
14]:
8.4 Zooming Method for Finite Element Analysis 273
Fig. 8.24 Effects of network hyperparameters. Reprinted from [17] with permission from Elsevier
√ {
1 ( )2 ( )2 ( )}
σMises = σx x − σ yy + σ yy − σzz + (σzz − σx x )2 + 6 τx2y + τ yz
2 + τ2
zx
2
(8.4.1)
First, the von Mises stress values are calculated by the zooming method using the
feedforward neural network with the procedure as follows:
( )
(A1) Input the coordinates X iL , YiL , Z iL of the periphery
( node )of the local fine
model to the trained neural network, and let UiL , ViL , WiL obtained as the
output of the neural network to be the boundary condition (fixed displacement)
of the local fine model.
(A2) Perform the analysis of the local fine model and calculate the von Mises stress
at each node of the local fine model.
(A3) Find the maximum value among the calculated von Mises stresses and set it
as σMises
A
.
For comparison, the maximum value of the von Mises stresses is calculated also
by another procedure as follows:
( )
(B1) For the coordinates X iL , YiL , Z iL of the outer periphery node of the local fine
(model, identify) the nearest element of the global coarse model, and extrapolate
UiL , Vi L , WiL from the displacements at the nodes of the element using the
shape (basis) function of the global coarse model, and set it as the boundary
condition (fixed displacement) of the local fine model.
(B2) Perform the analysis of the local fine model and calculate the von Mises stress
at each node of the local fine model.
(B3) Find the maximum value among the calculated von Mises stresses and set it
as σMises
B
.
274 8 Further Applications with Deep Learning
In addition, the maximum value of the von Mises stresses is calculated by another
procedure as follows:
(C1) Create a global fine model dividing the entire region including the fillet into
elements with the same fineness as the local fine model.
(C2) Perform the analysis of the global fine model above and calculate the von
Mises stress at each node in the same region as the local fine model.
(C3) Find the maximum value among the calculated Mises stresses and set it as
σMises
C
. This value is considered to be close to the correct one.
The performance of the present method is evaluated from the comparison among
the three von Mises stress values σMises
A
, σMises
B
and σMises
C
.
The results for various radii of curvature of the fillet are shown in Fig. 8.25. The
horizontal axis is the radius of curvature of the fillet, and the vertical axis is the
maximum von Mises stress values. The results clearly show that σMises C
≈ σMises
A
<
σMises , indicating that the zooming method using the feedforward neural network is
B
Fig. 8.25 Performance of zooming method versus radius of fillet. Reprinted from [17] with
permission from Elsevier
8.5 Physics-Informed Neural Network 275
Strain–displacement equation:
⎛ ⎞ ⎡ ∂ ⎤
εx ∂x
0 ( )
⎝ εy ⎠ = ⎢ ∂ ⎥ u
⎣ 0 ∂y ⎦ (8.5.4)
∂ ∂ v
γx y ∂y ∂x
where λ and μ are the Lame’s constants, which have the following relationship with
the Young’s modulus E and the Poisson’s ratio ν.
μ(3λ + 2μ) λ
E= , ν= (8.5.6)
λ+μ 2(λ + μ)
Now, consider the analysis domain and boundary conditions as shown in Fig. 8.26
[4]. The external load is given as follows:
{
(λ + 2μ)Q sin(π x) (on y = 1)
T y (x, y) = (8.5.8)
0 (elsewhere)
{ }
f y (x, y) = λ −3 sin(π x)Qy 2 + 2π 2 sin(2π x) cos(π y)
{ }
1 2
+ μ −6 sin(π x)Qy + 2π sin(2π x) cos(π y) + π sin(π x)Qy
2 2 4
4
(8.5.10)
1
v(x, y) = sin(π x)Qy 4 (8.5.12)
4
Figure 8.27 illustrates the solution for λ = 1.0, μ = 0.5 and Q = 4.0.
Prepare a large number of sampling points in the analysis domain, and calculate
the coordinates x ∗ and y ∗ , displacements u ∗ and v ∗ , and stresses σx∗ , σ y∗ and τx∗y , at
each sampling point. For the data obtained in this way, a physics-informed neural
network is constructed as follows [4]:
Input data: Coordinates of a sampling point inside the region, x ∗ and y ∗ .
Teacher data: Displacements u ∗ and v ∗ and stresses σx∗ , σ y∗ and τx∗y at the sampling
point.
Fig. 8.27 Solution for parameter values of λ = 1.0, μ = 0.5, Q = 4.0. Reprinted from [4] with
permission from Elsevier
278 8 Further Applications with Deep Learning
With L1 , the error in a normal neural network, L2 , the error specific to the physics-
informed neural network, which is related to the equilibrium equations, and L3 , which
is related to the setup and configuration equation, the error function L is defined as
L = L1 + L2 + L3 (8.5.13)
where
| | | | | | | | | |
L1 = |u N N − u ∗ | + |v N N − v ∗ | + |σxN N − σx∗ | + |σ yN N − σ y∗ | + |τxNy N − τx∗y |
(8.5.14)
| | | |
| ∂σ N N ∂τxNy N | | ∂τ N N ∂σ yN N |
| | | xy |
L2 = | x + + f x∗ | + | + + f y∗ | (8.5.15)
| ∂x ∂y | | ∂x ∂y |
| |
L3 = |(λ + 2μ)εxN N + λε yN N − σxN N |
| |
+ |(λ + 2μ)ε yN N + λεxN N − σ yN N |
| |
+ |μγxNy N − τxNy N |a (8.5.16)
Note that () N N means the output of the neural network, while εxN N , ε yN N and
γxNy Nare not the output of the neural network but are quantities calculated from the
displacements u N N and v N N based on Eq. (8.5.4) as
∂u N N ∂v N N ∂u N N ∂v N N
εxN N = , ε yN N = , γxNy N = + (8.5.17)
∂x ∂y ∂y ∂x
The derivative values appearing in Eqs. (8.5.15) and (8.5.17) are those of the
output of the neural network with respect to the input value, which are obtained by the
automatic differentiation implemented in the deep learning library (see Sect. 1.3.8).
If highly accurate values of f x∗ and f y∗ in Eq. (8.5.15) are not available at the
sampling points, they are alternatively obtained based on Eq. (8.5.1) as follows:
∂σx∗ ∂τx∗y
f x∗ = − −
∂x ∂y
∗ ∗
∂τ ∂σ
f y∗ = −
xy y
− (8.5.18)
∂x ∂y
The derivatives of right-hand sides of Eq. (8.5.18) can be calculated by using the
central difference approximation (see Sect. 7.2) for σx∗ , σ y∗ and τx∗y .
8.5 Physics-Informed Neural Network 279
Fig. 8.28 Physics-informed neural network for parameter identification. Reprinted from [4] with
permission from Elsevier
280 8 Further Applications with Deep Learning
Fig. 8.29 Surrogate modeling using physics-informed neural network. Reprinted from [4] with
permission from Elsevier
In addition, several combinations of the number of hidden layers and the number
of units per hidden layer are tested, showing that the combination of 5 hidden layers
and 50 units per hidden layer results in relatively good results. Note that tanh() is
used for the activation function here.
Application Phase:
As for the identification of the material constants λ and μ, the correct values of
λ = 1.0 and μ = 0.5 are obtained with very few epochs.
A comparison is also made employing a finite element solution instead of an
analytical solution. Dividing the analysis domain into 40 × 40 elements, solutions
using Lagrange elements of the first to fourth order are tested, and good results are
obtained for Lagrange elements of the second and higher orders.
Transfer learning has also been evaluated, and it has been confirmed that a neural
network that has completed training for λ = 1.0 and μ = 0.5 converges (comes
to be able to estimate the correct μ value) faster than learning from scratch when
performing training for patterns with different μ values.
Based on the above results, another physics-informed neural network has been
constructed. This network takes the coordinates x ∗ and y ∗ , and the material parameter
μ, as input, and outputs the displacements u N N and v N N , and the stresses σxN N , σ yN N ,
and τxNy N . The error function is the same as that in Eq. (8.1.13). λ = 1.0 is treated as
a constant. Patterns created for four different conditions, μ = 1/4, 2/3, 3/2 and 4,
are used for training.
The estimation accuracy of the trained neural network is evaluated for input of
various μ values including untrained ones. The results are shown in Fig. 8.29. Natu-
rally, the accuracy of estimation is high for μ = 1/4, 2/3, 3/2 and 4, which are
References 281
included in the training patterns, but it is seen that acceptable estimation is also
possible for other μ values not used in training, indicating that the physics-informed
neural network can be used as a kind of surrogate model.
References
1. Akin, J.E.: Finite Elements for Analysis and Design. Academic Press (1994).
2. Ferziger, J.H., Peric, M.: Computational Methods for Fluid Dynamics (Second Edition).
Springer (1999).
3. Golub, G.H., Van Loan, C.F.: Matrix Computations (Third Edition). The Johns Hopkins
University Press (1996).
4. Haghighat, E., Raissi, M., Moure, A., Gomez, H., Juanes, R.: A physics-informed deep learning
framework for inversion and surrogate modeling in solid mechanics. Comput. Methods Appl.
Mech. Eng. 379, 113741 (2021). https://doi.org/10.1016/j.cma.2021.113741.
5. Hirai, I., Wang, B.P., Pilkey, W.D.: An efficient zooming method for finite element analysis.
Int. J. Numer. Methods Eng. 20, 1671–1683 (1984). https://doi.org/10.1002/nme.1620200910.
6. Hirsh, C.: Numerical Computation of Internal and External Flows: The Fundamentals of
Computational Fluid Dynamics (Second Edition). Butterworth-Heinemann (2007).
7. Hughes, T.J.R.: The Finite Element Method: Linear Static and Dynamic Finite Element
Analysis. Dover (2000).
8. Jennings, A., McKeown, J.J.: Matrix Computations (Second Edition). John Wiley & Sons
(1992).
9. Jung, J., Yoon, K., Lee, P.-S.: Deep learned finite elements. Comput. Methods Appl. Mech.
Eng. 372, 113401 (2020). https://doi.org/10.1016/j.cma.2020.113401.
10. Kingma, D.P., Rezende, D.J., Mohamed, S., Welling, M.: Semi-supervised learning with deep
generative models. Adv. Neural Inf. Process. Sys. 27, 3581–3589 (2014).
11. Mao, K.M., Sun, C.T.: A refined global-local finite element analysis method. Int. J. Numer.
Methods Eng. 32, 29–43 (1991). https://doi.org/10.1002/nme.1620320103.
12. Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics-informed neural networks: A deep
learning framework for solving forward and inverse problems involving nonlinear partial
differential equations. J. Comput. Phys. 378, 686–707 (2019).
13. Ranade, R., Hill, C., Pathak, J.: DiscretizationNet: A machine-learning based solver for Navier–
Stokes equations using finite volume discretization. Comput. Methods Appl. Mech. Eng. 378,
113722 (2021). https://doi.org/10.1016/j.cma.2021.113722.
14. Simo, J.C., Hughes, T.J.R.: Computational Inelasticity. Springer (1998).
15. Sohn, K., Lee, H., Yan, X.: Learning Structured Output Representation using Deep Conditional
Generative Models. Adv. Neural Inf. Process. Sys. 28, 3483–3491 (2015).
16. Yamaguchi, T., Okuda, H.: Prediction of stress concentration at fillets using a neural network
for efficient finite element analysis. Mech. Eng. Lett. 6, 20–00318 (2020). https://doi.org/10.
1299/mel.20-00318.
17. Yamaguchi, T., Okuda, H.: Zooming method for FEA using a neural network. Comput. Struct.
247, 106480 (2021). https://doi.org/10.1016/j.compstruc.2021.106480.
18. Yao, H., Gao, Y., Liu, Y.: FEA-Net: A physics-guided data-driven model for efficient mechan-
ical response prediction. Comput. Methods Appl. Mech. Eng. 363, 112892 (2020). https://doi.
org/10.1016/j.cma.2020.112892.
19. Zienkiewicz, O. C., Taylor, R. L.: The Finite Element Method (5th Ed.) Volume 1: The Basis.
Butterworth-Heinemann (2000).
Part III
Computational Procedures
Chapter 9
Bases for Computer Programming
The finite element method, a major numerical analysis method, is composed of the
three processes as follows:
Preprocessing process to divide the analysis target into elements,
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 285
G. Yagawa and A. Oishi, Computational Mechanics with Deep Learning,
Lecture Notes on Numerical Methods in Engineering and Sciences,
https://doi.org/10.1007/978-3-031-11847-0_9
286 9 Bases for Computer Programming
where [D] is the stress–strain matrix and [B] the strain–displacement matrix. A
homogeneous and isotropic elastic material is assumed here.
This integral is usually calculated by the Gauss–Legendre quadrature as follows
(see Chapt. 4 for detail):
[ e] Σ
n Σ
m Σ
l
( T )
k ≈ [B] [D][B] · |J | | · Hi, j,k (9.1.2)
ξ = ξi
i=1 j=1 k=1
η = ηj
ζ = ζk
where
( n, )m, and l are the numbers of integration points in each axis direction,
ξi , η j , ζk and and Hi, j,k are, respectively, the coordinates and the weights at the
integration points. In the programs in this section, n = m = l is assumed for
simplicity.
In the following, the entire source code of the function esm3D08() that calcu-
lates the element stiffness matrix of an 8-node isoparametric hexahedral element
of the first order is shown, while Table 9.1 summarizes main variables and arrays
employed.
/* esm3D08.c */
void esm3D08(
int *elem,
double **node,
double *mate,
double **esm,
9.1 Computer Programming for Data Preparation Phase 287
int ngauss,
double *gc,
double *gw,
int nfpn)
{
int i,j,k,ii,jj,kk,counter,necm=6,nnpe=8,kdim=24;
double e,v,ee,det,coord[8][3],J[3][3],invJ[3][3], [6],
288 9 Bases for Computer Programming
D[6][6],s,t,u,ra,rs,sa,ss,ua,us,N[8][7],B[6][24],
DB[6][24];
double dtmp,ra2,rs2,ssus,ssua,saus,saua;
The above is the header part of the function. This function, taking element data
elem[], nodal data node[][], and material data mate[] as input data, calculates
Eq. (9.1.2) by using the Gauss–Legendre quadrature with ngauss integration points
per axis, and outputs the element stiffness matrix as esm[][].
The node data node[][] represents a two-dimensional array that stores the coor-
dinates of all the nodes in the entire analysis domain. For example, node[20][0]
contains the x-coordinate of the node with the global node number 20, and
node[35][2] the z-coordinate of the node with the global node number 35. The
element data elem[] contains the global numbers of the eight nodes that define an
element. Note that two kinds of node numbers are usually used in the finite element
method: the global node number, which is defined as the sequential number attached
to all the nodes in a whole analysis domain, and the element node number, which is
defined only in an element.
Let’s look at Fig. 9.1, where the left hand side of the figure is an element in
real space, and the number of each node is that of the global one. In the Gauss–
Legendre quadrature, the element is mapped to the local coordinate space as shown
in the right-hand side of the figure, where the integration process is performed. The
number attached to each node from 0 to 7 in the local coordinate space is that of the
element one, and the present program is based on this numbering system.
Note that the element data elem[] in esm3D08() must satisfy the rule that
nodes in an element are arranged in accordance with the ordering of the element node
number. For example, we, respectively, show allowable and not allowable cases for
the element shown in Fig. 9.1 as follows:
7
407 6
(1,1,1)
305
201 4 5
72
17
107
51 2
7 0 1
(-1,-1,-1)
Allowable:
Not allowable:
for(ii=0;ii<kdim;ii++){
for(jj=0;jj<kdim;jj++) esm[ii][jj] = 0.0;
}
for(i=0;i<nnpe;i++){
ii = elem[i];
for(j=0;j<nfpn;j++) coord[i][j] = node[ii][j];
}
for(i=0;i<necm;i++)
for(j=0;j<kdim;j++) B[i][j] = 0.0;
In the code above, using node[][] and elem[], only the coordinates of the
8 nodes belonging to the element are stored in coord[][], while esm[][] and
B[][] are cleared to zero.
e = mate[0];
v = mate[1];
for(i=0;i<necm;i++)
for(j=0;j<necm;j++) D[i][j] = 0.0;
ee = e*(1.0 - v)/(1.0 + v)/(1.0 - 2.0*v);
290 9 Bases for Computer Programming
D[0][0] = ee;
D[1][1] = ee;
D[2][2] = ee;
D[3][3] = ee*(1.0 - 2.0*v)/2.0/(1.0 - v);
D[4][4] = D[3][3];
D[5][5] = D[3][3];
D[0][1] = ee*v/(1.0 - v);
D[0][2] = D[0][1];
D[1][2] = D[0][1];
D[1][0] = D[0][1];
D[2][0] = D[0][2];
D[2][1] = D[1][2];
for(i=0;i<ngauss;i++){
s = gc[i];
for(j=0;j<ngauss;j++){
t = gc[j];
for(k=0;k<ngauss;k++){
u = gc[k];
This is the starting part of the triple-nested loop that calculates the contribution
of each integration point in turn in the Gauss–Legendre quadrature. The variables s,
t, and u correspond to the local coordinates ξ, η, and ζ , respectively. As for√gc[]
and√gw[], for example, when ngauss is 2, gc[0] and gc[1] are −1/ 3 and
1/ 3 , respectively, and gw[0] and gw[1] are both 1.0. (See Chap. 4 for detail.)
ra = (1.0 + s)*0.5;
rs = (1.0 - s)*0.5;
sa = (1.0 + t)*0.5;
ss = (1.0 - t)*0.5;
9.1 Computer Programming for Data Preparation Phase 291
ua = (1.0 + u)*0.5;
us = (1.0 - u)*0.5;
ssus = ss*us;
saus = sa*us;
ssua = ss*ua;
saua = sa*ua;
N[0][6] = rs*ssus;
N[1][6] = ra*ssus;
N[2][6] = ra*saus;
N[3][6] = rs*saus;
N[4][6] = rs*ssua;
N[5][6] = ra*ssua;
N[6][6] = ra*saua;
N[7][6] = rs*saua;
N[0][0] = -0.5*ssus;
N[1][0] = 0.5*ssus;
N[2][0] = 0.5*saus;
N[3][0] = -0.5*saus;
N[4][0] = -0.5*ssua;
N[5][0] = 0.5*ssua;
N[6][0] = 0.5*saua;
N[7][0] = -0.5*saua;
rs2 = 0.5*rs;
ra2 = 0.5*ra;
N[0][1] = -rs2*us;
N[1][1] = -ra2*us;
N[2][1] = ra2*us;
N[3][1] = rs2*us;
N[4][1] = -rs2*ua;
N[5][1] = -ra2*ua;
N[6][1] = ra2*ua;
N[7][1] = rs2*ua;
N[0][2] = -rs2*ss;
N[1][2] = -ra2*ss;
N[2][2] = -ra2*sa;
N[3][2] = -rs2*sa;
N[4][2] = rs2*ss;
N[5][2] = ra2*ss;
N[6][2] = ra2*sa;
N[7][2] = rs2*sa;
In the code above, the values of the basis functions at the integration point are
calculated, where N[i][6] denotes the value of the basis function Ni (ξ, η, ζ ),
which is defined as follows:
1
N[0] [6] = N0 (ξ, η, ζ ) = (1 − ξ )(1 − η)(1 − ζ ) (9.1.4)
8
292 9 Bases for Computer Programming
1
N[1] [6] = N1 (ξ, η, ζ ) = (1 + ξ )(1 − η)(1 − ζ ) (9.1.5)
8
1
N[2] [6] = N2 (ξ, η, ζ ) = (1 + ξ )(1 + η)(1 − ζ ) (9.1.6)
8
1
N[3] [6] = N3 (ξ, η, ζ ) = (1 − ξ )(1 + η)(1 − ζ ) (9.1.7)
8
1
N[4] [6] = N4 (ξ, η, ζ ) = (1 − ξ )(1 − η)(1 + ζ ) (9.1.8)
8
1
N[5] [6] = N5 (ξ, η, ζ ) = (1 + ξ )(1 − η)(1 + ζ ) (9.1.9)
8
1
N[6] [6] = N6 (ξ, η, ζ ) = (1 + ξ )(1 + η)(1 + ζ ) (9.1.10)
8
1
N[7] [6] = N7 (ξ, η, ζ ) = (1 − ξ )(1 + η)(1 + ζ ) (9.1.11)
8
N[i][0] in the code denotes the value of the partial derivative of the basis
function Ni (ξ, η, ζ ) with respect to ξ , which is defined as
∂ N0 (ξ, η, ζ ) 1
N[0] [0] = = − (1 − η)(1 − ζ ) (9.1.12)
∂ξ 8
∂ N1 (ξ, η, ζ ) 1
N[1] [0] = = (1 − η)(1 − ζ ) (9.1.13)
∂ξ 8
∂ N2 (ξ, η, ζ ) 1
N[2] [0] = = (1 + η)(1 − ζ ) (9.1.14)
∂ξ 8
∂ N3 (ξ, η, ζ ) 1
N[3] [0] = = − (1 + η)(1 − ζ ) (9.1.15)
∂ξ 8
∂ N4 (ξ, η, ζ ) 1
N[4] [0] = = − (1 − η)(1 + ζ ) (9.1.16)
∂ξ 8
∂ N5 (ξ, η, ζ ) 1
N[5] [0] = = (1 − η)(1 + ζ ) (9.1.17)
∂ξ 8
∂ N6 (ξ, η, ζ ) 1
N[6] [0] = = (1 + η)(1 + ζ ) (9.1.18)
∂ξ 8
∂ N7 (ξ, η, ζ ) 1
N[7] [0] = = − (1 + η)(1 + ζ ) (9.1.19)
∂ξ 8
9.1 Computer Programming for Data Preparation Phase 293
N[i][1] in the code denotes the vue of the partial derivative of the basis function
Ni (ξ, η, ζ ). with respect to η, which is defineds
∂ N0 (ξ, η, ζ ) 1
N[0] [1] = = − (1 − ξ )(1 − ζ ) (9.1.20)
∂η 8
∂ N1 (ξ, η, ζ ) 1
N[1] [1] = = − (1 + ξ )(1 − ζ ) (9.1.21)
∂η 8
∂ N2 (ξ, η, ζ ) 1
N[2] [1] = = (1 + ξ )(1 − ζ ) (9.1.22)
∂η 8
∂ N3 (ξ, η, ζ ) 1
N[3] [1] = = (1 − ξ )(1 − ζ ) (9.1.23)
∂η 8
∂ N4 (ξ, η, ζ ) 1
N[4] [1] = = − (1 − ξ )(1 + ζ ) (9.1.24)
∂η 8
∂ N5 (ξ, η, ζ ) 1
N[5] [1] = = − (1 + ξ )(1 + ζ ) (9.1.25)
∂η 8
∂ N6 (ξ, η, ζ ) 1
N[6] [1] = = (1 + ξ )(1 + ζ ) (9.1.26)
∂η 8
∂ N7 (ξ, η, ζ ) 1
N[7] [1] = = (1 − ξ )(1 + ζ ) (9.1.27)
∂η 8
And N[i][2] in the code denotes the value of the partial derivative of the basis
function Ni (ξ, η, ζ ) with respect to ζ , which is defined as
∂ N0 (ξ, η, ζ ) 1
N[0] [2] = = − (1 − ξ )(1 − η) (9.1.28)
∂ζ 8
∂ N1 (ξ, η, ζ ) 1
N[1] [2] = = − (1 + ξ )(1 − η) (9.1.29)
∂ζ 8
∂ N2 (ξ, η, ζ ) 1
N[2] [2] = = − (1 + ξ )(1 + η) (9.1.30)
∂ζ 8
∂ N3 (ξ, η, ζ ) 1
N[3] [2] = = − (1 − ξ )(1 + η) (9.1.31)
∂ζ 8
∂ N4 (ξ, η, ζ ) 1
N[4] [2] = = (1 − ξ )(1 − η) (9.1.32)
∂ζ 8
∂ N5 (ξ, η, ζ ) 1
N[5] [2] = = (1 + ξ )(1 − η) (9.1.33)
∂ζ 8
294 9 Bases for Computer Programming
∂ N6 (ξ, η, ζ ) 1
N[6] [2] = = (1 + ξ )(1 + η) (9.1.34)
∂ζ 8
∂ N7 (ξ, η, ζ ) 1
N[7] [2] = = (1 − ξ )(1 + η) (9.1.35)
∂ζ 8
for(ii=0;ii<nfpn;ii++){
for(jj=0;jj<nfpn;jj++){
J[ii][jj] = 0.0;
for(kk=0;kk<nnpe;kk++){
J[ii][jj] += N[kk][ii]*coord[kk][jj];
}
}
}
In the code above, based on the basic equation for isoparametric elements as
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
x x(ξ, η, ζ ) Σn Xi
⎝ y ⎠ = ⎝ y(ξ, η, ζ ) ⎠ = Ni (ξ, η, ζ ) · ⎝ Yi ⎠ (9.1.36)
z z(ξ, η, ζ ) i=1 Zi
det = J[0][0]*J[1][1]*J[2][2]
+ J[0][1]*J[1][2]*J[2][0]
+ J[0][2]*J[1][0]*J[2][1]
- J[0][0]*J[1][2]*J[2][1]
- J[0][1]*J[1][0]*J[2][2]
- J[0][2]*J[1][1]*J[2][0] ;
invJ[0][0] = (J[1][1]*J[2][2] - J[1][2]*J[2][1])/det;
invJ[0][1] = (J[0][2]*J[2][1] - J[0][1]*J[2][2])/det;
invJ[0][2] = (J[0][1]*J[1][2] - J[1][1]*J[0][2])/det;
invJ[1][0] = (J[1][2]*J[2][0] - J[1][0]*J[2][2])/det;
9.1 Computer Programming for Data Preparation Phase 295
The determinant and inverse of the Jacobian matrix [J ] are calculated in the
code above. The former is calculated directly for two-by-two matrices, and using
a recurrence formula for third-order and higher matrices, respectively, while the
formula for calculating the determinant of three-by-three matrix is shown as
| |
| a00 a01 a02 | | | | | | |
| | | | | | | |
| a10 a11 a12 | = a00 | a11 a12 | − a01 | a10 a12 | + a02 | a10 a11 |
| | | a21 a22 | | a20 a22 | | a20 a21 |
|a a a |
20 21 22
= a00 (a11 a22 − a12 a21 ) − a01 (a10 a22 − a12 a20 )
+ a02 (a10 a21 − a11 a20 ) (9.1.38)
The inverse matrix can be obtained using the formulas of linear algebra. Let the
n-th order matrix [A] be given as
⎡ ⎤
a00 · · · a0,n−1
⎢ ⎥
[A] = ⎣ ... . . . ..
. ⎦ (9.1.39)
an−1,0 · · · an−1,n−1
1 [ ]
[A]−1 = Ã (9.1.40)
|[A]|
[ ]
where |[A]| is the determinant of the matrix [A]. Ã is its adjugate matrix given by
⎡ ⎤
[ ] ã00 · · · ãn−1,0
⎢ .. .. .. ⎥
à = ⎣ . . . ⎦ (9.1.41)
ã0,n−1 · · · ãn−1,n−1
with
296 9 Bases for Computer Programming
⎡ ⎤
a00 · · · a0, j−1 a0, j+1 · · · a0,n−1
⎢ . . .. .. .. .. ⎥
⎢ .. .. . . . . ⎥
⎢ ⎥
[i j ] ⎢
⎢ ai−1,0 · · · ai−1, j−1
⎥
ai−1, j+1 · · · ai−1,n−1 ⎥
A =⎢ ⎥ (9.1.43)
⎢ ai+1,0 · · · ai+1, j−1 ai+1, j+1 · · · ai+1,n−1 ⎥
⎢ . .. .. .. ⎥
⎢ . .. .. ⎥
⎣ . . . . . . ⎦
an−1,0 · · · an−1, j−1 an−1, j+1 · · · an−1,n−1
[ ]
its adjugate matrix à is given by
⎡ | | | | | | ⎤
| a11 a12 | | a01 a02 | | a01 a02 |
|
⎢ | a a || −|| |
|
| |
| a11 a12 | ⎥
⎢ a a |⎥
[ ] ⎢ || a a || | | |
21 22 21 22
⎢ | 10 12 | | a00 a02 | | a00 a02 | ⎥
à = ⎢ −| | | −|| |⎥
⎢ | a20 a22| | | a20 a22 | |⎥ (9.1.45)
⎢ | | | a 10 a | ⎥
12
⎥
⎣ || a10 a11 || | a00 a01 |
| |
| a00 a01 | ⎦
| |
| a20 a21 | −|
a20 a21 | | a10 a11 |
for(ii=0;ii<nnpe;ii++){
N[ii][3] = 0.0;
N[ii][4] = 0.0;
N[ii][5] = 0.0;
for(jj=0;jj<3;jj++){
N[ii][3] += invJ[0][jj]*N[ii][jj];
N[ii][4] += invJ[1][jj]*N[ii][jj];
N[ii][5] += invJ[2][jj]*N[ii][jj];
}
}
In the code above, the first-order derivatives of the basis functions with respect
to x, y, and z are calculated. By the chain rule of differentiation, the relationship
between the first-order derivatives of the basis functions with respect to x, y, and z
and those with respect to ξ, η, and ζ is written as follows:
9.1 Computer Programming for Data Preparation Phase 297
⎛ ∂N ⎞ ⎛ ∂N ∂x ∂ Ni ∂y ∂ Ni ∂z
⎞ ⎡ ∂x ∂y ∂z
⎤⎛ ⎞
i i
+ + ∂ Ni
∂ξ ∂x ∂ξ ∂y ∂ξ ∂z ∂ξ ∂ξ ∂ξ ∂ξ ∂x
⎜ ∂ Ni ⎟ ⎜ ∂ Ni ∂x ∂ Ni ∂y ∂ Ni ∂z ⎟ ⎢ ∂x ∂y ∂z ⎥⎜ ∂ Ni ⎟
⎝ ∂η ⎠=⎝ ∂x ∂η
+ ∂y ∂η
+ ∂z ∂η ⎠=⎣ ∂η ∂η ∂η ⎦⎝ ∂y ⎠ (9.1.46)
∂ Ni ∂ Ni ∂x ∂ Ni ∂y ∂ Ni ∂z ∂x ∂y ∂z ∂ Ni
∂ζ ∂x ∂ζ
+ ∂y ∂ζ
+ ∂z ∂ζ ∂ζ ∂ζ ∂ζ ∂z
Therefore, by using the inverse of the Jacobian matrix, the first-order derivatives
of the basis functions with respect to x, y, and z are calculated as
⎛ ⎞ ⎛ ∂N ⎞
∂ Ni i
∂x ∂ξ
⎜ ∂ Ni ⎟ −1 ⎜ ∂ N ⎟
⎝ ∂y ⎠ = [J ] ⎝ ∂ηi ⎠ (9.1.47)
∂ Ni ∂ Ni
∂z ∂ζ
for(ii=0;ii<nnpe;ii++){
jj = ii*nfpn;
B[0][jj] = N[ii][3];
B[1][1+jj] = N[ii][4];
B[2][2+jj] = N[ii][5];
B[3][jj] = N[ii][4];
B[3][1+jj] = N[ii][3];
B[4][1+jj] = N[ii][5];
B[4][2+jj] = N[ii][4];
B[5][jj] = N[ii][5];
B[5][2+jj] = N[ii][3];
}
and
⎡ ⎤
∂
0 0 ∂x
⎢ 0 ∂ 0 ⎥
⎢ ∂y ⎥
⎢ ⎥
⎢ 0 0 ∂∂z ⎥
[L] = ⎢
⎢ ∂ ∂ 0
⎥
⎥ (9.1.49)
⎢ ∂y ∂x ⎥
⎢ ∂ ∂ ⎥
⎣ 0 ∂z ∂ y ⎦
∂
∂z
0 ∂∂x
298 9 Bases for Computer Programming
for(ii=0;ii<necm;ii++){
for(jj=0;jj<kdim;jj++){
DB[ii][jj] = 0.0;
for(kk=0;kk<necm;kk++){
DB[ii][jj] += D[ii][kk]*B[kk][jj];
}
}
}
dtmp = gw[i]*gw[j]*gw[k]*det;
for(ii=0;ii<kdim;ii++){
for(jj=0;jj<kdim;jj++){
for(kk=0;kk<necm;kk++){
esm[ii][jj]
+= B[kk][ii]*DB[kk][jj]*dtmp;
}
}
}
}
}
}
}
/* End of esm3D08.c */
In the code above, using [D], [B], and |[J ]| obtained so far, the element stiffness
matrix is calculated based on the following equation.
[ e] Σ Σ Σ ( T
ng−1 ng−1 ng−1
)
k ≈ [B] [D][B] · |[J ]| | · Hi, j,k (9.1.51)
ξ = ξi
i=0 j=0 k=0
η = ηj
ζ = ζk
9.1 Computer Programming for Data Preparation Phase 299
Note that the variable dtmp in the code is the product of Hi, j,k and |[J ]|.
In the innermost part of the triple-nested loop for the integration points, contri-
bution to the element stiffness matrix of each integration point is calculated, all of
which are finally summed to obtain the element stiffness matrix.
The element stiffness matrix of each element is calculated by esm3D08()
above, and the global stiffness matrix is constructed by adding the element stiff-
ness matrices of all the elements. The element stiffness matrix is arranged assuming
the displacement vector as follows:
( )T
U0 V0 V0 U1 V1 W1 · · · · · · · · · U7 V7 W7 (9.1.53)
( )T
where Ui Vi Wi is the displacement vector for the element node number i.
On the other hand, the global stiffness matrix is arranged according to the displace-
ment vectors of all the nodes in the entire domain. Let the total number of nodes be
N, and the global displacement vector is expressed as follows:
( )T
U0 V0 W0 U1 V1 W1 · · · · · · · · · U N −1 VN −1 W N −1 (9.1.54)
( )T
where U j V j W j is the displacement vector for the global node number j.
Thus, the construction of the global stiffness matrix is performed by adding each
element stiffness matrix to the appropriate position in the global stiffness matrix
according to the correspondence between the global node number and the element
one.
A program code for constructing a global stiffness matrix based on the element
stiffness matrices is shown as follows:
for(i=0;i<N*nfpn;i++)
for(j=0;j<N*nfpn;j++) gsm[i][j] = 0.0 ;
for(iel=0;iel<nelem;iel++){
esm3D08(elem[iel],node,mate[iel],esm,ngauss,gc,gw,nfpn);
for(i=0;i<nnpe;i++){
idof = elem[iel][i]*nfpn;
for(j=0;j<nnpe;j++){
jdof = elem[iel][j]*nfpn;
300 9 Bases for Computer Programming
for(ia=0;ia<nfpn;ia++){
for(ja=0;ja<nfpn;ja++){
gsm[idof+ia][jdof+ja]
+= esm[i*nfpn+ia][j*nfpn+ja] ;
}
}
}
}
}
In the code above, it is assumed that the element data of the iel-th element
is stored in elem [ iel ][ ] and its material data in mate [ iel ][ ]. The
two-dimensional array gsm [][ ], which is to store the global stiffness matrix, is
cleared to zero at first, and each time an element stiffness matrix is calculated, its
components are added to the appropriate positions in the global stiffness matrix.
As we have seen in Chap. 4, the shape of an element affects the accuracy of the
elemental integration, and the error in the element stiffness matrix directly affects
the accuracy of the analysis results. For this reason, various parameters have been
used as criteria for judging the quality of the element shape during mesh genera-
tion to improve the initial mesh [3]. In this subsection, some programs to calculate
parameters representing the element geometry are discussed.
To begin with, the algebraic shape metric (AlgebraicShapeMetric) of an element
has been proposed as a measure of the element shape [9, 10]. The procedure to
calculate the AlgebraicShapeMetric of a hexahedral element of the first order shown
in Fig. 9.2 is given here. Assume that each node of a hexahedral element is numbered
as shown in the figure (the element node number). The coordinates of each node are
given in Table 9.2. Then, for the l-th node of an element, the matrix Al is defined as
⎡ ⎤
x i − xl x j − xl x k − xl
Al = ⎣ yi − yl y j − yl yk − yl ⎦ (9.1.55)
z i − zl z j − zl z k − zl
where i, j, and k are given for the l-th node according to Table 9.3, and (xi , yi , z i )
are the coordinate values of the i-th node.
Each column of Al above is a vector directing from the l-th node to the adjacent
node. Similarly, for the element of the cube shape (see Table 9.4 for the nodal
coordinates), the matrix Wl is defined as
9.1 Computer Programming for Data Preparation Phase 301
0 3
⎡ ⎤
xic − xlc x cj − xlc xkc − xlc
⎢ ⎥
Wl = ⎣ yic − ylc y cj − ylc ykc − ylc ⎦ (9.1.56)
z i − zl z j − zl z k − zl
c c c c c c
( )
where i, j, and k are also given by Table 9.3, and xic , yic , z ic are the coordinate
values of the i-th node of the cubic element.
The matrix Tl is defined as the product of the matrix Al and the inverse of the
matrix Wl .
302 9 Bases for Computer Programming
Tl = Al Wl−1 (9.1.57)
Using the matrix Tl and its inverse, κl , called the condition number, is defined as
∥ ∥
κl = ∥Tl ∥∥Tl−1 ∥ (9.1.58)
where ∥A∥ is the Frobenius norm of the matrix A, defined to be the square root of
the sum of squares of all components of the matrix A as
√ Σ
∥A∥ = ai2j (9.1.59)
i, j
8
f = Σ8 ( )2 (9.1.60)
κl
l=1 3
Note that the AlgebraicShapeMetric f of an element takes the value in the range
0.0 < f ≤ 1.0, and f = 1.0 for the case of cubic shape.
Let’s take a look at ElementShapeMetric . c, a program code for calculating
the AlgebraicShapeMetric of an element. Main variables and arrays used in the code
are listed in Table 9.5.
/* ElementShapeMetric.c */
/*---------------------------------------------------*/
void inv_mat3(
double invA[][3],
double A[][3])
{
9.1 Computer Programming for Data Preparation Phase 303
double det;
det = A[0][0]*A[1][1]*A[2][2] + A[0][1]*A[1][2]*A[2][0]
+ A[0][2]*A[1][0]*A[2][1] - A[0][0]*A[1][2]*A[2][1]
- A[0][1]*A[1][0]*A[2][2] - A[0][2]*A[1][1]*A[2][0] ;
invA[0][0] = (A[1][1]*A[2][2] - A[1][2]*A[2][1])/det;
invA[0][1] = (A[0][2]*A[2][1] - A[0][1]*A[2][2])/det;
invA[0][2] = (A[0][1]*A[1][2] - A[1][1]*A[0][2])/det;
invA[1][0] = (A[1][2]*A[2][0] - A[1][0]*A[2][2])/det;
invA[1][1] = (A[0][0]*A[2][2] - A[0][2]*A[2][0])/det;
invA[1][2] = (A[1][0]*A[0][2] - A[0][0]*A[1][2])/det;
invA[2][0] = (A[1][0]*A[2][1] - A[1][1]*A[2][0])/det;
invA[2][1] = (A[0][1]*A[2][0] - A[0][0]*A[2][1])/det;
invA[2][2] = (A[0][0]*A[1][1] - A[0][1]*A[1][0])/det;
}
/*---------------------------------------------------*/
void matMulti3(
double C[][3],
double A[][3],
double B[][3])
{
int i,j,k;
for(i=0;i<3;i++){
for(j=0;j<3;j++){
C[i][j] = 0.0 ;
for(k=0;k<3;k++) C[i][j] += A[i][k]*B[k][j] ;
}
}
}
/*---------------------------------------------------*/
double f_norm3(
double A[][3])
{
int i,j;
double dsum;
304 9 Bases for Computer Programming
dsum = 0.0 ;
for(i=0;i<3;i++)
for(j=0;j<3;j++)
dsum += A[i][j]*A[i][j] ;
return dsum ;
}
/*---------------------------------------------------*/
double shape_metric(
int *elem,
double **node,
int nnpe,
int nfpn)
{
int i,j,k,ia,ib,ic,id,inode;
int idx[8][3]
= {{1,3,4},{2,0,5},{3,1,6},{0,2,7},
{7,5,0},{4,6,1},{5,7,2},{6,4,3}};
double kp[8],W[3][3],A[3][3],T[3][3],invW[3][3],
invT[3][3],cond;
double cube[8][3]
= {{0.0, 0.0, 0.0},{1.0, 0.0, 0.0},
{1.0, 1.0, 0.0},{0.0, 1.0, 0.0},
{0.0, 0.0, 1.0},{1.0, 0.0, 1.0},
{1.0, 1.0, 1.0},{0.0, 1.0, 1.0}} ;
for(inode=0;inode<nnpe;inode++){
ia = elem[inode] ;
ic = inode ;
for(j=0;j<nfpn;j++){
ib = elem[idx[inode][j]] ;
id = idx[inode][j] ;
for(i=0;i<nfpn;i++)
A[i][j] = node[ib][i] - node[ia][i] ;
for(i=0;i<nfpn;i++)
W[i][j] = cube[id][i] - cube[ic][i] ;
}
inv_mat3(invW,W) ;
matMulti3(T,A,invW) ;
inv_mat3(invT,T) ;
kp[inode] = f_norm3(T)*f_norm3(invT) ;
}
for(inode=0,cond=0.0;inode<nnpe;inode++)
cond += kp[inode]/9 ;
return 8.0/cond ;
}
/*---------------------------------------------------*/
In the code above, inv_mat3() is a function that calculates the inverse matrix
invA of a 3-by-3 matrix A given. See Sect. 9.1.1 for details.
9.1 Computer Programming for Data Preparation Phase 305
Σ
2
ci j = aik · bk j (9.1.61)
k=0
Similarly, according to Eq. (9.1.56), the matrices Wl of all the nodes are defined
as follows:
9.1 Computer Programming for Data Preparation Phase 307
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
x1c − x0c x3c − x0c x4c − x0c 1−0 0−0 0−0 100
W0 = ⎣ y1c − y0c y3c − y0c y4c − y0c ⎦ = ⎣ 0 − 0 1 − 0 0 − 0 ⎦ = ⎣ 0 1 0 ⎦
z 1c − z 0c z 3c − z 0c z 4c − z 0c 0−0 0−0 1−0 001
(9.1.70)
⎡ c ⎤ ⎡ ⎤ ⎡ ⎤
x2 − x1c x0c − x1c x5c − x1c 1−1 0−1 1−1 0 −1 0
W1 = ⎣ y c − y c y c − y c y c − y c ⎦ = ⎣ 1 − 0 0 − 0 0 − 0 ⎦ = ⎣ 1 0 0 ⎦
2 1 0 1 5 1
z 2c − z 1c z 0c − z 1c z 5c − z 1c 0−0 0−0 1−0 0 0 1
(9.1.71)
⎡ c ⎤ ⎡ ⎤ ⎡ ⎤
x3 − x2c x1c − x2c x6c − x2c 0−1 1−1 1−1 −1 0 0
W2 = ⎣ y3c − y2c y1c − y2c y6c − y2c ⎦ = ⎣ 1 − 1 0 − 1 1 − 1 ⎦ = ⎣ 0 −1 0 ⎦
z 3c − z 2c z 1c − z 2c z 6c − z 2c 0−0 0−0 1−0 0 0 1
(9.1.72)
⎡ c ⎤ ⎡ ⎤ ⎡ ⎤
x0 − x3c x2c − x3c x7c − x3c 0−0 1−0 0−0 0 10
W3 = ⎣ y c − y c y c − y c y c − y c ⎦ = ⎣ 0 − 1 1 − 1 1 − 1 ⎦ = ⎣ −1 0 0 ⎦
0 3 2 3 7 3
z 0c − z 3c z 2c − z 3c z 7c − z 3c 0−0 0−0 1−0 0 01
(9.1.73)
⎡ c ⎤ ⎡ ⎤ ⎡ ⎤
x7 − x4c x5c − x4c x0c − x4c 0−0 1−0 0−0 01 0
W4 = ⎣ y7c − y4c y5c − y4c y0c − y4c ⎦ = ⎣ 1 − 0 0 − 0 0 − 0 ⎦ = ⎣ 1 0 0 ⎦
z 7c − z 4c z 5c − z 4c z 0c − z 4c 1−1 1−1 0−1 0 0 −1
(9.1.74)
⎡ c ⎤ ⎡ ⎤ ⎡ ⎤
x4 − x5c x6c − x5c x1c − x5c 0−1 1−1 1−1 −1 0 0
W5 = ⎣ y4c − y5c y6c − y5c y1c − y5c ⎦ = ⎣ 0 − 0 1 − 0 0 − 0 ⎦ = ⎣ 0 1 0 ⎦
z 4c − z 5c z 6c − z 5c z 1c − z 5c 1−1 1−1 0−1 0 0 −1
(9.1.75)
⎡ c ⎤ ⎡ ⎤ ⎡ ⎤
x5 − x6c x7c − x6c x2c − x6c 1−1 0−1 1−1 0 −1 0
W6 = ⎣ y5c − y6c y7c − y6c y2c − y6c ⎦ = ⎣ 0 − 1 1 − 1 1 − 1 ⎦ = ⎣ −1 0 0 ⎦
z 5c − z 6c z 7c − z 6c z 2c − z 6c 1−1 1−1 0−1 0 0 −1
(9.1.76)
⎡ c ⎤ ⎡ ⎤ ⎡ ⎤
x6 − x7c x4c − x7c x3c − x7c 1−0 0−0 0−0 1 0 0
W7 = ⎣ y6c − y7c y4c − y7c y3c − y7c ⎦ = ⎣ 1 − 1 0 − 1 1 − 1 ⎦ = ⎣ 0 −1 0 ⎦
z 6c − z 7c z 4c − z 7c z 3c − z 7c 1−1 1−1 0−1 0 0 −1
(9.1.77)
⎡ ⎤⎡ ⎤ ⎡ ⎤
0 10 0 −1 0 100
T3 = A3 W3−1 = ⎣ −1 0 0 ⎦⎣ 1 0 0 ⎦ = ⎣ 0 1 0 ⎦ (9.1.89)
0 01 0 0 1 001
⎡ ⎤⎡ ⎤ ⎡ ⎤
0 1 0 01 0 1 0 0
T4 = A4 W4−1 = ⎣ 1 0 0 ⎦⎣ 1 0 0 ⎦ = ⎣ 0 1 0⎦ (9.1.90)
1 − z 1 − z −z 0 0 −1 1−z 1−z z
⎡ ⎤⎡ ⎤ ⎡ ⎤
−1 0 0 −1 0 0 1 00
T5 = A5 W5−1 = ⎣ 0 1 0 ⎦⎣ 0 1 0 ⎦ = ⎣ 0 1 0 ⎦ (9.1.91)
z − 1 0 −1 0 0 −1 1−z 0 1
⎡ ⎤⎡ ⎤ ⎡ ⎤
0 −1 0 0 −1 0 100
T6 = A6 W6 = ⎣ −1 0 0 ⎦⎣ −1 0 0 ⎦ = ⎣ 0 1 0 ⎦
−1
(9.1.92)
0 0 −1 0 0 −1 001
⎡ ⎤⎡ ⎤ ⎡ ⎤
1 0 0 1 0 0 1 0 0
T7 = A7 W7−1 = ⎣ 0 −1 0 ⎦⎣ 0 −1 0 ⎦ = ⎣ 0 1 0 ⎦ (9.1.93)
0 z − 1 −1 0 0 −1 0 1−z 1
According to Eq. (9.1.58), the condition numbers κl for all the nodes are calculated,
respectively, as follows:
√
∥ ∥ √ 1
κ0 = ∥T0 ∥ · ∥T0−1 ∥ = 2 + z 2 · 2+ (9.1.102)
z2
∥ ∥ √ √
κ1 = ∥T1 ∥ · ∥T1−1 ∥ = 3 · 3 (9.1.103)
∥ ∥ √ √
κ2 = ∥T2 ∥ · ∥T2−1 ∥ = 3 · 3 (9.1.104)
∥ ∥ √ √
κ3 = ∥T3 ∥ · ∥T3−1 ∥ = 3 · 3 (9.1.105)
√ )
(
∥ −1 ∥ √ z−1 2 1
κ4 = ∥T4 ∥ · ∥T4 ∥ = 2 + 2(1 − z)2 + z 2 · 2+2 + 2 (9.1.106)
z z
∥ ∥ √ √
κ5 = ∥T5 ∥ · ∥T5−1 ∥ = 3 + (1 − z)2 · 3 + (z − 1)2 (9.1.107)
∥ ∥ √ √
κ6 = ∥T6 ∥ · ∥T6−1 ∥ = 3 · 3 (9.1.108)
∥ −1 ∥ √ √
∥
κ7 = ∥T7 ∥ · T7 ∥ = 3 + (1 − z) · 3 + (z − 1)2
2
(9.1.109)
/* ElementShape.c */
/*---------------------------------------------------*/
void check_shape(
double *s_data,
int *elem,
double **node,
int nfpn)
{
int i,j,k,i0,i1,i2,n,ne,nnpe=8;
double dl[12],da[48],de[24],e[12][3],d1,d2,
dx0[3],dx1[3],dx2[3],
nv[8][3][3],v1[3],v2[3],v3[3],v4[3],ndata[8][3],
dl_min,dl_max,da_min,da_max,de_max,de_min;
int idx[8][3] =
{{0,3,8},{1,0,9},{2,1,10},{3,2,11},
{7,4,8},{4,5,9},{5,6,10},{6,7,11}};
int sg[8][3] = {{ 1,-1, 1},{ 1,-1, 1},{ 1,-1, 1},{ 1,-1, 1},
{-1, 1,-1},{-1, 1,-1},{-1, 1,-1},{-1, 1,-1} };
int id[12][6]
= { {0,2,1, 1,0,1}, {1,2,1, 2,0,1},
312 9 Bases for Computer Programming
}
for(i=0;i<=3;i++){
for(j=0;j<nfpn;j++)
e[i+8][j] = ndata[i+4][j] - ndata[i][j] ;
for(j=0,d1=0.0;j<nfpn;j++) d1 += e[i+8][j]*e[i+8][j] ;
dl[i+8] = sqrt(d1) ;
for(j=0;j<nfpn;j++) e[i+8][j] /= dl[i+8] ;
}
/*------------ angle1 ----------------*/
for(n=0;n<nnpe;n++){
for(i=0;i<nfpn;i++) dx0[i] = sg[n][0]*e[idx[n][0]][i] ;
for(i=0;i<nfpn;i++) dx1[i] = sg[n][1]*e[idx[n][1]][i] ;
for(i=0;i<nfpn;i++) dx2[i] = sg[n][2]*e[idx[n][2]][i] ;
de[n*3+0] = dx0[0]*dx1[0]+dx0[1]*dx1[1]+dx0[2]*dx1[2] ;
de[n*3+1] = dx1[0]*dx2[0]+dx1[1]*dx2[1]+dx1[2]*dx2[2] ;
de[n*3+2] = dx2[0]*dx0[0]+dx2[1]*dx0[1]+dx2[2]*dx0[2] ;
nv[n][0][0] = dx2[1]*dx1[2] - dx2[2]*dx1[1] ;
nv[n][0][1] = dx2[2]*dx1[0] - dx2[0]*dx1[2] ;
nv[n][0][2] = dx2[0]*dx1[1] - dx2[1]*dx1[0] ;
nv[n][1][0] = dx1[1]*dx0[2] - dx1[2]*dx0[1] ;
nv[n][1][1] = dx1[2]*dx0[0] - dx1[0]*dx0[2] ;
nv[n][1][2] = dx1[0]*dx0[1] - dx1[1]*dx0[0] ;
nv[n][2][0] = dx0[1]*dx2[2] - dx0[2]*dx2[1] ;
nv[n][2][1] = dx0[2]*dx2[0] - dx0[0]*dx2[2] ;
nv[n][2][2] = dx0[0]*dx2[1] - dx0[1]*dx2[0] ;
for(j=0;j<3;j++){
for(i=0,d1=0.0;i<nfpn;i++) d1 += nv[n][j][i]*nv[n][j][i] ;
d2 = 1.0/sqrt(d1) ;
for(i=0;i<nfpn;i++) nv[n][j][i] *= d2 ;
}
}
/*------------ angle2 ----------------*/
for(ne=0;ne<12;ne++){
for(i=0;i<nfpn;i++) v1[i] = nv[id[ne][0]][id[ne][1]][i] ;
for(i=0;i<nfpn;i++) v2[i] = nv[id[ne][0]][id[ne][2]][i] ;
for(i=0;i<nfpn;i++) v3[i] = nv[id[ne][3]][id[ne][4]][i] ;
for(i=0;i<nfpn;i++) v4[i] = nv[id[ne][3]][id[ne][5]][i] ;
da[ne*4+0] = v1[0]*v2[0]+v1[1]*v2[1]+v1[2]*v2[2] ;
da[ne*4+1] = v3[0]*v4[0]+v3[1]*v4[1]+v3[2]*v4[2] ;
da[ne*4+2] = v1[0]*v4[0]+v1[1]*v4[1]+v1[2]*v4[2] ;
da[ne*4+3] = v2[0]*v3[0]+v2[1]*v3[1]+v2[2]*v3[2] ;
}
/*------------minmax-----------------------*/
dl_max = -1.0 ;
dl_min = 1.0e30 ;
for(i=0;i<12;i++){
if(dl[i] > dl_max) dl_max = dl[i] ;
if(dl[i] < dl_min) dl_min = dl[i] ;
314 9 Bases for Computer Programming
}
de_max = -1.0e30 ;
de_min = 1.0e30 ;
for(i=0;i<24;i++){
if(de[i] > de_max) de_max = de[i] ;
if(de[i] < de_min) de_min = de[i] ;
}
da_max = -1.0e30 ;
da_min = 1.0e30 ;
for(i=0;i<48;i++){
if(da[i] > da_max) da_max = da[i] ;
if(da[i] < da_min) da_min = da[i] ;
}
s_data[0] = dl_min ;
s_data[1] = dl_max ;
s_data[2] = acos(da_min)*180/3.1415926 ;
s_data[3] = acos(da_max)*180/3.1415926 ;
s_data[4] = acos(de_min)*180/3.1415926 ;
s_data[5] = acos(de_max)*180/3.1415926 ;
}
/*---------------------------------------------------*/
dx2[5]
nv[4][1]
Fig. 9.6 Outward normal nv[4][2]
vectors of three faces that
share a node nv[5][1]
nv[4][0]
nv[5][0]
nv[5][2]
contains the correspondence between the unit vector at each node and that along each
edge, and the array sg[i][j] the orientation of the unit vector.
For example, at the node 4, idx[4]={7,4,8} and sg[4]={-1,1,-1}
define the unit vectors at the node as dx0[4]=-e[7], dx1[4]=e[4], and
dx2[4]=-e[8]. Once the unit vectors at each node are defined, the angles between
the three unit vectors at the node are calculated based on the cosine of the angle
between two vectors. Thus, a total of 24 angles between edges are calculated.
In addition, using the three unit vectors at each node, the outward unit normal
vectors nv [ i ][ j] of the three faces that share the node (Fig. 9.6) are obtained
from the outer product of the unit vectors in the face.
Next, in the angle2 part, the angle between the faces of the hexahedron is calcu-
lated. Each face of a hexahedral element is not necessarily flat, i.e., it is not guaranteed
that the four nodes of a face lie on the same plane. For this reason, it is difficult to
show the angle between the faces with a single value; therefore, four angle values
are calculated between two faces of an element instead.
Figure 9.7 shows how to express the angle between two faces that share an edge
(red) in terms of the four angles formed by triangles composed of nodes on each face
(it is obvious that the three points of a triangle are on the same plane). Since each
triangle has two sides as element edges, its normal vector is one of the nv [ i ][
j] calculated in the angle1 part. The array id [ i ][ j] contains for each edge the
corresponding normal vector of the triangle shown in Fig. 9.7.
For example, for the fourth edge, id[4]={4,1,0, 5,1,2} indicates that
the normal vectors nv[4][1], nv[4][0] at the fourth node and nv[5][1],
nv[5][2] at the fifth node are those used to calculate the angle between the two
316 9 Bases for Computer Programming
faces that share the fourth edge, as shown in Fig. 9.6. Thus, a total of 48 angles are
calculated for all the 12 edges.
Finally, in the minmax part, the maximum and minimum values are calculated for
each parameter. Then, each value related to some angle is converted from the cosine
value to an angle value to make it easier to understand intuitively.
A method for applying deep learning to the contact search between surfaces defined
by NURBS is discussed in Chap. 6. NURBS, which has been used for defining shapes
in CAD, is also used as the basis function in the isogeometric analysis [1, 6]. In this
subsection, a program code for computing NURBS basis functions is given.
NURBS basis functions are created from B-spline basis functions. Let’s take a
look at bspline.c, a program that calculates the B-spline basis functions. The
main variables and arrays used are listed in Table 9.8.
/* bspline.c */
#define NMAX_KV 100
#define KV_EPS 1.0e-5
#define NURBS_EPS 1.0e-20
9.1 Computer Programming for Data Preparation Phase 317
/*---------------------------------------------------*/
void Bspline00(
double *N,
double *dN,
double xi,
int ni,
int p,
int nkv,
double *KV)
{
int i,j,k;
double d,e,d_d,d_e,temp[NMAX_KV],dtemp[NMAX_KV],dw;
for(i=ni-p;i<=ni+p;i++){
temp[i] = 0.0 ;
dtemp[i] = 0.0 ;
}
for(i=ni-p;i<=ni;i++){
if((xi>=KV[i]) && (xi<KV[i+1])){
temp[i] = 1.0;
318 9 Bases for Computer Programming
}else{
temp[i] = 0.0;
}
dtemp[i] = 0.0 ;
}
for(k=1;k<=p;k++){
for(i=ni-k;i<=ni;i++){
if(fabs(temp[i]) > NURBS_EPS){
dw = KV[i+k] - KV[i] ;
if(fabs(dw) > KV_EPS){
d = ((xi - KV[i])*temp[i])/dw ;
d_d = k*temp[i]/dw ;
}else{
d = 0.0 ;
d_d = 0.0 ;
}
}else{
d = 0.0;
d_d = 0.0 ;
}
if(fabs(temp[i+1]) > NURBS_EPS){
dw = KV[i+k+1] - KV[i+1] ;
if(fabs(dw) > KV_EPS){
e = ((KV[i+k+1] - xi)*temp[i+1])/dw ;
d_e = k*temp[i+1]/dw ;
}else{
e = 0.0 ;
d_e = 0.0 ;
}
}else{
e = 0.0;
d_e = 0.0 ;
}
temp[i] = d + e;
dtemp[i] = d_d - d_e;
}
}
for(i=0;i<=p;i++){
N[i] = temp[ni-p+i] ;
dN[i] = dtemp[ni-p+i] ;
}
}
/*---------------------------------------------------*/
void Bspline1D(
double *N,
double *dN,
int n,
double xi,
int p,
int nkv,
double *KV)
9.1 Computer Programming for Data Preparation Phase 319
{
int i,j,k,nb;
double d1,d2,d3;
for(i=0;i<n;i++){
N[i] = 0.0 ;
dN[i] = 0.0 ;
}
for(i=0;i<nkv;i++)
if((xi >= KV[i]) && (xi < KV[i+1])) break ;
Bspline00(N+(i-p),dN+(i-p),xi,i,p,nkv,KV) ;
}
/*---------------------------------------------------*/
void Bspline3D(
double ***B,
double ***dB_xi,
double ***dB_et,
double ***dB_ze,
double *N,
double *dN,
int n,
double *M,
double *dM,
int m,
double *L,
double *dL,
int l)
{
int i,j,k;
double d1,d2,d3,dd1,dd2,dd3;
for(i=0;i<n;i++){
d1 = N[i] ;
dd1 = dN[i] ;
for(j=0;j<m;j++){
d2 = M[j] ;
dd2 = dM[j] ;
for(k=0;k<l;k++){
d3 = L[k] ;
dd3 = dL[k] ;
B[i][j][k] = d1*d2*d3;
dB_xi[i][j][k] = dd1*d2*d3;
dB_et[i][j][k] = d1*dd2*d3;
dB_ze[i][j][k] = d1*d2*dd3;
}
}
}
}
/*---------------------------------------------------*/
{ }
For a knot vector Ξ = ξ1 , ξ2 , ξ3 , · · · , ξn+ p , ξn+ p+1 , a monotonically non-
decreasing sequence of real numbers, the n one-dimensional p-th order B-spline
320 9 Bases for Computer Programming
Here, the rule 0/0 = 0 is applied to the fractional part of Eq. (9.1.111). (See
Chapter 6 for an example of calculation using Eqs. (9.1.110) and (9.1.111).)
Note an open knot vector with ξ1 = ξ2 = · · · = ξ p = ξ p+1 and ξn+1 = ξn+2 =
· · · = ξn+ p = ξn+ p+1 is usually employed for CAD and the isogeometric analysis.
Differentiating Eq. (9.1.111), we have Eq. (9.1.112), showing that the derivative of
the B-spline basis function of the p-th order can be obtained from the basis functions
of the (p−1)-th order and their derivatives. In other words, as well as the B-spline
basis function of the p-th order, its derivative can be calculated recursively by Eq.
(9.1.112).
For the first-order and the higher-order derivatives, we have, respectively, the
following formulas [12].
dNi, p (ξ ) p p
= Ni, p−1 (ξ ) − Ni+1, p−1 (ξ ) (9.1.113)
dξ ξi+ p − ξi ξi+ p+1 − ξi+1
dk Ni, p (ξ ) p! Σ k
= ak, j Ni+ j, p−k (ξ ) (9.1.115)
dξ k ( p − k)! j=0
where
9.1 Computer Programming for Data Preparation Phase 321
a0,0 = 1
ak−1,0
ak,0 = ξi+ p−k+1 −ξi
ak−1, j −ak−1, j−1 (9.1.116)
ak, j = ξi+ p+ j−k+1 −ξi+ j (0 < j < k)
−ak−1,k−1
ak,k = ξi+ p+1 −ξi+k
The three-dimensional B-spline basis functions and their partial derivatives are
computed by calling Bspline3D after computing the three one-dimensional B-
spline basis functions Ni, p (ξ ), M j,q (η), and L k,r (ζ ) by calling Bspline1D three
times.
Next, let’s take a look at nurbs.c below, which calculates the NURBS basis
functions. Main variables and arrays are summarized in Table 9.9. Many of these are
the same as in bspline.c (see Table 9.8).
/* nurbs.c */
/*---------------------------------------------------*/
void Nurbs1D(
double *R,
double *dR,
int n,
double *N,
double *dN,
double xi,
322 9 Bases for Computer Programming
int p,
int nkv,
double *KV,
double *w)
{
int i;
double dsum,dsum2,ddsum;
Bspline1D( N, dN, n, xi, p, nkv, KV) ;
for(i=0,dsum=0.0;i<n;i++) dsum += N[i]*w[i] ;
dsum2 = 1.0/dsum/dsum ;
for(i=0;i<n;i++) R[i] = N[i]*w[i]/dsum ;
for(i=0,ddsum=0.0;i<n;i++) ddsum += dN[i]*w[i] ;
for(i=0;i<n;i++)
dR[i] = (dN[i]*w[i]*dsum - N[i]*w[i]*ddsum)*dsum2 ;
}
/*---------------------------------------------------*/
void Nurbs3D(
double ***R3,
double ***dR3_xi,
double ***dR3_et,
double ***dR3_ze,
double *N,
double *dN,
int n,
double *M,
double *dM,
int m,
double *L,
double *dL,
int l,
double *w1,
double *w2,
9.1 Computer Programming for Data Preparation Phase 323
double *w3)
{
int i,j,k;
double d1,d2,d3,dd1,dd2,dd3,dsum,dsum2,ddsum1,ddsum2,
ddsum3;
for(i=0,dsum=0.0,ddsum1=0.0,ddsm2=0.0,ddsum3=0.0;
i<n;i++){
d1 = N[i]*w1[i] ;
dd1 = dN[i]*w1[i] ;
for(j=0;j<m;j++){
d2 = M[j]*w2[j] ;
dd2 = dM[j]*w2[j] ;
for(k=0;k<l;k++){
d3 = L[k]*w3[k] ;
dd3 = dL[k]*w3[k] ;
dsum += d1*d2*d3 ;
ddsum1 += dd1*d2*d3;
ddsum2 += d1*dd2*d3;
ddsum3 += d1*d2*dd3;
}
}
}
dsum2 = 1.0/dsum/dsum ;
for(i=0;i<n;i++){
d1 = N[i]*w1[i] ;
dd1 = dN[i]*w1[i] ;
for(j=0;j<m;j++){
d2 = M[j]*w2[j] ;
dd2 = dM[j]*w2[j] ;
for(k=0;k<l;k++){
d3 = L[k]*w3[k] ;
dd3 = dL[k]*w3[k] ;
R3[i][j][k] = d1*d2*d3/dsum ;
dR3_xi[i][j][k] = (dd1*dsum - d1*ddsum1)*dsum2;
dR3_et[i][j][k] = (dd2*dsum - d2*ddsum2)*dsum2;
dR3_ze[i][j][k] = (dd3*dsum - d3*ddsum3)*dsum2;
}
}
}
}
/*---------------------------------------------------*/
Using the one-dimensional B-spline basis functions and the weight vector, the
one-dimensional NURBS basis function is defined by (see also Sect. 6.2)
Ni, p (ξ ) · wi
Ri, p (ξ ) = Σn (9.1.121)
N (ξ ) · wî
î=1 î, p
324 9 Bases for Computer Programming
In the code nurbs.c above, the function Nurbs1D takes as input a knot vector
and a weight vector. First, the B-spline basis functions and their first-order derivatives
are computed from the knot vector by calling the Bspline1D function, and then
the NURBS basis functions and their first-order derivatives are computed using Eqs.
(9.1.121) and (9.1.122).
The two- and three-dimensional NURBS basis functions are obtained as simple
extensions of Eq. (9.1.121). The latter is given by
where wi, j,k = wi · w j · wk , the product of the weights in each axis direction, and
the first-order derivatives (partial derivatives) of the three-dimensional NURBS basis
function with respect to ξ, η, and ζ are, respectively, written as follows:
p,q,r
∂ Ri, j,k (ξ, η, ζ ) 1
=( )2
∂ξ Σ
n Σ
m Σ
l
Nı̂, p (ξ ) · Mjˆ,q (η) · L k̂,r (ζ ) · wı̂,jˆ,k̂
ı̂=1 ĵ=1 k̂=1
{
∂ Ni, p (ξ )
× · M j,q (η) · L k,r (ζ )
∂ξ
⎛ ⎞
Σ n Σ m Σ l
· wi, j,k ⎝ Nı̂, p (ξ ) · Mjˆ,q (η) · L k̂,r (ζ ) · wı̂,jˆ,k̂ ⎠
ı̂=1 jˆ=1 k̂=1
In Chap. 2, the behavior of neural networks has been explained in detail by using
mathematical formulas, which will be helpful for the readers to understand the
program given here.
Now, DLneuro.c, a simple program for a fully connected feedforward neural
network in C language, is studied. The program employs the SGD (Stochastic
Gradient Descent), which performs the error back propagation for each training
pattern, and the momentum method (see Sect. 2.3.1) to accelerate the training. It also
performs data augmentation by adding noise to the input data during training. Main
variables and arrays used in DLneuro.c are listed in Table 9.10. Note that, for the
sake of brevity, various additional processings such as error handling routines for
wrong usage have been omitted from DLneuro.c.
DLneuro . c assumes the input data file (text file) as follows:
The input data above are for the case of 5 input data (parameters) and 3 output
(teacher) data. The total number of patterns, including training patterns and veri-
fication patterns, is 1000. Each row corresponds to a pattern: the first column is a
sequential number, columns 2–6 the input data, and columns 7–9 the teacher data.
The input and the teacher data are both assumed to be single-precision real values.
First, let’s take a look at DLcommon.c below, which contains commonly used
functions such as those for file input and activation functions.
/* DLcommon.c */
/*---------------------------------------------------*/
void s_shuffle(
int *a,
9.2 Computer Programming for Training Phase 327
int n)
{
int i,ic,tsize,itemp;
for(i=0,tsize=n;i<n-1;i++,tsize--){
ic = floor(drand48()*(tsize-1) + 0.1) ;
itemp = a[tsize-1] ;
a[tsize-1] = a[ic] ;
a[ic] = itemp ;
}
}
/*---------------------------------------------------*/
void a0f(
float *fv,
float *fvd,
float x)
{
float dd;
dd = (1.0f+(float)tanh(x/2.0f))/2.0f;
*fv = dd;
*fvd = dd*(1.0 - dd) ;
}
/*---------------------------------------------------*/
void a1f(
float *fv,
float *fvd,
float x)
{
float dd;
dd = (1.0f+(float)tanh(x/2.0f))/2.0f;
*fv = dd;
*fvd = dd*(1.0 - dd) ;
}
/*---------------------------------------------------*/
void read_file(
char *name,
float **o,
float **t,
int nIU,
int nOU,
int npattern)
{
int i,j,k;
FILE *fp;
fp = fopen( name, "r" ) ;
for(i=0;i<npattern;i++){
fscanf(fp,"%d",&k);
for(j=0;j<nIU;j++) fscanf(fp,"%e",o[i]+j);
for(j=0;j<nOU;j++) fscanf(fp,"%e",t[i]+j);
}
fclose( fp );
}
9.2 Computer Programming for Training Phase 329
/*---------------------------------------------------*/
void initialize(
float ***w,
float **bias,
int nIU,
int *nHU,
int nOU,
int nHL)
{
int i,j,k;
for(i=0;i<=nHL;i++)
for(j=0;j<nHU[i+1];j++)
for(k=0;k<nHU[i];k++) w[i][j][k] = rnd() ;
for(j=1;j<=nHL+1;j++)
for(i=0;i<nHU[j];i++) bias[j][i] = rnd();
}
/*---------------------------------------------------*/
void store_weight(
float ***w,
float **bias,
float ***w_min,
float **bias_min,
int nIU,
int *nHU,
int nOU,
int nHL)
{
int i,j,k;
for(i=0;i<=nHL;i++)
for(j=0;j<nHU[i+1];j++)
for(k=0;k<nHU[i];k++) w_min[i][j][k] = w[i][j][k] ;
for(j=1;j<=nHL+1;j++)
for(i=0;i<nHU[j];i++) bias_min[j][i] = bias[j][i] ;
}
/*---------------------------------------------------*/
void show_results(
float ***w,
float **bias,
float ***w_m,
float **bias_m,
int nIU,
int *nHU,
int nOU,
int nHL)
{
int i,j,k,iL;
for(iL=0;iL<=nHL;iL++){
for(i=0;i<nHU[iL];i++){
printf("%5d",i);
for(j=0;j<nHU[iL+1];j++)
printf(" %e",w_m[iL][j][i]);
330 9 Bases for Computer Programming
printf("\n");
}
}
for(iL=1;iL<=nHL+1;iL++){
for(j=0;j<nHU[iL];j++) printf("%e ",bias_m[iL][j]);
printf("\n");
}
for(iL=0;iL<=nHL;iL++){
for(i=0;i<nHU[iL];i++){
printf("%5d",i);
for(j=0;j<nHU[iL+1];j++)
printf(" %e",w[iL][j][i]);
printf("\n");
}
}
for(iL=1;iL<=nHL+1;iL++){
for(j=0;j<nHU[iL];j++) printf("%e ",bias[iL][j]);
printf("\n");
}
}
/*---------------------------------------------------*/
void clear_dweight(
float ***dw,
float **dbias,
int nIU,
int *nHU,
int nOU,
int nHL)
{
int i,j,k;
for(i=0;i<=nHL;i++)
for(j=0;j<nHU[i+1];j++)
for(k=0;k<nHU[i];k++) dw[i][j][k] = 0.0 ;
for(j=1;j<=nHL+1;j++)
for(i=0;i<nHU[j];i++) dbias[j][i] = 0.0 ;
}
/*---------------------------------------------------*/
In the code above, the function s_shuffle takes as input an array a[] of
integers from 0 to n−1. When this function is executed, the original array a[] is
randomly reordered. In other words, it is a function to generate permutations and is
used to shuffle the training patterns each epoch during training of a neural network.
The function a0f is the activation function used in the hidden layers, and the
function a1f is that used in the output layer. Each of them outputs both the function
value and the first-order derivative of the activation function for a given input x.
Here, both functions are assumed to be sigmoid functions.
The function read_file is used to read the input data, such as the example
shown above. Given the total number of patterns (npattern), the number of input
9.2 Computer Programming for Training Phase 331
data (parameters) (nIU), the number of teacher data (nOU), and the input data file
name (name[]), this function reads the data from the file and stores them in arrays.
The function initialize is a function that initializes all the connection weights
and biases with random numbers. In this case, they are initialized with uniform
random numbers.
The function store_weight copies the connection weights and biases to
another arrays. It is used to store the connection weights and biases that minimize
the error for the verification patterns.
The function show_results outputs the connection weights and biases after
the training of the neural network has been completed.
The function clear_dweight clears the amount of updates of the connection
weights and biases to zero.
Next, let’s take a look at DLebp.c, which contains functions for the forward and
backward propagation, the core part of DLneuro.c.
/* DLebp.c */
/*---------------------------------------------------*/
void propagation(
int p,
float **zIU,
float **zHU,
float **zdHU,
float *zOU,
float *zdOU,
float ***w,
float **bias,
int nIU,
int *nHU,
int nOU,
int nHL)
{
int i,j,iL;
float net;
for(i=0;i<nHU[1];i++){
for(net=0,j=0;j<nIU;j++)
net += w[0][i][j] * zIU[p][j];
net += bias[1][i];
a0f(zHU[1]+i, zdHU[1]+i, net) ;
}
for(iL=2;iL<=nHL;iL++){
for(i=0;i<nHU[iL];i++){
for(net=0,j=0;j<nHU[iL-1];j++)
net += w[iL-1][i][j] * zHU[iL-1][j];
net += bias[iL][i];
a0f(zHU[iL]+i, zdHU[iL]+i, net) ;
}
}
332 9 Bases for Computer Programming
for(i=0;i<nOU;i++){
for(net=0,j=0;j<nHU[nHL];j++)
net += w[nHL][i][j] * zHU[nHL][j];
net += bias[nHL+1][i];
a1f(zOU+i, zdOU+i, net) ;
}
}
/*---------------------------------------------------*/
void back_propagation(
int p,
float **t,
float **zIU,
float **zHU,
float **zdHU,
float *zOU,
float *zdOU,
float ***w,
float **bias,
float ***dw,
float **dbias,
float **d,
int nIU,
int *nHU,
int nOU,
int nHL,
float Alpha,
float Beta)
{
int i,j,iL;
float sum;
for(i=0;i<nOU;i++)
d[nHL+1][i] = (t[p][i] - zOU[i]) * zdOU[i] ;
for(i=0;i<nHU[nHL];i++){
for(sum=0.0f,j=0;j<nOU;j++)
sum += d[nHL+1][j] * w[nHL][j][i];
d[nHL][i] = zdHU[nHL][i] * sum;
}
for(iL=nHL-1;iL>=1;iL--){
for(j=0;j<nHU[iL];j++){
for(sum=0.0f,i=0;i<nHU[iL+1];i++)
sum += d[iL+1][i] * w[iL][i][j];
d[iL][j] = zdHU[iL][j] * sum;
}
}
for(iL=nHL+1;iL>=1;iL--){
for(i=0;i<nHU[iL];i++){
dbias[iL][i] = Beta*d[iL][i] + Mom1*dbias[iL][i];
bias[iL][i] += dbias[iL][i];
}
}
for(iL=nHL;iL>=1;iL--){
9.2 Computer Programming for Training Phase 333
for(j=0;j<nHU[iL+1];j++){
for(i=0;i<nHU[iL];i++){
dw[iL][j][i] = Mom2*dw[iL][j][i];
dw[iL][j][i] += Alpha*d[iL+1][j]*zHU[iL][i] ;
w[iL][j][i] += dw[iL][j][i];
}
}
}
for(j=0;j<nHU[1];j++){
for(i=0;i<nIU;i++){
dw[0][j][i] = Alpha*d[1][j]*zIU[p][i] + Mom2*dw[0][j][i];
+ Mom2*dw[0][j][i];
w[0][j][i] += dw[0][j][i];
}
}
}
/*---------------------------------------------------*/
Σ
nl−1
U lj = wl−1
ji · Oi
l−1
+ θ lj (9.2.2)
i=1
where f () is the activation function, U lj the input to the j-th unit of the l-th layer, O lj
its output, wl−1
ji the connection weight between the j-th unit in the l-th layer and the
i-th unit in the (l−1)-th layer, and θ lj the bias of the j-th unit of the l-th layer. The
function propagation calculates sequentially from the input layer to the output
layer by using Eqs. (9.2.1) and (9.2.2).
The function back_propagation is used for the error back propagation
learning. Here, the squared error is employed as the error function. That is, the error
to be minimized is defined to be the sum of the squares of the difference between the
output of the neural network and the teacher data as follows:
1Σ
nL
( p L p )2
E= O j − Tj (9.2.3)
2 j=1
where p O Lj is the output of the j-th unit of the output layer for the p-th training
pattern, p T j corresponding teacher data, and n L the number of units in the output
layer. In the error back propagation learning, the connection weights are updated
based on the following equations.
334 9 Bases for Computer Programming
∂ E(w, θ )
Δwl−1
ji = − (9.2.4)
∂wl−1
ji
wl−1
ji = w ji + α · Δw ji
l−1 l−1
(9.2.5)
Except for the term for the momentum method, the code is consistent with Eq.
(9.2.6).
Next, the amount of update of the connection weight between the second and the
first hidden layers is given by
( )
( 4)
Σ
n4
( ) ∂ f U 3j
∂E 4 ∂ f Uk
αΔw 2ji = −α =α Tk − Ok wk3j Oi2 (9.2.7)
∂w 2ji k=1
∂Uk4 ∂U 3j
9.2 Computer Programming for Training Phase 335
Except for the term for the momentum method, the code is consistent with Eq.
(9.2.7).
Further, the amount of update of the connection weight between the first hidden
layer and the input layer is given by
∂E
αΔw 1ji = −α
∂w 1ji
( )
( 4 ) n3 ( )
Σ
n4
( ) Σ ∂ f U 2j
4 ∂ f Uk ∂ f Ul3
=α Tk − Ok wki
3
wl2j Oi1 (9.2.8)
k=1
∂Uk4 l=1
∂Ul3 ∂U 2j
Except for the term for the momentum method, the code is consistent with Eq.
(9.2.8).
The amounts of update of the biases are summarized as follows:
( )
∂E ( 4 ) ∂ f Ui4
= −β 4 = −β Oi − Ti
βΔθi4 (9.2.9)
∂θi ∂Ui4
( )
( 3)
Σn4
( ) ∂ f U 4j
∂ E 3 ∂ f Ui
βΔθi = −β 3 = −β
3
O j − Tj
4
w ji (9.2.10)
∂θi j=1
∂U 4j ∂Ui3
∂E
βΔθi2 = −β
∂θi2
( )
( ) ( )
)∂ f Uj Σ
4
Σ
n4
( n3
∂ f Uk3 ∂ f Ui2
= −β O 4j − Tj w 3jk wki
2
(9.2.11)
j=1
∂U 4j k=1
∂Uk3 ∂Ui2
/* DLneuro.c */
#include "nrutil.c"
#include <math.h>
#define rnd() (drand48() * (Wmax - Wmin) + Wmin)
#define noise() ((drand48()-0.5f)*2.0f*NoiseLevel)
#define FNAMELENGTH 100
#define NHU_V 1
#define NHU_C 0
float Mom1=0.05f;
float Mom2=0.05f;
float Wmin = -0.10f ;
float Wmax = 0.10f ;
#include "DLcommon.c"
#include "DLebp.c"
/*---------------------------------------------------*/
int main(void)
{
int i,j,k,iteration_min,i1,j1,rseed,o_freq,MaxPattern,
MaxEpochs,lp_no,tp_no,nIU,nOU,*nHU,nHL,nHU0,nhflag,
*idx1;
float *zOU,**zOU_min,**zIU,**zIUor,**zHU,***w,**bias,
9.2 Computer Programming for Training Phase 337
***dw,**dbias,***w_min,**bias_min,**dtemp,**zdHU,*zdOU,
ef1,ef2,ef2_min=1e6,NoiseLevel, Alpha,Beta,**t ;
char fname1[FNAMELENGTH];
FILE *fp;
/*---------------------------------------------------*/
scanf("%d %d %d %d %d %d %d %d %d %s %d %e %e %e",
&MaxPattern,&lp_no,&nIU,&nHU0,&nOU,&nHL,&nhflag,
&MaxEpochs,&o_freq,fname1,&rseed,&Alpha,&Beta,
&NoiseLevel);
tp_no = MaxPattern - lp_no ;
/*---------------------------------------------------*/
nHU = ivector(0,nHL+1);
if(nhflag == NHU_V){
for(i=1;i<=nHL;i++) scanf("%d",nHU+i);
}else{
for(i=1;i<=nHL;i++) nHU[i] = nHU0 ;
}
nHU[0] = nIU ;
nHU[nHL+1] = nOU ;
/*---------------------------------------------------*/
t = matrix(0,MaxPattern-1,0,nOU-1) ;
zIU = matrix(0,MaxPattern-1,0,nIU-1) ;
zIUor = matrix(0,MaxPattern-1,0,nIU-1) ;
zHU = (float **)malloc((nHL+2)*sizeof(float *));
for(i=0;i<nHL+2;i++) zHU[i] = vector(0,nHU[i]-1);
zdHU = (float **)malloc((nHL+2)*sizeof(float *));
for(i=0;i<nHL+2;i++) zdHU[i] = vector(0,nHU[i]-1);
zOU = vector(0,nOU-1) ;
zdOU = vector(0,nOU-1) ;
zOU_min = matrix(0,MaxPattern-1,0,nOU-1) ;
w = (float ***)malloc((nHL+1)*sizeof(float **));
for(i=0;i<=nHL;i++)
w[i] = matrix(0,nHU[i+1]-1,0,nHU[i]-1) ;
w_min = (float ***)malloc((nHL+1)*sizeof(float **));
for(i=0;i<=nHL;i++)
w_min[i] = matrix(0,nHU[i+1]-1,0,nHU[i]-1) ;
dw = (float ***)malloc((nHL+1)*sizeof(float **));
for(i=0;i<=nHL;i++)
dw[i] = matrix(0,nHU[i+1]-1,0,nHU[i]-1) ;
bias = (float **)malloc((nHL+2)*sizeof(float *));
for(i=0;i<=nHL+1;i++) bias[i] = vector(0,nHU[i]-1) ;
bias_min = (float **)malloc((nHL+2)*sizeof(float *));
for(i=0;i<=nHL+1;i++) bias_min[i] = vector(0,nHU[i]-1) ;
dbias = (float **)malloc((nHL+2)*sizeof(float *));
for(i=0;i<=nHL+1;i++) dbias[i] = vector(0,nHU[i]-1) ;
dtemp = (float **)malloc((nHL+2)*sizeof(float *));
for(i=0;i<nHL+2;i++) dtemp[i] = vector(0,nHU[i]-1) ;
338 9 Bases for Computer Programming
[−Noise Level, Noise Level]. Although drand48() is used here to generate the
random numbers in DLneuro.c, it could be replaced by a better routine.
The structure of the main function is summarized as follows:
(1) Loading of meta-parameters such as number of training patterns, number of
hidden layers, etc.
(2) Allocation of arrays based on the meta-parameters.
(3) Loading of training patterns from a file by read_file function.
(4) Initialization of the connection weights and biases by initialize function.
(5) Start of training (Training continues until MaxEpochs is met.)
(6) End of training (MaxEpochs is met.)
(7) Output of training results, e.g., all the connection weights and biases by
show_results function.
The operations per epoch in the training loop are summarized as follows:
(E1) Add of noise to input data.
(E2) Change of the order of presentation of training patterns by s_shuffle
function.
(E3) Following operations are performed per training pattern.
(E3-1) Propagation (propagation)
(E3-2) Back propagation (back_propagation)
(E3-3) Following operations are performed every o_freq epochs.
(E3-3-1) Calculate sum of errors for training patterns.
(E3-3-2) Calculate sum of errors for verification patterns.
(E3-3-3) Output average error.
(E3-3-4) If the error for verification patterns becomes minimum,
(E3-3-4-1) Copy all the connection weights and biases to another arrays.
which is stored in the contiguous area of the memory in this order, i.e., components
of the first row come first and those of the second follow.
This storage order is called RowMajor, a standard for matrices in C. In the case
above, the component at the i-th row and the j-th column of the two-dimensional
matrix A is accessed as A[i*3 + j] of the corresponding one-dimensional array
in the RowMajor order by explicitly specifying the number of columns, 3.
Note that, in the FORTRAN language, the matrix in Eq. (9.2.12) is stored in
memory as the order shown by
which is called ColumnMajor, where components of the first, the second and the
third columns are, respectively, stored in this order.
BLAS subprograms are classified into three levels as
Level 1: Subprograms (functions in C) on vectors.
9.2 Computer Programming for Training Phase 341
the computational intensity is 0.5 because n additions are performed using 2n pieces
of data.
In the case of the scalar product of vectors as
⎛⎞
b1
( )⎜ ⎟
⎜ b2 ⎟
a1 a2 · · · an ⎜ . ⎟ (9.2.14)
⎝ .. ⎠
bn
the computational intensity is about 2.0 because 2n 2 operations are performed using
n 2 + n pieces of data.
For the case of matrix–matrix product, which is one of the operations in the level
3 BLAS and shown as
342 9 Bases for Computer Programming
⎡ ⎤⎡ ⎤
a11 a12 ··· a1n b11 b12 ··· b1n
⎢ a21 a22 ··· a2n ⎥ ⎢ b21 b22 ··· b2n ⎥
⎢ ⎥⎢ ⎥
⎢ . .. .. .. ⎥⎢ .. .. .. .. ⎥ (9.2.16)
⎣ .. . . . ⎦⎣ . . . . ⎦
an1 an2 · · · ann bn1 bn2 · · · bnn
for(i=0;i<n;i++){
y[incy*i] += a*x[incx*i] ;
}
Note that the dimension of vector x must be at least 1 + (n-1)*incx and that
of vector y at least 1 + (n-1)*incy. There is cblas_daxpy() that performs
the same operation as the above but for double-precision real numbers.
Next, a description of the level 2 function cblas_sgemv() is taken. This func-
tion performs the operation of multiplying the product of an m-by-n matrix A and a
vector x by alpha, and adding it to a vector y multiplied by beta. The components
of matrix A, vectors x and y, and constants alpha and beta are single-precision
real numbers. The arguments of this function are shown as
Therefore, CblasRowMajor should be specified for the case of the row priority,
and CblasColMajor for the case of the column priority. transA is a parameter
that specifies whether or not to transpose the matrix A, which is defined in cblas.h
as follows:
cblas_sgemv(CblasRowMajor, CblasNoTrans, m, n,
alpha, A, n,
x, incx,
beta, y, incy)
for(i=0;i<m;i++){
for(j=0,d=0.0f;j<n;j++) d += A[i*n+j]*x[incx*j] ;
y[incy*i] = alpha*d + beta*y[incy*i] ;
}
Note that the dimension of vector x must be at least 1 + (n-1)*incx, and that
of vector y must be at least 1 + (n-1)*incy. There is also cblas_dgemv()
that performs the same operation but for double-precision real numbers.
Next, we discuss a description of the level 3 function cblas_sgemm(), which
performs the operation of multiplying by alpha the product of an m-by-k matrix A
and a k-by-n matrix B (resulting in an m-by-n matrix), and adding it to an m-by-n
matrix C multiplied by beta. The components of matrices A, B, and C, and the
constants alpha and beta are single-precision real numbers. The arguments of
this function is shown as
344 9 Bases for Computer Programming
The operation of this function in the case above can be written without using
BLAS as
for(i=0;i<m;i++){
for(j=0;j<n;j++){
for(ii=0,d=0.0f;ii<k;ii++) d += A[i*k+ii]*B[ii*n+j] ;
C[i*n+j] = alpha*d + beta*C[i*n+j] ;
}
}
Σ
nl−1
p
U lj = wl−1
ji · Oi
p l−1
+ θ lj (9.2.17)
i=1
where p U lj is the input to the j-th unit in the l-th layer, and p O l−1i the output of
the i-th unit in the (l−1)-th layer. Here, the left shoulder supercript p denotes that
these input and output are those in the p-th learning pattern in a mini-batch. wl−1 ji is
the connection weight between the j-th unit in the l-th layer and the i-th unit in the
(l−1)-th layer, and θ lj the bias value of the j-th unit in the l-th layer.
Using the scalar product of vectors, Eq. (9.2.17) is written for the first unit in the
l-th layer as
⎛ p l−1 ⎞
( ) O1
⎜ .. ⎟
p
U1l = w11
l−1
· · · w1,nl−1 ⎝ . ⎠ + θ1l
l−1
(9.2.18)
p l−1
Onl−1
Similarly, Eq. (9.2.17) is written for the second unit in the l-th layer as
346 9 Bases for Computer Programming
⎛ p l−1 ⎞
( ) O1
⎜ .. ⎟
p
U2l = w21
l−1
· · · w2,nl−1 ⎝ . ⎠ + θ2l
l−1
(9.2.19)
p l−1
Onl−1
[ ]
where wl−1 is the n l -by-n l−1 matrix with the connection weights between the
(l−1)-th layer and the l-th layer as components.
For the first training pattern in a mini-batch, Eq. (9.2.20) is written as
⎛1 ⎞ ⎛1 ⎞ ⎛ l ⎞
U1l O1l−1 θ1
⎜ Ul
1 ⎟ [ ]⎜ 1 l−1 ⎟
O ⎜ θ l ⎟
⎜ 2 ⎟ l−1 ⎜ 2 ⎟ ⎜ 2 ⎟
⎜ . ⎟= w ⎜ .. ⎟ + ⎜ . ⎟ (9.2.21)
⎝ .. ⎠ ⎝ . ⎠ ⎝ .. ⎠
1
Unl l 1
Onl−1
l−1
θnl l
Similarly, for the second training pattern in a mini-batch, Eq. (9.2.20) is written
as follows:
⎛2 ⎞ ⎛2 ⎞ ⎛ l ⎞
U1l O1l−1 θ1
⎜ 2U l ⎟ [ ]⎜ 2 O l−1 ⎟ ⎜ θ l ⎟
⎜ 2 ⎟ ⎜ 2 ⎟ ⎜ 2 ⎟
⎜ . ⎟ = wl−1 ⎜ .. ⎟ + ⎜ . ⎟ (9.2.22)
⎝ .. ⎠ ⎝ . ⎠ ⎝ .. ⎠
2
Unl l 2
Onl−1
l−1
θnl l
∂pE
p
Δw 1ji = −
∂w 1ji
( )p
(p ) (p )
Σ
n4
( ) ∂ f Uk4 Σ
n3 ∂f p
U 2j
3 ∂f Ul3
=− p
Ok4 − p
Tk wkl wl2j Oi1
k=1
∂ p Uk4 l=1
∂ p Ul3 ∂ p U 2j
( )
(p )
Σ
n4 Σ
n3 ∂f p
U 2j
3 ∂f Ul3
= δk
p 4
wkl wl2j p
Oi1
k=1 l=1
∂ p Ul3 ∂ p U 2j
( )
Σ
n3 ∂f p
U 2j
= δl wl j
p 3 2 p
Oi1 = p δ 2j · p Oi1 (9.2.27)
l=1
∂ p U 2j
⎜ δ2
p 4⎟
⎜ ⎟
⎜ . ⎟ (9.2.28)
⎝ .. ⎠
δn 4
p 4
[ ]
for all the training patterns in a mini-batch can be given by a matrix δ 4 of
δj
p 4
⎢ 1δ4 δ2
2 4
··· δ2
mb 4 ⎥ [ ]
⎢ 2 ⎥
.. ⎥ = δ
4
⎢ . .. (9.2.29)
⎣ .. . ··· . ⎦
δn 4
1 4
δn 4 · · ·
2 4
δn 4
mb 4
where ʘ denotes the Hadamard product of two vectors, a binary operation that
takes two vectors of the same dimension and produces another vector of the same
dimension whose i-th component is the product of the i-th components of the original
two vectors. Thus, p δ 3j for all the training patterns in a mini-batch can be writeten as
9.2 Computer Programming for Training Phase 349
⎡1 ⎤
δ13 δ1
2 3
... δ1
mb 3
⎢ δ3
1
δ2
2 3
... δ2
mb 3 ⎥
⎢ 2 ⎥
⎢ . .. .. ⎥
⎣ .. . ... . ⎦
δn 3 δn 3 . . . mb δn33
1 3 2 3
⎡⎡ 3 ⎤⎡ 1 ⎤⎤
w11 w21 3
· · · wn34 1 δ14 δ1
2 4
··· δ1
mb 4
⎢⎢ w 3 w 3 · · · w 3 ⎥⎢ 1 δ 4 δ2
2 4
··· δ2
mb 4 ⎥⎥
⎢⎢ 12 22 n4 2 ⎥⎢ 2 ⎥⎥
= ⎢⎢ . .. . ⎥⎢ . .. .. ⎥⎥
⎣⎣ . . . · · · .. ⎦⎣ .. . ··· . ⎦⎦
w1n 3 w2n 3 · · · wn 4 n 3
3 3 3 δn 4 δn 4 · · ·
1 4 2 4
δn 4
mb 4
⎡ ⎤
∂ f (1 U13 ) ∂ f (2 U13 ) ∂ f (mb U13 )
· · ·
⎢ ∂ 1U13
1 3
∂ U1
2 3
∂ U1
mb 3
⎥
⎢ ∂ f ( U2 ) ∂ f (2 U23 ) ∂ f (mb U23 ) ⎥
⎢ ∂ 1U 3 · · · ⎥
⎢ ∂ 2 U2 3
∂ mb U2 3
⎥
ʘ⎢ ⎥
2
⎢ .
.. .
.. .
.. ⎥
(9.2.32)
⎢ ( ) ( ) · · · ( )⎥
⎣ ∂ f 1 Un3 ∂ f 2 Un3 ∂ f mb 3
U ⎦
· · · ∂ mb U 3 3
3 3 n
∂ 1U 3 n3 ∂ 2U 3 n3 n3
[ ] [ ]
where δ 3 is a matrix of n 3 rows and mb columns, and w 3 that of n 4 rows and n 3
[ ]T
columns ( w 3 is n 3 rows and n 4 columns). Note that here ʘ denotes the Hadamard
product of two matrices, a binary operation that takes two matrices of the same dimen-
sions and produces another matrix of the same dimensions whose i, j-th component
is the product of the i, j-th components of the original two matrices.
In the same manner, from Eq. (9.2.27), we obtain
⎡1 ⎤
δ12 δ1
2 2
··· δ1
mb 2
⎢ 1δ2 δ2
2 2
··· δ2
mb 2 ⎥
⎢ 2 ⎥
⎢ . .. .. ⎥
⎣ .. . ··· . ⎦
δn 2 δn 2 · · · mb δn22
1 2 2 2
⎡⎡ 2 ⎤⎡ 1 ⎤⎤
w11 w21 2
· · · wn23 1 δ13 δ1
2 3
··· δ1
mb 3
⎢⎢ w 2 w 2 · · · w 2 ⎥⎢ 1 δ 3 δ2
2 3
··· δ2
mb 3 ⎥⎥
⎢⎢ 12 22 n3 2 ⎥⎢ 2 ⎥⎥
= ⎢⎢ . .. . ⎥⎢ . .. .. ⎥⎥
⎣⎣ .. . · · · .. ⎦⎣ .. . ··· . ⎦⎦
w1n
2
2
w2n
2
2
· · · wn23 n 2 δn 3
1 3
δn 3 · · ·
2 3
δn 3
mb 3
350 9 Bases for Computer Programming
⎡ ⎤
∂ f (1 U12 ) ∂ f (2 U12 ) ∂ f (mb U12 )
···
⎢ ∂ 1 U12 ∂ 2 U13 ⎥
∂ mb U12
⎢ ∂ f (1 U22 ) ∂ f (2 U22 ) ⎥
∂ f (mb U22 )
⎢ ··· ⎥
⎢ ∂ 1 U22 ∂ 2 U23 ⎥
∂ mb U22
ʘ⎢ .. .. .. ⎥ (9.2.34)
⎢ ··· ( . ) ⎥
⎢ (. ) (
. ) ⎥
⎣ ∂ f 1 Un2 ∂f 2
Un22 ∂ f mb Un22 ⎦
∂ 1 Un22
2
∂ 2 Un22
· · · ∂ mb U 2
n2
[ ] [ ]
where δ 2 is a matrix of n 2 rows and mb columns, and w 2 that of n 3 rows and n 2
[ 2 ]T
columns ( w is n 2 rows and n 3 columns).
The amounts of updates of connection weights are obtained from Eqs. (9.2.25),
(9.2.26), and (9.2.27), which are summarized as follows:
Σ
mb
Δw 3ji = δj
p 4
· p Oi3 (9.2.36)
p=1
Σ
mb
Δw 2ji = δj
p 3
· p Oi2 (9.2.37)
p=1
Σ
mb
Δw 1ji = δj
p 2
· p Oi1 (9.2.38)
p=1
From Eq. (9.2.36), we obtain for all the connection weights between the output
layer and the second hidden layer the equation as
⎡ ⎤
Δw113
Δw12 3
· · · Δw1n 3
3
⎢ Δw 3 Δw 3 · · · Δw2n 3
3 ⎥
⎢ 21 22 ⎥
⎢ . .. .. ⎥
⎣ . . . ··· . ⎦
Δwn 4 1 Δwn34 2
3
· · · Δwn34 n 3
9.2 Computer Programming for Training Phase 351
⎡1 ⎤⎡ ⎤
δ14 δ1
2 4
··· δ1
mb 4 1
O13 1
O23 · · · 1 On33
⎢ δ4
1
δ2
2 4
··· δ2
mb 4 ⎥⎢ 2 3
O1 2 3
O2 · · · 2 On33 ⎥
⎢ 2 ⎥⎢ ⎥
=⎢ . .. .. ⎥⎢ .. .. .. ⎥
⎣ .. . ··· . ⎦ ⎣ . . ··· . ⎦
δn 4
1 4
δn 4 · · ·
2 4 mb 4
δn 4 mb
O1 O2 · · · mb On33
3 mb 3
⎡1 mb 4 ⎤⎡ 1 3 2 3 ⎤T
δ14 2 δ14
··· δ1 O1 O1 · · · mb O13
⎢ δ2 δ2
1 4 2 4
··· mb 4 ⎥⎢ 1 3 2 3
δ2 ⎥⎢ O2 O2 · · · mb O23 ⎥
⎢ ⎥
=⎢ . . .. ⎥⎢ .. .. .. ⎥ (9.2.39)
⎣ .. .. · · · . ⎦ ⎣ . . ··· . ⎦
δn 4 δn 4 · · ·
1 4 2 4 mb 4
δn 4 On 3 On 3 . . . mb On33
1 3 2 3
[ ] [ ][ ]T
Δw 1 = δ 2 O 1 (9.2.42)
[ ] [ 2]
where
[ ] Δw 2 is a matrix of n 3 rows and n 2 columns (the same size
[ as
] that
[ of
] w ),
Δw 1 that of n 2 rows and n 1 columns
[ ] (the same size as that of w 1
), O 2
that of
n 2 rows and mb columns, and O 1 that of n 1 rows and mb columns.
Here, we note that the update rule for the bias in the p-th training pattern in a
mini-batch can be written as follows (see Eqs. (2.1.46), (2.1.47), and (2.1.48) in
Sect. 2.1):
( )
∂pE ( p 4 p ) ∂ f p Ui4
p
Δθi4 = − = − O i − T i = p δi4 (9.2.43)
∂θi4 ∂ p Ui4
( )
( p 3)
Σn4
(p 4 p ) ∂ f p 4
U
∂ E 3 ∂f
p j Ui
p
Δθi = − 3 = −
3
O j − Tj w ji
∂θi j=1
∂ pU 4
j ∂ pU 3
i
( )
Σn 4
∂ f p Ui3
= δ j · w 3ji
p 4
= p δi3 (9.2.44)
j=1
∂ pU 3
i
∂pE
p
Δθi2 = −
∂θi2
352 9 Bases for Computer Programming
( )
(p ) (p )
)∂ f U 4j Σ
p
Σ
n4
( n3
∂f Uk3 ∂f Ui2
=− p
O 4j − Tj
p
w 3jk wki
2
j=1
∂ p U 4j k=1 ∂ p Uk3 ∂ p Ui2
( p 3) ( )
Σn4 Σn3
3 ∂f Uk 2 ∂ f p Ui2
= δj
p 4
w jk wki
j=1 k=1
∂ p Uk3 ∂ p Ui2
( p 2)
Σn3
2 ∂f Ui
= δk · wki
p 3
= p δi2 (9.2.45)
k=1
∂ p U i
2
Therefore, the amount of update for the biases in the output layer in all the training
patterns in a mini-batch can be written as follows:
⎛ ⎞ ⎛ ⎞
Δθ14 1
Δθ14 + 2 Δθ14 + · · · + mb Δθ14
⎜ Δθ 4 ⎟ ⎜ 1
Δθ24 + 2 Δθ24 + · · · + mb Δθ24 ⎟
⎜ 2 ⎟ ⎜ ⎟
⎜ . ⎟=⎜ .. ⎟
⎝ .. ⎠ ⎝ . ⎠
Δθn44 1
Δθn44 + 2 Δθn44 + · · · + mb Δθn44
⎛ ⎞
δ1 + 2 δ14 + · · · + mb δ14
1 4
⎜ δ2 + 2 δ24 + · · · + mb δ24 ⎟
1 4
⎜ ⎟
=⎜ .. ⎟
⎝ . ⎠
δn 4
1 4
+ 2 δn44 + · · · + mb δn44
⎡1 4 ⎤⎛ ⎞ ⎛ ⎞
δ1 δ1 · · · mb δ14
2 4
1 1
⎢ 1δ4 δ2 · · · mb δ24 ⎥
2 4 ⎜ 1 ⎟ [ ]⎜ 1 ⎟
⎢ 2 ⎥⎜ ⎟ 4 ⎜ ⎟
=⎢ . .. .. ⎥⎜ .. ⎟ = δ ⎜ .. ⎟ (9.2.46)
⎣ .. . ··· . ⎦ ⎝ . ⎠ ⎝.⎠
δn 4
1 4
δn 4 · · · δn 4
2 4 mb 4
1 1
In the same way, from Eqs. (9.2.44) and (9.2.45), we have, respectively,
⎛ ⎞ ⎛ ⎞
Δθ13 1
⎜ Δθ 3 ⎟ [ ]⎜ 1 ⎟
⎜ 2 ⎟ ⎜ ⎟
⎜ . ⎟ = δ3 ⎜ . ⎟ (9.2.47)
⎝ .. ⎠ ⎝ .. ⎠
Δθn 3
3
1
⎛ ⎞ ⎛ ⎞
Δθ12 1
⎜ Δθ 2 ⎟ [ ]⎜ 1 ⎟
⎜ 2 ⎟ ⎜ ⎟
⎜ . ⎟ = δ2 ⎜ . ⎟ (9.2.48)
.
⎝ . ⎠ ⎝ .. ⎠
Δθn22 1
The program DLebpBLAS.c for the calculation of the forward and backward
propagation is shown as follows (Table 9.11 shows a list of main variables and arrays):
9.2 Computer Programming for Training Phase 353
/* DLebpBLAS.c */
/*---------------------------------------------------*/
void propagationBLAS(
float **uHU,
float **zHU,
float **zdHU,
float **w,
float **bias,
int *nHU,
int nHL,
int bsz)
{
int i, j, k, ia;
for(i=0;i<nHL;i++) {
bias_onesBLAS(uHU[i], bias[i], nHU[i+1], bsz);
cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans,
nHU[i+1], bsz, nHU[i], 1, w[i], nHU[i], zHU[i],
bsz, 1, uHU[i], bsz);
for(j=0;j<nHU[i+1];j++){
for(k=0;k<bsz;k++){
ia = IDX(j, k, bsz) ;
a0f(zHU[i+1]+ia, zdHU[i+1]+ia, uHU[i][ia]);
}
}
}
354 9 Bases for Computer Programming
i = nHL;
bias_onesBLAS(uHU[i], bias[i], nHU[i+1], bsz);
cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans,
nHU[i+1], bsz, nHU[i], 1, w[i], nHU[i], zHU[i], bsz,
1, uHU[i], bsz);
for(j=0;j<nHU[i+1];j++){
for(k=0;k<bsz;k++){
ia = IDX(j, k, bsz) ;
a1f(zHU[i+1]+ia, zdHU[i+1]+ia, uHU[i][ia]) ;
}
}
}
/*---------------------------------------------------*/
void back_propagationBLAS(
float **uHU,
float **zHU,
float **zdHU,
float *t,
float **w,
float **bias,
float **dw,
float **dbias,
float *ones,
float **dtemp,
int *nHU,
int nHL,
float Alpha,
float Beta,
int bsz,
int pbatch)
{
int i, j, k, ia;
for(i=0;i<nHU[nHL+1];i++){
for(j=0;j<bsz;j++){
ia = IDX(i, j, bsz) ;
dtemp[nHL][ia] = t[IDX(j+pbatch, i, nHU[nHL+1])]
- zHU[nHL+1][ia];
dtemp[nHL][ia] *= zdHU[nHL+1][ia];
}
}
for(i=nHL-1;i>=0;i--){
cblas_sgemm(CblasRowMajor, CblasTrans, CblasNoTrans,
nHU[i+1], bsz, nHU[i+2], 1, w[i+1], nHU[i+1],
dtemp[i+1], bsz, 0, dtemp[i], bsz);
for(j=0;j<nHU[i+1];j++){
for(k=0;k<bsz;k++){
ia = IDX(j, k, bsz);
dtemp[i][ia] *= zdHU[i+1][ia];
}
}
9.2 Computer Programming for Training Phase 355
}
for(i=0;i<nHL+1;i++){
cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasTrans,
nHU[i+1],nHU[i], bsz, Alpha, dtemp[i], bsz,
zHU[i], bsz, Moment1, dw[i], nHU[i]);
cblas_sgemv(CblasRowMajor, CblasNoTrans, nHU[i+1],
bsz, Beta, dtemp[i], bsz, ones, 1, Moment1,
dbias[i], 1);
cblas_saxpy(nHU[i+1] * nHU[i], 1.0f, dw[i], 1, w[i], 1);
cblas_saxpy(nHU[i+1], 1.0f, dbias[i], 1, bias[i], 1);
}
}
/*---------------------------------------------------*/
/* DLneuroBLAS.c */
#include <cblas.h>
#include <float.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
356 9 Bases for Computer Programming
#include <string.h>
#include <sys/time.h>
#define MaxH_UnitNo 1000
#define MaxO_UnitNo 1000
#define FNAMELENGTH 100
#define NHU_V 1
#define NHU_C 0
#define Moment1 0.05f
#define Wmin -0.30f
#define Wmax 0.30f
#define rnd() (drand48() * (Wmax - Wmin) + Wmin)
#define noise() ((drand48() - 0.5f) * 2.0f * nlev)
#define IDX(i, j, ld) ((ld) * (i) + (j))
#include "DLneuroBLAS_mem.c"
#include "DLcommonBLAS.c"
#include "DLebpBLAS.c"
/*---------------------------------------------------*/
int main(void){
int i, j, k, m, ia, ib, ic, iteration_min=0, rseed, o_freq,
bsz, MaxPattern,MaxEpochs, lp_no, tp_no, nIU, nOU, *nHU,
nHL, nHU0, nhflag, thread, pbatch;
char fname1[FNAMELENGTH];
float ef1, ef2, ef2_min = 1e6, nlev, Alpha, Beta, *t, *d, *oIU,
*oIUor, *ones, **uHU, **zHU,**zdHU, **w, **bias,
**dw, **dbias, **w_min, **bias_min, **dtemp;
//---(Part 1)-----------------------------------
9.2 Computer Programming for Training Phase 357
scanf("%d", &thread);
openblas_set_num_threads(thread);
scanf("%d %d %d %d %d %d %d %d %d %d %d %s %d %e %e %e",
&MaxPattern,
&lp_no, &tp_no, &bsz, &nIU, &nHU0, &nOU, &nHL, &nhflag,
&MaxEpochs, &o_freq, fname1, &rseed, &Alpha, &Beta, &nlev);
srand48(rseed);
//---(Part 2)-----------------------------------
nHU = (int *)array1D(sizeof(int), nHL+2);
if(nhflag==NHU_V){
for(i=1;i<=nHL;i++) scanf("%d", nHU+i);
}else{
for(i=1;i<=nHL;i++) nHU[i] = nHU0;
}
nHU[0] = nIU;
nHU[nHL+1] = nOU;
//---(Part 3)-----------------------------------
t = (float *)array1D(sizeof(float), MaxPattern * nOU);
d = (float *)array1D(sizeof(float), nOU * bsz);
oIUor = (float *)array1D(sizeof(float), MaxPattern * nIU);
oIU = (float *)array1D(sizeof(float), lp_no * nIU);
uHU = array2D_u(nHL, nHU, bsz);
zHU = array2D_z(nHL, nHU, bsz);
zdHU = array2D_z(nHL, nHU, bsz);
w = array2D_w(nHL, nHU);
w_min = array2D_w(nHL, nHU);
dw = array2D_w(nHL, nHU);
bias = array2D_bias(nHL, nHU);
bias_min = array2D_bias(nHL, nHU);
dbias = array2D_bias(nHL, nHU);
dtemp = array2D_u(nHL, nHU, bsz);
ones = (float *)array1D(sizeof(float), bsz);
for(i=0;i<bsz;i++) ones[i] = 1.0;
//---(Part 4)-----------------------------------
read_fileBLAS(fname1,oIUor,t,nIU,nOU,lp_no,tp_no);
initializeBLAS(w, dw, bias, dbias, nHU, nHL);
//---(Part 5)-----------------------------------
for(i=0;i<=MaxEpochs;i++){
for(j=0;j<lp_no/bsz;j++){
pbatch = j * bsz;
batchcopy_noiseBLAS(nlev, &oIUor[0], &zHU[0][0],
pbatch, nIU, bsz);
propagationBLAS(uHU, zHU, zdHU, w, bias, nHU, nHL, bsz);
nHL, bsz);
back_propagationBLAS(uHU, zHU, zdHU, t, w, bias,
dw, dbias, ones,
dtemp, nHU, nHL, Alpha, Beta, bsz, pbatch);
}
if((j=lp_no%bsz)>=1){
pbatch = lp_no - bsz;
batchcopy_noiseBLAS(nlev, &oIUor[0], &zHU[0][0],
358 9 Bases for Computer Programming
ib = IDX(k+pbatch, m, nOU);
ic = IDX(m, k, bsz);
d[ia] = t[ib] - zHU[nHL+1][ic];
}
}
ef2 += cblas_sasum(nOU * bsz, &d[0], 1);
}
if ((j = tp_no % bsz) >= 1) {
pbatch = lp_no + tp_no - bsz;
batchcopyBLAS(&oIUor[0], &zHU[0][0],
pbatch, nIU, bsz);
propagationBLAS(uHU, zHU, zdHU, w, bias,
nHU, nHL, bsz);
for(k=0;k<bsz;k++){
for(m=0;m<nOU;m++){
ia = IDX(k, m, nOU);
ib = IDX(k+pbatch, m, nOU);
ic = IDX(m, k, bsz);
d[ia] = t[ib] - zHU[nHL+1][ic];
}
}
ef2 += cblas_sasum(nOU * j, &d[0], 1);
}
printf("%d th Error : %.5e %.5e\n",
i, ef1/lp_no, ef2/tp_no);
if((ef2/tp_no)<=ef2_min){
ef2_min = ef2/tp_no;
iteration_min = i;
store_weightBLAS(w, bias, w_min, bias_min, nHU, nHL);
}
}
}
//---(Part 6)-----------------------------------
show_resultsBLAS(w, bias, w_min, bias_min, nHU, nHL);
return 0;
}
The functions used to allocate memory space for arrays in the part 3 are defined
in DLneuroBLAS_mem . c as follows:
/* DLneuroBLAS_mem.c */
/*---------------------------------------------------*/
void *array1D(size_t size, int row)
{
char *v;
v = (char *)calloc(row, size);
return v;
}
/*---------------------------------------------------*/
float **array2D(int row, int col)
{
float **a;
int i;
a = (float **)calloc(row, sizeof(float *));
a[0] = (float *)calloc(row * col, sizeof(float));
for(i=1;i<row;i++) a[i] = a[i-1] + col;
return a;
}
/*---------------------------------------------------*/
float ***array3D(int x, int y, int z)
{
float ***a;
int i, j;
a = (float ***)calloc(x, sizeof(float **));
a[0] = (float **)calloc(x * y, sizeof(float *));
a[0][0] = (float *)calloc(x * y * z, sizeof(float));
for(i=0;i<x;i++){
a[i] = a[0] + i * y;
for(j=0;j<y;j++) a[i][j] = a[0][0] + i*y*z + j*z;
}
return a;
}
/*---------------------------------------------------*/
float **array2D_bias(int nHL, const int *nHU)
{
float **a;
int i, y;
a = (float **)calloc(nHL + 1, sizeof(float *));
for(i=0,y=0;i<nHL+1;i++) y += nHU[i+1];
a[0] = (float *)calloc(y, sizeof(float));
for(i=0,y=0;i<nHL+1;i++){
a[i] = a[0] + y;
y += nHU[i+1];
}
return a;
}
/*---------------------------------------------------*/
float **array2D_w(int nHL, const int *nHU)
9.2 Computer Programming for Training Phase 361
{
float **a;
int i, m;
a = (float **)calloc(nHL+1, sizeof(float *));
for(i=0,m=0;i< nHL+1;i++) m += nHU[i] * nHU[i+1];
a[0] = (float *)calloc(m, sizeof(float));
for(i=0,m=0;i<nHL+1;i++){
a[i] = a[0] + m;
m += nHU[i+1] * nHU[i];
}
return a;
}
/*---------------------------------------------------*/
float **array2D_u(int nHL, const int *nHU, int bsz)
{
float **a;
int i, y;
a = (float **)calloc(nHL+1, sizeof(float *));
for(i=1,y=0;i<=nHL+1;i++) y += nHU[i] * bsz;
a[0] = (float *)calloc(y, sizeof(float));
for(i=0,y=0;i<nHL+1;i++){
a[i] = a[0] + y;
y += nHU[i+1]*bsz;
}
return a;
}
/*---------------------------------------------------*/
float **array2D_z(int nHL, const int *nHU, int bsz)
{
float **a;
int i, y;
a = (float **)calloc(nHL + 2, sizeof(float *));
for(i=0,y=0;i<nHL+2;i++) y += nHU[i];
a[0] = (float *)calloc(y * bsz, sizeof(float));
for(i=0,y=0;i<nHL+2;i++){
a[i] = a[0] + y;
y += nHU[i]*bsz;
}
return a;
}
/*---------------------------------------------------*/
void free2D(float **a)
{
free(a[0]);
free(a);
}
/*---------------------------------------------------*/
void free3D(float ***a)
362 9 Bases for Computer Programming
{
free(a[0][0]);
free(a[0]);
free(a);
}
/* DLcommonBLAS.c */
/*---------------------------------------------------*/
void a0f(
float *a,
float *da,
float x)
{
float d;
d = (1.0f + tanhf(x))*0.5f ;
*a = d;
*da = d*(1.0f – d) ;
}
/*---------------------------------------------------*/
void a1f(
float *a,
float *da,
float x)
{
float d;
d = (1.0f + tanhf(x))*0.5f ;
*a = d;
*da = d*(1.0f – d) ;
}
/*---------------------------------------------------*/
void clear_deltaBLAS(
float **dtemp,
int nHL,
int *nHU,
int bsz)
{
int i, y;
for(i=1,y=0;i<=nHL+1;i++) y += nHU[i];
y *= bsz;
cblas_sscal(y, 0, dtemp[0], 1);
}
/*---------------------------------------------------*/
void bias_onesBLAS(
float *uHU,
float *bias,
int row,
int bsz)
9.2 Computer Programming for Training Phase 363
{
int i, j;
for(i=0;i<row;i++)
for(j=0;j<bsz;j++) uHU[IDX(i, j, bsz)] = bias[i];
}
/*---------------------------------------------------*/
void batchcopy_noiseBLAS(
float nlev,
float *oIUor,
float *zHU,
int pbatch,
int nIU,
int bsz)
{
int i, j, ia, ib;
for(i=0;i<nIU;i++){
for(j=0;j<bsz;j++){
ia = IDX(i, j, bsz);
ib = IDX(j+pbatch, i, nIU);
zHU[ia] = (1.0f + noise()) * oIUor[ib];
}
}
}
/*---------------------------------------------------*/
void batchcopyBLAS(
float *oIUor,
float *zHU,
int pbatch,
int nIU,
int bsz)
{
int i, j;
for(i=0;i<nIU;i++)
for(j=0;j<bsz;j++)
zHU[IDX(i, j, bsz)] = oIUor[IDX(j+pbatch, i, nIU)];
}
/*---------------------------------------------------*/
void read_fileBLAS(
char *name,
float *o1,
float *t,
int nIU,
int nOU,
int lp_no,
int tp_no)
{
int i, j, k;
FILE *fp;
fp = fopen(name, "r") ;
for(i=0;i<lp_no+tp_no;i++){
fscanf(fp, "%d", &k);
364 9 Bases for Computer Programming
for(j=0;j<nIU;j++)
fscanf(fp,"%e", &o1[IDX(i, j, nIU)]);
for(j=0;j<nOU;j++)
fscanf(fp,"%e", &t[IDX(i, j, nOU)]);
}
fclose(fp);
}
/*---------------------------------------------------*/
void initializeBLAS(
float **w,
float **dw,
float **bias,
float **dbias,
int *nHU,
int nHL)
{
int i, j, k;
for(i=0;i<nHL+1;i++)
for(j=0;j<nHU[i+1];j++)
for(k=0;k<nHU[i];k++)
w[i][IDX(j, k, nHU[i])] = rnd();
for(i=0;i<nHL+1;i++)
for(j=0;j<nHU[i+1];j++) bias[i][j] = rnd();
for(i=0,k=0;i<nHL+1;i++) k += nHU[i] * nHU[i+1];
cblas_sscal(k, 0.0f, dw[0], 1);
for(i=0,k=0;i<nHL+1;i++) k += nHU[i+1];
cblas_sscal(k, 0.0f, dbias[0], 1);
}
/*---------------------------------------------------*/
void store_weightBLAS(
float **w,
float **bias,
float **w_min,
float **bias_min,
int *nHU,
int nHL)
{
int i, k;
for(i=0,k=0;i<nHL+1;i++) k += nHU[i]*nHU[i+1];
cblas_scopy(k, w[0], 1, w_min[0], 1);
for(i=0,k=0;i<nHL+1;i++) k += nHU[i+1];
cblas_scopy(k, bias[0], 1, bias_min[0], 1);
}
/*---------------------------------------------------*/
void show_resultsBLAS(
float **w,
float **bias,
float **w_min,
float **bias_min,
int *nHU,
int nHL)
9.2 Computer Programming for Training Phase 365
{
int i, j, iL;
for(iL=0;iL<=nHL;iL++){
for(i=0;i<nHU[iL];i++){
printf("%5d", i);
for(j=0;j<nHU[iL+1];j++)
printf(" %e", w_min[iL][IDX(j, i, nHU[iL])]);
printf("\n");
}
}
for(iL=0;iL<=nHL;iL++){
for(j=0;j<nHU[iL+1];j++)
printf("%e ", bias_min[iL][j]);
printf("\n");
}
for(iL=0;iL<=nHL;iL++){
for(i=0;i<nHU[iL];i++){
printf("%5d", i);
for(j=0;j<nHU[iL+1];j++)
printf(" %e", w[iL][IDX(j, i, nHU[iL])]);
printf("\n");
}
}
for(iL=0;iL<=nHL;iL++){
for(j=0;j<nHU[iL+1];j++) printf("%e ", bias[iL][j]);
printf("\n");
}
}
/*---------------------------------------------------*/
After compiling, run the program as follows. In this case, the result will be stored
in the result.txt file.
$ echo “1 1000 800 200 20 5 20 3 2 0 5000 100 indata.dat 12345 0.1 0.1
0.001” | ./DLneuroBLAS.exe > result.txt
In the above example, the number of threads is set to 1, the mini-batch size to 20,
the number of hidden layers to 2, the number of units in each hidden layer to 20, the
number of training epochs to 5000, and so on.
366 9 Bases for Computer Programming
In Sects. 9.2.1 and 9.2.2, programs for feedforward neural networks in C have been
presented with a lot of mathematical formulas. The Python program of a feedforward
neural network given here, on the other hand, shows that it can be programmed very
concisely by making use of libraries.
Python is a relatively new programming language introduced in 1991 by Guido
van Rossum. Since many deep learning libraries are designed to be used with Python,
Python has become the indispensable language for deep learning.
There are a number of Python-based libraries for deep learning including
TensorFlow (https://www.tensorflow.org)
Keras (https://keras.io)
PyTorch (https://pytorch.org)
Chainer (https://chainer.org)
We use, here, Keras as a front-end to TensorFlow.
While the above libraries are specialized for deep learning, many other libraries
are also developed and are widely available for using Python not only for machine
learning, including deep learning, but also for general-purpose numerical computa-
tion. In the program of a feedforward neural network discussed here, we use two
libraries as follows:
NumPy (https://numpy.org).
pandas (https://pandas.pydata.org).
The former [4] is a library for matrix and vector operations that is essential in
scientific and engineering calculations, which is commonly used in most numerical
programs in Python, while the latter [11] is a library for operations commonly used
in data analysis, which supports a variety of data formats, and the program discussed
here uses this library for loading input data files.
Now, let us discuss DLneuroPython.py, a Python program for a fully
connected feedforward neural network. This program has the following features
in common with DLneuro.c in Sect. 9.2.1.
Network structure: Fully connected feedforward type.
Number of hidden layers: Arbitrary.
Number of units in a hidden layer: Arbitrary.
Activation function for hidden layers: Sigmoid function.
Activation function for the output layer: Sigmoid function.
Error function: Squared error.
Minimization method: Stochastic gradient descent method.
Note that the activation and error functions are changeable with others. In partic-
ular, for the Python program employed here, they are easily changed to others thanks
to the support of the library.
9.2 Computer Programming for Training Phase 367
The input data sample above is for the case of 5 input data (parameters) and 3
output (teacher) data. The total number of patterns, including training patterns and
verification patterns, is 1000. Each row corresponds to a pattern: the first column is a
sequential number, columns 2–6 are the input data, and columns 7–9 the teacher data.
Both the input and the teacher data are assumed to be single-precision real values.
Now, let’s study the details of the program. For convenience, the program is
divided into nine parts.
Part 1: This is to import the libraries to be used. As discussed above, NumPy
is used for general-purpose array description, pandas for loading data files, and the
Keras for deep learning. Note that Keras is used as a front-end for TensorFlow.
Part 2: This part is to define the function to read the input data file.
pd.read_csv(), a pandas function, is used in the readfile0() function,
where sep= ’ ’ is specified because the input data file employs whitespace char-
acters as field separators. All the rows and columns in the input file are read into the
array arr using pd.read_csv(), and then the input data are stored in the array
d and the teacher data in the array t using np.array() of the NumPy library.
When reading the input file, the columns are deleted if any unnecessary sequential
numbers exist in the rflag columns at the beginning of each line of the input data.
The readfile() function, dividing the input data array d and the teacher data
array t read by the readfile0() function into d_train and t_train for
training and d_test and t_test for verification, respectively, stores them in the
arrays.
Part 3: This part is the beginning of the definition section of the main() function.
In this program, various parameters are given as command line arguments at startup.
(See Table 9.13).
Part 4: Here, the numbers of units in hidden layers are set. When nhflag is 0,
each number of units in all the hidden layers is the same, nHU0. On the other hand,
when nhflag is set to 1, the numbers must be given by the command line arguments
at startup.
Provided that the number of hidden layers is three and that of units in all the
hidden layers 10, we specify the following,
$python DLneuroPython.py 800 200 5 10 3 3 0 100 sample.dat 13721 0.1
0.1 1 wm res
If we want to set the numbers of units in hidden layers to 20, 15, and 5, respectively,
we specify the following,
$python DLneuroPython.py 800 200 5 10 3 3 1 100 sample.dat 13721 0.1
0.1 1 wm res 20 15 5
Part 5: Using readfile() defined in Part 2, input data are divided and stored
in the input data array d_train and the teacher data array t_train for training,
and the input data array d_test and the teacher data array t_test for verification.
After that, the seed of the random number generator is set to initialize the weights.
Part 6: The configuration of a feedforward neural network is determined based
on the number of input parameters (number of units in the input layer) nIU, that of
9.2 Computer Programming for Training Phase 369
units in the output layer nOU, that of hidden layers nHL, and that of units in each
hidden layer nHU[]. The activation function is specified as the sigmoid function
with activation = "sigmoid".
Other options for the activation function include the following functions.
’sigmoid’ sigmoid function (see Eq. (2.1.3) in Sect. 2.1).
’tanh’ tanh function (see Eq. (2.1.4) in Sect. 2.1).
’relu’ ReLU function (see Eq. (2.1.5) in Sect. 2.1).
’linear’ linear function (see Eq. (2.1.10) in Sect. 2.1).
While the sum of squared errors is selected with loss =
’mean_squared_error’ as the error function, there are other choices
for the error function as follows:
’mean_squared_error’ mean squared error.
’mean_absolute_error’ mean absolute error.
’categorical_crossentropy’ crossentropy (for classification prob-
lems).
Regarding the optimization method, stochastic gradient descent (SGD) is specified
by optimizer = sgd, but there are various high performance optimization
methods as
SGD Stochastic Gradient Descent (Sect. 2.3.1).
RMSprop RMSProp [19] (Sect. 2.3.2).
Adagrad AdaGrad [2] (Sect. 2.3.2).
Adam Adam [8] (Sect. 2.3.3).
At the end of Part 6, the shape of the neural network constructed is output by
ffnn . summary().
Part 7: The training of the neural network is completed with a single line of
history = ffnn . fit(), which sets the Checkpoint and stores the
connection weights that minimizes the verification error.
Part 8: The trained neural network is stored.
Part 9: This part specifies the condition for the main function to start running.
The full code of DLneuroPython . py is shown below.
# DLneuroPython.py
#----------(Part 1)----------
import sys
import numpy as np
import pandas as pd
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras import optimizers
from tensorflow.keras import models
from tensorflow.keras import callbacks
#----------(Part 2)----------
370 9 Bases for Computer Programming
def readfile0(num,nIU,nOU,rflag,fname):
arr = pd.read_csv(fname,header=None,sep=’ ’,nrows=num)
d = np.array(arr.iloc[0 : num, rflag : \
rflag+nIU]) .astype (’float’)
t = np.array(arr.iloc[0 : num, rflag+nIU : \
rflag+nIU+nOU]) .astype(’float’)
return d, t
#---------------
def readfile(ntrain,ntest,nIU,nOU,rflag,fname,sep=’ ’):
d, t = readfile0(ntrain+ntest,nIU,nOU,rflag,fname)
d_train = d[0:ntrain]
d_test = d[-ntest:]
t_train = t[0:ntrain]
t_test = t[-ntest:]
return d_train,t_train,d_test,t_test
#----------(Part 3)----------
def main():
argv = sys.argv
argc = len(argv)
argn =0
lp_no = int(argv[1])
tp_no = int(argv[2])
nIU = int(argv[3])
nHU0 = int(argv[4])
nOU = int(argv[5])
nHL = int(argv[6])
nhflag = int(argv[7])
MaxEpochs = int(argv[8])
i_fname = argv[9]
rseed = int(argv[10])
Alpha = float(argv[11])
Moment = float(argv[12])
rflag = int(argv[13])
wmin_dir = argv[14]
o_fname = argv[15]
argn=16
#----------(Part 4)----------
nHU = np.array(nIU)
if nhflag == 0:
for i in range(nHL):
nHU = np.append(nHU,nHU0)
else:
for i in range(nHL):
nHU = np.append(nHU,int(argv[argn]))
argn += 1
nHU = np.append(nHU,nOU)
9.2 Computer Programming for Training Phase 371
#----------(Part 5)----------
d_train,t_train,d_test,t_test=readfile(lp_no,tp_no,nIU,\
nOU,rflag,i_fname,sep=’ ’)
np.random.seed(rseed)
#----------(Part 6)----------
ffnn = models.Sequential()
ffnn.add(layers.Dense(units=nHU[1],activation="sigmoid", \
input_shape=(nIU,)))
if nHL > 1:
for j in range(2,nHL+1):
ffnn.add(layers.Dense(units=nHU[j], \)
activation="sigmoid"))
ffnn.add(layers.Dense(units=nOU,activation="sigmoid"))
sgd = optimizers.SGD(learning_rate=Alpha, momentum=Moment)
ffnn.compile(loss=’mean_squared_error’, optimizer=sgd)
ffnn.summary()
#----------(Part 7)----------
checkpoint_path = wmin_dir
cp = callbacks.ModelCheckpoint(checkpoint_path, \
monitor=’val_loss’,save_best_only=True, \
save_weights_only=True,verbose=1)
history = ffnn.fit(d_train, t_train, epochs=MaxEpochs,\
callbacks=[cp],validation_data=(d_test, t_test))
#----------(Part 8)----------
ffnn.save(o_fname)
#----------(Part 9)----------
if __name__ == ’__main__’:
main()
One of the major factors that have led to the rise of deep learning is the success of
the convolutional neural networks (CNNs), which are well suited for handling data
such as images and audio.
Using Python (especially Tensorflow + Keras), a basic program of a convolutional
neural network is taken. We discuss here the MNIST (Modified National Institute of
Standards and Technology database) handwritten number identification problem.
372 9 Bases for Computer Programming
#-----(Part 1)-----
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras import models
#-----(Part 2)-----
nClass = 10
iShape = (28, 28, 1)
(x_train,y_train),(x_test,y_test)= \
keras.datasets.mnist.load_data()
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)
y_train = keras.utils.to_categorical(y_train, nClass)
y_test = keras.utils.to_categorical(y_test, nClass)
In Part 2, data for training and verification are prepared. nClass is the number
of classification categories, set as nClass = 10 since the problem is to identify
handwritten numbers from 0 to 9. iShape means the format of the image. Since
the image size is 28 × 28 and the number of channels is 1 (grayscale), iShape is
set to (28, 28, 1). For RGB color image of the same size, it would be (28, 28, 3).
Keras has a dedicated function to load MNIST image data. Therefore, it
is easy to complete the data loading process by simply calling the function
keras.datasets.mnist.load_data(). With only this, x_train and
y_train will store 60,000 image data and labels (teacher data) respectively, and
x_test and y_test 10,000 image data and labels (teacher data), respectively.
9.2 Computer Programming for Training Phase 373
#-----(Part 3)-----
cnn_mnist = models.Sequential()
cnn_mnist.add(layers.Conv2D(32, kernel_size=(3, 3),\
activation="relu",input_shape=iShape))
cnn_mnist.add(layers.MaxPooling2D(pool_size=(2, 2)))
cnn_mnist.add(layers.Conv2D(64, kernel_size=(3, 3), \
activation="relu"))
cnn_mnist.add(layers.MaxPooling2D(pool_size=(2, 2)))
cnn_mnist.add(layers.Flatten())
cnn_mnist.add(layers.Dropout(0.5))
cnn_mnist.add(layers.Dense(units=nClass, \
activation="softmax"))
cnn_mnist.summary()
cnn_mnist.compile(loss="categorical_crossentropy",\
optimizer="adam", metrics=["accuracy"])
#-----(Part 4)-----
minibatch_size = 100
epochs = 20
cnn_mnist.fit(x_train, y_train, batch_size=minibatch_size, \
epochs=epochs,\
validation_data=(x_test, y_test))
Model: "sequential"
Layer (type)n Output Shape Param #
conv2d (Conv2D) (None, 26, 26, 32) 320
max_pooling2d (MaxPooling2D) (None, 13, 13, 32) 0
conv2d_1 (Conv2D) (None, 11, 11, 64) 18496
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64) 0
flatten (Flatten) (None, 1600) 0
dropout (Dropout) (None, 1600) 0
dense (Dense) (None, 10) 16010
Total params: 34,826
Trainable params: 34,826
Non-trainable params: 0
where a is the number of output channels, b the number of input channels, c and d
the filter size, respectively, and e corresponds to the bias, which is usually 1.
In the above example, the number of parameters for the first conv2d layer, the
second conv2d layer, and the last dense layer are calculated, respectively, as
follows:
Thus, about 35,000 tunable parameters are to be learned in the CNN employed
for the above program.
Note that some well-known CNNs that have achieved top results in the ILSVRC
(ImageNet Large Scale Visual Recognition Challenge) [16] are available and easily
tested with Keras. Here, we show examples for VGG16 [17] and ResNet [5]. These
trained models can be loaded by the dedicated loading functions as follows:
model = tf.keras.applications.vgg16.VGG16(weights=’imagenet’)
model = tf.keras.applications.ResNet50(weights=’imagenet’)
9.2 Computer Programming for Training Phase 375
Model: "vgg16"
Layer (type) Output Shape Param #
input_1 (InputLayer) [(None, 224, 224, 3)] 0
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792
block1_conv2 (Conv2D) (None, 224, 224, 64) 36928
block1_pool (MaxPooling2D) (None, 112, 112, 128) 0
block2_conv1 (Conv2D) (None, 112, 112, 128) 73856
block2_conv2 (Conv2D) (None, 112, 112, 128) 147584
block2_pool (MaxPooling2D) (None, 56, 56, 256) 0
block3_conv1 (Conv2D) (None, 56, 56, 256) 295168
block3_conv2 (Conv2D) (None, 56, 56, 256) 590080
block3_conv3 (Conv2D) (None, 56, 56, 256) 590080
block3_pool (MaxPooling2D) (None, 28, 28, 512) 0
block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160
block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808
block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808
block4_pool (MaxPooling2D) (None, 14, 14, 512) 0
block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808
block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808
block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
flatten (Flatten) (None, 25088) 0
fc1 (Dense) (None, 4096) 102764544
fc2 (Dense) (None, 4096) 16781312
predictions (Dense) (None, 1000) 4097000
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0
Model: "resnet50"
Layer (type) Output Param # Connected to
Shape
input_2 (InputLayer) [(None, 0
224,
224, 3)
(continued)
376 9 Bases for Computer Programming
(continued)
conv1_pad (ZeroPadding2D) (None, 0 input_2[0][0]
230,
230, 3)
conv1_conv (Conv2D) (None, 9472 conv1_pad[0][0]
112,
112,
64)
conv1_bn (None, 256 conv1_conv[0][0]
(BatchNormalization) 112,
112,
64)
conv1_relu (Activation) (None, 0 conv1_bn[0][0]
112,
112,
64)
pool1_pad (ZeroPadding2D) (None, 0 conv1_relu[0][0]
114,
114,
64)
pool1_pool (MaxPooling2D) (None, 0 pool1_pad[0][0]
56, 56,
64)
conv2_block1_1_conv (None, 4160 pool1_pool[0][0]
(Conv2D) 56, 56,
64)
..(many lines are (None, 1050624
omitted)...... 7, 7,
conv5_block3_3_conv 2048)
(Conv2D)
conv5_block3_2_relu[0][0]
conv5_block3_3_bn (None, 8192
(BatchNormali 7, 7,
conv5_block3_3_conv[0][0] 2048)
conv5_block3_add (Add) (None, 0 conv5_block2_out[0][0]
conv5_block3_3_bn[0][0] 7, 7,
2048)
conv5_block3_out (None, 0 conv5_block3_add[0][0]
(Activation) 7, 7,
2048)
avg_pool (None, 0 conv5_block3_out[0][0]
(GlobalAveragePooling2 2048)
predictions (Dense) (None, 2049000 avg_pool[0][0]
1000)
Total params: 25,636,712
Trainable params: 25,583,592
Non-trainable params: 53,120
9.2 Computer Programming for Training Phase 377
Here is a typical program for using image data prepared by user as training data.
Let’s take the procedure for loading training data as follows:
(0) Prepare a file data.txt containing a list of image data and labels (teacher
data).
(1) Read image file names and labels (teacher data) from data.txt.
(2) Read image data using the filenames read in (1).
Below is a sample of data.txt.
1 img0001.jpg 5
2 img0002.jpg 2
3 img0003.jpg 7
...
1000 img1000.jpg 3
When using Keras, image data in various formats can be easily loaded with the
keras.preprocessing.image library.
For example, if the data.txt and all the image files listed in the file are in the
execution directory, the following program can read them. Here, the image size is
assumed to be 64 × 64. After loading image data, such operations as the conversion of
teacher data to one-hot encoding and the division of data into training and verification
data, which are explained in the previous subsubsection, should be performed.
import numpy as np
from tensorflow import keras
from tensorflow.keras.preprocessing.image import load_img,
378 9 Bases for Computer Programming
img_to_array
#-------------------
num = 1000
ffname = ’data.txt’
xsize = 64
ysize = 64
#-------------------
ts = []
fname = []
with open(ffname,’r’) as f:
for line in f:
elements = line.split(‘ ‘)
fname.append(elements[1])
ts.append(elements[2].rstrip(‘\n’))
tsi = [int(s) for s in ts]
Y = np.array(tsi)
X=[]
for i in range(num):
img = img_to_array(load_img(fname[i], target_size=(xsize,
ysize)))
X.append(img)
References
1. Cottrell, J.A., Hughes, T.J.R., Bazilevs, Y.: Isogeometric Analysis. Wiley (2009)
2. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic
optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)
3. Field, D.A.: Qualitative measures for initial meshes. Int. J. Numer. Methods Eng. 47, 887–906
(2000)
4. Harris, C.R., Millman, K.J., van der Walt, S.J., Gommers, R., Virtanen, P., Cournapeau, D.,
Wieser, E., Taylor, J., Berg, S., Smith, N.J., Kern, R., Picus, M., Hoyer, S., van Kerkwijk, M.H.,
Brett, M., Haldane, A., del Río, J.F., Wiebe, M., Peterson, P., Gérard-Marchant, P., Sheppard,
K., Reddy, T., Weckesser, W., Abbasi, H., Gohlke, C., Oliphant, T.E.: Array programming with
NumPy. Nature 585, 357–362 (2020). https://doi.org/10.1038/s41586-020-2649-2.
5. He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition. 2016
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016,
pp. 770–778, https://doi.org/10.1109/CVPR.2016.90.
6. Hughes, T.J.R., Cottrell, J.A., Bazilevs, Y.: Isogeometric Analysis: CAD, finite elements,
NURBS, exact geometry, and mesh refinement. Comput. Methods Appl. Mech. Eng. 194,
4135–4195 (2005)
7. Kernighan, B.W., Ritchie, D.M.: The C programming language (Second Edition). Prentice Hall
(1988)
8. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. in the 3rd International
Conference for Learning Representations (ICLR), San Diego, 2015, arXiv:1412.6980
9. Knupp, P.M.: A method for hexahedral mesh shape optimization. Int. J. Numer. Methods Eng.
58, 319–332 (2003)
References 379
10. Knupp, P.M.: Algebraic mesh quality metrics for unstructured initial meshes. Finite Elem.
Anal. Des. 39, 217–241 (2003)
11. McKinney, W.: Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython
(2nd edition). O’Reilly (2017)
12. Piegl, L., Tiller, W.: The NURBS Book 2nd ed. Springer (2000)
13. Plauger, P.J.: The standard C library. Prentice Hall (1992)
14. Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C: The
Art of Scientific Computing (Second Edition). Cambridge University Press (1992). (http://num
erical.recipes)
15. Rogers, D.F.: An Introduction to NURBS with Historical Perspective. Academic Press (2001)
16. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A.,
Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet Large Scale Visual Recogni-
tion Challenge. Int. J. Comput. Vis. 115, 211–252 (2015). https://doi.org/10.1007/s11263-015-
0816-y
17. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image
recognition. ICLR 2015, arXiv: 1409.1556, 2015.
18. Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: A
simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958
(2014)
19. Tieleman, T., Hinton, G.: Lecture 6.5-rmsprop: Divide the gradient by a running average of its
recent magnitude. COURSERA: Neural networks for machine learning 4(2), 26–31 (2012)
Chapter 10
Computer Programming
for a Representative Problem
The problem to be solved here is the same as the one in Sect. 4.8, i.e.,
Problem Estimate the optimal number of integration points for the elemental
integration of an element from its shape features.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 381
G. Yagawa and A. Oishi, Computational Mechanics with Deep Learning,
Lecture Notes on Numerical Methods in Engineering and Sciences,
https://doi.org/10.1007/978-3-031-11847-0_10
382 10 Computer Programming for a Representative Problem
Training Phase
Preprocessing of input data
Deep learning
Application Phase
Calculation of shape features
Inference
Postprocessing
First, a number of training patterns are to be generated for deep learning by the
procedure as follows:
1 Generate a large number of elements
2 Calculate some shape features of each element
3 Calculate the optimal number of integration points for each element.
In this manner, a large number of data pairs (shape features, optimal number of
integration points) are created.
/* elemgen.c */
#include <stdio.h>
#include <stdlib.h>
#define rnode drand48()*(cmax-cmin)+cmin
int main(void)
{
int i,j,k,nel;
double node0[8][3],node[8][3],cmin,cmax,rseed;
/*---Part 1--------*/
scanf(“%le %le %d %d”,&cmin,&cmax,&nel,&rseed);
srand48(rseed);
node[0][0] = 0.0; node[0][1] = 0.0; node[0][2] = 0.0;
node[1][0] = 1.0; node[1][1] = 0.0; node[1][2] = 0.0;
node0[2][0] = 1.0; node0[2][1] = 1.0; node0[2][2] = 0.0;
node0[3][0] = 0.0; node0[3][1] = 1.0; node0[3][2] = 0.0;
node0[4][0] = 0.0; node0[4][1] = 0.0; node0[4][2] = 1.0;
node0[5][0] = 1.0; node0[5][1] = 0.0; node0[5][2] = 1.0;
node0[6][0] = 1.0; node0[6][1] = 1.0; node0[6][2] = 1.0;
node0[7][0] = 0.0; node0[7][1] = 1.0; node0[7][2] = 1.0;
/*---Part 2--------*/
printf(“%d\n”,nel);
for(i=0;i<nel;i++){
for(j=2;j<8;j++){
for(k=0;k<3;k++) node[j][k] = node0[j][k] + rnode ;
}
node[3][2] = 0.0;
for(j=0;j<8;j++){
printf(“%d %d”,i,j);
for(k=0;k<3;k++) printf(“ %e”,node[j][k]);
printf(“\n”);
}
}
return 0;
}
In Part 1 of elemgen . c, the nodal coordinates of the basic cubic elements are
set, and in Part 2, they are modified using random numbers and the results are output.
This program is compiled by
$ cc –o elemgen.exe elemgen.c
And it is executed by
$ echo "-0.1 0.1 1000 12345" | ./elemgen.exe > elem_node.dat
In this case, the nodal coordinates of each element are stored in elem_node .
dat.
10.2 Data Preparation Phase 385
Here, several shape features are calculated for each generated element. Selected
features are
A The maximum and the minimum values of the lengths of edges
B The maximum and the minimum values of the angles between edges
C The maximum and the minimum values of the angles between faces
D AlgebraicShapeMetric.
The detail of each parameter above is given in Sect. 9.1.2. Features A, B, and C are
calculated using the program ElementShape . c (Sect. 9.1.2), and the feature D
using ElementShapeMetric . c (Sect. 9.1.2). A sample program elemshape
. c is shown as
/* elemshape.c */
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include “ElementShapeMetric.c” //Section 9.1.2
#include “ElementShape.c” //Section 9.1.2
int main(void)
{
int i,j,k,ia,ib,nel,nfpn=3,nnpe=8,elem[8]={0,1,2,3,4,5,6,7};
double eshape[7],**node;
scanf(“%d”,&nel);
printf(“%d\n”,nel);
node = (double **)malloc(nnpe*sizeof(double *));
for(i=0;i<nnpe;i++) node[i] = (double *)malloc(nfpn*sizeof
(double));
for(i=0;i<nel;i++){
for(j=0;j<nnpe;j++){
scanf(“%d %d”,&ia,&ib);
for(k=0;k<nfpn;k++) scanf(“%le”,node[j]+k);
}
check_shape(eshape,elem,node,nfpn);
eshape[6] = shape_metric(elem,node,nnpe,nfpn);
printf(“%d”,i);
for(j=0;j<7;j++) printf(“ %e”,eshape[j]);
printf(“\n”);
}
return 0;
}
Then, evaluating the convergence of the elemental integral for each generated
element, the optimal number of integration points is achieved. The procedure is
as follows:
(1) Read nodal coordinates of an element
(2) Calculate the element stiffness matrix esm0 with the number of integration
points per axis qmax being 30
(3) Set q = 2
(4) Calculate the element stiffness matrix esm with q integration points per axis
(5) Calculate and record the difference between esm0 and esm based on Error,
defined as Eq. (4.3.2) in Sect. 4.3
(6) Set q = q + 1
(7) If q = 30, go to (1) to evaluate the next element; if q < 30, go to (4).
To calculate an element stiffness matrix in (2) and (4), the function esm3D08()
(Sect. 9.1.1) is used. A sample program for this process, elemconv . c, is shown
as
/* elemconv.c */
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include "esm3D08.c" //Section 9.1.1
/*----------------------------------------*/
void gausslg(
double *weight,
double *pos,
int ngauss)
{
int i,j,k,ia,ib,ic;
double w,a,ww,v,p,q,r,d0,d1,d2,p1,p2,u,df;
ia = ngauss ;
ib = ngauss/2 ;
w = 1.0 * ngauss ;
a = 3.1415926535897932/(w+w) ;
10.2 Data Preparation Phase 387
ww = w*(w + 1.0)*0.5 ;
for(k=1;k<=ib;k++){
v = cos(a*(2*k-1)) ;
Loop1: ;
p = 1.0 ;
q=v;
for(j=2;j<=ngauss;j++){
r = ((2*j-1)*v*q - (j-1)*p)/j ;
p=q;
q=r;
}
u = (1.0 - v)*(1.0 + v) ;
d0 = (p - v*q)*w/u ;
d1 = (v*d0 - ww*q)/u ;
d2 = (q*d1/(d0*d0)+1.0)*q/d0 ;
v -= d2 ;
if(fabs(d2) >= 1.0e-16) goto Loop1 ;
df = d2*v/u ;
weight[k-1] = 2.0/(w*d0*(1.0 - df*w)*p*(1.0 - df*2.0)) ;
pos[k-1] = v ;
}
if(ib*2 < ngauss){
d0 = 1.0 ;
for(j=1;j<=ib;j++) d0 = (1.0 + 0.5/j)*d0 ;
weight[ib] = 2.0/(d0*d0) ;
pos[ib] = 0.0 ;
}
for(i=0;i<ib;i++){
weight[ngauss-1-i] = weight[i] ;
pos[ngauss-1-i] = pos[i] ;
pos[i] *= -1.0 ;
}
}
/*---------------------------------------------*/
double set_refdata(
double **esm,
int edim)
{
int i,j ;
double d1 ;
d1 = esm[0][0] ;
for(i=0;i<edim;i++){
for(j=0;j<edim;j++){
if(esm[i][j] > d1) d1 = esm[i][j] ;
}
}
return d1 ;
}
/*---------------------------------------------*/
double check_esm(
388 10 Computer Programming for a Representative Problem
double **esm,
double **esm0,
double ref_value,
int edim)
{
int i,j,k ;
double sum ;
for(i=0,sum=0.0;i<edim;i++){
for(j=0;j<edim;j++) sum += fabs(esm[i][j] - esm0[i][j]) ;
}
return sum/ref_value ;
}
/*------------------------------------------------*/
int main()
{
int i,j,k,ia,ib,ig,nel,nfpn=3,elem[8]={0,1,2,3,4,5,6,7};
double **esm,**esm0,*gc,*gw,**node,mate[2]={2.0e11,0.3};
int max_ngp=30,nnpe=8,edim=24;
double ref_value,chk_data;
node = (double **)malloc(nnpe*sizeof(double *));
for(i=0;i<nnpe;i++) node[i] = (double *)malloc(nfpn*sizeof
(double));
esm = (double **)malloc(edim*sizeof(double *));
for(i=0;i<edim;i++)esm[i] = (double *)malloc(edim*sizeof
(double));
esm0 = (double **)malloc(edim*sizeof(double *));
for(i=0;i<edim;i++)esm0[i] = (double *)malloc(edim*sizeof
(double));
gc = (double *)malloc(max_ngp*sizeof(double)) ;
gw = (double *)malloc(max_ngp*sizeof(double)) ;
scanf("%d",&nel);
printf("%d\n",nel);
for(i=0;i<nel;i++){
for(j=0;j<8;j++){
scanf("%d %d",&ia,&ib);
for(k=0;k<nfpn;k++) scanf("%le",node[j]+k);
}
ig = max_ngp ;
gausslg(gw,gc,ig) ;
esm3D08(elem,node,mate,esm0,ig,gc,gw,nfpn);
ref_value = set_refdata(esm0,edim) ;
for(ig=2;ig<max_ngp;ig++){
gausslg(gw,gc,ig) ;
esm3D08(elem,node,mate,esm,ig,gc,gw,nfpn);
10.2 Data Preparation Phase 389
chk_data = check_esm(esm,esm0,ref_value,edim) ;
printf("%d %d %e\n",i,ig,chk_data) ;
}
}
}
/* elemngp.c */
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int i,j,ia,ib,nel;
double er[30],th=1.0e-7;
scanf("%d",&nel);
printf("%d\n",nel);
for(i=0;i<nel;i++){
for(j=2;j<30;j++) scanf("%d %d %le",&ia,&ib,er+j);
for(j=2;j<30;j++)if(er[j]<th) break;
printf("%d %d\n",i,j);
}
390 10 Computer Programming for a Representative Problem
return 0;
}
In this case, the results, optimal numbers of integration points for each element,
are stored in elem_ngp . dat.
With the processes above, we collect the data for each of nel elements as
Shape features of the element: e lem_shape.dat
Optimal number of integral points of the element: e lem_ngp.dat
Now, a feedforward neural network is trained using the shape features of the elements
collected in Sect. 10.2, elem_shape . dat, as the input data and the optimal
number of integration points of the elements, elem_ngp . dat, as the teacher
data, respectively. DLneuro . c, described in Sect. 9.2.1, is used to construct the
feedforward neural network for this problem.
As a preprocessing for training patterns for the neural network, the following data
conversion is performed. Since the size and range of the shape features are different
depending on parameters, we first transform all the parameters so that each of them
falls within the range of [0.0, 1.0].
Next, for the teacher data (output data), the number of units in the output layer is
set as equal to that of categories to use the one-hot encoding, where only one unit
outputs 1 while the other units 0. Thus, the procedure to create training patterns for
the feedforward neural network is written as follows:
(1) Conversion of the input data to [0.0, 1.0].
(2) Conversion of the teacher data to the one-hot encoding.
(3) Integration to training patterns (see Sect. 9.2.1 for the training pattern format
for DLneuro.c).
Here, a sample program for 0–1 conversion, shapeNN . c, is given as follows:
/* shapeNN.c */
#include <stdio.h>
#include <stdlib.h>
10.3 Training Phase 391
int main(void)
{
int i,j,k,ia,ib,nel;
double **shape,smin[7],smax[7],swidth[7];
FILE *fp;
/*---Part 1-----------*/
fp = fopen("elem_shape.dat","r");
fscanf(fp, "%d",&nel);
shape = (double **)malloc(nel*sizeof(double *));
for(i=0;i<nel;i++) shape[i] = (double *)malloc(7*sizeof
(double));
for(i=0;i<nel;i++){
fscanf(fp,"%d",&ia);
for(j=0;j<7;j++) fscanf(fp,"%le",shape[i]+j);
}
fclose(fp);
/*---Part 2------------*/
for(i=0;i<7;i++) smax[i] = -1.0e30;
for(i=0;i<7;i++) smin[i] = 1.0e30;
for(i=0;i<nel;i++){
for(j=0;j<7;j++){
if(shape[i][j] > smax[j]) smax[j] = shape[i][j] ;
if(shape[i][j] < smin[j]) smin[j] = shape[i][j] ;
}
}
for(j=0;j<7;j++) swidth[j] = smax[j] - smin[j] ;
/*---Part 3------------*/
for(i=0;i<nel;i++){
for(j=0;j<7;j++) shape[i][j] = (shape[i][j] –
smin[j])/swidth[j] ;
}
/*---Part 4------------*/
printf("%d\n",nel);
for(i=0;i<nel;i++){
printf("%d",i);
for(j=0;j<7;j++) printf(" %e",shape[i][j]);
printf("\n");
}
for(j=0;j<7;j++) printf("%d %e %e\n",j,smin[j],smax[j]);
return 0;
}
In the code above, Part 1 is a data reading section, Part 2 calculates the maximum
and minimum values of each parameter, Part 3 converts each parameter to 0–1, and
Part 4 writes the converted data and the maximum and minimum values of each
parameter. Note that the maximum and minimum values of each parameter obtained
here are required later in the Application Phase.
392 10 Computer Programming for a Representative Problem
And it is executed as
$ ./shapeNN.exe > elem_shapeNN.dat
/* ngpNN.c */
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int i,j,k,ia,ib,nel,*ngp,**t,gmin,gmax,gcat;
FILE *fp;
/*---Part 1-----------*/
fp = fopen("elem_ngp.dat","r");
fscanf(fp,"%d",&nel);
ngp = (int *)malloc(nel*sizeof(int));
for(i=0;i<nel;i++) fscanf(fp,"%d %d",&ia,ngp+i);
fclose(fp);
/*---Part 2------------*/
gmin = 1000; gmax = -1000 ;
for(i=0;i<nel;i++){
if(ngp[i] > gmax) gmax = ngp[i] ;
if(ngp[i] < gmin) gmin = ngp[i] ;
}
gcat = gmax - gmin + 1 ;
/*---Part 3------------*/
t = (int **)malloc(nel*sizeof(int *));
for(i=0;i<nel;i++) t[i] = (int *)malloc(gcat*sizeof(int));
for(i=0;i<nel;i++){
for(j=0;j<gcat;j++) t[i][j] = 0 ;
}
for(i=0;i<nel;i++) t[i][ngp[i]-gmin] = 1 ;
/*---Part 4------------*/
printf("%d\n",nel);
for(i=0;i<nel;i++){
printf("%d",i);
for(j=0;j<gcat;j++) printf(" %d",t[i][j]);
printf("\n");
}
printf("%d %d\n",gmin,gmax);
return 0;
}
10.3 Training Phase 393
In the code above, Part 1 is the data loading part, Part 2 calculates the maximum
and minimum optimal number of integration points and determines the number of
categories (number of output units) gcat, Part 3 converts the data to teacher data
by one-hot encoding, and Part 4 writes the converted teacher data.
This program is compiled as follows:
$ cc –o ngpNN.exe ngpNN.c
And it is executed as
$ ./ngpNN.exe > elem_ngpNN.dat
/* patternNN.c */
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int i,j,k,ia,ib,nel,n_in,n_out;
float f1,f2,f3;
FILE *fp1,*fp2;
scanf("%d %d",&n_in,&n_out);
/*---Part 1-----------*/
fp1 = fopen("elem_shapeNN.dat","r");
fscanf(fp1,"%d",&nel);
fp2 = fopen("elem_ngpNN.dat","r");
fscanf(fp2,"%d",&nel);
/*---Part 2------------*/
for(i=0;i<nel;i++){
printf("%d",i);
fscanf(fp1,"%d",&ia);
for(j=0;j<n_in;j++){
fscanf(fp1,"%e",&f1);
printf(" %e",f1);
}
fscanf(fp2,"%d",&ia);
for(j=0;j<n_out;j++){
fscanf(fp2,"%d",&ib);
printf(" %d",ib);
}
printf("\n");
}
/*---Part 3------------*/
fclose(fp1);
394 10 Computer Programming for a Representative Problem
fclose(fp2);
return 0;
}
where Part 1 opens both the input and teacher data files for reading, Part 2 reads and
writes a training pattern one by one, and Part 3 closes the files.
This program is compiled as follows:
$ cc –o patternNN.exe patternNN.c
In this case, the complete training data are stored in patternNN . dat.
So far, the training patterns for the feedforward neural network are prepared. Now,
let us start deep learning with DLneuro . c discussed in Sect. 9.2.1.
DLneuro . c is compiled as follows:
$ cc –O3 –o DLneuro.exe DLneuro.c –lm
DLneuro . exe is executed and its results are stored in result . dat as
follows:
$ echo “1000 800 5 20 3 2 0 5000 100 patternNN.dat 12345 0.1 0.1 0.001” |
./DLneuro.exe > result.dat
In the above example, it is assumed that there are two hidden layers, the number
of units in each hidden layer is 20, and the number of training epochs is 5000. Among
the 1000 training patterns, 800 patterns are used for training and 200 patterns for
verification (training monitoring). (See Sect. 9.2.1 for details of DLneuro . c.)
Note that many trials are needed to find the conditions that give the best results
among various settings of meta-parameters such as the number of hidden layers, that
of units in each hidden layer, and the learning coefficients,
It is also known that the initial settings of the connection weights and biases affect
the results; therefore it is necessary to try multiple random sequences to initialize the
connection weights and biases.
It is concluded that the best feedforward neural network for a problem should
be determined by finding the best combination of network structure and training
conditions from the results of many calculations.
biases of the neural network trained in the Training Phase. The program of DAneuro
. c is shown as
/* DAneuro.c */
#include "nrutil.c"
#include <math.h>
#define FNAMELENGTH 100
#define NHU_V 1
#define NHU_C 0
#define Mom1 0.1
#define Mom2 0.1
#include "DAcommon.c"
#include "DLebp.c" //Section 9.2.1
/*----------------------------------------*/
int main(void)
{
int i,j,k,i1,j1,rseed,MaxPattern,nIU,nOU,*nHU,nHL,nHU0,
nhflag;
float *zOU,**zIU,**zHU,***w,**bias,**zdHU,*zdOU;
char fname1[FNAMELENGTH],fname2[FNAMELENGTH];
FILE *fp;
/*------------------------------------*/
scanf("%d %d %d %d %d %d %s %s",
&MaxPattern,&nIU,&nHU0,&nOU,&nHL,&nhflag,fname1,fname2);
/*----------------------------------*/
nHU = ivector(0,nHL+1);
if(nhflag == NHU_V){
for(i=1;i<=nHL;i++) scanf("%d",nHU+i);
}else{
for(i=1;i<=nHL;i++) nHU[i] = nHU0 ;
}
nHU[0] = nIU ;
nHU[nHL+1] = nOU ;
/*-----------------------------------------*/
zIU = matrix(0,MaxPattern-1,0,nIU-1) ;
zHU = (float **)malloc((nHL+2)*sizeof(float *));
for(i=0;i<nHL+2;i++) zHU[i] = vector(0,nHU[i]-1);
zdHU = (float **)malloc((nHL+2)*sizeof(float *));
for(i=0;i<nHL+2;i++) zdHU[i] = vector(0,nHU[i]-1);
zOU = vector(0,nOU-1) ;
zdOU = vector(0,nOU-1) ;
w = (float ***)malloc((nHL+1)*sizeof(float **));
for(i=0;i<=nHL;i++) w[i] = matrix(0,nHU[i+1]-1,0,nHU[i]-
1) ;
bias = (float **)malloc((nHL+2)*sizeof(float *));
for(i=0;i<=nHL+1;i++) bias[i] = vector(0,nHU[i]-1) ;
/*------------------------------------*/
read_fileA(fname1,zIU,nIU,MaxPattern);
load_weight(fname2,w,bias,nIU,nHU,nOU,nHL);
396 10 Computer Programming for a Representative Problem
/*----------------------------------*/
for(i=0;i<MaxPattern;i++){
propagation(i,zIU,zHU,zdHU,zOU,zdOU,w,bias,nIU,nHU,
nOU,nHL);
printf("%d",i);
for(j=0;j<nOU;j++) printf(" %e",zOU[j]);
printf("\n");
}
return 0 ;
}
/* DAcommon.c */
/*--------------------------------------------*/
void a0f(
float *fv,
float *fvd,
float x)
{
float dd;
dd = (1.0f+(float)tanh(x/2.0f))/2.0f;
*fv = dd;
*fvd = dd*(1.0 - dd) ;
}
/*--------------------------------------------*/
void a1f(
float *fv,
float *fvd,
float x)
{
float dd;
dd = (1.0f+(float)tanh(x/2.0f))/2.0f;
*fv = dd;
*fvd = dd*(1.0 - dd) ;
}
/*-------------------------------------------*/
void read_fileA(
char *name,
float **o,
int nIU,
int npattern)
{
int i,j,k;
FILE *fp;
10.4 Application Phase 397
DAneuro . exe is executed and its results are stored in rngp . dat as follows:
$ echo “10 7 80 5 3 0 NewElem.dat Weights.dat” | ./DAneuro.exe >
rngp.dat
This example assumes that the neural network trained in the Training Phase has
three hidden layers and 80 units per hidden layer, that the connection weights and
biases are stored in Weights . dat, that there are ten new elements for which the
optimal number of integration points is to be estimated, and that the shape parameters
of the elements are stored in NewElem . dat.
Note that Weights . dat is a file that contains only the weights and biases from
the results of DLneuro . exe in the Training Phase.
398 10 Computer Programming for a Representative Problem
The procedure for estimating the optimal number of integration points for a new
element in the Application Phase is summarized as follows:
(1) Calculate the shape features of the new element for which the optimal number
of integration points is to be estimated.
(2) If the shape features calculated in (1) are within the range of the maximum
and minimum values output by shapeNN.exe (Sect. 10.3), it is judged that
estimation is possible and we proceed to the next step.
(3) For the elements judged to be estimable in (2), convert the shape features
to the range of [0, 1] using the maximum and minimum values output by
shapeNN.exe (Sect. 10.3).
(4) The converted shape features calculated in (3) are input to DAneuro.exe to
estimate the optimal number of integration points.
DAneuro . exe in (4) above has already been detailed in this section. As for
(1), elemshape . c (Sect. 10.2.2) can be used to calculate the shape features. In
(2), if the calculated features of an element are within the range of the maximum
and minimum values output by shapeNN . exe (Sect. 10.3), the element can be
estimated by DAneuro . exe; otherwise, the element is not to be estimated.
In the postprocessing, the estimation of the optimal number of integration points
based on the output of DAneuro . exe is performed using a criterion such as
“the category with the largest output value is equivalent to the optimal number of
integration points.”
References
1. Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C: The Art
of Scientific Computing (Second Edition). Cambridge University Press (1992). (http://numeri
cal.recipes)
2. Watanabe, T., Ohuchi, A.: Gauss-Legendre Quadrature Formula of Very High Order. J. Plasma
Fusion Res. (Kakuyuugoukenkyuu) 64(5), 397–407 (1990). https://doi.org/10.1585/jspf1958.
64.397 (in Japanese)
Index
A BLAS, 340
Activation function, 49, 330 Bounding box, 168
AdaGrad, 66 B-spline, 171, 316
Adam, 67
Adaptive finite element method, 154
Adjugate matrix, 295 C
Adversarial example, 25 C, 285
AlexNet, 21 CAD, 171
AlgebraicShapeMetric, 131, 300, 385 Carbon Fiber Reinforced Plastic
AlphaGO, 22 Composites (CFRP), 268
AlphaGO Zero, 22 Cblas_sasum(), 344
A posteriori error estimation, 152, 154, 155 Cblas_saxpy(), 342
Artificial compressibility, 220 Cblas_scopy(), 344
Autoencoder, 17, 264 Cblas_sgemm(), 343
Automatic differentiation, 39, 73 Cblas_sgemv(), 342
Average pooling, 64 Cblas_sscal(), 344
Central difference approximation, 212, 278
C0 continuity, 170
B C1 continuity, 174
Backpropagation, 14, 333 Chain rule of differentiation, 296
Back Propagation Through Time (BPTT), Classification problem, 12
226 Coarse mesh, 154
Backward difference approximation, 212 Coefficient matrix, 143
Backward substitution, 143 ColumnMajor, 340
Banded structure, 146 Compressed Column Storage (CCS), 148
Basis function, 105, 106, 148, 169, 171, Computational complexity, 145
180, 241, 242, 270, 291, 316 Computational graph, 40
Batch normalization, 30 Computational intensity, 341
B-bar method, 249 Computation time, 141
Bezier segment, 185 Conditional Generative Adversarial
Bezier surface segment, 188, 190, 194 Network (CGAN), 34
BFloat16 (BF16), 23 Conditional variational autoencoder, 263
Bias, 50 Condition number, 131, 302
© The Editor(s) (if applicable) and The Author(s), under exclusive license 399
to Springer Nature Switzerland AG 2023
G. Yagawa and A. Oishi, Computational Mechanics with Deep Learning,
Lecture Notes on Numerical Methods in Engineering and Sciences,
https://doi.org/10.1007/978-3-031-11847-0
400 Index
J N
Jacobian matrix, 105, 294 Navier-Stokes equation, 204, 218, 263, 265
Jagged Diagonal Storage (JDS), 148 Neocognitron, 20
Neumann boundary condition, 4
Newton-Cotes quadrature, 101
K Newtonian fluid, 204
Kalman vortex, 222 Newton-Raphson iteration, 192
Keras, 27, 366 Node-segment algorithm, 169
Kinetic energy, 205 Nondestructive testing, 75–77
Knot insertion, 184 Normal distribution, 37
Knot line, 184 Normalization layer, 64
Knot span, 184 Numerical differentiation, 40
Knot vector, 171, 319 Numerical quadrature, 95
Kronecker’s delta, 62 NumPy, 366
Kullback Leibler (KL) divergence, 39
NURBS, 171, 316
NURBS-Enhanced FEM, 170
L
Lagrange element, 280
Lagrange polynomial, 97 O
Lame’s constant, 275 One-hot encoding, 127, 136, 372, 377, 390,
Laplacian mask, 20 392
Legendre polynomial, 95 OpenBLAS, 340
Linearly separable problem, 13 Open knot vector, 175, 320
Local search, 168 Optimal quadrature parameter, 115
Long Short-Term Memory (LSTM), 225, Optimization problem, 9
230 Output gate, 231
LSGAN, 36 Overfitting, 70
LSTM unit, 231 Overtraining, 27, 71, 237
402 Index
T
R Taylor expansion, 210
R-adaptive method, 154 Tensor Float 32 (TF32), 23
Random forest, 28 TensorFlow, 365
Real Time Recurrent Learning, 226 Transfer learning, 79
Recurrent neural network, 225 0–1 transformation, 30
REDUCE, 40
Regularization, 26, 70
ReLU function, 50 U
Remeshing, 154 Uniform random numbers, 119, 338
Reparameterization trick, 38 Unit, 10, 49
Representative length, 219 Upwind difference method, 222
Representative speed, 219
ResNet, 21, 374
Reverse Cuthill-McKee (RCM), 146 V
Reverse mode, 43 Vanishing gradient problem, 16
Reynolds number, 219, 263, 267 Variational autoencoder, 37
Richardson extrapolation, 153 VGG, 22, 374
RMSProp, 67 Von Mises stress, 272
RowMajor, 340 Vorticity, 222
RTRL, 226
W
S
Weight decay, 72
Segment, 184
WGAN, 36
Shape function, 103
Sigmoid function, 15, 50, 330
Significant digit, 23 Z
Sign part, 23 Zooming method, 269
Simultaneous linear equations, 141, 143 ZZ method, 152
Single-precision real number, 23