Professional Documents
Culture Documents
and Economics
Faculty of Transportation and Vehicle
Engineering
NUMERICAL METHODS
XY Kiadó
BME Közlekedésmérnöki és Járműmérnöki Kar
© Dr. Bicsák György, Sziroczák Dávid PhD CEng, Aaron Latty CEng, 2018
ISBN XXX-XXX-XXX-XXX-X
1
Contents
Contents............................................................................................................... 2
1 Introduction ................................................................................................. 5
2 Errors and Approximations ......................................................................... 7
2.1 Errors and Their Sources: .................................................................... 7
2.1.1 Binary and Hexadecimal Numbers .............................................. 9
2.1.2 Floating Point and Rounddown Errors ..................................... 10
2.1.3 Absolute and Relative Errors ...................................................... 11
2.1.4 Error Limits .................................................................................. 13
2.1.5 Error Propagation........................................................................ 13
2.1.6 Substracting Two Near Zero Numbers ......................................14
2.2 Iterative Methods .............................................................................. 16
2.2.1 Basic Iterative Method ............................................................... 18
2.2.2 Convergence Rate ...................................................................... 18
3 Single Variable Equations, System of Equations ..................................... 20
3.1 Solution of Single Variable Equations (24) ........................................ 21
3.1.1 Bisection method ....................................................................... 22
3.1.2 Newton-Raphson Method ......................................................... 24
3.1.3 Secant method ........................................................................... 27
3.1.4 Regula Falsi Method................................................................... 30
3.1.5 Fixed-point method.................................................................... 33
3.1.6 Finding multiple roots ................................................................ 34
3.2 Solution of System Equations ........................................................... 40
3.2.1 Gauss elimination method ..........................................................41
3.2.2 LU factorization methods .......................................................... 44
3.2.3 Iterative solutions ...................................................................... 45
3.2.4 Systems of nonlinear equations ................................................. 51
4 Curve Fitting Methods............................................................................... 56
4.1 Regression ......................................................................................... 56
2
4.1.1 Linear Regression ....................................................................... 58
4.1.2 Linearization of Non-linear Input Data...................................... 62
4.1.3 Polynomial Regression ............................................................... 64
4.2 Finite difference ................................................................................ 70
4.2.1 Factorial Polynomials ................................................................. 75
4.2.2 Anti-differentiation .................................................................... 76
4.3 Interpolation ...................................................................................... 79
4.3.1 Lagrange Interpolation .............................................................. 80
4.3.2 Newton Divided-Difference Interpolation ................................ 84
4.3.3 Spline Interpolation [3] .............................................................. 92
5 Numerical Derivation................................................................................107
5.1 2-Point Backward Difference .......................................................... 108
5.2 2-Point Forward Difference............................................................. 108
5.3 2-Point Central Difference ............................................................... 109
5.4 3-Points Backward Difference ......................................................... 110
5.5 Numerical Difference Formulas ........................................................ 111
6 Numerical Integration .............................................................................. 116
6.1 Newton-Cotes Formulas .................................................................. 118
6.1.1 Rectangular Rule ....................................................................... 118
6.1.2 Trapezoidal Rule ........................................................................ 123
6.2 Quadrature Formulas ....................................................................... 125
6.2.1 1/3 Simpson Rule ........................................................................ 127
6.2.2 3/8 Simpson Rule .......................................................................129
6.3 Romberg Integration .......................................................................134
6.4 Gaussian Quadrature ........................................................................ 137
6.5 Improper Integrals ...........................................................................139
6.6 Numerical Integration of Bivariate Functions ................................ 140
7 Numerical Solution of Differential Equations – Initial Value Problems 144
7.1 One-Step Methods ...........................................................................145
7.1.1 Euler’s Method ..........................................................................145
3
7.1.2 Second Order Taylor Method .................................................. 148
5.1. Runge-Kutta Methods ......................................................................150
7.1.3 Second Order Runge-Kutta Methods....................................... 151
7.1.4 Built in Functions in MATLAB.................................................... 153
8 Partial Differential Equations ................................................................... 157
8.1 Partial Differential Equations generally .......................................... 157
8.2 Elliptic Partial Differential Equations .............................................. 158
8.2.2 Neumann Problem ....................................................................167
8.2.3 Mixed problem ......................................................................... 169
8.2.4 Elliptic Partial Differential Equations in More Complex Regions
170
8.3 Parabolic Partial Differential Equations .......................................... 173
8.3.1 Finite-Difference Method ......................................................... 174
8.3.2 Crank-Nicolson Method ............................................................ 177
8.4 Hyperbolic Partial Differential Equations ....................................... 180
References ....................................................................................................... 185
4
1 Introduction
Although a wide range of mathematical knowledge is included in the
university curriculum, the practical use is often not immediately clear, or
would be only made clear during later stages of studies. Furthermore in
engineering practice, there are many analytical solutions that can be very
complicated, and the effort involved to follow them through might not be
reflected in the final outcome, as the desired level of accuracy or
sophistication might not be required to solve a practical problem in hand.
Instead of looking for a „correct” mathematically perfectly accurate solution,
many times we can settle with a solution that is although not 100% accurate,
but satisfies our goals and expectations. In these group of problems we need
to also include problems which can’t be reasonably defined in mathematical
terms, but we can do very good approximation using numerical methods such
as finite element, finite difference or finite volume methods. This is the
philosophy behind the development and usage of the various numerical
methods, and that is the topic of this set of lecture notes.
The aim of developing and using numerical methods instead of relying on
complicated and expensive analytical solution methods is to approximate the
accurate solution we desire using simpler, arithmetical equations, with the
required level of accuracy. Many example numerical methods when
understood properly, might seem very basic, primitive even, but it is essential
we understand the underlying logic; that is why can we use a given method to
solve our problems. Complicated mathematics can obscure the underlying
solution method of the problems, hence most examples in these notes are
kept simple, to primarily illustrate the method.
Naturally, even numerical methods don’t make all of our problems disappear;
the more accurate we want to approximate the theoretical solutions, the
more computations we need. This can be prohibitively expensive for very
large simulations for example. It is worth noting, that running simulations are
usually still a lot more cheaper, convenient, and in some sense provide us with
more knowledge about the problem, than setting up tests. In order to make
the understanding of more complicated engineering simulation tools (FEA,
CFD, etc.) easier in further studies, we will investigate the errors involved in
numerical procedures, what are their cause, what are their effects on the
solution and how do they propagate. The rest of the chapters are divided into
the following main topics:
Solution of equations and system of equations
Approximation and interpolation
Numerical differentiation and integration
5
Numerical solution of differential equations
Numerical solution of partial differential equations
6
2 Errors and Approximations
Numerical methods are procedures, where a mathematical problem is not
necessarily solved in an exact way, rather the solution is converged within a
desired tolerance to the accurate solution with a finite number of successive
approximations. Although simpler calculations can be performed using even
a calculator, to use most numerical methods a computer is required. Even with
this, the speed and cost of solution is still better than trying to find an
analytical way.
Numerical methods themselves are usually described as a strongly
mathematical algorithm, which is a clear set of instructions, describing the
operations and functions and their precise order. When applying these
methods to some given starting conditions (the description of the problem),
the methods will arrive at the desired solution (with the desired accuracy).
This is when the term „acceptable” comes in to play. Since we discussed that
the numerical methods only approximate the exact solution, we need to
define an acceptance criteria on the goodness of approximation. Also note
that the approximate nature also applies to the input data, as relying on data
with excessive noise in it can have great effect on the accuracy of the
algorithm. This will be described in the following chapter in more detail. The
second problem to consider is the fact that were are just approximating the
solution itself, we need to be aware of the order of errors in the solution
process, and what effect that can have on the results. To understand this, let
us first investigate the various types of error, their sources and the error
propagation.
7
2. Measurement errors
Measurement errors were discussed in previous subjects.
When measuring it is inevitable that errors and noise is
introduced to the measuring process and thus into the
solution. For this reason it is often said: „one measurement is
not measurement”.
3. Expression/function errors
Expression errors are introduced to make handling of
mathematical equations possible, usually by neglecting small
terms. For example in the case of a Taylor or Fourier series, we
would only use the first few terms of the series, or when
calculating the volumetric thermal expansion coefficient from
the linear.
4. Discretisation errors
The source of this error is when continuous functions, space,
etc. are mapped to a discrete representation. For example
continuous functions are represented with discrete values, a
function might be differentiated numerically, and the integral
represented as a sum of products.
5. Roundoff and digital representation error
Storing data in a computer inevitable distorts the numerical
values due to the digital representation of the numbers, for
example numbers can be represented as integers, single or
double precision floating point numbers, which each have
their own precision and limitations. Roundoff errors are
introduced for example when irrational numbers such as π are
represented, as it is impossible to store the infinite number of
digits of it, so the numerical representation must be truncated
at some, suitably large number of digits.
6. Evaluation error
Although a converged mathematical model may produce
solutions that seem accurate (taking all other mentioned
sources of errors into account), but due to the definition of
boundary conditions, model components and various affects,
the results are still not accurate; it still doesn’t match reality.
This error like the previous ones might or might not be
acceptable, however the main issue is that we might not be
able to percieve it without rigorous comparison to tests, and
thus can introduce major problems when accurate results are
desired.
8
The steps of model creation and the sources of errors between them are
shown in Figure 1.
Figure 1: Model creation and the error sources between the steps [1]
9
642 = [1 ∗ 29 + 0 ∗ 28 + 1 ∗ 27 + 0 ∗ 26 + 0 ∗ 25 + 0 ∗ 24 + 0 ∗ 23 + 0 ∗ 22
+ 0 ∗ 21 + 0 ∗ 20 ]2
So the binary form is: 642 = [1010000010]2
Also commonly used in programming is the hexadecimal (16 based) system.
Using similar logic:
642 = [2 ∗ 162 + 8 ∗ 161 + 2 ∗ 160 ]16 = [282]16
Since we have run out of arabic numbers, to represent the 16 symbols, in
addition to the 0..9 numbers, the letters A..F will also appear, for example:
30 = [1 ∗ 161 + 14 ∗ 160 ]16 = [1𝐸]16
What is the point of using the hexadecimal system one might ask? It is
essentially an expansion of the binary system, since 24 = 16, meaning that for
every quartet of binary numbers we can define allocate a hexadecimal
symbol, for example: 𝐷 = [1101]2 or 9 = [0101]2 .
Where S is the sign, M is the Mantissa (or significand), p is the base and k is
the exponent (together 𝑝𝑘 is the characteristic or the power term). Also it is
required that: 1/p ≤ M ≤ 1
In the case of different architectures we can use different floating point
representation. Currently the Institute of Electrical and Electronics Engineers
(IEEE) defined the IEEE 754 standard, which has been used for the past 30
years. Using this a 32 bit (single prescision) number can be represented as: [2]
10
𝑛𝑢𝑚𝑏𝑒𝑟 = (−1)𝑆 (1. 𝑀)(2𝑘−127 ) (2.2.)
The sign has 1 bit length (0 for positive, 1 for negative), then the
8bit significand, which defines the place of the binary number separator in the
number. Most of the time it is stored in a normalised form. When the
exponent is 𝑘 bits long, then the bias is 𝑒 = 2𝑘−1 − 1. In the case of single
precision: 𝑒 = 127.
The next value is the mantissa, which is 23 bit long real number set to one
decimal norm. The first digit is always 1, which is not stored in the format, only
the fractional part. The stored part can be calculated as the following:
𝑠𝑧á𝑚 = (−1)𝑆 (1 + 𝑀)(2𝑘−127 ) (2.3.)
where 𝑆 is the sign bit, 𝑀 is the mantissa, 𝑘 is the characteristic and 𝑒 is the
bias. The bits reserved for the mantissa defines the accuracy of the number
representation, while the size of the characteristic sets the magnitude of the
represented numbers. [2]
Example
Which binary floating point number is displayed using 32 bits in the following
format:
1 10000110 10100000000000000000000 ?
Solution:
Following the previous logic:
𝑠=1
𝑘 = 100001102 = 13410
𝑀 = 0.1012 = 0,62510
so:
𝑥 = −1,625 ∗ 27 = −1,625 ∗ 128 = −208
11
denoted as 𝑥̃ and the accurate value as 𝑥 then the two types of errors are
defined as:
Absolute error:
𝛿𝑎 = 𝑥 − 𝑥̃ (2.4.)
Relative error:
𝑎𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝑒𝑟𝑟𝑜𝑟 𝛿𝑎 𝑥 − 𝑥̃ (2.5.)
𝛿𝑟 = = = , 𝑥≠0
𝑟𝑒𝑎𝑙 𝑣𝑎𝑙𝑢𝑒 𝑥 𝑥
When the real value is 0, the relative error is not defined. Most of the time
relative errors have higher significance than absolute errors, so in most cases
we evaluate only relative errors.
Example
Let’s assume two cases: first we measure electric current, where 𝑥̃1 =
0,004 𝐴, while the real value 𝑥1 = 0,005 𝐴. In the second case, we measure
electric potential (voltage), where the measured value is 𝑥̃2 = 1315 𝑉, while
real value is 𝑥2 = 1331 𝑉. What is the absolute and relative error in the two
measurements?
Solution:
Absolute errors:
𝛿𝑎1 = 𝑥1 − 𝑥̃1 = 0,005 𝐴 − 0,004 𝐴 = 0,001 𝐴
𝛿𝑎2 = 𝑥2 − 𝑥̃2 = 1331 𝑉 − 1315 𝑉 = 16 𝑉
Based on this, the 0,001 A, sounds lot lower than 16 V, so is that a better
measurement? Let’s investigate the two in terms of relative errors!
𝑥1 − 𝑥̃1 0,005 𝐴 − 0,004 𝐴
𝛿𝑟1 = = = 0,2
𝑥1 0,005 𝐴
𝑥2 − 𝑥̃2 1331 𝑉 − 1315 𝑉
𝛿𝑟2 = = = 0,012
𝑥2 1331 𝑉
In this case we can see that for the current, 0,001 A absolute error
corresponds to 20% relative error, while when measuring voltage, the 16 V
absolute error result in only 1,2% relative error. So, it is not just the difference
that matters, but how large is the difference in the grand scheme of things.
12
2.1.4 Error Limits
When looking at a single measurement error, then the absolute error provides
us with some information. However, when we are looking at multiple
measurements (and we always should be!), it is beneficial to average the
measurements, and look at the absolute value of the absolute error. The next
question is, what is the highest level of absolute error we can expect; what
are our error limits? While the bottom limit of the absolute error is clearly 0,
the upper limit can be anything, even infinity. Due to this, we can identify an
upper limit to the acceptable absolute error as:
Keep in mind, that 𝛼 is not an estimate for the error from the measurements,
rather an acceptance limit! A similar limit can be defined for the acceptable
relative error as:
|𝑥 − 𝑥̃| (2.7.)
|𝛿𝑟 | = ≤𝛽
|𝑥|
The upper absolute error limit in the case of addition and substraction are the
sum/subtraction of the individual limits:
|𝛿𝑎 | = |(𝑥1 ± 𝑥2 ) − (𝑥̃1 ± 𝑥̃2 )| ≤ 𝛼1 + 𝛼2 (2.9.)
13
𝑥1 𝑥2 − 𝑥̃1 𝑥̃2 𝑥1 𝑥2 − [𝑥1 − (𝛿𝑎1 )][𝑥2 − (𝛿𝑎2 )] (2.11.)
|𝛿𝑟 | = | |=| |=
𝑥1 𝑥2 𝑥1 𝑥2
−(𝛿𝑎1 )(𝛿𝑎1 ) + (𝛿𝑎1 )𝑥2 + (𝛿𝑎2 )𝑥1
=| |
𝑥1 𝑥2
In this last equation, we can use the approximation that −(𝛿𝑎1 )(𝛿𝑎1 ) ≈ 0, as
this product is magnitudes smaller than the other terms. Using this
simplification then we can arrive to:
(𝛿𝑎1 )𝑥2 + (𝛿𝑎2 )𝑥1 (𝛿𝑎1 ) (𝛿𝑎2 ) (2.12.)
|𝛿𝑟 | = | |=| + |
𝑥1 𝑥2 𝑥1 𝑥2
(𝛿𝑎1 ) (𝛿𝑎2 )
≤| |+| | ≤ 𝛽1 + 𝛽2
𝑥1 𝑥2
Which proves our point.
Similarly, without proof this time, the relative error propagation of division is:
𝑥1 /𝑥2 − 𝑥̃1 /𝑥̃2 (2.13.)
|𝛿𝑟 | = | | ≤ 𝛽1 + 𝛽2
𝑥1 /𝑥2
14
Example
Let’s assume the following second order equation: 𝑥 2 + 50 + 4 = 0, with
roots at 𝑥1 = −0,080128411 and 𝑥2 = −49,91987159.
Solution:
It is known that second order equations have a explicit solution expression:
−𝑏 ± √𝑏 2 − 4𝑎𝑐
𝑥1,2 =
2𝑎
In this case, 𝑏 2 ≫ 4𝑎𝑐, so the terms under the root: √𝑏 2 − 4𝑎𝑐 ≅ 𝑏. So when
calculating 𝑥1 we are substracing nearly equal numbers. In the case of a 4 digit
roundoff, the root is calculated as:
−50,00 + √50,002 − 4(1,000)(4,000) −50,00 + 49,88
𝐹𝐿(𝑥1 ) = =
2(1,000) 2,000
= −0,060
Likewise the second root:
−50,00 + √50,002 − 4(1,000)(4,000) −50,00 − 49,88
𝐹𝐿(𝑥2 ) = =
2(1,000) 2,000
= −49,94
These values show the following relative errors:
|𝑥1 − 𝐹𝐿(𝑥1 )|
|𝛿𝑟1 | = ≅ 0,2503 = 25,03%
|𝑥1 |
|𝑥2 − 𝐹𝐿(𝑥2 )|
|𝛿𝑟2 | = ≅ 0,0004 = 0,4%
|𝑥2 |
So 𝑥1 produces a lot higher relative error than 𝑥2 , where two neary equal
numbers are added and not extracted, so there is less problem. The main issue
though is the fact that the same equation for the same problem can give 0.4%
or 25% relative errors, even for a simple problem like this.
Let’s try to approach the problem in a different way. In addition to the explicit
solution, we also know that an equation written as 𝑎𝑥 2 + 𝑏𝑥 + 𝑐 = 0 can be
rearranged to produce 𝑥1 𝑥2 = 𝑐. So from one solution we can calculate the
other as:
𝑐 4,000
𝐹𝐿(𝑥1 ) = = = −0,080
𝑎𝐹𝐿(𝑥2 ) (1,000)(−49,94)
In which case the relative error is:
|𝑥1 − 𝐹𝐿(𝑥1 )|
|𝛿𝑟1 | = ≅ 0,0016 = 0,16%
|𝑥1 |
15
So we have achieved a drastic increase in accuracy (24,87%) by introducing a
very simple (but clever!) change compared to the original equation.
16
a part of something bigger, in which case a poorly written and not iteration
constrained algorithm can crash a whole system!
Example: [3]
17
With 7 digits precision: 𝑒 −2 = 0,1353353. When we limit the maximum
number of iteration at 20, the program will likely result in the failure error
message, as it is not guaranteed to achieve convergence in so many steps. If
the difference between the exact value and the approximate value is within
the prescribed tolerance, the program also stops. In this case the outputs are
n and the function approximation. If we try the algorithm in MATLAB or Excel
VBA, we will see that 14 iteration steps are enough to achieve convergence,
so n=13. In this case the approximate value: 𝑒 −2 ≅ 0,1353351, which results
in 𝛿𝑎 = 0,2 ∗ 10−6 absolute error.
How the program runs will be affected by the character of the function 𝑓(𝑥)
and the 𝑥0 initial value.
Then we say that the iterative algorithm’s rate of convergence is R. There are
2 basic types of convergence, linear and quadratic. Convergence is linear
when 𝑅 = 1, so:
18
|𝑒𝑛+1 | (2.16.)
lim =𝐾≠0
𝑛→∞ |𝑒𝑛 |
Convergence is quadratic when 𝑅 = 2:
|𝑒𝑛+1 | (2.17.)
lim =𝐾≠0
𝑛→∞ |𝑒𝑛 |2
Convergence rate is not always an integer number, for example the bisection
1
method’s convergence rate is 2 (1 + √5) ≅ 1,618. [3]
Example: [3]
Calculate the convergence rate of the following problem:
1
𝑥𝑛+1 = ( ) 𝑥𝑛 , 𝑛 = 0,1,2, …, 𝑥0 = 1
2
Solution:
1 𝑛
Since when 𝑛 → ∞, we can see that 𝑥𝑛 = (2) → 0, obviously with 𝑥 = 0 as a
limit. In this case, the absolute error in the nth step:
1 𝑛 1 𝑛
𝛿𝑎,𝑛 = 𝑥 − 𝑥𝑛 = 0 − ( ) = − ( )
2 2
First let’s investigate what happens if 𝑅 = 1. In this case, using (2.15.):
1 𝑛+1
|− (2) |
|𝑒𝑛+1 | 1
lim = lim 𝑛 = ≠0
𝑛→∞ |𝑒𝑛 |𝑅 𝑛→∞ 1 2
|− (2) |
So it shows that 𝑅 = 1 satisfies the algorithm, and converges linearly. Since R
satisfies the criteria set in (2.15.), there is no need to investigate other values.
19
3 Single Variable Equations, System of Equations
Solving single variable equations is perhaps the most common numerical
activity. Of interest is the value of a single, or the values of multiple variables,
to satisfy the desired equivalence(s). If the equations can be rearranged and
brought to a closed form, then the problem becomes a trivial issue. However
not all variables can be expressed conveniently, or it might not be possible at
all. Typical example is the equation governing flow in a rocket nozzle:
𝛾+1
−
𝛾+1 𝛾 − 1 2 2(𝛾−1)
𝐴 𝛾+1 2(𝛾−1) (1 + 2 𝑀 ) (3.1.)
∗
=( )
𝐴 2 𝑀
While the correct 𝐴 value can be calculated if the Mach (𝑀) number is known
across the nozzle, it is often not what we want to know, but the exact
opposite; if I have a nozzle with given geometry, 𝐴, what Mach numbers
would I get? In this case, expressing 𝑀 from above equation as a function of
𝐴, is not directly possible (or if it is, the solution is not trivial at all!). However,
in these cases, we can rely on numerical methods to approximate what 𝑀 is a
good enough solution, to satisfy the problem stated above.
20
In all cases it is important to consider the algorithm from a practical point of
view. Solving for the roots of equations is normally a part of a larger set of
methods or program, but even if standalone, it is desired that the algorithm
definitely comes to some conclusion. For this reason it is good practice to
define an iteration and/or time limit to the method, especially if the continuity
of a function cannot be guaranteed. As it can be seen in Figure 2, while the
function satisfies the criteria that the function values at lower and upper
bounds have different signs, there is no actual numerical solution to this
equation in the shown interval. If untreated, a computer would potentially go
into an infinite loop, trying to find the non-existing solution.
In the following chapter, the solution of single variable equations will be
investigated, with some of the basic root finding methods demonstrated, and
briefly discussed. As the second part of the chapter, the solution of systems
of equations will be investigated, and the methodology discussed.
21
computer. So at least, the accuracy of any numerical procedure performed by
the computer is limited by the machine’s precision. Today most good
numerical software use double precision variables, which represent numbers
encoded with 64 bits of memory, which are accurate up to 15-17 decimal digits.
This precision is more than enough for all but the most demanding of
applications; even so, calculating so many digits is a complete waste of time
and resources for most applications. For example if we are calculating the
tons of steel to be purchased, why go down to the microgram level, when the
foundry would be able to measure only in kg at best?
Based on these, to find roots efficiently, it is always important to set the
required tolerance for the procedure. For most applications 3 or 4 significant
digits are enough, so a relative tolerance of 1E-3 or 1E-4 is acceptable. This
obviously requires some a priori understanding of the problem at hand, but
adaptivity could also be included in the root finding algorithm, so depending
on the problem, tolerance could be adopted. In this chapter most examples
will use 1E-4 absolute tolerance for convergence.
22
The method is shown graphically in Figure 3. The orange coloured bars
represent the calculation interval, which are halved at every subsequent
iteration. It can also be seen that the fairly high required precision can result
in many additional iterations, which might or might not be justified,
depending on the application.
Example
Solution:
%Script C2BisectionMethod.m
%Script to use the bisection method to solve single
%variable equations
%Function to be evaluated
function fValue = f_at_x(x)
fValue = 0.8*x^3-4.0*x^2-12.0*x-21.0;
end
23
%Limits of interval: a - lower; b - upper
a = 0.0;
b = 10.0;
%Tolerance
eps = 1e-3;
%Maximum limit on iterations
iMax = 1e12;
%Counter of current iteration number
iCounter = 1
24
function, so there are implications on convergence whether the function is
continuous and differentiable in the regions near the root, the starting point
of the method and the overall behaviour of the function. The convergence of
the method, near the solution at least, is quadratic, which is considered quite
efficient.
The Taylor series representation of the function 𝑓(𝑥) near a given point 𝑓(𝑥𝑖 )
is defined as:
𝑓(𝑥) = 𝑓(𝑥𝑖 ) + 𝑓 ′ (𝑥𝑖 )(𝑥 − 𝑥𝑖 ) + 𝑓 ′′ (𝑥𝑖 )(𝑥 − 𝑥𝑖 )2 + . . . (3.3.)
The method implements the following procedure:
1. If in the i-th latest iteration, the function value is within the described
tolerance, so |𝑓(𝑥𝑖 )| < 𝜀 we have converged to the solution.
2. If not, then the next point where the function will be evaluated needs
to be calculated. If we neglect all terms but the first order in the Taylor
𝑓(𝑥 ) 𝑓(𝑥 )
series, it can be written that 𝑓′ (𝑥𝑖 ) = 𝑓(𝑥 )⁄𝑖 ∆𝑥 = ∆𝑥. Using this, the next
𝑖 𝑖
𝑓(𝑥𝑖 )
point to evaluate the function is estimated as: 𝑥𝑖+1 = 𝑥𝑖 − . So,
𝑓′ (𝑥𝑖 )
the term in the equation represents a small increment from the
current point, 𝑥𝑖 towards the next point, 𝑥𝑖+1 along the current slope
of the curve at 𝑥𝑖 .
3. Repeat from step 1
Note, that the classic Newton-Raphson method does not unconditionally
guarantee a solution. For example, starting from a local extremum or
𝑓(𝑥 )
inflection point of a function, where 𝑓 ′ (𝑥𝑖 ) = 0, the 𝑓′ (𝑥𝑖 ) division can not be
𝑖
evaluated. Also, the method can enter into infinite patterns or divergent
behaviour. Of critical importance in most cases, and especially for complicated
functions, is the starting point of the iterations.
It can be noted, that more than the first derivative can be included in the
algorithm, which would speed up the method’s convergence, however the
limitations would still be applicable.
Example
25
central difference method. In this example a root of the following function is
required:
𝑓(𝑥) = 0.8𝑥 3 − 4.0𝑥 2 − 12.0𝑥 − 21.0
Solution:
%Script C2NewtonRaphson.m
%Script to use the Newton-Raphson method to solve
single variable equations
%Function to be evaluated
function fValue = f_at_x(x)
fValue = 0.8*x^3-4.0*x^2-12.0*x-21.0;
end
%Initial point:
x0 = 10.0;
%Step size used to calculate local differential
xStep = 1e-5;
%Tolerance
eps = 1e-3;
%Maximum limit on iterations
iMax = 1e12;
%Counter of current iteration number
iCounter = 1;
%Initial point
xi = x0;
%Initial evaluation of the function
fxi = f_at_x(xi);
26
%Work out the better approximation of the root
xi = xi - fxi/diffX;
%Update the function value and iteration counter
fxi = f_at_x(xi);
iCounter = iCounter+1;
end
250
F(x) function value [-]
200
150
100
50
0
6 6,5 7 7,5 8 8,5 9 9,5 10
-50
x [-]
The Secant method can be thought as a variant of the Newton method. In this
algorithm, 2 initial estimates of the root are taken, then one of them improved
27
by moving on a secant line of the curve. The procedure is then iterated until
convergence is found.
There are a number of benefits of using the Secant method over the Newton
method. The most important is that the method does not require the
computation of the local derivative; depending on the type of function used,
differentiating it might be computationally expensive, or might even be
impossible. Based on this, the Secant method is searching along the secant
line, as opposed to the tangent line used by the Newton-Raphson method.
Provided good starting points are selected, the Secant method is generally
more stable than the Newton-Raphson, and has a lower chance of diverging
or showing wildly oscillating behaviour. Although more iterations are
generally needed than with Newton-Raphson, each iteration can be
significantly cheaper because the derivative is not evaluated.
The following algorithm describes the Secant method:
Assume 2 initial estimates for the root, points, 𝑥−1 and 𝑥0 .
If in the i-th latest iteration, the function value is within the described
tolerance, so |𝑓(𝑥𝑖 )| < 𝜀 we have converged to the solution.
If not, then calculate a better estimation of the root, using the
following formula:
𝑓(𝑥𝑖 )
𝑥𝑖+1 = 𝑥𝑖 −
𝑓(𝑥𝑖−1 ) − 𝑓(𝑥𝑖 ) (3.4.)
(𝑥𝑖−1 − 𝑥𝑖 )
Repeat from step 2.
Example
The following MATLAB script implements the Secant method on the familiar
polynomial function:
𝑓(𝑥) = 0.8𝑥 3 − 4.0𝑥 2 − 12.0𝑥 − 21.0
Solution:
%Script C2Secant.m
%Script to use the Secant method to solve single
variable equations
%Function to be evaluated
function fValue = f_at_x(x)
28
fValue = 0.8*x^3-4.0*x^2-12.0*x-21.0;
end
%Initial points:
xm1 = 10.0
x0 = 9.5;
%Tolerance
eps = 1e-3;
%Maximum limit on iterations
iMax = 1e12;
%Counter of current iteration number
iCounter = 1;
%Initial point
xim1 = xm1;
xi = x0;
%Initial evaluations of the function
fxim1 = f_at_x(xim1);
fxi=f_at_x(xi);
29
Figure 5 shows the convergence of the Secant method from the starting point
pair of 10 and 9.5. It can be seen that in this case, 5 iterations of the method
provided the root with the requested accuracy. To compare, a less fortunate
pair, such as -4.5 and -5 would require 111 iterations.
250
F(x) function value [-]
200
150
100
50
0
6 6,5 7 7,5 8 8,5 9 9,5 10
-50
x [-]
Regula Falsi, or False Position methods are a family of bracketing root finding
methods, that can be thought of as an advanced version of the Bisection
method. In the case of the Bisection method, the interval is always halved,
and thus very little information is actually used from the function’s behaviour.
The Regula Falsi relies on the calculation of the secant line of the function, and
improving the root estimation based on the zero crossing of this secant line.
This additional information results in more educated guessing than the
bisection method, which increases the convergence rate of the algorithm,
although at the expense of additional calculations. Similar to the bisection
method, it requires an [a b] interval where the function crosses the 0 value for
it to work.
The Regula Falsi method can be described using the following algorithm:
30
1. Evaluate the function at the ends of an [a,b] interval. The two end
points are names as 𝑥0 = 𝑎 and 𝑥1 = 𝑏. The function values are 𝑓(𝑥0 )
and 𝑓(𝑥1 )
2. The secant line between 𝑥𝑖 and 𝑥𝑖+1 and their function values is
evaluated, and it’s zero crossing point 𝑥𝑖+2, is calculated as:
𝑥𝑖+1 − 𝑥𝑖 𝑥𝑖 𝑓(𝑥𝑖+1 ) − 𝑥𝑖+1 𝑓(𝑥𝑖 )
𝑥𝑖+2 = 𝑥𝑖 − 𝑓(𝑥𝑖 ) = (3.5.)
𝑓(𝑥𝑖+1 ) − 𝑓(𝑥𝑖 ) 𝑓(𝑥𝑖+1 ) − 𝑓(𝑥𝑖 )
Example
Solution:
%Script C2RegulaFalsi.m
%Script to use the Regula Falsi method to solve
single variable equations
%Function to be evaluated
function fValue = f_at_x(x)
fValue = 0.8*x^3-4.0*x^2-12.0*x-21.0;
end
%Limits of interval:
%a: lower; b: upper
a = -5.0;
b = 10.0;
31
%Tolerance
eps = 1e-3;
%Maximum limit on iterations
iMax = 1e12;
%Counter of current iteration number
iCounter = 1;
32
Convergence of the Regula Falsi method
300
250
200
150
F(x) function value [-]
100
50 F(x)
Regula Falsi
0
-10 -5 0 5 10 15
-50
-100
-150
-200
x [-]
33
𝑓(𝑥) = 𝑥 − 𝑔(𝑥) (3.7.)
has a zero at 𝑝. Thus, finding the root of 𝑓(𝑥) is substituted with iteratively
finding a fixed point of 𝑔(𝑥). Interestingly, the Newton method is a form of
Fixed-point root finding method, where 𝑔(𝑥) includes the first derivative of
𝑓(𝑥).
The method has a strict criterion for stability, the absolute value of the
differential of the 𝑔(𝑥) function needs to be under 1 for all 𝑥 values considered
during the iteration, or as expressed in equations: |𝑔′(𝑥)| < 1. Depending on
the sign and value of the 𝑔′(𝑥), a “spiral” or “staircase” like convergence (or
divergence) can be seen.
The method can be described with the following algorithm.
1. Rewrite the original 𝑓(𝑥) = 0 problem into 𝑥 = 𝑔(𝑥) in a way, where
a fixed point of 𝑔(𝑥) is also a solution of 𝑓(𝑥) = 0.
2. Assume 𝑥0 as the initial best estimation of the root of 𝑓(𝑥) (and the
fixed point of 𝑔(𝑥)).
3. Generate a better approximation of the fixed point: 𝑥𝑛+1 = 𝑔(𝑥𝑛 )
4. Iterate until desired convergence of the fixed point is achieved or
terminate if divergent behaviour is identified.
5. The acquired fix point of 𝑔(𝑥) is the desired root of the original 𝑓(𝑥)
function.
It can be seen that in addition to the initial estimation, the choice of 𝑔(𝑥) also
plays a significant role in whether the solution converges or not, and also in
the speed of convergence if it does. The choice, however is not obvious,
especially as information about 𝑓(𝑥) might not be even available. Generally
Fixed-point methods are simpler to investigate than the possible numerical
methods that are based on them, so by studying the behaviour of the Fixed-
point method, it is possible to improve the performance of other root finding
methods, or develop new ones.
So far it has been assumed, that the problem in question has a solution, and
only a single solution, thus one root. When considering real problems, this is
not necessarily true, a function can have any number, even infinite number of
roots, such as the 𝑓(𝑥) = sin(1⁄𝑥 ) type functions. Furthermore, a given root
might not be just a simple root, but have multiplicity of 2, 3 and more, such as
in the case of 𝑓(𝑥) = (𝑥 − 1)3 , there is 1 root, +1, but with the multiplicity of
3.
34
Finding a multiple root numerically is more difficult than a simple one with
methods such as the Newton or Secant methods. The effect of the multiplicity
of the root is to reduce the convergence rate of the method. Generally, the
Newton method will show a linear convergence rate near a root of multiplicity
𝑚−1
m as: 𝜆 = 𝑚 where 𝑚 ≥ 1. It can be seen, that for m > 2 (provided it is
feasible to use) the Bisection method provides faster convergence than the
Newton.
The numerical methods demonstrated here all rely on some form of iteration
and start with the assumption that the initial estimate is somewhere “near”
the desired root. Following this logic, in order to find multiple roots, their
approximate location need to be identified and the iterations started “near”
the roots. A common way of identifying the root locations is to define a
starting point on the x axis, and moving along the positive or negative
direction. When the sign of the function changes (as long as the function is
continuous) it’s an indication that there is a root nearby. Once these sign
change regions are identified, an appropriate method can be chosen to
converge on the root with desired accuracy. Note the behaviour of some of
the tools, such as the Newton-Raphson, there is a chance to go widely off the
desired track, and explore regions other than the desired root region. This can
be an issue, as although the root approximate location was identified, but the
actual root might be found somewhere else. The Bisection or the Regula Falsi
methods don’t suffer from this issue, as they have already bracketed the root,
and would stay within the bracket throughout the solving process.
Although the search algorithm sounds simple, a fully automatic method
without a priori information on the functions and the roots would be difficult
to create. A mathematic algorithm relying on incrementing the variable would
have no information on the expected scale of the problem; is the user
searching on the nano (10-9) or the giga (10+9) scale. Choosing the wrong scale
would either result in too small steps and a very slow (to the limit of being
unacceptable) algorithm, or one with too large that would skip roots. In these
examples we can have a “feel” for the size of the problem, however if the
problem behaves as a black box, and/or too expensive to evaluate enough
times to acquire information, the root multiple root finding algorithm needs
further refining to provide the required performance.
Example
The following example shows how a version of the multiple root finding
algorithm can be implemented on an example polynomial function. The
function is plotted in Figure 7. It can be seen that the function has 6 roots:
35
𝑥 = +3.5
𝑥 = −2.0
𝑥 = +2.1
𝑥 = −4.0
𝑥 = +1.4 (with a multiplicity m = 2)
Solution:
The iteration will be started from 𝑥 = 0, and will look for roots on either side
of the starting position. Since we were able to graph the given function, and
have information about the roots and their rough position, we can use that
information to set step sizes and similar for an efficient algorithm. To zoom in
on the individual roots, we will use the Regula Falsi method as demonstrated
before.
The MATLAB implementation of the script is shown below:
%Script C2FindMultipleRoots.m
%Script to find multiple roots of a given equation
36
%Define starting position:
x0 = 0.0;
if(sign(multiRootFun(lPos))!=sign(multiRootFun(lPos-
xStep)))
%Signs are not equal, so possible root position!
vRoots = [vRoots;C2RegulaFalsiFun(lPos-
xStep,lPos,rTol)];
itRootFound = [itRootFound;iCounter];
nRootsFound=nRootsFound+1;
end
%Increment the left position
lPos = lPos - xStep;
%Repeat for the right side search direction
if(sign(multiRootFun(rPos))!=sign(multiRootFun(rPos+x
Step)))
37
%Signs are not equal, so possible root position!
vRoots =
[vRoots;C2RegulaFalsiFun(rPos,rPos+xStep,rTol)];
itRootFound = [itRootFound;iCounter];
nRootsFound=nRootsFound+1;
end
%Increment the right position
rPos = rPos + xStep;
%Increment counter and go to the next iteration
step
iCounter = iCounter+1
end
%Write out results and the iteration values
vRoots
itRootFound
When this script is executed on the example function shown before, it will not
find all the roots, and will terminate due to reaching the iteration count limit.
Why is this happening? Investigating the function closer, it can be seen that in
the vicinity of the double root, the function doesn’t actually change signs. A
zoomed in view of the function is shown in Figure 8.
Due to the fact that the function doesn’t change sign near the root, the multi
root finding algorithm wouldn’t be able to recognize it as a place to search,
unless it stumbles upon the exact numerical value of the root (where signum
is zero, so different to before), which is quite unlikely. Furthermore,
38
appropriate root finding method needs to be found, as for example both the
Bisection and Regula Falsi methods require the interval ends to have different
signs for it to work. Arguably, if the location was identified as a potential
position for a root, the Newton-Raphson iteration would very quickly
converge to the root within the required numerical precision.
39
function, there is no guarantee, that these multiple root finding algorithms
would return all, or for that matter any of the desired roots.
2𝑥 − 5𝑧 = −4 (3.9.)
−2𝑥 + 3𝑦 + 2𝑧 = 5 (3.10.)
Which can be rewritten and expanded in the following form:
3𝑥 + 2𝑦 − 4𝑧 = +3
2𝑥 + 0𝑦 − 5𝑧 = −4 (3.11.)
−2𝑥 + 3𝑦 + 2𝑧 = +5
From here the equations can be rewritten as a matrix-vector scalar
multiplication:
3 2 −4 𝑥 3
[ 2 0 −5] [𝑦] = [−4] (3.12.)
−2 3 2 𝑧 5
By generalizing the equation, defining the 𝐴̿ coefficient matrix, the 𝑥̅ solution
vector and 𝑏̅ known coefficients, we can write:
40
𝐴̿𝑥̅ = 𝑏̅ (3.13.)
Which can be rearranged by multiplying with the inverse of 𝐴̿ from the left:
̿ 1 𝑏̅
𝐴̿−1 𝐴̿𝑥̅ = 𝐼 𝑥̿ ̅ = 𝑥̅ = 𝐴− (3.14.)
Which is one of the most common problems in engineering or scientific
calculations. So what we are looking for in order to solve the linear systems
of equations, are methods to generate the inverse of the coefficient matrix,
𝐴− ̿ 1 , or at least an acceptable approximation. The following methods will
show different approaches to generating this matrix.
Note, that large scale matrix inversion is a problem on its own which is of
significant interest of the scientific community. Many engineering software,
such as Finite Element Methods, some Computational Fluid Dynamics
software, image processing, compression algorithms, and many more rely on
some form of matrix inversion. Most of the basic methods shown here
become prohibitively expensive as the size of the coefficient matrix grows, as
the methods contain unnecessary and repeated steps. Nevertheless, they
illustrate the basic ideas of solving the linear systems of equations.
Also note, that many software provide some form of built in matrix inversion
algorithm. In MATLAB, the Y = inv(X) command generates the inverse of a
square matrix X. (Note that the documentation suggests, that the 𝑥 = 𝐴\𝑏
command or the linsolve method is used for solving linear systems of
equations instead!) In Microsoft Excel, you can use the MINVERSE function
on an array of cells to generate the inverse, and MMULT to multiply the
matrices and vectors together.
The Gauss elimination method is the most visual of all the methods, and for
small systems of equations it is a method that can be performed by hand if
required. It is also called the row-reduction method, as it operates directly on
the rows of the coefficient matrix.
There are two similar methods, the Gaussian and Gauss-Jordan elimination.
They both rely on 3 types of elementary row operations that are used to alter
the system of equations, but the outcome of the procedure is different: the
Gaussian elimination stops when an upper triangle matrix is produced out of
the 𝐴̿ coefficient matrix (or non-reduced row echelon form) while the Gauss-
Jordan procedure continues until the completely reduced row echelon form
is reached. The reduced form in the case of a consistent system of equations
is the diagonal unity matrix. Computationally it is often desirable not to
41
perform all the row operations, as algorithmically the gains of having the
reduced format are overshadowed by the additional computational expense.
In order to solve the 𝐴̿𝑥̅ = 𝑏̅ type equations, the system is written in the
following augmented matrix format:
[𝐴̿|𝑏̅] (3.15.)
Which in the case of our example system is:
3 2 −4 3
[ 2 0 −5| −4] (3.16.)
−2 3 2 5
As the systems of equations are linear, we know that any scalar multiple or
addition/subtraction of the equations will also be valid linear equations.
Furthermore, as the order of equations are arbitrary, there are three
operations, that can be used by the Gaussian elimination, which are the
following:
1. Swap any two rows in the matrix
2. Multiply any row with a scalar, non-zero number
3. Add a multiple of one row to any other row
Example
In the case of the example matrix (3.16.), the following steps can be taken to
solve:
1. Divide row 1 by 3:
1 0.666 −1.333 1
[2 0 −5 | −4]
−2 3 2 5
2. Add -2 times row 1 to row 2, and add 2 times row 1 to row 3
1 0.666 −1.333 1
[0 −1.333 −2.333| −6]
0 4.333 −0.666 7
3. Divide row 2 by -1.333:
1 0.666 −1.333 1
[0 1 1.75 | 4.5]
0 4.333 −0.666 7
4. Add -0.666 times row 2 to row 1, and add -4.333 row 2 to row 3.
1 0 −2.5 −2
[0 1 1.75 | 4.5 ]
0 0 −8.25 −12.5
42
5. Divide row 3 by -8.25:
1 0 −2.5 −2
[0 1 1.75 | 4.5 ]
0 0 1 1.5152
6. Add 2.5 times row 3 to row 1 and add -1.75 times row 3 to row 2:
1 0 0 1.7879
[0 1 0| 1.8485]
0 0 1 1.5152
In this example, the solution is of the Gauss-Jordan form, and the solution
represents:
𝑥 = 1.7879
𝑦 = 1.8485
𝑧 = 1.5152
The fact that there is a consistent solution shows that the original system of
equations were linearly independent.
In MATLAB, the following simple implementation shows the Gauss-Jordan
elimination. Note that important features, such as checking for 0 division and
checking for consistency is omitted but should be an important part for any
real algorithm.
% Script C2_System_GaussElimination
% Script to show the Gauss-Jordan elimination
43
% Find the correct row and divide by the pivot element
% in the diagonal
M(i,:) = M(i,:)/M(i,i);
%Iterate through number of rows:
for j=1:length(A)
%Only subtract if it's not the pivot row
if i != j
M(j,:) = M(j,:) - M(j,i)/M(i,i)*M(i,:);
endif
end
end
% Print M to screen
M
44
For this reason, it is common to use a version of the LU factorization, where a
𝑃̿ permutation matrix is included to swap the rows around:
𝐴̿ = 𝑃̿𝐿̿𝑈
̿ (3.19.)
This factorization exists if, and only if 𝐴̿ is non-singular. Because of this
property, and it’s efficiency the LU factorization with row pivoting is the most
common method how systems of equations are solved by a computer. The LU
factorization requires in the order of 2⁄3 𝑛3 operations to solve (where n is
the row count of 𝐴̿). The additional computation expense of breaking down
the coefficient matrix into the two triangular matrices pays back during the
later part of the solution. Compare this to the Gauss elimination, where each
element in every row is going to be operated on multiple times during the
procedure.
Once the factorization is complete, the solution can be found as:
−1
̿ 1 𝑏̅ = (𝑃̿𝐿̿𝑈
𝑥̅ = 𝐴− ̿) 𝑏̅ = 𝑈
̿ −1 𝐿̿−1 𝑃̿−1 𝑏̅ = 𝑈
̿ −1 𝐿̿−1 𝑃̿𝑇 𝑏̅ (3.20.)
This is a very efficient procedure, since the permutation matrix has little cost,
and the inverse of a triangular matrix is also a triangular matrix, so the forward
and back substitution methods can be used to very efficiently solve the
system.
This section deals with a set of methods where the solution of the linear
system of equations are arrived by not a deterministic, but an iterative
method. The benefit of an iterative method is its low computational cost per
iteration, in the order of 𝑛2 , which for high n values is considerably smaller
than for example the 2⁄3 𝑛3 iterations for an LU factorization, which is one of
the most efficient direct methods.
The general format of the iterative methods are the following:
𝐴̿𝑥̅ = [𝑄̿ − 𝑃̿]𝑥̅ = 𝑏̅ (3.21.)
Thus the original 𝐴̿ coefficient matrix is split into 𝑄̿ and 𝑃̿ matrix components.
With these the initial equation can be expressed as:
𝑄̿ 𝑥̅ = 𝑏̅ + 𝑃̿ 𝑥̅ (3.22.)
From here using a similar concept as for finding fixed points of a single
equation, an initial 𝑥̅ 0 estimation of the solution is assumed and then iterated
as:
𝑄̿ 𝑥̅ 𝑘+1 = 𝑏̅ + 𝑃̿𝑥̅ 𝑘 𝑘 = 0,1,2, … (3.23.)
45
And if 𝑄̿ is invertible (non-singular) the next approximation of the solution can
be expressed as:
𝑥̅ 𝑘+1 = 𝑄̿ −1 𝑏̅ + 𝑄̿ −1 𝑃̿𝑥̅ 𝑘 𝑘 = 0,1,2, … (3.24.)
As it can be seen so far, no restrictions have been placed to the different
matrices, except that 𝑄̿ −1 must exist. In theory any decomposition of the
coefficient matrix works, however unless special types of matrices are used,
the inversion of 𝑄̿ −1 can be as costly as just directly solving the original
equation, making the iterative procedure quite inefficient and thus pointless.
The examples provided in this section will show specific, easily invertible
matrix decomposition which enable efficient solving of the equations using
thee iterative method.
3.2.3.1 Vector and matrix norms
For the iterative solutions, the convergence of the solutions needs to be
tracked, as unlike the case of the direct methods, it is not known how many
iterations it will take to find a solution.
One way to monitor the convergence is to assess how close (or similar) two
vectors are to each other. This can be done using vector norms.
Consider a vector:
𝑏̅ = {𝑏1 , 𝑏2 , … , 𝑏𝑛 } (3.25.)
There are 3 different norms that are defined over the vector:
The 𝑙1 norm, ‖𝑏̅‖1 is the sum of absolute values of all vector
components:
𝑛
46
‖𝑏̅‖ ≥ 0 and 0 is possible if and only if vector is the null vector
‖𝛼𝑏̅‖ = |𝛼|‖𝑏̅‖ where 𝛼 is a scalar multiplier
‖𝑏̅ + 𝑐̅‖ ≤ ‖𝑏̅‖ + ‖𝑐̅‖ for any two vectors
Three norms of a matrix can be defined in a similar manner. The 1 norm of the
matrix are equivalent of column-wise maximum of individual vector
operations, the infinite norm is the maximum row-wise, and the 2 norm is
calculated on all elements of the n x n square matrix 𝐴̿ = [𝑎𝑖𝑗 ]:
The 𝑙1 norm, ‖𝐴̿‖ is the largest of the sum of absolute values of all
1
vector components in the matrix columns:
𝑛
The 𝑙2 norm or Euclidean norm, ‖𝐴̿‖2 is the square root of the sum of
squares of all matrix elements:
𝑛 𝑛
The matrix norms have also similar properties to the vector norm calculation:
‖𝐴̿‖ ≥ 0 and 0 is possible if and only if matrix is the null matrix
‖𝛼𝐴̿‖ = |𝛼|‖𝐴̿‖ where 𝛼 is a scalar multiplier
‖𝐴̿ + 𝐵̿‖ ≤ ‖𝐴̿‖ + ‖𝐵̿‖ for any two n x n matrices
‖𝐴̿𝐵̿‖ ≤ ‖𝐴̿‖‖𝐵̿‖ for any two n x n matrices
MATLAB and similar software have built in functions to calculate the various
norms of vectors and matrices. In MATLAB, the norm(v,p) method can be
used to calculate the various norms, where p can be given 1, 2 or Inf,
representing the 3 types of norms shown here.
In this section these norms will be used to track the convergence of the
iterative solutions and stop them when sufficient convergence is achieved.
47
3.2.3.2 The Jacobi method
The Jacobi method first decomposes the 𝐴̿ coefficient matrix into a diagonal,
lower-triangular and upper-triangular) matrices 𝐷 ̿ , 𝐿̿ and 𝑈
̿ in the following
manner:
𝑎11 0 0
̿
𝐷=[ 0 𝑎22 0 ] (3.32.)
0 0 𝑎𝑛𝑛
0 0 0
̿𝐿 = [𝑎21 0 0] (3.33.)
𝑎𝑛1 𝑎𝑛2 0
0 𝑎12 𝑎1𝑛
̿
𝑈 = [0 0 𝑎2𝑛 ] (3.34.)
0 0 0
This way, the 𝑄̿ and 𝑃̿ matrices can be expressed from 𝐴̿ = 𝐷 ̿ + 𝐿̿ + 𝑈
̿ where
the matrices used for the iteration are: 𝑄̿ = 𝐷
̿ and 𝑃̿ = −[𝐿̿ + 𝑈̿] which gives
the specific form of the Jacobi iteration equations:
̿ −1 𝑏̅ + 𝐷
𝑥̅ 𝑘+1 = 𝐷 ̿ −1 (−(𝐿̿ + 𝑈
̿))𝑥̅ 𝑘 𝑘 = 0,1,2, … (3.35.)
Example
The following MATAB script implements the Jacobi iteration. This particular
implementation defines the norm as distance from the true solution as
opposed to the incremental change between the solution vector through the
iterations. For a large problem, such as in the case of a very large CFD model,
the direct calculation of the solution can be very expensive. Note that
convergence of the iteration strongly depends on the structure if the
coefficient matrix, but this not discussed here any further.
Also note, that the script serves demonstration purposes, so some features
are sub-optimal, such as unnecessary matrix additions, or not pre-allocating
the solution matrix size, but these have been left there for better reading.
% Script C2_System_Jacobi.m
% Script to solve linear system of equations using the
% Jacobi method
%Coefficient matrices:
A = [3 2 -4;2 4 -5;-2 3 2]
b = [3;-4;5]
% Try random matrices. Note the high values in the
% diagonal aid convergence
48
% A = rand(4,4)+diag([1,2,3,4]);
% b = rand(4,1);
% Tolerance
eps = 1e-3;
% Maximum limit on iterations
iMax = 100;
% Counter of current iteration number
iCounter = 1;
% The method will run until the error norm is below the
% accepted tolerance or the
% number of iterations exceed the set limit
while ~(cNorm < eps) && iCounter < iMax
xNew = inverse(D)*(b-(L+U)*x(:,iCounter));
x = [x xNew];
iCounter = iCounter + 1;
cNorm = norm(A*x(:,iCounter)-b);
end
A
b
% Result of the iteration
result = x(:,end)
% Compare to direct solution result:
directResult = linsolve(A,b)
49
3.2.3.3 Gauss-Seidel method
The Gauss-Seidel method can be looked as an improved version of the Jacobi
method. It speeds up the solution, because it uses more information about
the solution than the Jacobi. The Jacobi method relies only on the information
at the kth iteration, but the Gauss-Seidel uses information from the column
vectors computed in the (k+1)th iteration; and since it is assumed that the
solution is converging, this will give a better estimation, thus faster solution.
In the Gauss-Seidel iteration, the matrices are defined as the following:
𝑄̿ = 𝐷
̿ + 𝐿̿ and 𝑃̿ = −𝑈
̿ (3.36.)
Using these matrices, the specific iteration form can be written as:
−1 −1
̿ + 𝐿̿] 𝑏̅ − [𝐷
𝑥̅ 𝑘+1 = [𝐷 ̿ + 𝐿̿] 𝑈 ̿𝑥̅ 𝑘 𝑘 = 0,1,2, … (3.37.)
Example
% Coefficient matrices:
A = [3 2 -4;2 4 -5;-2 3 2]
b = [3;-4;5]
% Try random
% A = rand(4,4)+diag([1,2,3,4]);
% b = rand(4,1);
% Tolerance
eps = 1e-3;
% Maximum limit on iterations
iMax = 200;
% Counter of current iteration number
iCounter = 1;
50
% taken off
At = A-D;
L = tril(At);
U = triu(At);
% The method will run until the error norm is below the
% accepted tolerance or the number of iterations exceed
% the set limit
while ~(cNorm < eps) && iCounter < iMax
xNew = inverse(D+L)*(b-U*x(:,iCounter));
x = [x xNew];
iCounter = iCounter + 1;
cNorm = norm(A*x(:,iCounter)-b);
end
For suitably formed problems and good initial estimates the Gauss-Seidel
method achieves convergence in a lower number of iterations than the Jacobi
method, such as 8 vs 13 iterations respectively. Choose a worse starting point,
and a worse structured matrix and the iteration requirement of the Gauss-
Seidel can be more than 100 higher than the Jacobi. The convergence of these
methods are not investigated any further, but the reader is encouraged to
read up the appropriate literature.
So far, the chapter has only discussed the solution of linear system of
equations. Linear systems offer great simplification, as due to the rules of
linearity, the actual equations can be simplified into a matrix algebra problem,
for which robust solutions exist.
51
In the case of a system of non-linear equations, such simplifications cannot be
made, and thus there is more difficulty in solving them, both from
computation cost and from algorithm point of view.
This chapter illustrates how the Newton’s method demonstrated for solving
single variable functions is applied to a more general case. First a system of 2
equations is solved, then the extension of the method to higher number of
variables is discussed.
Let’s define the two variables as 𝑥 and 𝑦 and the two functions in the
following format:
𝑓1 (𝑥, 𝑦) = 0 (3.38.)
𝑓2 (𝑥, 𝑦) = 0 (3.39.)
Similarly to the single variable solution, the concept of the Newton iteration
relies on the Taylor-series expansion of the functions around a given 𝑥1 , 𝑦1
point and an assumed 𝑥2 , 𝑦2 point which is the solution of the system (if such
solution even exists!):
𝜕𝑓1 𝜕𝑓1
𝑓1 (𝑥2 , 𝑦2 ) = 𝑓1 (𝑥1 , 𝑦1 ) + | (𝑥2 − 𝑥1 ) + | (𝑦2 − 𝑦1 )
𝜕𝑥 (𝑥1,𝑦1) 𝜕𝑦 (𝑥 (3.40.)
1 ,𝑦1 )
+⋯
𝜕𝑓2 𝜕𝑓2
𝑓2 (𝑥2 , 𝑦2 ) = 𝑓2 (𝑥1 , 𝑦1 ) + | (𝑥2 − 𝑥1 ) + | (𝑦2 − 𝑦1 )
𝜕𝑥 (𝑥1 ,𝑦1) 𝜕𝑦 (𝑥 (3.41.)
1 ,𝑦1 )
+⋯
Essentially what we have achieved with this, is that the original non-linear
system of equation was linearized in the vicinity of the solution, by taking only
the first order term from the Taylor series. As we have seen during the
discussion on linear systems, the existence of a solution depends on the
52
singularity of the coefficient matrix. This coefficient matrix is referred to as
the Jacobian of the non-linear system, and is written as:
𝜕𝑓1 𝜕𝑓1
𝜕𝑥 𝜕𝑦
𝐽(𝑓1 , 𝑓2 )(𝑥1 ,𝑦1 ) = (3.45.)
𝜕𝑓2 𝜕𝑓2
[ 𝜕𝑥 𝜕𝑦 ](𝑥 ,𝑦 )
1 1
Example
% Functions to be evaluated
function fValue = f1_at_xy(x,y)
fValue=-6.1*x^3+2.3*y^2-11.25
end
% Initial point:
x0 = 2.0;
y0 = -4.0;
% Step size used to calculate local differential
xStep = 1e-5;
yStep = 1e-5;
% Tolerance
eps = 1e-3;
53
% Maximum limit on iterations
iMax = 1e2;
% Counter of current iteration number
iCounter = 1;
% Initial point
xi = x0;
yi = y0;
% Initial evaluation of the functions
f1xiyi = f1_at_xy(xi,yi);
f2xiyi = f2_at_xy(xi,yi);
54
% Function value at the solution
f1_at_xy(xi,yi)
f2_at_xy(xi,yi)
Figure 10 shows the iteration steps the Newton method took to solve the
provided nonlinear system of equations. It can be seen that although the
procedure started “near” the solution, the method went quite far from this
neighbourhood, and in this particular case, returned and provided a solution.
This particular run took 40 iterations to converge within 1E-4 tolerance on the
solution. All the benefits and drawbacks discussed in the single variable case
are still valid for the systems of equations.
The example above was shown for a two-variable system of nonlinear
equation. Following the same logic, the method can be extended to any
number of variables. Using the Taylor-series approximation, each non-linear
equation needs to be differentiated with respect to each variable, and the
Jacobian matrix formed. Then provided the Jacobian is non-singular, the step
vector required to improve the estimate of the solution can be calculated the
usual way.
Figure 10: The Newton iteration steps solving the nonlinear systems of
equations
55
4 Curve Fitting Methods
In real life engineering problems, in most cases some kind of input data has to
be processed, analysed and evaluated, based on which the given system, tool,
unit or model behaviour can be concluded. These input data are mostly
generated during some kind of experiment: some kind of physical quantity
has to be converted to a numerical value in some sort of way. For example:
we can measure the flow velocity, the pressure, the temperature of a fluid,
and in the possession of the material properties and the available flow-domain
the mass flow rate can be calculated. Similarly, by measuring the voltage and
current of an electrical circuit gives the circuit’s electrical resistance. Or the
expansion/compression of a spring indicates the applied force on it, if the
characteristics of the spring is known. From these examples we can conclude
two states: it is really rare that the actually needed physical quantity can be
measured; and because the experiments are designed and executed by
mostly humans, there is generally a measurement error/scattering in the data
(you know: one measurement is not measurement).
Without a doubt, the best start to evaluate the input data is to plot them in a
suitable chosen diagram. In this way, their behaviour can be observed, their
characteristics can be concluded: whether they are linear, quadratic,
exponential, or maybe have normal distribution or completely stochastic.
Indeed, their classification should be done based on science, so the scope of
this chapter is to introduce the curve fitting methods on the initial input data
and the mathematical theory behind them. The fitted curves can be linear or
polynomial, since mostly there are the most widespread used characteristics
in engineering.
The most important consequence of the chapter is the following: we can fit
or approximate curves on the input points, so there are two main methods:
- regression
- interpolation.
4.1 Regression
The regression first was worked out by Francis Galton and the word
‘regression’ itself means: return to the average. Essentially, the regression is
not else but inspecting that how does the result depend on the change of the
explanatory variable. Many examples can be made, like how does the traffic
of an airport depend on the GDP per capita, or how does the torque of a motor
depend on the engine speed, or how does the chance of getting a grade
better than 2 depend on studying more in Numerical Methods class. Indeed,
56
these are just illustrated examples, in real life the traffic of an airport is
influenced also by the location, the tourism, …, in the case of the motor the
quality of the fuel or the load also influences the torque, …, but in the
example of studying, well, that is actually a single variable problem.
The scope is that in the case of regression analysis we want to find the
connection of two or more variables and try to model the behaviour. There
are two basic regression types: linear and nonlinear.
It is essential to understand, that the regression process is only an
approximation. Meaning, we have a set of points and we would like to fit a
curve near to them, but not exactly on them. Therefore, the fitted curve does
not have to go through on the input points. If we wanted to find a curve that
actually fits on every single input point, then we are talking about
interpolation. The basic difference between interpolation and approximation
can be seen in Figure 11.
3,5
Interpolation
3
Regression
2,5
1,5
0,5
0
0 0,5 1 1,5 2
In general, we can state that the regression should be used if large number of
initial points are given, or they have scattering between them (for example
due to an inaccurate experiment). Regression should be used also, if the
characteristics of the input data is well-known (like the motor-torque).
57
4.1.1 Linear Regression
The simplest form of regression is the linear regression, when a linear curve is
fitted to the input points. This is an often method problem in engineering
problems, because linear systems are widely used in every field. The goal of
the linear regression is to find the equation of a curve in the following form:
𝑦 = 𝑎1 𝑥 + 𝑎0 (4.1.)
which fits the best for 𝑛 set points: (𝑥1 , 𝑦1 ), … , (𝑥𝑛 , 𝑦𝑛 ). As a first step, it is
recommended to plot the data points in a Cartesian coordinate system, so we
can observe the linear characteristics of them. For example, if it can be easily
seen that the data points have cubic-characteristics, we still can fit a linear
curve on them, but the result will be probably useless.
These error values have to be calculated in the whole interval in order to find
the optimally suitable 𝑎1 and 𝑎0 values.
Nevertheless, there comes the next question: how can we decide, which one
is the best solution? We cannot just summarise the errors, since the negative
and positive values were cancel each other, resulting in an inaccurate solution.
Therefore, to eliminate this mistake, we can get the absolute value of each
error, or the square of them, and complete the summation. In both cases, we
58
do not have to be afraid of the negative signs, since the result are positive in
every cases. However, if we got only the absolute values of the errors, we will
not get exact solution by solving the following equation:
𝑛 𝑛
Figure 13: Two linear fittings with the same total error (absolute values are
summed)
To get rid of this effect, the least mean square method has been invented,
which has become the most widespread used regression method. In this case,
the sum of error-squares is calculated, as follows:
𝑛 𝑛
- error values with different signage are not eliminating each other,
- the total error is always positive,
- there is always an exact solution for 𝑎1 and 𝑎0 coefficients,
- small error values have got smaller weight in the summation, than the
large errors, due to the square counting.
59
The scope is still to find a linear curve with the equation of 𝑦 = 𝑎1 𝑥 + 𝑎0 , that
suits best to (𝑥1 , 𝑦1 ), … , (𝑥𝑛 , 𝑦𝑛 ) input points, and at the same time minimise
the total error, calculated in equation (4.4.). The 𝐸 total error is influenced by
𝑎1 and 𝑎0 in non-linear way, so it is clear that this equation has a minimum
𝜕𝐸 𝜕𝐸
point, where 𝜕𝑎 and 𝜕𝑎 values are zero, like:
1 0
𝑛
𝜕𝐸
= −2 ∑ 𝑥𝑖 [𝑦𝑖 − (𝑎1 𝑥𝑖 + 𝑎0 )] = 0
𝜕𝑎1
𝑖=1
𝑛
𝑛 𝑛 𝑛
2
(∑ 𝑥𝑖 ) 𝑎0 + (∑ 𝑥𝑖 ) 𝑎1 = ∑ 𝑥𝑖 𝑦𝑖
𝑖=1 𝑖=1 𝑖=1
𝑛 𝑛 (4.6.)
𝑛𝑎0 + (∑ 𝑥𝑖 ) 𝑎1 = ∑ 𝑦𝑖
𝑖=1 𝑖=1
60
Example [3]
61
We can run our function in the Command Window:
>> x = 0:0.2:1.2;
>> y = [7.6 8.4 8.2 8.6 8.5 8.6 8.9];
>> [a1 a0] = LinearReg(x,y)
The equation of the linear: y = 8.214286e-01x + 7.907143e+00.
62
𝑦 = 𝑎𝑒 𝑏𝑥 (4.8.)
where 𝑎 and 𝑏 are constants.
Since, the differentiation of the exponential function gives the same
exponential function multiplied by a constant, so this technic provides a
solution to describe quantities, where the rate of change of the quantity and
the quantity itself are directly proportional, like radioactive decay, or the
flame-speed described by Vibe combustion model.
Figure 14: Three linearization technics for the three function types: a,d –
exponential; b,e – power; c,f - saturation [3]
63
log 𝑦 = 𝑏 log 𝑥 + log 𝑎 (4.11.)
That is, both log 𝑦 has to be plotted in the function of log 𝑥, to get a linear with
slope 𝑏 and log 𝑎 intercept, as (b) and (e) of Figure 14 illustrate.
64
𝑛
𝜕𝐸
= −2 ∑{𝑥𝑖 [𝑦𝑖 − (𝑎𝑚 𝑥 𝑚 + 𝑎𝑚−1 𝑥 𝑚−1 + ⋯ + 𝑎2 𝑥 2 + 𝑎1 𝑥 + 𝑎0 )]} = 0
𝜕𝑎1
𝑖=1
𝑛
𝜕𝐸
= −2 ∑{𝑥𝑖2 [𝑦𝑖 − (𝑎𝑚 𝑥 𝑚 + 𝑎𝑚−1 𝑥 𝑚−1 + ⋯ + 𝑎2 𝑥 2 + 𝑎1 𝑥 + 𝑎0 )]} = 0
𝜕𝑎2
𝑖=1
...
𝑛
𝜕𝐸
= −2 ∑{𝑥𝑖𝑚 [𝑦𝑖 − (𝑎𝑚 𝑥 𝑚 + 𝑎𝑚−1 𝑥 𝑚−1 + ⋯ + 𝑎2 𝑥 2 + 𝑎1 𝑥 + 𝑎0 )]} = 0
𝜕𝑎𝑚
𝑖=1
= ∑ 𝑥𝑖 𝑦𝑖
𝑖=1
𝑛 𝑛 𝑛 𝑛 𝑛
(4.17.)
(∑ 𝑥𝑖2 ) 𝑎0 + (∑ 𝑥𝑖3 ) 𝑎1 + (∑ 𝑥𝑖4 ) 𝑎2 + ⋯ + (∑ 𝑥𝑖𝑚+2 ) 𝑎𝑚 = ∑ 𝑥𝑖2 𝑦𝑖
𝑖=1 𝑖=1 𝑖=1 𝑖=1 𝑖=1
...
𝑛 𝑛 𝑛 𝑛
= ∑ 𝑥𝑖𝑚 𝑦𝑖
𝑖=1
Example [3]
65
% describe the quadratic fit.
n = length(x);
Sumx = sum(x); Sumy = sum(y);
Sumx2 = sum(x.^2); Sumx3 = sum(x.^3); Sumx4 =
sum(x.^4);
Sumxy = sum(x.*y); Sumx2y = sum(x.*x.*y);
% Form the coefficient matrix and the vector of
% right-hand sides
A =[n Sumx Sumx2;Sumx Sumx2 Sumx3;Sumx2 Sumx3 Sumx4];
b =[Sumy;Sumxy;Sumx2y];
w = A\b; % Solve
a2 = w(3); a1 = w(2); a0 = w(1);
% Plot the data and the quadratic fit
xx = linspace(x(1),x(end));
% Generate 100 points for plotting purposes
p = zeros(100,1); % Pre-allocate
for i = 1:100
p(i)= a2*xx(i)^2 + a1*xx(i)+ a0;
% Calculate 100 points
end
plot(x,y,'o')
hold on
plot(xx,p)
xlabel('x'); ylabel('y'); title('Quadratic
Regression')
fprintf('The equation of the parabolic: y = %dx^2 +
%dx + %d.',a2,a1,a0)
66
Example [3]
67
a3 = w(4); a2 = w(3); a1 = w(2); a0 = w(1);
% Plotting
xx = linspace(x(1),x(end));
p = zeros(100,1);
for i = 1:100,
p(i)= a3*xx(i)^3 + a2*xx(i)^2 + a1*xx(i) + a0;
end
plot(xx,p,x,y,'o')
hold on
xlabel('x'); ylabel('y'); title('Cubic Regression')
fprintf('The equation of the polynom is: y = %dx^3 +
%dx^2 + %dx + %d.',a3,a2,a1,a0)
end
Let’s see how large is the difference between the two methods for the
following points:
>> x3 = 0:0.4:1.6;
>> y3 = [2.95 3.2 3.42 4.8 6.91];
68
Built-in MATLAB functions
Indeed, MATLAB has its own built-in functions in order to make our problems
easier. The two used functions are:
69
9.5679
a2 =
-13.1746
a1 =
8.4484
a0 =
3.4586
It can be seen that the coefficients are the same in both cases, also, the curve
does not fit completely to the input data, since there is a small deviation.
70
Table 2: The 𝑓(𝑥) function, interpreted in 𝑥0 ≤ 𝑥 ≤ 𝑥𝑛 domain
71
Example
Calculate the difference table until the fourth difference for the following 𝑥
values: 0, 1, 3, 4, 7, 9, if the function is : 𝑦 = 𝑓(𝑥) = 𝑥 3
Solution
Create the table below. The first column includes the 𝑥 values, the second
column contains the 𝑓(𝑥) function values, and from the third to the sixth
columns we write the differences. With using the former expression, we can
calculate the first difference:
1 − 27
= 13
1−3
Then, the second difference:
37 − 93
= 14
3−7
Likewise, the third difference:
4−8
=1
0−4
Finally, the fourth difference:
1−1
=0
0−4
The difference table looks like the following:
It is interesting to observe, that the third differences are equal, and the fourth
differences are all zero.
72
Generally, 𝑥 values are located with the same distances from each other or so-
called they are equidistant. In this particular case, the denominators will be
equal to each other, so they can be eliminated. As a result, the differences can
be calculated as the difference of the function values.
So, if the difference is constant between 𝑥 values, then a completely average
𝑥𝑘 value can be written in the following way:
𝑥𝑘 = 𝑥0 + 𝑘ℎ 𝑤ℎ𝑒𝑟𝑒 𝑘 = ⋯ − 2, −1, 0, 1, 2, … (4.20.)
Based on that, the first differences can be expressed by introducing Δ
difference operator:
∆𝑓𝑘 = 𝑓𝑘+1 − 𝑓𝑘 (4.21.)
Similarly, the second differences are:
∆2 𝑓𝑘 = ∆(∆𝑓𝑘 ) = ∆𝑓𝑘+1 − ∆𝑓𝑘 (4.22.)
By generalising the solution to every positive 𝑛 value:
∆𝑛 𝑓𝑘 = ∆(∆𝑛−1 𝑓𝑘 ) = ∆𝑛−1 𝑓𝑘+1 − ∆𝑛−1 𝑓𝑘 (4.23.)
Of course, the difference operator meets with the rules of the exponents:
∆𝑚 (∆𝑛 𝑓𝑘 ) = ∆𝑚+𝑛 𝑓𝑘 (4.24.)
Based on that, the new difference table can be created as follows:
73
Example
Solution:
Create the difference table as in the former example. It can be seen, that the
3rd differences are equal to each other again, and the 4th differences are still
zeros.
74
4.2.1 Factorial Polynomials
Factorial polynomials can be defined as:
(𝑥)(𝑛) = 𝑥(𝑥 − 1)(𝑥 − 2) … (𝑥 − 𝑛 + 1)
1 (4.28.)
(𝑥)−(𝑛) =
(𝑥 − 1)(𝑥 − 2) … (𝑥 − 𝑛)
By using Δ difference operator again, the same equation can be expressed in
the following form:
∆(𝑥)(𝑛) = (𝑥)(𝑛−1)
(4.29.)
∆(𝑥)−(𝑛) = −𝑛(𝑥)(𝑛−1)
Sometimes, creating a factorial polynomial is the best solution to handle a
problem. So, to create one, we have to use the Maclaurin series, which is
written as:
𝑝𝑛 (𝑥) = 𝑎0 + 𝑎1 (𝑥)(1) + 𝑎2 (𝑥)(2) + ⋯ + 𝑎𝑛 (𝑥)(𝑛) (4.30.)
where 𝑎𝑘 coefficients have to be determined. For that, the first expression
gives 𝑥 = 0, thus:
𝑎0 = 𝑝𝑛 (0) (4.31.)
The 𝑎1 coefficient can be calculated from equation (4.30.):
∆𝑝𝑛 (𝑥) = 1𝑥 0 𝑎1 + 2𝑎2 (𝑥)(1) + 3𝑎3 (𝑥)(2) + ⋯ + 𝑛𝑎𝑛 (𝑥)(𝑛−1) (4.32.)
Then, making 𝑥 equal to zero, we get:
𝑎1 = ∆𝑝𝑛 (0) (4.33.)
The next differentiation gives:
∆2 𝑝𝑛 (𝑥) = 2 ∙ 1𝑎2 + 3 ∙ 2𝑎3 (𝑥)(1) + ⋯ + 𝑛(𝑛 − 1)𝑎𝑛 (𝑥)(𝑛−2) (4.34.)
Due to 𝑥 = 0, 𝑎2 will be:
∆2 𝑝𝑛 (0) ∆2 𝑝𝑛 (0)
𝑎2 = = (4.35.)
2∙1 2!
The general form:
∆𝑗 𝑝𝑛 (0)
𝑎𝑗 = ∀ 𝑗 = 0,1,2, … , 𝑛 (4.36.)
𝑗!
After determining the calculation process, we could really ask: what is this
good for? The answer is simple: this is one of the easiest way to generate a
difference table.
There are 3 major steps in the method:
75
1. The 𝑝𝑛 (𝑥) values – provided by the Maclauring series – are divided with
𝑥, which gives a quotient 𝑞0 (𝑥) and a residue 𝑟0 , which latter is equal to
the 𝑎0 coefficient. Thus:
𝑝𝑛 (𝑥) = 𝑟0 + 𝑥𝑞0 (𝑥) (4.37.)
2. By dividing 𝑞0 (𝑥) with (𝑥 − 1) to express 𝑞1 (𝑥) quotient and 𝑟1 residue,
we got:
𝑞0 (𝑥) = 𝑟1 + (𝑥 − 1)𝑞1 (𝑥) (4.38.)
Substituting this expression to equation (4.37.) and using the original
equation (4.30.) we can write:
𝑝𝑛 (𝑥) = 𝑟0 + 𝑥[𝑟1 + (𝑥 − 1)𝑞1 (𝑥)]
(4.39.)
= 𝑟0 + 𝑟1 (𝑥)(1) + 𝑥(𝑥 − 1)𝑞1 (𝑥)
3. Dividing 𝑞1 (𝑥) with (𝑥 − 2) we get 𝑞2 (𝑥) and the residue 𝑟2 , which
latter is now equal to 𝑎2 , so:
𝑞1 (𝑥) = 𝑟2 + (𝑥 − 2)𝑞2 (𝑥) (4.40.)
Substituting back to the former equation:
𝑝𝑛 (𝑥) = 𝑟0 + 𝑟1 (𝑥)(1) + 𝑥(𝑥 − 1)[𝑟2 + (𝑥 − 2)𝑞2 (𝑥)] =
(4.41.)
= 𝑟0 + 𝑟1 (𝑥)(1) + 𝑟2 (𝑥)(2) + 𝑥(𝑥 − 1)(𝑥 − 2)𝑞2 (𝑥)
Then, we continue this process until the new quotient’s order is not 1 order
lower than the former quotient’s; meaning we can repeat this algorithm
(𝑛 + 1) times.
4.2.2 Anti-differentiation
If only the first derivative of a given function is known, in order to get the
function itself, the opposite of the differentiation has to be carried out, which
76
is the so-called anti-differentiation. We could be suspicious, since it is really
sounds like the integration, and it actually is, but from a different point of
view. Our goal right now is to carry out an integration process without the
general integration rules, and relying only on the difference-calculation. Going
back to equation (4.29.), the anti-differentiation can be given in the following
general form:
(𝑥)(𝑛+1)
∆−1 (𝑥)(𝑛) = (4.45.)
(𝑛 + 1)
Let’s see, why this equation looks so familiar!
Example
Solution:
Create the factorial polynomial, as it was described earlier: (division, quotient
and residual formation, repetition):
𝑝𝑛 (𝑥) = 4 − (𝑥)(1) − 8(𝑥)(2) + (𝑥)(3) + (𝑥)(4)
Then create the difference table:
∆0 𝑝(0) = 0! ∙ 4 = 4
∆1 𝑝(0) = 1! ∙ (−1) = −1
∆2 𝑝(0) = 2! ∙ (−8) = −16
∆3 𝑝(0) = 3! ∙ 1 = 6
∆4 𝑝(0) = 4! ∙ 1 = 24
∆5 𝑝(0) = 5! ∙ 0 = 0
Insert the values to the table, then the next round and so on:
77
Repeat the steps, until the table is not ready:
78
From the table the anti-differentiation formula can be written as:
(𝑥)(5) (𝑥)(4) (𝑥)(3) (𝑥)(2)
∆−1 𝑝𝑛 (𝑥) = + −8 − + 4(𝑥)(1) + 𝐶
5 4 3 2
where 𝐶 is a constant.
In fact, we've actually implemented integration.
The 3.6 Example gives us a feeling that the sum of the series pn(x) over the
interval of 𝑎 ≤ 𝑥 ≤ 𝑎 + (𝑛 − 1) is:
𝛼+(𝑛−1)ℎ
𝛼+(𝑛−1)ℎ
4.3 Interpolation
So far to this point we have discussed how can we approximate a set if input
points with a function, if the function does not necessarily have to go through
every single points, but the error between the function values and given
points are minimal. It was reasonable, since in the case of experimental data
usually there is a light (or strong) scattering between the points; or if we have
a large number of input points it was an unnecessarily hard constraint to make
the function go through at every point – this would cause a really large
computation capacity-requirement, furthermore probably the results was not
in the expected range.
There are, however, many situations when the input points mean a constraint
to the function, because we need the curve touch every point, like if we draw
a 3 dimensional model in a CAD software, we define points then connect them
with some kind of curve. In this case, interpolation method is applied. In
definition, the interpolation is a numerical method, which gives and describes
a connection between two sets of numbers; it provides the unknown values
79
of a function based on the known values. All of the interpolation methods
have to fulfil three conditions:
- it has to go through at given input points,
- must fulfil further special conditions, like slope, curvature,
- apply polynomials for easy of use.
80
where 𝐿1 (𝑥) and 𝐿2 (𝑥) are the Lagrange coefficients. They can be given as:
𝑥 − 𝑥2 𝑥 − 𝑥1
𝐿1 (𝑥) = , 𝐿2 (𝑥) = (4.52.)
𝑥1 − 𝑥2 𝑥2 − 𝑥1
When 𝐿1 (𝑥1 ) = 1 and 𝐿1 (𝑥2 ) = 0, it is also true that 𝐿2 (𝑥1 ) = 0 and 𝐿2 (𝑥2 ) =
1. This means that 𝑝1 (𝑥1 ) = 𝑦1 and 𝑝1 (𝑥2 ) = 𝑦2 , namely the interpolating
polynomial is a straight line that passes both points, as it can be seen in Figure
15.
81
The pattern can be figured out from these two examples, which is the
following: the 𝑛-th order Lagrange polynomial that passes through 𝑛 + 1
points (𝑥1 , 𝑦1 ), … , (𝑥𝑛+1 , 𝑦𝑛+1 ) can be written as follows:
𝑛+1
Example [3]
By using Lagrange method, find the interpolation polynomial of the input data
in the following table, then use this polynomial to find the interpolated value
at 𝑥 = 0,3.
𝒊 𝒙𝒊 𝒚𝒊
1 0,1 0,12
2 0,5 0,47
3 0,9 0,65
Solution:
The three Lagrange interpolation coefficients are:
(𝑥 − 𝑥2 )(𝑥 − 𝑥3 ) (𝑥 − 0,5)(𝑥 − 0,9)
𝐿1 (𝑥) = =
(𝑥1 − 𝑥2 )(𝑥1 − 𝑥3 ) (0,1 − 0,5)(0,1 − 0,9)
(𝑥 − 𝑥1 )(𝑥 − 𝑥3 ) (𝑥 − 0,1)(𝑥 − 0,9)
𝐿2 (𝑥) = =
(𝑥2 − 𝑥1 )(𝑥2 − 𝑥3 ) (0,5 − 0,1)(0,5 − 0,9)
(𝑥 − 𝑥1 )(𝑥 − 𝑥2 ) (𝑥 − 0,1)(𝑥 − 0,5)
𝐿3 (𝑥) = =
(𝑥3 − 𝑥1 )(𝑥3 − 𝑥2 ) (0,9 − 0,1)(0,9 − 0,5)
When the coefficients are known, the 2nd order polynomial can be written as:
𝑝2 (𝑥) = 𝐿1 (𝑥)𝑦1 + 𝐿2 (𝑥)𝑦2 + 𝐿3 (𝑥)𝑦3 =
82
(𝑥 − 0,5)(𝑥 − 0,9) (𝑥 − 0,1)(𝑥 − 0,9)
= (0,12) + (0,47)
(0,1 − 0,5)(0,1 − 0,9) (0,5 − 0,1)(0,5 − 0,9)
(𝑥 − 0,1)(𝑥 − 0,5)
+ (0,65) =
(0,9 − 0,1)(0,9 − 0,5)
= 0,5312𝑥 2 + 1,1937𝑥 + 0,0059
By using this polynomial the interpolated value at 𝑥 = 0,3 is 𝑝2 (0,3) = 0,3162.
Example
Let’s define a MATLAB function that can find the Lagrange interpolating
polynomial, fits to set of data, also can interpolate the polynomial’s value at a
specified point:
function yi = LagrangeInterp(x,y,xi)
%
% LagrangeInterp finds the Lagrange interpolating
% polynomial that fits the data (x,y) and uses it to
% find the interpolated value at xi.
% yi = LagrangeInterp(x,y,xi) where x, y are n-
% dimensional row or column vectors of data,
% xi is a specified point, yi is the interpolated
% value at xi.
n = length(x);
L = zeros(1,n);
for i = 1:n,
L(i) = 1;
for j = 1:n,
if j ~= i,
L(i) = L(i)* (xi - x(j))/(x(i) - x(j));
end
end
end
yi = sum(y.* L);
83
>> yi = LagrangeInterp(x,y,0.3)
ans =
0.3037
84
𝑦3 − 𝑦2 𝑦2 − 𝑦1
𝑥 − 𝑥2 − 𝑥2 − 𝑥1 (4.61.)
𝑎3 = 3
𝑥3 − 𝑥1
If we continue this process, all the missing coefficients can be determined
with using Newton’s divided differences. Calculating for two points (𝑥1 , 𝑦1 )
and (𝑥2 , 𝑦2 ), the first divided difference can be given as the slope of the
connecting line between the two points, which is:
𝑦2 − 𝑦1
𝑓[𝑥2 , 𝑥1 ] = = 𝑎2 (4.62.)
𝑥2 − 𝑥1
If three points were considered:
𝑦3 − 𝑦2 𝑦2 − 𝑦1
𝑓[𝑥3 , 𝑥2 ] − 𝑓[𝑥2 , 𝑥1 ] 𝑥3 − 𝑥2 − 𝑥2 − 𝑥1 (4.63.)
𝑓[𝑥3 , 𝑥2 , 𝑥1 ] = = = 𝑎3
𝑥3 − 𝑥1 𝑥3 − 𝑥1
For a general case, the method can be written in the following form:
𝑓[𝑥𝑘+1 , 𝑥𝑘 , … , 𝑥2 , 𝑥1 ]
𝑓[𝑥𝑘+1 , 𝑥𝑘 , … , 𝑥2 , 𝑥1 ] − 𝑓[𝑥𝑘+1 , 𝑥𝑘 , … , 𝑥2 , 𝑥1 ]
= (4.64.)
𝑥𝑘+1 − 𝑥1
= 𝑎𝑘+1
Therefore, we can also give Newton’s divided-difference interpolating
polynomial for 𝑛 + 1 pieces of points (𝑥1 , 𝑦1 ), … , (𝑥𝑛+1 , 𝑦𝑛+1 ) as:
𝑝𝑛 (𝑥) = 𝑎1 + 𝑎2 (𝑥 − 𝑥1 ) + 𝑎3 (𝑥 − 𝑥1 )(𝑥 − 𝑥2 ) + ⋯ +
(4.65.)
+𝑎𝑛+1 (𝑥 − 𝑥1 )(𝑥 − 𝑥2 ) … (𝑥 − 𝑥𝑛 )
where 𝑎1 , … , 𝑎𝑛+1 coefficients are in order 𝑦1 , 𝑓[𝑥2 , 𝑥1 ], … , 𝑓[𝑥𝑛+1 , 𝑥1 ]; for
determining their value, the best solution is to apply the difference tables.
Example
Find the 4th order Newton’s divided-difference polynomial for the following
input points, then use the interpolation polynomial to find the interpolated
value at 𝑥 = 0,7.
𝒊 𝒙𝒊 𝒚𝒊
1 0 0
2 0,1 0,1210
3 0,2 0,2258
4 0,5 0,4650
5 0,8 0,6249
85
Solution:
The first task is to create the difference table, which can be seen below.
Based on equation (4.64.), the 4th order Newton’s divided-difference
interpolating polynomial is denoted by:
𝑝4 (𝑥) = 𝑎1 + 𝑎2 (𝑥 − 𝑥1 ) + 𝑎3 (𝑥 − 𝑥1 )(𝑥 − 𝑥2 )
+ 𝑎4 (𝑥 − 𝑥1 )(𝑥 − 𝑥2 )(𝑥 − 𝑥3 )
+ 𝑎5 (𝑥 − 𝑥1 )(𝑥 − 𝑥2 )(𝑥 − 𝑥3 )(𝑥 − 𝑥4 ) =
= 0 + 1,21(𝑥 − 0) − 0,81(𝑥 − 0,1) + 0,3664(𝑥 − 0)(𝑥 − 0,1)(𝑥 − 0,2) −
−0,1254(𝑥 − 0)(𝑥 − 0,1)(𝑥 − 0,2)(𝑥 − 0,5) =
= −0,1254𝑥 4 + 0,4667𝑥 3 − 0,9412𝑥 2 + 1,2996𝑥
By using the polynomial, the interpolated value can be calculated easily:
𝑝4 (0,7) = 0,5784
The difference table:
In practice, it works in the following way: MATLAB has the INTERP1 function:
86
- nearest neighbour interpolation method – ‘nearest’
- linear interpolation method – ‘linear’
- cubic spline interpolation method – ‘spline’
- Shape-preserving piecewise cubic interpolation – ‘pchip’
Next, we will see some examples both for the built-in and for user-specified
functions and their operation mode.
function a = divdiff(x, y)
a = divdiff(x, y);
n = length(a);
val = a(n)*ones(1,length(xi));
for m = n-1:-1:1
val = (xi - x(m)).*val + a(m);
end
yi = val(:);
To see, how they are working, define the points (x,y) and the points, where
the interpolant is evaluated (xi,yi):
87
>> x = 0:pi/5:pi;
>> y = sin(2.*x);
>> xi = 0:pi/100:pi;
>> yi = interp1(x,y,xi, 'nearest ');
>> plot(x, y, 'o', xi, yi), title('Piecewise constant interpolant of y = sin(2x)')
The script gives us the following result:
88
Now try out the Newtonpol function:
% Script showint.m
% Plot of the function 1/(1 + x^2) and its
% interpolating polynomial of degree n.
m=input('Enter number of interpolating polynomials');
for k=1:m
n = input('Enter degree of the interpolating
polynomial ');
hold on
x = linspace(-5,5,n+1);
89
y = 1./(1 + x.*x);
z = linspace(-5.5,5.5);
t = 1./(1 + z.^2);
h1_line = plot(z,t,'-.');
set(h1_line, 'LineWidth',1.25)
t = Newtonpol(x,y,z);
h2_line = plot(z,t,'r');
set(h2_line,'LineWidth',1.3,'Color',[0 0 0])
axis([-5.5 5.5 -.5 1])
title(sprintf('Example of divergence (n =
%2.0f)',n))
xlabel('x')
ylabel('y')
legend('y = 1/(1+x^2)','interpolant')
hold off
end
By running this script, try out the parameters of m = 1 and n = 9. the result is
the following:
90
function yi = NewtonInterp(x,y,xi)
%
% NewtonInterp finds the Newton divided-difference
% interpolating polynomial that fits the data (x,y)
% and uses it to find the interpolated value at xi.
% yi = NewtonInterp(x,y,xi) where x, y are n-
% dimensional row or column vectors of data,
% xi is a specified point, yi is the interpolated
% value at xi.
n = length(x);
a = zeros(1,n);
a(1) = y(1);
DivDiff = zeros(1,n-1);
for i = 1:n-1,
DivDiff(i,1) = (y(i+1) - y(i))/(x(i+1) - x(i));
end
for j = 2:n-1,
for i = 1:n-j,
DivDiff(i,j) = (DivDiff(i+1,j-1) - DivDiff(i,j-
1))/(x(j+i) - x(i));
end
end
for k = 2:n,
a(k) = DivDiff(1,k-1);
end
yi = a(1);
xprod = 1;
for m = 2:n,
xprod = xprod*(xi - x(m-1));
yi = yi + a(m)*xprod;
end
Although, the script looks more messy, the function itself is less complicated,
since can handle only a constant xi value, not a vector type.
91
4.3.3 Spline Interpolation [3]
Up to this point, to interpolate 𝑛 + 1 points, a 𝑛-th degree polynomial has
been used. For example, to interpolate 11 input points, by using the Lagrange
interpolation we got an 10th degree polynomial, which is not really practical,
furthermore is really ugly. This means, that the former interpolation methods
work well only for a low number of points, but for 101 or 1001 input points, it
is not a good solution to use an 100th or 1000th degree polynomial, it would
not make much sense. Additionally, considering 11 points and the 10 th degree
polynomial, due to the peaks, generated by the high degree polynomial, the
interpolation method can make relevant errors, as it can be seen in Figure 17.
This means, the higher the number of initial points the higher degree
polynomial is required, thus the error of the interpolation method is also
higher.
In order to eliminate on this phenomena, an idea was born to avoid using one
global polynomial to interpolate a large number of initial points but use more
low-degree polynomial, interpreted over sub-intervals, including one or more
data points. These low-degree polynomials are the so-called splines.
Originally, splines were flexible, thin strips – rulers, mostly made from special
wood or steel, and were used by naval architects and drafters. The two
endpoints of the stripe were fixed, in the middle section weights were applied
creating curves with different curvatures. At this time, it was mathematically
challenging to describe these functions, but as soon as the mathematical
background was available, the splines have become maybe the most
commonly used interpolation methods.
92
Figure 17: Interpolating 11 points with a 10th degree polynomial and with using a
cubic spline
The most commonly applied spline types is the cubic spline, which can give
accurate description and smooth transition between two sub-intervals, as it
can be seen in Figure 17. The same figure illustrates the advantages of the
cubic spline over the high-degree polynomials.
93
drawbacks, since linear splines are not capable of creating a smooth transition
between sub-intervals, additionally they create a breakpoint in every point.
Figure 18: Interpolating of 4 points with linear splines and a 3rd degree
polynomial
This behaviour comes from the fact that the linear splines’ first derivatives
(slopes) are not equal to each other in each point (if this was a constraint, the
interpolation problem could not been carried out with linear splines). That’s
why our next constraint will be determining the equality of the first
derivatives, even if then first degree polynomials cannot be used. So in the
following, higher degree polynomials will be called over the sub-intervals. Of
course, our next problem will be the equality of the second derivatives, but
this will be discussed after we dealt with the first derivatives.
94
𝑆𝑖 (𝑥) = 𝑎1 𝑥 2 + 𝑏𝑖 𝑥 + 𝑐𝑖 , 𝑖 = 1,2, … , 𝑛 (4.67.)
where 𝑎𝑖 , 𝑏𝑖 , 𝑐𝑖 (𝑖 = 1,2, … , 𝑛) are the unkown coefficients, have to be
determined. Since we have 𝑛 polynomials, each one with 3 coefficients, 3𝑛
unkown coefficients have to be determined. In order to calculate every one
of them, exactly 3𝑛 equations have to be constructed.
The first polynomial 𝑆1 (𝑥) must pass through (𝑥1 , 𝑦1 ) point, and the last one
𝑆𝑛 (𝑥) has to pass through (𝑥𝑛+1 , 𝑦𝑛+1 ):
𝑆𝑖 (𝑥) = 𝑦1 = 𝑎1 𝑥 2 + 𝑏1 𝑥1 + 𝑐1
2
(4.68.)
𝑆𝑛 (𝑥𝑛+1 ) = 𝑦𝑛+1 = 𝑎𝑛 𝑥𝑛+1 + 𝑏𝑛 𝑥𝑛+1 + 𝑐𝑛
Function values at intermediate points:
95
𝑆2 (𝑥2 ) = 𝑦2 , 𝑆3 (𝑥3 ) = 𝑦3 , … , 𝑆𝑛 (𝑥𝑛 ) = 𝑦𝑛
This way, 𝑛 − 1 equations are generated. Therefore, the general form is the
following:
2
𝑆𝑖 (𝑥𝑖+1 ) = 𝑦𝑖+1 = 𝑎𝑖 𝑥𝑖+1 + 𝑏𝑖 𝑥𝑖+1 + 𝑐𝑖 , 𝑖 = 1,2, … , 𝑛 − 1
(4.70.)
𝑆𝑖 (𝑥𝑖 ) = 𝑦𝑖 = 𝑎𝑖 𝑥𝑖2 + 𝑏𝑖 𝑥𝑖 + 𝑐𝑖 , 𝑖 = 2,3, … , 𝑛 − 1
Values of the first derivatives in the intermediate points:
In the intermediate points, the values of the first derivatives must agree,
which can be written mathematically in the following form (creating 𝑛 − 1
equations):
𝑆′1 (𝑥2 ) = 𝑆′2 (𝑥2 ), 𝑆′2 (𝑥3 ) = 𝑆′3 (𝑥3 ), … , 𝑆 ′ 𝑛−1 (𝑥𝑛 ) = 𝑆′𝑛 (𝑥𝑛 ) (4.71.)
In general form:
This means:
Example
96
𝑥𝑖 𝑦𝑖
2 5
3 2,3
5 5,1
7,5 1,5
Solution:
Because we have 4 input points, 𝑛 = 3, thus 3 quadratic splines have to be
determined with the total of 9 unknown coefficients. First of all we know that
𝑎1 = 0. The rest 8 equations are given by solving equations (4.68.)-(4.73.).
Based on equation (4.68.):
𝑎1 (2)2 + 𝑏1 (2) + 𝑐1 = 5
𝑎3 (7,5)2 + 𝑏3 (7,5) + 𝑐3 = 1,5
By using equation (4.70.):
𝑎1 (3)2 + 𝑏1 (3) + 𝑐1 = 2,3
𝑎2 (5)2 + 𝑏2 (5) + 𝑐2 = 5,1
𝑎2 (3)2 + 𝑏2 (3) + 𝑐2 = 2,3
𝑎3 (5)2 + 𝑏3 (5) + 𝑐3 = 5,1
Finally, applying equation (4.73.):
2𝑎1 (3) + 𝑏1 = 2𝑎2 (3) + 𝑏2
2𝑎2 (5) + 𝑏2 = 2𝑎3 (5) + 𝑏3
To simplify the problem, transform the linear system of equation into matrix
form:
2 1 0 0 0 0 0 0 𝑏1 5
0 0 0 0 0 56,25 7,5 1 𝑐1 1,5
3 1 0 0 0 0 0 0 𝑎2 2,3
0 0 25 5 1 0 0 0 𝑏2 5,1
=
0 0 9 3 1 0 0 0 𝑐2 2,3
0 0 0 0 0 25 5 1 𝑎3 5,1
1 0 −6 −1 0 0 0 0 𝑏3 0
[0 0 10 1 0 −10 −1 0] [ 𝑐3 ] [ 0 ]
Solving the system of equation, we will have the following coefficients:
𝑎1 = 0; 𝑎2 = 2,05; 𝑎3 = −2,776;
𝑏1 = −2,7; 𝑏2 = −15; 𝑏3 = 33,26;
𝑐1 = 10,4; 𝑐2 = 28,85; 𝑐3 = −91,8
As the final step, construct the quadratic spline equations:
𝑆1 (𝑥) = −2,8𝑥 + 10,4 2 ≤ 𝑥 ≤ 3
97
𝑆2 (𝑥) = 2,05𝑥 2 − 15𝑥 + 28,85 3 ≤ 𝑥 ≤ 5
𝑆3 (𝑥) = −2,776𝑥 2 + 33,26𝑥 − 91,8 5 ≤ 𝑥 ≤ 7,5
By plotting the solution of the example:
98
The first derivatives also have the same values in the intermediate points
(creating 𝑛 − 1 equations):
𝑆𝑖′ (𝑥𝑖+1 ) = 𝑆𝑖+1
′
(𝑥𝑖+1 ), 𝑖 = 1,2, … , 𝑛 − 1 (4.77.)
The second derivatives are also equal to each other in the intermediate points
(creating 𝑛 − 1 equations):
𝑆𝑖′′ (𝑥𝑖+1 ) = 𝑆𝑖+1
′′
(𝑥𝑖+1 ), 𝑖 = 1,2, … , 𝑛 − 1 (4.78.)
So, in total 4𝑛 − 2 equations have been defined so far, two more are missing.
These missing equations can be written up by defining boundary conditions:
how does the spline start out from the start point, and how does it get to the
endpoint. Generally, there are 2 kinds of boundary conditions.
Clamped boundary conditions:
The first and last splines (𝑆1 and 𝑆𝑛 ) starting from (𝑥1 , 𝑦1 ) and getting to
(𝑥𝑛+1 , 𝑦𝑛+1 ) are clamped as:
𝑆1′ (𝑥1 ) = 𝑝, 𝑆𝑛′ (𝑥𝑛+1 ) = 𝑞 (4.79.)
Free boundary conditions:
This is a little bit tricky, because here the curvatures are defined as boundary
conditions:
𝑆1′′ (𝑥1 ) = 0, 𝑆𝑛′′ (𝑥𝑛+1 ) = 0 (4.80.)
Indeed, applying clamped boundary conditions gives a more accurate
approximation, because it consists more information about the spline, like in
the case of free boundary conditions. Of course, in order to use clamped
boundary conditions, we need to know more about the splines, since we
define the boundaries manually in this case.
𝑑𝑖 = 𝑦𝑖 , 𝑖 = 1,2, … , 𝑛 (4.81.)
99
Calculate the distance between the points (steps): ℎ𝑖 = 𝑥𝑖+1 (𝑖 = 1,2, … , 𝑛).
Substituting this term to equation (4.76.), keeping in mind that 𝑆𝑖+1 (𝑥𝑖+1 ) =
𝑑𝑖+1 , it can be written that:
1
𝑑𝑖+1 = (2𝑏𝑖 + 𝑏𝑖+1 )ℎ𝑖2 + 𝑐𝑖 ℎ𝑖 + 𝑑𝑖 , 𝑖 = 1,2, … , 𝑛 (4.88.)
3
and
100
𝑑𝑖+1 − 𝑑𝑖 1
𝑐𝑖 = − (2𝑏𝑖 + 𝑏𝑖+1 )ℎ𝑖 (4.90.)
ℎ𝑖 3
By changing 𝑖 to 𝑖 − 1:
𝑑𝑖 − 𝑑𝑖−1 1
𝑐𝑖−1 = − (2𝑏𝑖−1 + 𝑏𝑖 )ℎ𝑖−1 (4.91.)
ℎ𝑖−1 3
The same step is done to equation (4.89.):
𝑑2 − 𝑑1 1
𝑐1 = − (2𝑏1 + 𝑏2 )ℎ1 (4.94.)
ℎ1 3
Because earlier it was determined that 𝑐1 = 𝑆1′ (𝑥1 ) = 𝑝, the equation can be
transformed to:
3(𝑑2 − 𝑑1 )
(2𝑏1 + 𝑏2 )ℎ1 = − 3𝑝 (4.95.)
ℎ1
From equation (4.89.):
101
𝑑𝑛+1 − 𝑑𝑛 1
𝑐𝑛 = − (2𝑏𝑛 + 𝑏𝑛+1 )ℎ𝑛 (4.98.)
ℎ𝑛 3
Substituting equation (4.97.) back to the former expression we get:
3(𝑑𝑛+1 − 𝑑𝑛 )
(2𝑏𝑛+1 + 𝑏𝑛 )ℎ𝑛 = − − 3𝑞 (4.99.)
ℎ𝑛
Finally, by combining equations (4.99.), (4.95.) and (4.93.), a system of
equations is generated with 𝑛 + 1 equations and 𝑛 + 1 unknown coefficients
(𝑏𝑖 , 𝑖 = 1,2, … , 𝑛 + 1). This way 𝑏𝑖 can be expressed from the following
system:
3(𝑑2 − 𝑑1 )
(2𝑏1 + 𝑏2 )ℎ1 = − 3𝑝
ℎ1
𝑏𝑖−1 ℎ𝑖−1 + 2𝑏𝑖 (ℎ𝑖 + ℎ𝑖−1 ) + 𝑏𝑖+1 ℎ𝑖
3(𝑑𝑖+1 − 𝑑𝑖 ) 3(𝑑𝑖 − 𝑑𝑖−1 )
= − (4.100.)
ℎ𝑖 ℎ𝑖−1
where 𝑖 = 2,3, … , 𝑛
3(𝑑𝑛+1 − 𝑑𝑛 )
(2𝑏𝑛+1 + 𝑏𝑛 )ℎ𝑛 = − − 3𝑞
ℎ𝑛
A tridiagonal system (the coefficient matrix has non-zero elements only in the
main diagonal, and the first diagonal below and above the main diagonal) has
been generated with a clear solution. As soon as 𝑏𝑖 coefficients are
determined, with equation (4.90.) 𝑐𝑖 s can be calculated, finally with using
equation (4.87.), we can get 𝑎𝑖 coefficients. That is it the task is finished.
Although, this process seems complicated and long, actually for a large
number of input points gives significantly faster and more accurate result that
all the other interpolation methods.
Example
Solve the interpolation problem in Example 4.12 using cubic splines with
clamped boundary conditions, if the boundaries are:
𝑝 = −1; 𝑞 = 1
Solution:
Since we have 4 input points, due to this 𝑛 = 3, so 3 cubic polynomials have
to be determined in the following form:
𝑆𝑖 (𝑥) = 𝑎𝑖 (𝑥 − 𝑥𝑖 )3 + 𝑏𝑖 (𝑥 − 𝑥𝑖 )2 + 𝑐𝑖 (𝑥 − 𝑥𝑖 ) + 𝑑𝑖 , 𝑖 = 1,2,3
102
Using the former deduction, at first 𝑏𝑖 coefficients are determined in the
following way:
3(𝑑2 − 𝑑1 )
(2𝑏1 + 𝑏2 )ℎ1 = − 3𝑝
ℎ1
3(𝑑3 − 𝑑2 ) 3(𝑑2 − 𝑑1 )
𝑏1 ℎ1 + 2𝑏2 (ℎ2 + ℎ1 ) + 𝑏3 ℎ2 = −
ℎ2 ℎ1
3(𝑑4 − 𝑑3 ) 3(𝑑3 − 𝑑2 )
𝑏2 ℎ2 + 2𝑏3 (ℎ3 + ℎ2 ) + 𝑏4 ℎ3 = −
ℎ3 ℎ2
3(𝑑4 − 𝑑3 )
(2𝑏4 + 𝑏3 )ℎ3 = − 3𝑞
ℎ3
Because 𝑑𝑖 values are equal to the input points:
𝑑1 = 5; 𝑑2 = 2,3; 𝑑3 = 5,1; 𝑑4 = 1,5
Similarly: ℎ1 = 1; ℎ2 = 2; ℎ3 = 2,5. By substituting with 𝑝 = −1 and 𝑞 = 1,
the system of equation can be written as:
2 1 0 0 𝑏1 −5,1
1 6 2 0 𝑏2 12,3
[ ][ ] = [ ]
0 2 9 2,5 𝑏3 −8,52
0 0 2,5 5 𝑏4 7,32
which is a tridiagonal system, and by solving it
𝑏1 = −4,3551; 𝑏2 = 3,6103; 𝑏3 = −2,5033; 𝑏4 = 2,7157
Next, 𝑐𝑖 values are calculated:
𝑑2 − 𝑑1 1
𝑐1 = − (2𝑏1 − 𝑏2 )ℎ1 = −1
ℎ1 3
𝑑3 − 𝑑2 1
𝑐2 = − (2𝑏2 − 𝑏3 )ℎ2 = −1,7449
ℎ2 3
𝑑4 − 𝑑3 1
𝑐3 = − (2𝑏3 − 𝑏4 )ℎ3 = 0,4691
ℎ3 3
103
By using the calculated coefficients, the 3 cubic splines can be written as:
𝑆1 (𝑥) = 2,6551(𝑥 − 2)3 − 4,3551(𝑥 − 2)2 − (𝑥 − 2) + 5, 2 ≤ 𝑥 ≤ 3
𝑆2 (𝑥) = −1,0189(𝑥 − 3)3 + 3,6103(𝑥 − 3)2 − 1,7449(𝑥 − 3) + 2,3, 3 ≤ 𝑥 ≤ 5
𝑆3 (𝑥) = 0,6959(𝑥 − 5)3 − 2,5033(𝑥 − 5)2 + 0,4691(𝑥 − 5) + 5,1, 5 ≤ 𝑥 ≤ 7,5
Look at the input points and the quadratic and cubic interpolating splines and
their characteristics, illustrated by the next diagram. It can be clearly seen that
the cubic splines pass through the points more smoothly, with smaller peaks
and providing more accurate solution.
104
𝑏1 = 0
𝑏𝑖−1 ℎ𝑖−1 + 2𝑏𝑖 𝑖 + ℎ𝑖−1 ) + 𝑏𝑖+1 ℎ𝑖
(ℎ
3(𝑑𝑖+1 − 𝑑𝑖 ) 3(𝑑𝑖 − 𝑑𝑖−1 ) (4.101.)
= −
ℎ𝑖 ℎ𝑖−1
𝑏𝑛+1 = 0
By calculating 𝑏𝑖 values, the rest of the unknown coefficients can be
calculated in the same way, like in the case of clamped boundary conditions:
𝑑𝑖 s (𝑖 = 1,2, … , 𝑛 + 1) are equal to the input points, ℎ𝑖 s (𝑖 = 1,2, … , 𝑛) are
determined by the spacing, 𝑏𝑖 s are provided by equation (4.101.), 𝑐𝑖 s from
equation (4.90.) and 𝑎𝑖 s from equation (4.87.):
𝑏𝑖+1 − 𝑏𝑖
𝑎𝑖 = , 𝑖 = 1,2, … , 𝑛 (4.102.)
3ℎ𝑖
Example
The already known 3.11 Example input points are used, but interpolate with
cubic splines with free boundary conditions.
Solution:
Because free boundary conditions are used: 𝑏1 = 0, 𝑏4 = 0. Consequently,
equation (4.101.) is friendlier now:
𝑏1 = 0
6𝑏2 + 2𝑏3 = 12,3 → 𝑏2 = 2,5548
2𝑏2 + 9𝑏3 = −8,52 → 𝑏3 = −1,5144
𝑏4 = 0
Then calculate 𝑐𝑖 values:
𝑑2 − 𝑑1 1
𝑐1 = − (2𝑏1 + 𝑏2 )ℎ1 = −3,5516
ℎ1 3
𝑑3 − 𝑑2 1
𝑐2 = − (2𝑏2 + 𝑏3 )ℎ2 = −0,9968
ℎ2 3
𝑑4 − 𝑑3 1
𝑐3 = − (2𝑏3 + 𝑏4 )ℎ3 = 1,0840
ℎ3 3
Finally, 𝑎𝑖 s are determined:
105
𝑏2 − 𝑏1
𝑎1 = = 0,8516
3ℎ1
𝑏3 − 𝑏2
𝑎2 = = −0,6782
3ℎ2
𝑏4 − 𝑏3
𝑎3 = = 0,2019
3ℎ3
So, the 3 cubic splines are:
𝑆1 (𝑥) = 0,8516(𝑥 − 2)3 − 3,5516(𝑥 − 2) + 5; 2 ≤ 𝑥 ≤ 3
𝑆2 (𝑥) = −0,6782(𝑥 − 3)3 + 2,5548(𝑥 − 3)2 − 0,9968(𝑥 − 3) + 2,3; 3 ≤ 𝑥
≤5
𝑆3 (𝑥) = 0,2019(𝑥 − 5)3 − 1,5144(𝑥 − 5)2 + 1,0840(𝑥 − 5) + 5,1; 5 ≤ 𝑥
≤ 7,5
The difference between the clamped and free boundary conditions can be
observed in the following diagram:
106
5 Numerical Derivation
In everyday engineering problems, it is quite common that a known (or
interpolated function’s slope, or curvature should be determined. Of course,
higher derivatives could be needed, too, but generally the first 2 derivatives
are looked for. Contrary to the integration process, the derivation is relatively
easy, less rules are applied; in most cases the calculation even analytically can
be carried out for simpler functions and polynomials. In the case of more
complex functions on the other hand the conventional method is not efficient
all the time, since numerical methods can provide results in shorter time with
less computational efforts. By discretizing the function, the derivatives of a
function can be determined at certain points. It is even more a reasonable
assumption if input points are available and not a function. Of course, the
derivatives can be calculated by applying an interpolation method on the
input points, and then differentiating the interpolated function, but of course,
there are more efficient methods, too. In this subchapter, we are focusing in
this latter solution.
The point of the finite difference formulas that in a given 𝑥𝑖 point the
derivatives are approximated by using the neighbouring function values. In
order to calculate the differences, Taylor-series are used. Four examples are
introduced in this chapter, but the rest can be interpreted in the same way.
The differences between the formulas come from the utilization of different
function values, that can be originated from the left- or right-hand side of the
𝑥𝑖 point. This way, we call them forward-, backward- or central formulas.
The basis of numerical differentiation is calculus which is the study of
continuous change. Generally speaking, differentiation is concerning the rates
of change of curves while integration is concerning the summation of
quantities under curves. It has a wide range of applications.
The difference approximation can be defined by:
∆𝑦 𝑓(𝑥𝑖 + ∆𝑥) − 𝑓(𝑥𝑖 )
= (5.1.)
∆𝑥 ∆𝑥
where 𝑦 and 𝑓(𝑥) are alternative representations of the dependent variable
and 𝑥 is the independent variable. If we let ∆𝑥 approach zero, the difference
becomes a derivative:
𝑑𝑦 𝑓(𝑥𝑖 + ∆𝑥) − 𝑓(𝑥𝑖 )
= lim (5.2.)
𝑑𝑥 ∆𝑥→0 ∆𝑥
𝑑𝑦
where 𝑑𝑥 (sometimes designated as 𝑦′ or 𝑓(𝑥𝑖 )) is the first derivative of y with
respect to 𝑥 evaluated at 𝑥𝑖 .
107
5.1 2-Point Backward Difference
If the values of 𝑓(𝑥𝑖−1 ) are approximated from generating the Taylor-series in
𝑥𝑖 point, if ℎ = 𝑥𝑖 − 𝑥𝑖−1 steps are used in the following form:
1 2 ′′ 1
𝑓(𝑥𝑖−1 ) = 𝑓(𝑥𝑖 ) − ℎ𝑓 ′ (𝑥𝑖 ) + ℎ 𝑓 (𝑥𝑖 ) − ℎ3 𝑓 ′′′ (𝑥𝑖 ) + ⋯ (5.3.)
2! 3!
Keeping the linear term only in the expression we get:
1 2 ′′
𝑓(𝑥𝑖−1 ) = 𝑓(𝑥𝑖 ) − ℎ𝑓 ′ (𝑥𝑖 ) +
ℎ 𝑓 (𝜁) (5.4.)
2!
where the third tem is the remainder, where 𝑥𝑖−1 ≤ 𝜁 ≤ 𝑥𝑖 . By solving
equation (5.4.) it can be written:
𝑓(𝑥𝑖 ) − 𝑓(𝑥𝑖−1 ) 1
𝑓 ′ (𝑥𝑖 ) = + ℎ𝑓 ′′ (𝜁) (5.5.)
ℎ 2!
where the second term gives the truncation error. By neglecting this term the
approximation of the first derivative can be expressed. Of course, do not
forget that the truncation error is strongly dependent of the order of ℎ, and
can be expressed as 𝑂(ℎ), so the two-point backward difference equation can
be written as:
𝑓(𝑥𝑖 ) − 𝑓(𝑥𝑖−1 )
𝑓 ′ (𝑥𝑖 ) = + 𝑂(ℎ) (5.6.)
ℎ
Since 𝜁 is not known exactly, neither 𝑂(ℎ) is. But the smaller of ℎ becomes,
the smaller and more negligible 𝑂(ℎ) is.
1 2 ′′ 1
𝑓(𝑥𝑖+1 ) = 𝑓(𝑥𝑖 ) − ℎ𝑓 ′ (𝑥𝑖 ) + ℎ 𝑓 (𝑥𝑖 ) − ℎ3 𝑓 ′′′ (𝑥𝑖 ) + ⋯ (5.7.)
2! 3!
Keeping the linear terms only:
1 2 ′′
𝑓(𝑥𝑖+1 ) = 𝑓(𝑥𝑖 ) − ℎ𝑓 ′ (𝑥𝑖 ) +ℎ 𝑓 (𝜁) (5.8.)
2!
where the third term is still the remainder, and 𝑥𝑖 ≤ 𝜁 ≤ 𝑥𝑖+1 . By solving the
equation:
108
𝑓(𝑥𝑖+1 ) − 𝑓(𝑥𝑖 ) 1
𝑓 ′ (𝑥𝑖 ) = + ℎ𝑓 ′′ (𝜁) (5.9.)
ℎ 2!
The second term is still the truncation error. By neglecting it, similarly like in
the case of the backward formula, the 2-points forward formula can be
written as:
𝑓(𝑥𝑖+1 ) − 𝑓(𝑥𝑖 )
𝑓 ′ (𝑥𝑖 ) = + 𝑂(ℎ) (5.10.)
ℎ
1 2 ′′ 1
𝑓(𝑥𝑖−1 ) = 𝑓(𝑥𝑖 ) − ℎ𝑓 ′ (𝑥𝑖 ) + ℎ 𝑓 (𝑥𝑖 ) − ℎ3 𝑓 ′′′ (𝜁); 𝑥𝑖−1 ≤ 𝜁
2! 3! (5.11.)
≤ 𝑥𝑖
and
1 2 ′′ 1
𝑓(𝑥𝑖+1 ) = 𝑓(𝑥𝑖 ) + ℎ𝑓 ′ (𝑥𝑖 ) + ℎ 𝑓 (𝑥𝑖 ) + ℎ3 𝑓 ′′′ (𝜑); 𝑥𝑖−1 ≤ 𝜑
2! 3! (5.12.)
≤ 𝑥𝑖
By extracting the first equation from the second we get:
1 3 ′′′
𝑓(𝑥𝑖+1 ) − 𝑓(𝑥𝑖−1 ) = 2ℎ𝑓 ′ (𝑥𝑖 ) + ℎ [𝑓 (𝜁) + 𝑓 ′′′ (𝜑)] (5.13.)
3!
Finally, 𝑓′(𝑥𝑖 ) can be expressed easily:
𝑓(𝑥𝑖+1 ) − 𝑓(𝑥𝑖−1 )
𝑓 ′ (𝑥𝑖 ) = + 𝑂(ℎ2 ) (5.14.)
2ℎ
Okay, but what is this whole thing about? As it can be observed, the truncation
error depends on ℎ2 , so it will be significantly lower, than in the case of the
forward or backward difference formulas, so this is a more accurate
approximation. Let’ suppose that 𝑥1 , 𝑥2 , … , 𝑥𝑛 are given. Then, the 2-points
forward formula from already known methods cannot be used is 𝑥1 point,
since no data available in 𝑥0 . But this method can be used in all the other
points with 𝑂(ℎ) truncation error. The same thing is true for the 2-points
forward difference: it can be used in every points, except in 𝑥𝑛 , since there is
no 𝑥𝑛+1 . Similarly, the 2-points central difference cannot be used in 𝑥1 and 𝑥𝑛 ,
but anywhere else, furthermore with 𝑂(ℎ2 ) truncation error. The properties
109
of the difference formulas can be seen in Figure 20. These rules should be kept
in mind in the case of these kind of problems.
1 2 ′′ 1
𝑓(𝑥𝑖−1 ) = 𝑓(𝑥𝑖 ) − ℎ𝑓 ′ (𝑥𝑖 ) + ℎ 𝑓 (𝑥𝑖 ) − ℎ3 𝑓 ′′′ (𝜁); 𝑥𝑖−1 ≤ 𝜁
2! 3! (5.15.)
≤ 𝑥𝑖
Then, 𝑓(𝑥𝑖−2 ) is approximated from 𝑥𝑖 point:
1 1
𝑓(𝑥𝑖−2 ) = 𝑓(𝑥𝑖 ) − (2ℎ)𝑓 ′ (𝑥𝑖 ) + (2ℎ)2 𝑓 ′′ (𝑥𝑖 ) − (2ℎ)3 𝑓 ′′′ (𝜑);
2! 3! (5.16.)
𝑥𝑖−2 ≤ 𝜑 ≤ 𝑥𝑖−1
110
By extracting 4 times equation (5.15.) from equation (5.16.), the following
expression can be written:
𝑓(𝑥𝑖−2 ) − 4𝑓(𝑥𝑖−1 )
4 3 ′′′ 8 (5.17.)
= −3𝑓(𝑥𝑖 ) + ℎ𝑓 ′ (𝑥𝑖 ) + ℎ 𝑓 (𝜁) − ℎ3 𝑓 ′′′ (𝜑)
3! 3!
Then, 𝑓 ′ (𝑥𝑖 ) is expressed:
111
𝑦𝑖+2 − 2𝑦𝑖+1 + 𝑦𝑖
Forward difference 𝑦𝑖′′ =
ℎ2
𝑦𝑖+1 − 2𝑦𝑖 + 𝑦𝑖−1
Central difference 𝑦𝑖′′ =
ℎ2
𝑦𝑖 − 2𝑦𝑖−1 + 𝑦𝑖−2
Backward difference 𝑦𝑖′′ =
ℎ2
Second derivative, 4 points formula
2𝑦𝑖 − 5𝑦𝑖+1 + 4𝑦𝑖+2 − 𝑦𝑖+3
Forward difference 𝑦𝑖′′ =
ℎ2
Second derivative, 5 points formula
−𝑦𝑖+2 + 16𝑦𝑖+1 − 30𝑦𝑖 + 16𝑦𝑖−1 − 𝑦𝑖−2
Central difference 𝑦𝑖′′ =
12ℎ2
Third derivative, 4-points formula
𝑦𝑖+3 − 3𝑦𝑖+2 + 3𝑦𝑖+1 − 𝑦𝑖
Forward difference 𝑦𝑖′′′ =
ℎ3
Table 5: Numerical difference formulas
There are two functions in MATLAB that make our lives easier: the diff and
polyder.
112
If X is a vector, the result will be also a vector, but one element shorter
than X.
By entering diff(X,n), the function executes differentiation recursively
n times, calculating the nth difference.
Example [3]
Solution:
>> h=0.2;
>> x = 1.2:h:1.8;
>> y = [0.1701 0.1589 0.1448 0.1295];
% Values of f at the discrete x values
>> y_prime = diff(y)./h
y_prime =
-0.0560 -0.0705 -0.0765
>> y_sex = diff(y,2)
-0.0029 -0.0012
113
𝑓(𝑥 + ℎ) − 𝑓(𝑥 − ℎ)
𝑓′(𝑥) ≈
2ℎ
where h is the initial stepsize.
The solution has two phases:
1: computing a sequence of approximations to f’(x) using several values of h.
2: utilising Richardson’s extrapolation.
2
Consider the function 𝑓(𝑥) = 𝑒 −𝑥 and the comparison should be done
against the exact values of f’(x) at 𝑥 = 0.1,0.2, … ,1.0. Furthermore, h = 0.01
and n = 10.
Phase 1:
function der = numder(fun, x, h, n, varargin)
% Approximation der of the first order derivative, at
% the point x, of a function named by the string fun.
% Parameters h and n are user supplied values of the
% initial stepsize and the number of performed
% iterations in the Richardson extrapolation.
% For fuctions that depend on parameters their values
% must follow the parameter n.
d = [];
for i=1:n
s = (feval(fun,x+h,varargin{:})-feval(fun,x-
h,varargin{:}))/(2*h);
d = [d;s];
h = .5*h;
end
l = 4;
for j=2:n
s = zeros(n-j+1,1);
s = d(j:n) + diff(d(j-1:n))/(l - 1);
d(j:n) = s;
l = 4*l;
end
der = d(n);
Phase 2:
function testnder(h, n)
% Test file for the function numder. The initial
% stepsize is h and the number of iterations is n.
% Function to be tested is f(x) = exp(-x^2).
format long
114
disp(' x numder exact')
disp(sprintf('\n_____________________________________
_______________'))
for x=.1:.1:1
s1 = numder('exp2', x, h, n);
s2 = derexp2(x);
disp(sprintf('%1.14f %1.14f %1.14f',x,s1,s2))
end
115
6 Numerical Integration
The process of integration is a widely used mathematical tool, which can occur
in a wide-range of different engineering problems, since the are calculation
underneath a curve is the fundamental of performance computing and also of
numerous other applications (thermal state changes, determining torque
from shear forces, etc.). At the same time, it is also a common problem that
the function itself, which determines the curve, is unknown – like evaluating
an experiment data –, so approximation can be applied and the computed
curve is capable to perform analyses, or a numerical integration method can
be applied, that is capable of jumping through the approximation and using
the input data to determine the searched area. Additionally, we should not
forget about the fact that it is not easy at all to do the integration of special
functions, and also there are functions that cannot be integrated analytically.
As a remainder: the definite integral means the following problem:
116
Example: MATLAB Built-in functions
For example:
2
Consider a simple, rational function: 𝑓(𝑥) = 𝑒 −𝑥 (𝑙𝑛𝑥)2
117
6.1 Newton-Cotes Formulas
The most commonly used numerical integration formulas are the Newton-
Cotes formulas, which can be divided into two categories: opened and closed
formulas. The main difference between them is relatively easy-to-understand:
in the case of open formulas the endpoints of the interval are not used during
the computation, while the closed formulas incorporate the endpoints, too.
Open formulas are the Gauss-quadrature formula while the trapezoidal and
Simpson rules are closed formulas.
The initial consideration of the Newton-Cotes formulas is really simple: instead
of generating real functions, it replaces them with a simple polynomial (zero,
first, second or third degree polynomial) and they are to be integrated. This
process is possible in two ways: if the function itself is available, then by using
discretization input points are generated and by using interpolation a curve is
fitted that can be integrated finally; or if the points are already available, then
the first step can be neglected.
118
Figure 21: Illustration of the rectangular rule
119
𝑛
1http://www.inf.u-szeged.hu/~kgelle/sites/default/files/upload/11_numerikus_integralas_0.pdf
120
Without proving it mathematically, the composite rectangular rule has the
following error for left-hand (and right-hand) rule:
1
𝐸 = [ (𝑏 − 𝑎)𝑓 ̅′] ℎ = 𝑂(ℎ) (6.9.)
2
where 𝑓 ̅′ stands for the average error of the function, generated by using
Taylor-series, and estimated for the full interval.
The composite rectangular method has the following error for midpoint rule:
1
𝐸 = [ (𝑏 − 𝑎)𝑓 ̅′′] ℎ2 = 𝑂(ℎ2 ) (6.10.)
24
where 𝑓 ̅′′ is the estimated value of 𝑓′′ over the full [𝑎, 𝑏] interval.
21
For example if we want to appriximate ∫1 𝑥 𝑑𝑥 function when n = 5 and n = 10,
we use:
121
area10 =
0.693917602005837
𝑡 2
Similarly, approximating 𝑓(𝑡) = ∫0 𝑒 −𝜏 𝑑𝜏, when n = 10 and t = 2:
>> t=linspace(0,2,10); y=exp(-t.^2); area=trapz(t,y)
area =
0.881782381062206
Example [3]
Create the closed type Newton-Cotes formula as a MATLAB user function, and
define the different methods’ error.
Solution:
The nodes of the n – point formula are defined as: 𝑥𝑘 = 𝑎 + (𝑘 − 1)ℎ, 𝑘 =
𝑏−𝑎
1,2, … , 𝑛, where ℎ = 𝑛−1 , 𝑛 > 1. The weights of the quadrature formula are
determined from the conditions that the following equations are satisfied for
the monomials 𝑓(𝑥) = 1, 𝑥, … , 𝑥 𝑛−1 :
𝑏 𝑛
∫ 𝑓(𝑥)𝑑𝑥 = ∑ 𝑤𝑘 𝑓(𝑥𝑘 )
𝑎 𝑘=1
So, create the functions:
122
w = V\f';
w = (b-a)*w;
x = a + (b-a)*x;
x = x';
s = feval(fun,x,varargin{:});
s = w'*s;
function w = exp2(x)
% The weight function w of the Gauss-Hermite
% quadrarure formula.
w = exp(-x.^2);
Then:
>> approx_v = [];
>> for n =2:4
approx_v = [approx_v; (2/sqrt(pi))*cNCqf('exp2', 0, 1, n)];
end
>> approx_v
approx_v =
0.771743332258054
0.843102830042981
0.842890571431721
>> exact_v = erf(1)
exact_v =
0.842700792949715
123
𝐿𝑛 − 𝑅𝑛
𝑇𝑛 = (6.11.)
2
From which the geometrical interpretation:
𝑓(𝑏) − 𝑓(𝑎)
𝐴 = 𝑓(𝑎) + (𝑥 − 𝑎) (6.12.)
𝑏−𝑎
This could lead to the same problem as in the earlier case: looking for a global
solution over one big interval ([𝑎, 𝑏]) leads to a relatively big error (see Figure
23 left-hand side). The solution is also similar, like in the previous subchapter:
the same rule should be applied over more, narrower subintervals, so we can
write the rule for these individual pieces (Figure 23 right-hand side):
𝑓(𝑥𝑖−1 ) + 𝑓(𝑥𝑖 )
𝐴𝑖 = ∆𝑥𝑖 (6.13.)
2
This way, the expression of the composite trapezoidal rule can be generated:
𝑛
𝑓(𝑥𝑖−1 ) + 𝑓(𝑥𝑖 )
𝑇𝑛 = ∑ ∆𝑥𝑖 (6.14.)
2
𝑖=1
If the subdivision of the interval is done equidistantly, keeping in mind that
𝑥𝑖+1 − 𝑥𝑖 = ℎ (𝑖 = 1,2, … , 𝑛), then the formula is:
𝑛
ℎ
𝑇𝑛 = ∑[𝑓(𝑥𝑖 ) + 𝑓(𝑥𝑖+1 )] =
2 (6.15.)
𝑖=1
ℎ
= [𝑓(𝑎) + 2𝑓(𝑥2 ) + 2𝑓(𝑥3 ) + ⋯ + 2𝑓(𝑥𝑛 ) + 𝑓(𝑏)]
2
Figure 23: The simple (left) and composite (right) trapezoidal rules
124
where 𝑓 ̅′′ is the estimated value of 𝑓′′ over the interval of [a,b]. Therefore, the
error 𝑂(ℎ2 ) is compatible with the midpoint method and superior to the
rectangular rule using the endpoints whose error is 𝑂(ℎ).
Example
Create a function, which uses the composite trapezoidal rule to estimate the
value of a definite integral.
Solution:
function I = TrapComp(f,a,b,n)
%
% TrapComp estimates the value of the integral of
% f(x) from a to b by using the composite trapezoidal
% rule applied to n equal-length subintervals.
% I = TrapComp(f,a,b,n) where f is an inline function
% representing the integrand, a and b are the limits
% of integration, n is the number of equal-length
% subintervals in [a,b], I is the integral estimate.
%
h = (b-a)/n; I = 0;
x = a:h:b;
for i = 2:n,
I = I + 2*f(x(i));
end
I = I + f(a) + f(b);
I = I*h/2;
125
be higher as a consequence. In this chapter, the 1/3 and 3/8 Simpson methods
are introduced, which apply quadratic and cubic polynomials fitted on the
input points. Of course, the goal is still the same: to develop a fast, easy-to-
use method for numerical integration.
Before constructing the Simpson rules, it is important to clarify: from now on
the used formulas are quadrature formulas. The definite integral of an 𝑓
function over [𝑎, 𝑏] interval can be approximated with a quadrature
formula, if:
- the input points (base points) are known, or can be calculated
(𝑥1 , 𝑥2 , … , 𝑥𝑛 ), for which 𝑎 = 𝑥1 < 𝑥2 < ⋯ < 𝑥𝑛 = 𝑏,
- a 𝑤𝑖 weight is defined for each 𝑥𝑖 ,
- in this case the quadrature formula is:
𝑛
where 𝐿𝑖 (𝑥) is the 𝑖th degree Lagrange base polynomial for 𝑥. The
integral value is:
𝑏 𝑛 𝑛 𝑏
126
That is, the interpolating quadrature formula is not else, but a
𝑏
quadrature formula, with the weight of𝑤𝑖 = ∫𝑎 𝐿𝑖 (𝑥)𝑑𝑥.
Newton-Cotes formulas
Since the quality and distribution of the basepoints are essential in the
case of interpolation, the same thing is true for the numerical
integration. In the case of Lagrange interpolation the weights can be
generated clearly.
Just for the note, this chapter could be much more detailed, since the
theoretical fundamentals are really complex, but this book focuses
rather on the engineering applications.
127
Figure 24: The simple (left) and composite (right) 1/3 Simpson rules
The simplest method to fit a parabolic curve on three points is to use the
second degree Lagrange interpolating polynomial that can be constructed as
follows:
(𝑥 − 𝑥2 )(𝑥 − 𝑥3 ) (𝑥 − 𝑥1 )(𝑥 − 𝑥3 )
𝑝2 (𝑥) = 𝑓(𝑥1 ) + 𝑓(𝑥2 )
(𝑥1 − 𝑥3 )(𝑥1 − 𝑥3 ) (𝑥2 − 𝑥1 )(𝑥2 − 𝑥3 )
+ (6.20.)
(𝑥 − 𝑥1 )(𝑥 − 𝑥2 )
+ 𝑓(𝑥3 )
(𝑥3 − 𝑥1 )(𝑥3 − 𝑥2 )
𝑎+𝑏
By substituting 𝑥1 = 𝑎, 𝑥2 = 2 , 𝑥3 = 𝑏 values into 𝑝2 (𝑥) polynomial and
completing the integral from 𝑎 to 𝑏, then we get the following expression,
which is called the simple 1/3 Simpson rule:
𝑏
ℎ
∫ 𝑓(𝑥)𝑑𝑥 ≅ [𝑓(𝑥1 ) + 4𝑓(𝑥2 ) + 𝑓(𝑥3 )] (6.22.)
3
𝑎
𝑏−𝑎
where ℎ = 2 . The name of the method comes from the 1/3 constant, of
course.
In the case of the composite 1/3 Simpson rule the [𝑎, 𝑏] is divided to 𝑛
subintervals, for which 𝑛 + 1 (!) points are used, that are 𝑎 =
𝑥1 , 𝑥2 , … , 𝑥𝑛 , 𝑥𝑛+1 = 𝑏. Although, it could happen that the width of the
128
(𝑏−𝑎)
subintervals are not the same, the equidistant division (ℎ = 𝑛 ) makes the
process much easier. Since three points are required to construct a parabolic
curve, it is essential to know that the 1/3 Simpson rule can be applied only if
the number of the subintervals are even. This way the composite 1/3 Simpson
rule can be written in the following way:
𝑏
ℎ ℎ
∫ 𝑓(𝑥)𝑑𝑥 ≅ [𝑓(𝑥1 ) + 4𝑓(𝑥2 ) + 𝑓(𝑥3 )] + [𝑓(𝑥3 ) + 4𝑓(𝑥4 ) + 𝑓(𝑥5 )]
3 3
𝑎 (6.23.)
ℎ
+ ⋯ + [𝑓(𝑥𝑛−1 ) + 4𝑓(𝑥𝑛 ) + 𝑓(𝑥𝑛+1 )]
3
As it can be seen, during the sum, the function values with even indexes
(𝑥2 , 𝑥4 , … , 𝑥𝑛 ) are added four times, while the function values with odd
indexes (𝑥3 , 𝑥5 , … , 𝑥𝑛−1 ) are added only two times, while the first and last
function values are added only once. So, the 1/3 Simpson rule can be
expressed in a more compact form:
𝑏 𝑛 𝑛−1
ℎ
∫ 𝑓(𝑥)𝑑𝑥 ≅ {𝑓(𝑥1 ) + 4 ∑ 𝑓(𝑥𝑖 ) + 2 ∑ 𝑓(𝑥𝑗 ) + 𝑓(𝑥𝑛+1 )} (6.24.)
3
𝑎 𝑖=2,4,6… 𝑗=3,5,7…
Finally, let’s figure out what is the main reason, the 1/3 Simpson rule has
become so widespread! To answer this question, we have to determine the
accuracy of this method, so let’s see the error function:
1
𝐸 = [− (𝑏 − 𝑎)𝑓 ̅ (4) ] ℎ4 = 𝑂(ℎ4 ) (6.25.)
180
where 𝑓 ̅ (4) is the estimated value of 𝑓 (4) over the full interval of [𝑎, 𝑏].
Meaning, the 1/3 Simpson rule has got 2 times higher order accuracy (𝑂(ℎ4 ))
compared to the composite trapezoidal rule’s (𝑂(ℎ2 )), which makes it really
favourable.
129
Figure 25: The 3/8 Simpson Rule
130
3/8 Simpson method applies third-degree polynomials, 4 points are required
to construct them, so during the division of [𝑎, 𝑏] interval, 3-divisible
subintervals have to be generated. After the division, the sum of each
subinterval’s area can be summarised as:
𝑏
3ℎ
∫ 𝑓(𝑥)𝑑𝑥 ≅ [𝑓(𝑥1 ) + 3𝑓(𝑥2 ) + 3𝑓(𝑥3 ) + 𝑓(𝑥4 )]
8
𝑎
3ℎ (6.29.)
+ [𝑓(𝑥4 ) + 3𝑓(𝑥5 ) + 3𝑓(𝑥6 ) + 𝑓(𝑥7 )] + ⋯
8
3ℎ
+ [𝑓(𝑥𝑛−2 ) + 3𝑓(𝑥𝑛−1 ) + 3𝑓(𝑥𝑛 ) + 𝑓(𝑥𝑛+1 )]
8
If the terms are merged in the previous expression, it can be seen that the
terms with 2. , 5. , 8. , … and 3. , 6. , 9. , … indexes are summarised and multiplied
with 3, while terms with 4. , 7. , 10. , … indexes are summarised and multiplied
with 2. So, the expression in a more compact form can be written as follows:
𝑏 𝑛−1 𝑛−2
3ℎ
∫ 𝑓(𝑥)𝑑𝑥 ≅ {𝑓(𝑥1 ) + 3 ∑ [𝑓(𝑥𝑖 ) + 𝑓(𝑥𝑖+1 )] + 2 ∑ 𝑓(𝑥𝑗 )
8
𝑎 𝑖=2,5,8 𝑗=4,7,10
(6.30.)
+ 𝑓(𝑥𝑛+1 )}
This is the general form of 3/8 Simpson method. From which the error of the
method can be determined as:
1 ̅̅̅̅̅
(4) ] ℎ4 = 𝑂(ℎ4 )
𝐸 = [− (𝑏 − 𝑎)𝑓 (6.31.)
80
where ̅̅̅̅̅
𝑓 (4) is the estimated value of 𝑓 (4) over the full [𝑎, 𝑏] interval. So,
compared to the trapezoidal method, which has (𝑂(ℎ2 )) accuracy, the 3/8
Simpson method gives (𝑂(ℎ4 )) accuracy, which is actually comparable with
the 1/3 Simpson method’s accuracy.
Example [3]
Create the closed type Newton-Cotes formula as a MATLAB user function, and
define the different methods’ error.
Solution:
The nodes of the n – point formula are defined as: 𝑥𝑘 = 𝑎 + (𝑘 − 1)ℎ, 𝑘 =
𝑏−𝑎
1,2, … , 𝑛, where ℎ = 𝑛−1 , 𝑛 > 1. The weights of the quadrature formula are
determined from the conditions that the following equations are satisfied for
the monomials 𝑓(𝑥) = 1, 𝑥, … , 𝑥 𝑛−1 :
131
𝑏 𝑛
∫ 𝑓(𝑥)𝑑𝑥 = ∑ 𝑤𝑘 𝑓(𝑥𝑘 )
𝑎 𝑘=1
So, create the functions:
function [s, w, x] = cNCqf(fun, a, b, n, varargin)
% Numerical approximation s of the definite integral
% of f(x). fun is a string containing the name of the
% integrand f(x).
% Integration is over the interval [a, b].
% Method used:
% n-point closed Newton-Cotes quadrature formula.
% The weights and the nodes of the quadrature formula
% are stored in vectors w and x, respectively.
if n < 2
error(' Number of nodes must be greater than 1')
end
x = (0:n-1)/(n-1);
f = 1./(1:n);
V = vander(x);
V = rot90(V);
w = V\f';
w = (b-a)*w;
x = a + (b-a)*x;
x = x';
s = feval(fun,x,varargin{:});
s = w'*s;
132
Then:
>> approx_v = [];
>> for n =2:4
approx_v = [approx_v; (2/sqrt(pi))*cNCqf('exp2', 0, 1, n)];
end
>> approx_v
approx_v =
0.771743332258054
0.843102830042981
0.842890571431721
>> exact_v = erf(1)
exact_v =
0.842700792949715
Example [3]
Evaluate the following function with 1/3 Simpson rule, if n = 8:
1
1
∫ 𝑑𝑥
𝑥+2
−1
Solution:
function I = Simpson(f,a,b,n)
%
% Simpson estimates the value of the integral of f(x)
% from a to b by using the composite Simpson’s 1/3
% rule applied to n equal-length subintervals.
%
% I = Simpson(f,a,b,n) where
%
% f is an inline function representing the integrand,
% a, b are the limits of integration,
% n is the (even) number of subintervals,
%
% I is the integral estimate.
h = (b-a)/n;
x = a:h:b;
I = 0;
for i = 1:2:n,
133
I = I + f(x(i)) + 4*f(x(i+1)) + f(x(i+2));
end
Now create the analysed function, and use the defined function:
>> f = inline('1/(x+2)');
>> I = Simpson(f, -1,1,8)
I=
13.184704184704188
134
ℎ 2
( 1 ) 𝐼ℎ2 − 𝐼ℎ1
ℎ
𝐼≅ 2 (6.34.)
ℎ1 2
( ) −1
ℎ2
This approximation has an error of 𝑂(ℎ4 ). By using the composition of ℎ1 = ℎ
1
and ℎ2 = ℎ, and also knowing that the error of the method is 𝑂(ℎ4 ) we can
2
write:
4 1
𝐼 = 𝐼ℎ/2 − 𝐼ℎ + 𝑂(ℎ4 ) (6.35.)
3 3
The same deduction can be carried out for methods with different error
orders. In general form, the extrapolation can be written as:
4𝑗−1 𝐼𝑖+1,𝑗−1 − 𝐼𝑖,𝑗−1
𝐼𝑖,𝑗 = (6.36.)
4𝑗−1 − 1
Using this formula, a scheme can be created, where the 𝐼1,1 , 𝐼2,1 , … , 𝐼𝑚,1
entries are written in the first column and estimate the composite trapezoidal
rule, that has 𝑛, 2𝑛, … , 2𝑚−1 subintervals. In the next column, the
𝐼1,2 , 𝐼2,2 , … , 𝐼𝑚,2 entries are located, which combine every two successive
entries of the first column, and gives a more accurate approximation. The
method is continued until the last column, where there is only one entry: 𝐼1,𝑚 .
This is the so-called Romberg iteration scheme.
135
Example [3]
Solution:
The defined function is:
function I = Romberg(f,a,b,n,n_levels)
%
% Romberg uses the Romberg integration scheme to find
% integral estimates at different levels of accuracy.
%
% I = Romberg(f,a,b,n,n_levels) where
%
% f is an inline function representing the integrand,
% a and b are the limits of integration,
% n is the initial number of equal-length
% subintervals in [a,b],
% n_levels is the number of accuracy levels,
%
% I is the matrix of integral estimates.
%
I = zeros(n_levels,n_levels); % Pre-allocate
% Calculate the first-column entries by using the
% composite trapezoidal rule, where the number of
% subintervals is doubled going from one element
% to the next.
for i = 1:n_levels,
n_intervals = 2^(i-1)*n;
I(i,1) = TrapComp(f,a,b,n_intervals);
end
% Starting with the second level, use Romberg scheme
to
% generate the remaining entries of the table.
for j = 2:n_levels,
for i = 1:n_levels - j+1,
I(i,j) = (4^(j-1)*I(i+1,j-1)-I(i,j-1))/(4^(j-1)-1);
end
end
For testing use a simple function: f = (x2 + 3x)2 over the interval of [0,1], with n
= 2 and 3 levels of accuracy:
136
>> format long
>> f = inline('(x^2+3*x)^2');
>> I = Romberg(f,0,1,2,3)
I =
5.531250000000 4.700520833333 4.700000000000
4.908203125000 4.700032552083 0
4.752075195313 0 0
where 𝑝(𝑥) is the weight function. The most common weight functions are
the following:
weight p(x) interval [a,b] Quadrature name
1 [-1,1] Gauss-Legendre
1/√1 − 𝑥 2 [-1,1] Gauss-Chebyshev
𝑒 −𝑥 [0,] Gauss-Laguerre
−𝑥 2 [-,] Gauss-Hermite
𝑒
Table 6: The most commonly used Gaussian Quadrature weight functions
The weights of the Gauss formulas are all positive and the nodes are the roots
of the class of polynomials that are orthogonal, with respect to the given
weight function p(x), on the associated interval.
For example, the Gauss-Legendre polynomial is written as:
1 𝑛
137
1 𝑛
𝑥 − 𝑥𝑗
𝑤𝑘 = ∫ ∏ 𝑑𝑥 (6.39.)
𝑥𝑘 − 𝑥𝑗
−1 𝑗=1
𝑗≠𝑘
The 𝑥1 , 𝑥2 , … , 𝑥𝑛 Gauss nodes are the zeros of the 𝑛th degree Legendre
polynomial.
Example [3]
Solution:
function [s, w, x] = Gquad1(fun, a, b, n, type,
varargin)
% Numerical integration using either the Gauss-
% Legendre (type = 'L') or the Gauss-Chebyshev (type
% = 'C') quadrature with n (n > 0) nodes.
% fun is a string representing the name of the
% function that is integrated from a to b. For the
% Gauss - Chebyshev quadrature it is assumed that
% a = -1 and b = 1.
% The output parameters s, w, and x hold the computed
% approximation of the integral, list of weights, and
% the list of nodes, respectively.
d = zeros(1,n-1);
if type == 'L'
k = 1:n-1;
d = k./(2*k - 1).*sqrt((2*k - 1)./(2*k + 1));
fc = 2;
J = diag(d,-1) + diag(d,1);
[u,v] = eig(J);
[x,j] = sort(diag(v));
w = (fc*u(1,:).^2)';
w = w(j)';
w = 0.5*(b - a)*w;
x = 0.5*((b - a)*x + a + b);
else
x = cos((2*(1:n) - (2*n + 1))*pi/(2*n))';
w(1:n) = pi/n;
end
f = feval(fun,x,varargin{:});
138
s = w*f(:);
w = w';
139
The problem is caused by the integrand which is singular at the lower limit. To
overcome on this problem, the Newton-Cotes formula (like composite
midpoint rule) can be used to avoid the utilization of the endpoints.
Example
Solution:
In MATLAB, by using h = 0.0125 to estimate the value, the solution is the
following:
∬ 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦
𝐷
where 𝐷 = {(𝑥, 𝑦): 𝑎 ≤ 𝑥 ≤ 𝑏, 𝑐 ≤ 𝑦 ≤ 𝑑} is the domain is the integration, the
integral2 function can be used in MATLAB.
140
q = integral2(fun,xmin,xmax,ymin,ymax) approximates the integral of the
function z = fun(x,y) over the planar region xmin ≤ x ≤ xmax and
ymin(x) ≤ y ≤ ymax(x).
q = integral2(fun,xmin,xmax,ymin,ymax,Name,Value) specifies additional
options with one or more Name,Value pair arguments.
Example
Solution:
function z = esin(x,y);
z = exp(-x*y).*sin(x*y);
Example [3]
Solution:
The MATLAB function for the integration is the following:
function [s, w, x] = Gquad2(fun, n, type, varargin)
% Numerical integration using either the Gauss-
% Laguerre(type = 'L') or the Gauss-Hermite (type =
% 'H') with n (n > 0) nodes.
% fun is a string containing the name of the function
% that is integrated.
% The output parameters s, w, and x hold the computed
% approximation of the integral, list of weights, and
% the list of nodes, respectively.
if type == 'L'
d = -(1:n-1);
141
f = 1:2:2*n-1;
fc = 1;
else
d = sqrt(.5*(1:n-1));
f = zeros(1,n);
fc = sqrt(pi);
end
J = diag(d,-1) + diag (f) + diag(d,1);
[u,v] = eig(J);
[x,j] = sort(diag(v));
w = (fc*u(1,:).^2)';
w = w(j);
f = feval(fun,x,varargin{:});
s = w'*f(:);
function y = mygamma(t)
% Value(s) y of the Euler's gamma function evaluated
% at t (t > -1).
td = t - fix(t);
if td == 0
n = ceil(t/2);
else
n = ceil(abs(t)) + 10;
end
y = Gquad2('pow',n,'L',t-1);
A simple power function:
function z = pow(x, e)
% Power function z = x^e
z = x.^e;
142
disp(' t mygamma gamma')
disp(sprintf('\n_____________________________________
________________'))
for t=1:.1:2
s1 = mygamma(t);
s2 = gamma(t);
disp(sprintf('%1.14f %1.14f %1.14f',t,s1,s2))
end
143
7 Numerical Solution of Differential Equations – Initial
Value Problems
When solving ordinary differential equations (ODE) numerically, the most
common is to take a bottom-up approach. This would involve evaluating the
highest order of differential, and step by step working back to the final level
of equations for each value of the independent variable (most commonly
time). Once one state of the system is solved by this approach, all the other
states can be calculated by varying the independent variable and
implementing a time-marching solution. Note that it is referred to as time
marching, although the independent variable might not be actually defined as
time.
Due to the nature of the differentiation process, and its reverse the
integration, these equations have infinite number of solutions, as the
integration constant can be taken as an arbitrary value, and it would still
satisfy the differential equations. To ensure, that a specific solution is arrived
at, an nth order differential equation requires additional n boundary conditions
to allow the definition of these integration constants. By far the most
common approach is when all these boundary conditions are provided at the
beginning of a system and can be referred to as starting conditions. Defining
the constants in either the first or last state of the system would be referred
to as a boundary-value problem (BVP) Note however, that the integration
constants can be defined in any state of the system, and they need not be
defined in the same state. If they are defined in the same state, we define the
problem as an initial-value problem (IVP), otherwise they are just general
boundary conditions.
Generally, an nth order ordinary differential equation can be described as the
following:
𝑦 (𝑛) = 𝑓(𝑥, 𝑦, 𝑦 ′ , 𝑦 ′′ , … , 𝑦 (𝑛−1) ) (7.1.)
with corresponding n number of boundary conditions (in this case for
simplicity defined as starting conditions):
𝑦 (𝑖) (𝑥0 ) = 𝑦𝑖 ; 𝑖 = 0 … 𝑛 (7.2.)
and the independent variable x of the system is bound by both sides as:
𝑥0 ≤ 𝑥 ≤ 𝑥𝑚 (7.3.)
For many methods presented, it is going to be assumed, that there is a
uniform spacing between the independent value points, such as 𝑥𝑗 = 𝑥𝑗−1 +
∆𝑥 for all 𝑚 ≥ 𝑗 > 0. When numerically solving the equation, the objective is
144
to provide an acceptable estimate at all points of the independent variable to
the function value y an its derivatives.
There are many available numerical applications which offer powerful solver
algorithms. The sequential time marching repeat nature of the solution
process lends itself to visually straightforward implementation in spreadsheet
programs, such as Microsoft Excel. Packages such as MATLAB offer various
forms of ODE solvers, catering for many levels of precision and performance
requirements and for various system behaviour. There are various other
packages available in many programming languages, some are freely
available, such as ODEPACK in Fortran, Boost in C++, various packages in Java,
SciPy and similar.
145
Example [3]
Then, in order to test it, let’s solve the following Initial-Value Problem:
𝑦 ′ + 𝑦 = 2𝑥, 𝑦(0) = 1, 0 ≤ 𝑦 ≤ 1
So, we create the following script: Euler_01.m
% Testing of EulerODE function by solving
% y' + y = 2x, y(0) = 1, 0 <= x <= 1.
% The exact solution is y_exact(x)=2x+3*exp(-x)-2.
disp(' x yEuler yExact')
h = 0.1; x = 0:h:1; y0 = 1;
f = inline('-y+2*x','x','y');
yEuler = EulerODE(f,x,y0);
yExact = inline('2*x+3*exp(-x)-2');
for k = 1:length(x)
x_coord = x(k);
yE = yEuler(k);
yEx = yExact(x(k));
fprintf('%6.2f %11.6f %11.6f\n',x_coord,yE,yEx)
end
146
>> Euler_01
x yEuler yExact
0.00 1.000000 1.000000
0.10 0.900000 0.914512
0.20 0.830000 0.856192
0.30 0.787000 0.822455
0.40 0.768300 0.810960
0.50 0.771470 0.819592
0.60 0.794323 0.846435
0.70 0.834891 0.889756
0.80 0.891402 0.947987
0.90 0.962261 1.019709
1.00 1.046035 1.103638
It can be seen, that the biggest relative error is approx. 6.2% at x = 0.7. By
reducing the step size (h) this error also can be reduced: for example at h =
0.05 the largest relative error is 3% at x = 0.65.
The error calculation approach to find the percent relative errors at all xi can
be carried out by creating the next script: Euler_error
By executing it:
147
>> Euler_error
x yEuler yExact e_local e_global
0.00 1.000000 1.000000 0.00 0.00
0.10 0.900000 0.914512 1.59 1.59
0.20 0.830000 0.856192 1.53 3.06
0.30 0.787000 0.822455 1.44 4.31
0.40 0.768300 0.810960 1.33 5.26
0.50 0.771470 0.819592 1.19 5.87
0.60 0.794323 0.846435 1.04 6.16
0.70 0.834891 0.889756 0.90 6.17
0.80 0.891402 0.947987 0.76 5.97
0.90 0.962261 1.019709 0.64 5.63
1.00 1.046035 1.103638 0.53 5.22
148
The Second Order Taylor Method keeps the first two orders from the Taylor’s
series expansion, so the linear and the quadratic components. As such, the
estimation of the next system state can be written as:
1
𝑦𝑖+1 = 𝑦𝑖 + ∆𝑥 𝑦𝑖 ′ + ∆𝑥 2 𝑦𝑖 ′′ (7.6.)
2!
Where
𝑦𝑖 ′ = 𝑓(𝑥𝑖 , 𝑦𝑖 ), 𝑦𝑖 ′′ = 𝑓 ′ (𝑥𝑖 , 𝑦𝑖 ) (7.7.)
It can be seen, that the method only differs from the Euler’s method (First
Order Taylor Method) by the additional second order term. Following the
analogy in the previous chapter, the second order term allows us to
approximate a curved line with a parabola (2nd order curve) rather than a
straight line (1st order curve). Again, for low curvatures, and small steps, the
parabola can be a good approximation, and approximates the solution well.
In the case of high and changing curvatures, the method either needs small
timesteps, or has to include the higher order terms for better fit.
For the sake of completeness, a general kth order Taylor Method would be of
the following form:
1 1
𝑦𝑖+1 = 𝑦𝑖 + ∆𝑥 𝑦𝑖 ′ + ∆𝑥 2 𝑦𝑖 ′′ + ⋯ + ∆𝑥 𝑘 𝑦𝑖 (𝑘) (7.8.)
2! 𝑘!
Where
𝑦𝑖 ′ = 𝑓(𝑥𝑖 , 𝑦𝑖 ), 𝑦𝑖 ′′ = 𝑓 ′ (𝑥𝑖 , 𝑦𝑖 ) … 𝑦𝑖 (𝑘) = 𝑓 (𝑘−1) (𝑥𝑖 , 𝑦𝑖 ) (7.9.)
The following section will demonstrate the use of the Second Order Taylor
Method in MATLAB on an example problem.
Example [3]
Continuing the example from the previous sub-chapter, the following script
calculates and represents the relative error of the Second Order Taylor
Method and compares the values to the Euler Method solution:
Euler_Taylor.m
disp(' x yEuler yTaylor2 e_Euler
e_Taylor2')
h = 0.1; x = 0:h:1; y0 = 1;
f = inline('-y+2*x','x','y'); fp = inline('y-
2*x+2','x','y');
yEuler = EulerODE(f,x,y0); yExact =
inline('2*x+3*exp(-x)-2');
149
yTaylor2 = 0*x; yTaylor2(1) = y0;
for n = 1:length(x)-1
yTaylor2(n+1) = yTaylor2(n)+h*(f(x(n),
yTaylor2(n))+(1/2)*h*fp(x(n),yTaylor2(n)));
end
for k = 1:length(x)
x_coord = x(k);
yE = yEuler(k);
yEx = yExact(x(k));
yT = yTaylor2(k);
e_Euler = (yEx-yE)/yEx*100;
e_Taylor2 = (yEx-yT)/yEx*100;
fprintf('%6.2f %11.6f %11.6f %6.2f
%6.2f\n',x_coord,yE,yT,e_Euler,e_Taylor2)
end
150
𝑦𝑖+1 = 𝑦𝑖 + ∆𝑥𝜑𝑖 (7.10.)
where as before
𝑦 ′ = 𝑓(𝑥, 𝑦), 𝑦(𝑥0 ) = 𝑦0 (7.11.)
but the extrapolation term used to approximate the next state of system, 𝜑𝑖
is not the derivative 𝑓𝑖 rather some approximated slope function. The main
difference between the methods shown in this chapter is how this function is
defined.
151
choice do we differentiate the various RK2 methods. Neglecting the
derivation, the coefficients are expressed in the following form:
1 1
𝑎1 + 𝑎2 = 1, 𝑎2 𝑏1 = , 𝑎2 𝑐11 = (7.16.)
2 2
In the following sections, the most common RK2 methods, and the constant
values used are described.
152
7.1.3.3 Ralston’s Method
2 1 3 3
Ralston’s Method assumes 𝑎2 = 3 from which: 𝑎1 = 3 , 𝑏1 = 4 and 𝑐11 = 4
which gives the following specific implementation:
1
𝑦𝑖+1 = 𝑦𝑖 + ∆𝑥(𝑘1 + 2𝑘2 ) (7.25.)
3
And
𝑘1 = 𝑓(𝑥𝑖 , 𝑦𝑖 ) (7.26.)
3 3 (7.27.)
𝑘2 = 𝑓 (𝑥𝑖 + ∆𝑥, 𝑦𝑖 + 𝑘1 ∆𝑥)
4 4
Table 7 provides a summary of the methods, and the constants used.
Method 𝑎1 𝑎2 𝑏1 𝑐11
Improved Euler’s 0 1 1/2 1/2
Heun’s 1/2 1/2 1 1
Ralston’s 1/3 2/3 3/4 ¾
Table 7: Second Order Runge-Kutta coefficients
153
singular, known as differential-algebraic equations (DAEs). Specify the mass
matrix using the Mass option of odeset.
Example [3]
Find the numerical solution of y at t =0, 0.25, 0.5, 0.75, 1 to the following
problem:
𝑦 ′ = −2𝑡𝑦 2
with the initial condition of y(0) = 1. Use both ode23 and ode45 solvers.
Solution:
The exact solution of this problem is:
1
𝑦(𝑡) =
1 + 𝑡2
In MATLAB, at first let’s create the equation:
function dy = eq1(t,y)
% The m-file for the ODE y' = -2ty^2.
dy = -2*t.*y(1).^2;
Example [3]
Consider the following Initial-Value Problem and solve it with Heun’s Method:
𝑦 ′ − 𝑥 2 𝑦 = 2𝑥 2 , 𝑦(0) = 1, 0 ≤ 𝑦 ≤ 1, ℎ = 0.1
We create a user defined function, which uses the Heun’s method to solve the
initial value problem:
154
Solution:
function y = HeunODE(f,x,y0)
%
% HeunODE uses Heun's method to solve a first-order
% ODE given in the form y' = f(x,y) subject to
% initial condition y0.
%
% y = HeunODE(f,x,y0) where
% f is an inline function representing f(x,y),
% x is a vector representing the mesh points,
% y0 is a scalar representing the initial value of y,
% y is the vector of solution estimates at the mesh
% points.
y = 0*x; % Pre-allocate
y(1) = y0; h = x(2)- x(1);
for n = 1:length(x)-1,
k1 = f(x(n),y(n));
k2 = f(x(n)+h,y(n)+h*k1);
y(n+1) = y(n)+h*(k1+k2)/2;
end
Let’s solve the same problem, like earlier, thus time using Heun’s Method:
Heun_01.m
% Testing of HeunODE function by solving
% y' + y = 2x, y(0) = 1, 0 <= x <= 1.
% The exact solution is y_exact(x) = 2x + 3*exp(-x) -
2.
disp(' x yHeun yExact')
h = 0.1; x = 0:h:1; y0 = 1;
f = inline('-y+2*x','x','y');
yHeun = HeunODE(f,x,y0);
yExact = inline('2*x+3*exp(-x)-2');
for k = 1:length(x)
x_coord = x(k);
yH = yHeun(k);
yEx = yExact(x(k));
fprintf('%6.2f %11.6f %11.6f\n',x_coord,yH,yEx)
end
155
0.00 1.000000 1.000000
0.10 0.915000 0.914512
0.20 0.857075 0.856192
0.30 0.823653 0.822455
0.40 0.812406 0.810960
0.50 0.821227 0.819592
0.60 0.848211 0.846435
0.70 0.891631 0.889756
0.80 0.949926 0.947987
0.90 1.021683 1.019709
1.00 1.105623 1.103638
It can be seen, that while our simple implementation of Heun’s Method gives
good approximation of the result, it overestimates the exact solution in this
example by 0.21%. Depending on the application, this relatively small error can
be ignored, or different methods used. The ODE23 for example solves the IVP
with only 0.19% max error, and the ODE45 matches the exact solution within
the requested precision. The trade-off is always between the precision, and
the computing resources allocated to the problem.
156
8 Partial Differential Equations
The problem of solving a partial differential equation (or more) occurs in many
fields of engineering applications, like thermodynamics, fluid mechanics,
applied mechanics (like elasticity), or electromagnetic theory. The most
commonly known PDE problems are the Laplace’s equation, wave equation
or heat equation.
The main challenge with partial differential equations (PDE) that their
analytical solution requires advanced mathematical methods, and in some
cases, it is not even possible to find a solution in a closed-form, like for
ordinary differential equations. Mostly, this is the reason that generally it is
easier to approximate the solutions by using a simple and efficient numerical
method. Although, there are numerous methods available for solving PDEs,
but mostly the finite-difference methods have become widespread-used.
157
The classification of PDEs are based on the sign of the discriminant:
∆𝑃𝐷𝐸 = 𝐵2 − 4𝐴𝐶 (8.4.)
There are three classes:
- if ∆𝑃𝐷𝐸 < 0, it is an elliptic PDE (i.e.: Laplace’s equation),
- if ∆𝑃𝐷𝐸 = 0, it is a parabolic PDE (i.e.: Poisson’s equation),
- if ∆𝑃𝐷𝐸 > 0, it is a hyperbolic PDE (i.e.: 1D wave equation).
158
Figure 27: Rectangular region grid with a 5-point molecule [3]
159
These kinds of linear equations are usually solved with indirect methods, like
Gauss-Seidel iterative method.
Example [3]
Solution:
The exact solution is:
1 𝜋𝑥 𝜋(1 − 𝑦)
𝑢(𝑥, 𝑦) = 𝜋 sin 2 sinh
sinh (2 ) 2
function U = DirichletPDE(x,y,f,uleft,uright,ubottom,utop)
% DirichletPDE numerically solves an elliptic PDE with
% Dirichlet boundary conditions over a rectangular
% region.
%
% U = DirichletPDE(x,y,f,uleft,uright,ubottom,utop) where
%
% x is the 1-by-m vector of mesh points in the x
% direction,
% y is the n-by-1 vector of mesh points in the y
% direction,
% f is the inline function defining the forcing
% function which is in terms of x and y, namely,
% f(x,y),
% ubottom(x),utop(x),uright(y),uleft(y) are the
% functions defining the boundary conditions,
%
% U is the solution at the interior mesh points.
m = size(x,2); n = size(y,1); N = (m-2)*(n-2);
A = diag(-4*ones(N,1)); % Create diagonal matrix
A = A + diag(diag(A,n-2)+1,n-2); % Add n-2 diagonal
A = A + diag(diag(A,2-n)+1,2-n); % Add 2-n diagonal
d1 = ones(N-1,1); % Create vector of ones
d1(n-2:n-2:end) = 0; % Insert zeros
A = A + diag(d1,1); % Add upper diagonal
A = A + diag(d1,-1); % Add lower diagonal
160
[X Y] = meshgrid(x(2:end-1),y(end-1:-1:2)); % Create mesh
h = x(2)-x(1);
%Define boundary conditions
for i = 2:m-1
utopv(i-1) = utop(x(i));
ubottomv(i-1) = ubottom(x(i));
end
for i = 1:n
uleftv(i) = uleft(y(n+1-i));
urightv(i) = uright(y(n+1-i));
end
% Build vector b
b = 0; % Initialize vector b
for i = 1:N
b(i) = h^2*f(X(i),Y(i));
end
b(1:n-2:N) = b(1:n-2:N)-utopv;
b(n-2:n-2:N) = b(n-2:n-2:N)-ubottomv;
b(1:n-2) = b(1:n-2)-uleftv(2:n-1);
b(N-(n-3):N) = b(N-n+3:N)-urightv(2:n-1);
u = A\b'; % Solve the system
U = reshape(u,n-2,m-2);
U = [utopv;U;ubottomv];
U = [uleftv' U urightv'];
[X Y] = meshgrid(x,y(end:-1:1));
surf(X,Y,U); % 3D plot of the numerical results
xlabel('x');ylabel('y');
U =
0 0 0 0 0
0 0.2735 0.3867 0.2735 0
0 0.7071 1.0000 0.7071 0
161
Let’s see, what is the difference if the mesh size is decreased to ℎ = 0.1!
x = 0:0.1:2;
y = 0:0.1:1; y = y';
f = inline('0','x','y');
ubottom = inline('sin(pi*x/2)');
utop = inline('0','x'); uleft = inline('0','y');
uright = inline('0','y');
U = DirichletPDE(x,y,f,uleft,uright,ubottom,utop)
162
8.2.1.3 Peaceman-Rachford Alternating Direction Implicit
Method
(0)
During the PRADI, as an initial step an arbitrary value (𝑢𝑖𝑗 ) is used as a starting
value at each interior mesh points. As the first sub-step, the 𝑢𝑖𝑗 values are
upgraded from row to row using equation (8.9.) – in 𝑗th row (𝑗 = 1,2, … , 𝑀):
(0.5) (0.5) (0.5) (0) (0)
𝑢𝑖−1,𝑗 − 4𝑢𝑖,𝑗 + 𝑢𝑖,𝑗−1 = −𝑢𝑖,𝑗−1 − 𝑢𝑖,𝑗+1 , 𝑖 = 1,2, … , 𝑁 (8.11.)
Of course, the values that are given by the boundary conditions, are not
affected by the iteration steps and remain unchanged during the
computation. For any 𝑗 rows 𝑁 equations are generated by equation (8.11.),
thus in total 𝑀 ∗ 𝑁 equations are constructed. In this case, the coefficient
matrix is diagonal, which can be solved more efficiently, by using for example
the Thomas method. At this moment, the iteration step is only halfway
through, the second sub-step is still needed.
The second half of the iteration step solves equation (8.10.) for each column
(0.5)
by using 𝑢𝑖𝑗 values as initial values and then upgrades them – for 𝑖th column
(𝑖 = 1,2, … , 𝑁):
(1) (1) (1) (0.5) (0.5)
𝑢𝑖,𝑗−1 − 4𝑢𝑖,𝑗 + 𝑢𝑖,𝑗+1 = −𝑢𝑖−1,𝑗 − 𝑢𝑖+1,𝑗 , 𝑗 = 1,2, … , 𝑁 (8.12.)
Now, this equation generates 𝑀 equations for each column. Of course, the
boundary conditions do not change during the method. In total, 𝑀 ∗ 𝑁
equations are generated, which can be solves similarly, like the linear equation
system, generated for the rows. After finding the solution, the first iteration
step is completed. The further iteration steps are executed in the same way.
The method is continued until the convergence has reached, or until a
terminating condition has been fulfilled, like defining the minimum error
(tolerance) between two successive component matrices.
Example [3]
Solution:
Create the PRADI method’s MATLAB code.
function
[U,k]=PRADI(x,y,f,uleft,uright,ubottom,utop,tol,kmax)
% PRADI numerically solves an elliptic PDE with
163
% Dirichlet boundary conditions over a rectangular
% region using the Peaceman-Rachford alternating
% direction implicit method.
%
% [U,k] = PRADI(x,y,f,uleft,uright,ubottom,utop,tol,
% kmax) where
% x is the 1-by-m vector of mesh points in the
% x direction,
% y is the n-by-1 vector of mesh points in the
% y direction,
% f is the inline function defining the forcing
% function,
% ubottom,uleft,utop,uright are the functions
% defining the boundary conditions,
% tol is the tolerance used for convergence
% (default = 1e-4),
% kmax is the maximum number of iterations
% (default = 50),
% U is the solution at the mesh points,
% k is the number of full iterations needed to meet
% the tolerance.
% Note: The default starting value at all mesh points
% is 0.5.
if nargin<9 || isempty(kmax), kmax = 50; end
if nargin<8 || isempty(tol), tol = 1e-4; end
[X Y] = meshgrid(x(2:end-1),y(2:end-1));
% Create messh grid
m = size(X,2); n = size(X,1); N = m*n;
u = 0.5*ones(n,m); % Starting values
h = x(2)-x(1); % Mesh size
% Define boundary conditions
for i = 2:m+1
utopv(i-1) = utop(x(i));
ubottomv(i-1) = ubottom(x(i));
end
for i = 1:n+2
uleftv(i) = uleft(y(i));
urightv(i) = uright(y(i));
end
U = [ubottomv;u;utopv]; U = [uleftv' U urightv'];
% Generate matrix A1 (first half) and A2 (second half).
A = diag(-4*ones(N,1));
d1 = diag(A,1)+1; d1(m:m:N-1) = 0;
d2 = diag(A,-1)+1; d2(n:n:N-1) = 0;
A2 = diag(d2,1)+diag(d2,-1)+A;
A1 = diag(d1,1)+diag(d1,-1)+A;
U1 = U;
for i = 1:N % Initialize vector b
b0(i) = h^2*f(X(i),Y(i));
164
end
b0 = reshape(b0,n,m);
for k = 1:kmax
% First half
b = b0-U1(1:end-2,2:end-1)-U1(3:end,2:end-1);
b(:,1) = b(:,1)-U(2:end-1,1);
b(:,end) = b(:,end)-U(2:end-1,end);
b = reshape(b',N,1);
u = ThomasMethod(A1,b);
% Tridiagonal system - Thomas method
u = reshape(u,m,n);
U1 = [U(1,2:end-1);u';U(end,2:end-1)];
U1 = [U(:,1) U1 U(:,end)];
% second half
b = b0-U1(2:end-1,1:end-2)-U1(2:end-1,3:end);
b(1,:) = b(1,:)-U(1,2:end-1);
b(end,:) = b(end,:)-U(end,2:end-1);
b = reshape(b,N,1);
u = ThomasMethod(A2,b);
% Tridiagonal system - Thomas method
u = reshape(u,n,m);
U2 = [U(1,2:end-1);u;U(end,2:end-1)];
U2 = [U(:,1) U2 U(:,end)];
if norm(U2-U1,inf)<=tol, break, end;
U1 = U2;
end
[X Y] = meshgrid(x,y);
U = U1;
for i = 1:n+2
W(i,:) = U(n-i+3,:);
YY(i) = Y(n-i+3);
end
U = W; Y = YY;
surf(X,Y,U);
xlabel('x');ylabel('y');
To run this function, a function for Thomas method is also needed, which can
be constructed easily as follows:
function x = ThomasMethod(A,b)
% ThomasMethod uses Thomas method to find the solution
% vector x of a tridiagonal system Ax = b.
% x = ThomasMethod(A,b) where
% A is a tridiagonal n-by-n coefficient matrix,
% b is the n-by-1 vector of the right-hand sides,
% x is the n-by-1 solution vector.
n = size(A,1);
d = diag(A); % Vector of diagonal entries of A
l = [0;diag(A, -1)]; % Vector of lower diagonal elements
165
u = [diag(A,1);0]; % Vector of upper diagonal elements
u(1) = u(1)/d(1); b(1) = b(1)/d(1); % First equation
for i = 2:n-1 % The next n?2 equations
den = d(i) - u(i-1)*l(i);
if den == 0
x = 'failure, division by zero';
return
end
u(i)= u(i)/den; b(i) = (b(i)-b(i-1)*l(i))/den;
end
b(n) = (b(n)-b(n-1)*l(n))/(d(n)-u(n?1)*l(n));
% Last equation
x(n) = b(n);
for i = n-1: -1:1
x(i) = b(i) - u(i)*x(i+1);
end
x = x';
U =
0 0 0 0
0 0.1083 0.1083 0
0 0.3248 0.3248 0
0 0.8660 0.8660 0
k =
5
166
8.2.2 Neumann Problem
Until this point, we assumed that at the boundary we have known values of
𝑢. The Neumann problem is based on the missing information on the
boundary, meaning the boundary values form an unknown vector.
Considering the same Laplace’s equation, the difference equation at the mesh
point (3,3) is the following:
𝑢23 + 𝑢32 − 4𝑢33 + 𝑢43 + 𝑢34 = 0 (8.13.)
where 𝑢23 , 𝑢32 , 𝑢33 are the interior mesh points, while 𝑢43 , 𝑢34 are the
boundary points, as Figure 28 shows.
From these points the interior mesh points create the unknown vector, while
the boundary values are not available, so they are also part of the unknown
vector. Actually, considering equation (8.9.) close to the boundary points, the
same problem can occur. As the equation can be written to interior points
only, for example in Figure 28 we can construct 9 equations only, despite the
fact that there are 25 unknowns. Thus, more equations must be generated. In
order to generate these equations, equation (8.9.) has to be applied to each
marked boundary points. Regarding to point (3,3) the equation for these two
points are:
𝑢24 + 𝑢44 + 𝑢35 + 𝑢33 − 4𝑢34 = 0
(8.14.)
𝑢42 + 𝑢44 + 𝑢53 + 𝑢33 − 4𝑢43 = 0
This way, two quantities have been generated (𝑢35 and 𝑢53 ) that have to be
eliminated, since they are not part of the grid. By using the information on the
vertical and horizontal boundary segments and extending the grid beyond the
𝜕𝑢
boundary region. Since at point (3,4) the derivative of 𝜕𝑦
= 𝑔(𝑥) = 𝑔34 is
available, by applying two-point central difference formula we get:
167
𝜕𝑢 𝑢35 − 𝑢33
| = 𝑔34 = → 𝑢35 = 𝑢33 + 2ℎ𝑔34 (8.15.)
𝜕𝑦 (3,4) 2ℎ
𝜕𝑢 𝑢53 − 𝑢33
| = 𝑘43 = → 𝑢53 = 𝑢33 + 2ℎ𝑘43 (8.16.)
𝜕𝑥 (4,3) 2ℎ
By substituting these expressions to equation (8.14.), two more equations can
be generated that consist only interior mesh points and boundary points. To
continue this process, it has to assumed that the Laplace’s equation is valid
beyond the rectangular region, at least in the exterior area that contains the
newly produced points (like 𝑢35 and 𝑢53 ). Using this method, the same
number of equations can be generated as the total number of interior mesh
points and boundary points together, like in the example illustrated in Figure
28, 25 unknowns and 25 equations.
The nature and properties of the normal boundaries determined on the
different segments of boundaries specify, if there was a solution for a
Neumann problem: if the line integral of the normal derivatives is zero, the
solution exists, otherwise it does not:
𝜕𝑢
∫ 𝑑𝑠 = 0 (8.17.)
𝜕𝑛
𝐶
Example [3]
Consider the next Neumann problem, illustrated in the figure below. The
constructed grid is ℎ = 1. Is there a solution?
Solution:
The problems includes 2 interior points and 10 boundary points, 12 unknowns
in total. In order to figure out if there is a solution, solve equation (8.17.),
168
where the bottom border is marked as 𝐿1 , the left and right edges are 𝐿4 , 𝐿2
and finally the top edge is 𝐿3 (going counterclockwise):
𝜕𝑢 𝜕𝑢 𝜕𝑢 𝜕𝑢 𝜕𝑢
∫ 𝑑𝑠 = ∫ 𝑑𝑠 = ∫ 𝑑𝑠 = ∫ 𝑑𝑠 = ∫ 𝑑𝑠 =
𝜕𝑛 𝜕𝑛 𝜕𝑛 𝜕𝑛 𝜕𝑛
𝐶 𝐿1 𝐿2 𝐿3 𝐿4
4 2 4 2
Example [3]
Consider the next problem and solve it with the grid size of ℎ = 1.
Solution:
The upper, lower, and left edges yields for a Dirichlet boundary conditions,
extension is not needed there, only extension pertains to the (3,1) mesh
169
point, where 𝑢 is not known (𝑢41 ) is created. Equation (8.8.) is solved first at
the interior mesh points:
(1,1) 𝑢01 + 𝑢10 + 𝑢21 + 𝑢12 − 4𝑢11 = 0
(2,1) 𝑢11 + 𝑢20 + 𝑢31 + 𝑢22 − 4𝑢21 = 0
(3,1) 𝑢21 + 𝑢30 + 𝑢41 + 𝑢32 − 4𝑢31 = 0
With two-point central difference formula 𝑢41 can be eliminated:
𝜕𝑢 𝑢41 − 𝑢21
| = [2𝑦](3,1) = 2 = ⟹ 𝑢41 = 𝑢21 + 4
𝜕𝑥 (3,1) 2
By using the boundary values and substituting with 𝑢41 = 𝑢21 + 4, we can
write:
170
Figure 29: Illustration of an irregular boundary at a region; left: grid, right:
mesh points [3]
So, if the distance between 𝐴 and 𝑃 is 𝛼ℎ, while the distance between 𝐵 and
𝑃 is 𝛽ℎ, the Taylor’s series expansion can be written for 𝑢 at four points
(𝐴, 𝐵, 𝑀, 𝑁) about point 𝑃 in the following way for 𝑢(𝑀) and 𝑢(𝐴):
𝜕𝑢(𝑃) 1 2 𝜕 2 𝑢(𝑃)
𝑢(𝑀) = 𝑢(𝑃) − ℎ + ℎ −⋯ (8.18.)
𝜕𝑥 2! 𝜕𝑥 2
𝜕𝑢(𝑃) 1 𝜕 2 𝑢(𝑃)
𝑢(𝐴) = 𝑢(𝑃) − (𝛼ℎ) + (𝛼ℎ)2 −⋯ (8.19.)
𝜕𝑥 2! 𝜕𝑥 2
The difference between the two equation is only a multiplication with ℎ, so by
neglecting any terms with higher order than ℎ2 :
ℎ2 𝜕 2 𝑢(𝑃)
𝑢(𝐴) + 𝛼𝑢(𝑀) ≅ (𝛼 + 1)𝑢(𝑃) + 𝛼(𝛼 + 1) (8.20.)
2 𝜕𝑥 2
Rearranging the equation:
𝜕 2 𝑢(𝑃) 2 1 1 1
2
= 2[ 𝑢(𝐴) + 𝑢(𝑀) − 𝑢(𝑃)] (8.21.)
𝜕𝑥 ℎ 𝛼(𝛼 + 1) (𝛼 + 1) 𝛼
For 𝑢(𝐵) and 𝑢(𝑁) a similar equation can be carried out:
𝜕 2 𝑢(𝑃) 2 1 1 1
= 2[ 𝑢(𝐵) + 𝑢(𝑁) − 𝑢(𝑃)] (8.22.)
𝜕𝑦 2 ℎ 𝛽(𝛽 + 1) (𝛽 + 1) 𝛽
By adding these to equations we get:
171
𝑢𝑥𝑥 (𝑃) + 𝑢𝑦𝑦 (𝑃) =
2 1 1
= [ 𝑢(𝐴) + 𝑢(𝐵)
2
ℎ 𝛼(𝛼 + 1) 𝛽(𝛽 + 1) (8.23.)
1 1 1 1
+ 𝑢(𝑀) + 𝑢(𝑁) − ( + ) 𝑢(𝑃)]
(𝛼 + 1) (𝛽 + 1) 𝛼 𝛽
To deal with the Laplace’s equation, we know that:
1 1 1 1
𝑢(𝐴) + 𝑢(𝐵) + 𝑢(𝑀) + 𝑢(𝑁)
𝛼(𝛼 + 1) 𝛽(𝛽 + 1) (𝛼 + 1) (𝛽 + 1)
2(𝛼 + 𝛽) (8.24.)
− 𝑢(𝑃) = 0
𝛼𝛽
In order to generate the general form, using Figure 29 right-hand side
notations, the difference equation is:
1 1 1
𝑢 + 𝑢 + 𝑢 +
𝛼(𝛼 + 1) 𝑖+1,𝑗 𝛽(𝛽 + 1) 𝑖,𝑗+1 (𝛼 + 1) 𝑖−1,𝑗
(8.25.)
1 2(𝛼 + 𝛽)
+ 𝑢 − 𝑢𝑖,𝑗 = 0
(𝛽 + 1) 𝑖,𝑗−1 𝛼𝛽
The same method can be applied to the Poisson’s equation. This final equation
can be used in every case (at any mesh point) when any of the neighbouring
points are not located on the grid.
Example [3]
Solve the Laplace equation over the region, illustrated below. The slanting
2
part of the boundary is described as 𝑦 = − 3 𝑥 + 2.
172
Solution:
The method start again with the Dirichlet problem’s solution at mesh points
(1,1), (2,1), (1,2):
1 + 4 + 𝑢12 + 𝑢21 − 4𝑢11 = 0
𝑢11 + 7 + 9.5 + 𝑢22 − 4𝑢21 = 0
1
1 + 𝑢11 + + 𝑢22 − 4𝑢12 = 0
3
For point (2,2) equation (8.25.) has to be used. From the slanting segment
1
equation the vertical distance of 𝐴 point can be calculated, which is 3 from
1 2
point (2,2). So ℎ = and 𝛽 = , while 𝛼 = 1. This way:
2 3
2
2 2 2 (1 + )
9+ (2) + 𝑢12 + 𝑢21 − 3 𝑢 =0
2 22
2 5 5
∙ 3
3 3 3
Combining with equation (8.25.) after the simplifying:
−4 1 1 0 𝑢11 −5 𝑢11 = 3.3354
1 −4 0 1 𝑢21 −16.5 𝑢21 = 6.0666
[ ][ ] = [ ]⟹
1 0 −4 1 𝑢12 −1.33 𝑢12 = 2.2749
0 1.2 1 −5 𝑢 22 −12.6 𝑢 22 = 4.4310
173
8.3.1 Finite-Difference Method
The 1D heat equation actually means that considering a wire with the length
of 𝐿 which ends are kept at 0 temperature and subjected to the initial
temperature along the wire prescribed by 𝑓(𝑥). In this case, the simplified
initial value problem is:
𝑢𝑡 = 𝛼 2 𝑢𝑥𝑥 (𝛼 = 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 > 0), 0 ≤ 𝑥 ≤ 𝐿, 𝑡 ≥ 0
𝑢(0, 𝑡) = 0 = 𝑢(𝐿, 𝑡) (8.27.)
𝑢(𝑥, 0) = 𝑓(𝑥)
As Figure 30 illustrates, the generated grid for solving the 1D heat equation
has a size of ℎ in 𝑥-direction and 𝑘 size in 𝑡-direction.
As a first step, in equation (8.27.) the partial derivatives has to be replaced
with their Finite-Difference (FD) approximations. For 𝑢𝑥𝑥 term a three-point
central difference formula can be applied, while for term 𝑢𝑡 a two-point
forward difference formula must be used, since it is considered more accurate
than the central difference formula. Also, at 𝑡 = 0 only forward progression is
possible, since there is no information available when 𝑡 < 0. The transformed
equation then:
1 𝛼2
(𝑢𝑖,𝑗+1 − 𝑢𝑖,𝑗 ) = 2 (𝑢𝑖−1,𝑗 − 2𝑢, 𝑗 + 𝑢𝑖+1,𝑗 ) (8.28.)
𝑘 ℎ
Figure 30: The generated grid in order to solve the 1D heat equation [3]
174
If 𝑢𝑖−1,𝑗 , 𝑢𝑖,𝑗 and 𝑢𝑖+1,𝑗 are all known, 𝑢𝑖,𝑗+1 can be computed at higher level
of axis 𝑡 as:
2𝑘𝛼 2 𝑘𝛼 2
𝑢𝑖,𝑗+1 = [1 − 2 ] 𝑢𝑖,𝑗 + 2 (𝑢𝑖−1,𝑗 + 𝑢𝑖+1,𝑗 ) (8.29.)
ℎ ℎ
This equation can be written in a simplified form:
𝑘𝛼 2
𝑢𝑖,𝑗+1 = (1 − 2𝑟)𝑢𝑖,𝑗 + 𝑟(𝑢𝑖−1,𝑗 + 𝑢𝑖+1,𝑗 ), 𝑟 = (8.30.)
ℎ2
which is known as the difference equation for the 1D heat equation applying
FD method.
The FD method accuracy improves as ℎ and 𝑘 tends to zero. Equation (8.30.)
can be considered as stable and convergent if:
𝑘𝛼 2 1
𝑟= 2 ≤ (8.31.)
ℎ 2
otherwise is divergent and unstable.
Example [3]
Solve the 1D heat equation for a wire which is laterally insulated, has 𝐿 = 1
length and 𝛼 = 0.5. The two ends of the wire are kept at zero temperature,
while the initial temperature is 𝑓(𝑥) = 10 sin 𝜋𝑥 at mesh points ℎ = 0.25 and
𝑘 = 0.1. Compute the temperature (𝑢(𝑥, 𝑡)) at 0 ≤ 𝑥 ≤ 1 and 0 ≤ 𝑡 ≤ 0.5, if
mesh points were generated by ℎ = 0.25 and 𝑘 = 0.1, considering all the
parameter values are consistent physical units.
Solution:
The exact solution is:
2
𝑢(𝑥, 𝑡) = (10 sin 𝜋𝑥)𝑒 −0.25𝜋 𝑡
For the numerical solution the following user-defined function can be
constructed in MATLAB:
function u = Heat1DFD(t,x,u,alpha)
%
% Heat1DFD numerically solves the one-dimensional heat
% equation, with zero boundary conditions, using the
% finite-difference method.
%
% u = Heat1DFD(t,x,u,alpha) where
%
175
% t is the row vector of times to compute,
% x is the column vector of x positions to compute,
% u is the column vector of initial temperatures for
% each value in x,
% alpha is a given parameter of the PDE,
%
% u is the solution at the mesh points.
u = u(:); % u must be a column vector
k = t(2)-t(1);
h = x(2)-x(1);
r = (alpha/h)^2*k;
if r > 0.5
warning('Method is unstable and divergent. Results will
be inaccurate.')
end
i = 2:length(x)-1;
for j = 1:length(t)-1
u(i,j+1) = (1-2*r)*u(i,j) + r*(u(i-1,j)+u(i+1,j));
end
0 0 0 0 0
0
7.0711 5.4142 4.1456 3.1742 2.4304
1.8610
10.0000 7.6569 5.8627 4.4890 3.4372
2.6318
7.0711 5.4142 4.1456 3.1742 2.4304
1.8610
0.0000 0 0 0 0
0
176
8.3.2 Crank-Nicolson Method
𝑘𝛼 2 1
The main problem with the Finite-Difference Method that the 𝑟 = 2 ≤
ℎ 2
condition can cause significant computational errors, like when ℎ = 0.2 and
1
𝛼 = 1: since 𝑟 ≤ 2, 𝑘 has to be 0.02 or lower, causing too many time steps.
Furthermore, if the mesh size were reduced half to ℎ = 0.1, the number of
time steps increases with a factor of 4. So, in order to decrease 𝑟, wither ℎ
must be increased, resulting in decreased accuracy, or 𝑘 must be decreased,
causing computation-requirement increment.
To overcome on this problem, the Crank-Nicolson (CN) Method provides a
𝑘𝛼 2
solution without a restriction on 𝑟 = 2 . To achieve that, a six-point molecule
ℎ
is employed as opposed to the four-point molecule used with FD method.
The starting equation is equation (8.35.) now from which the CN method
difference equation can be derived. Two terms are written on the right-hand
side, similar to the inside parentheses: one for the 𝑗th time row and one for
𝛼2
(𝑗 + 1)st time row. Then, both sides are multiplied with , thus:
2ℎ 2
1 𝛼2
(𝑢𝑖,𝑗+1 − 𝑢𝑖,𝑗 ) = 2 (𝑢𝑖−1,𝑗 − 2𝑢𝑖,𝑗 + 𝑢𝑖+1,𝑗 ) +
𝑘 2ℎ (8.32.)
𝛼2
+ (𝑢 − 2𝑢𝑖,𝑗+1 + 𝑢𝑖+1,𝑗+1 )
2ℎ2 𝑖−1,𝑗+1
𝑘𝛼 2
Then, by multiplying with 2𝑘 and letting 𝑟 = 2 , after rearranging the higher
ℎ
time row terms to the left hand side, we can write:
2(1 + 𝑟)𝑢𝑖,𝑗+1 − 𝑟(𝑢𝑖−1,𝑗+1 − 𝑢𝑖+1,𝑗+1 ) = (8.33.)
177
𝑘𝛼 2
= 2(1 − 𝑟)𝑢𝑖,𝑗 + 𝑟(𝑢𝑖−1,𝑗 + 𝑢𝑖+1,𝑗 ), 𝑟=
ℎ2
This is the form of the 1D heat equation using the CN method.
The initial step is starting with 0th time level (𝑗 = 1), each time is applied, so
the initial temperature (𝑓(𝑥)) provides the 3 terms on the right-hand side:
𝑢𝑖−1,ℎ , 𝑢𝑖,𝑗 and 𝑢𝑖+1,𝑗 . The higher time levels are not known. 𝑛 non-boundary
mesh points in each row provide 𝑛 × 𝑛 ensuing system of equations to be
solved with a tridiagonal coefficient matrix. By solving this system of equation
the temperature values can be calculated in 𝑗 = 1 row. Indeed, the process
has to be repeated to approximate all the desired mesh points.
Example [3]
The formerly (8.6 Example) considered 1D heat equation is used again. Find
the approximate values of 𝑢(𝑥, 𝑡) at the mesh points by using CN method and
compare the results with the FD method.
Solution:
Create the following MATLAB function:
function u = Heat1DCN(t,x,u,alpha)
%
% Heat1DCN numerically solves the one-dimensional heat
% equation, with zero boundary conditions, using the
% Crank-Nicolson method.
%
% u = Heat1DCN(t,x,u,alpha) where
% t is the row vector of times to compute,
% x is the column vector of x positions to compute,
% u is the column vector of initial temperatures for
% each value in x,
% alpha is a given parameter of the PDE,
%
% u is the solution at the mesh points.
u = u(:); % u must be a column vector
k = t(2)-t(1); h = x(2)-x(1); r = (alpha/h)^2*k;
% Compute A
n = length(x);
A = diag(2* (1+r)*ones(n-2,1));
A = A + diag(diag(A,-1)-r,-1);
A = A + diag(diag(A,1)-r, 1);
% Compute B
B = diag(2*(1-r)*ones(n-2,1));
B = B + diag(diag(B,-1) +r,-1);
178
B = B + diag(diag(B,1) +r,1);
C = A\B;
i = 2:length(x)-1;
for j = 1:length(t)-1
u(i,j+1) = C*u(i,j);
end
0 0 0 0 0 0
7.0711 5.5880 4.4159 3.4897 2.7578 2.1794
10.0000 7.9026 6.2451 4.9352 3.9001 3.0821
7.0711 5.5880 4.4159 3.4897 2.7578 2.1794
0.0000 0 0 0 0 0
179
8.4 Hyperbolic Partial Differential Equations
In order to introduce the numerical solution method of hyperbolic PDEs,
consider the following boundary-value problem:
𝑢𝑡𝑡 = 𝐶 2 𝑢𝑥𝑥
𝑢(𝑥, 0) = 𝑓(𝑥)
𝑢𝑡 (𝑥, 0) = 𝜙(𝑥) (8.34.)
𝑢(0, 𝑡) = 𝜓1 (𝑡)
𝑢(1, 𝑡) = 𝜓2 (𝑡)
when 0 ≤ 𝑡 ≤ 𝑇, which is actually the modelling of the transverse (1D)
vibrations of a stretched string. In the first equation, by replacing 𝑢𝑥𝑥 and 𝑢𝑡𝑡
terms with a 3-point central difference approximation, we get:
1
𝑢𝑥𝑥 = 2 (𝑢𝑖−1,𝑗 − 2𝑢𝑖,𝑗 + 𝑢𝑖+1,𝑗 ) + 𝑂(ℎ2 ) =
ℎ (8.35.)
1
= 𝑢𝑡𝑡 = 2 (𝑢𝑖,𝑗−1 − 2𝑢𝑖,𝑗 + 𝑢𝑖,𝑗+1 ) + 𝑂(𝑘 2 )
𝑘
where 𝑥 = 𝑖ℎ, 𝑖 = 0,1,2, … and 𝑡 = 𝑗𝑘, 𝑗 = 0,1,2, ….
Figure 31: The applied grid for solving the 1D wave equation [3]
𝑘𝛼 2
By substituting with 𝑟̃ = ( ℎ ) , multiplying with 𝑘 2 and rearranging the
equation for 𝑢𝑖,𝑗+1, we can write:
𝑢𝑖,𝑗+1 = −𝑢𝑖,𝑗−1 + 2(1 − 𝑟̃ )𝑢𝑖𝑗 + 𝑟̃ (𝑢𝑖−1,𝑗 + 𝑢𝑖+1,𝑗 ), 𝑟̃
𝑘𝛼 2 (8.36.)
=( )
ℎ
180
According to this, the function values at the 𝑗th and (𝑗 − 1)th time levels are
needed to compute the (𝑗 + 1)th time level function values. This difference
equation is known as the 1D wave equation using finite difference method.
These kind of methods are called three level difference schemes. If the terms
in equation (8.36.) were expanded as Taylor’s series and were simplified, it
could be seen that the truncation error is 𝑂(𝑘 2 + ℎ2 ). Additionally, the
formula works well if 𝑟̃ ≤ 1, which latter is also the stability condition.
For the initial problem – 1st equation of (8.34.) – there are two implicit finite
difference schemes that are:
𝑢𝑖,𝑗+1 − 2𝑢𝑖,𝑗−1 + 𝑢𝑖,𝑗−1
=
𝑘2
𝐶2 (8.37.)
= 2 [(𝑢𝑖+1,𝑗+1 − 2𝑢𝑖,𝑗+1 + 𝑢𝑖−1,𝑗+1 )
2ℎ
+ (𝑢𝑖+1,𝑗−1 − 2𝑢𝑖,𝑗−1 + 𝑢𝑖−1,𝑗−1 )]
and
𝑢𝑖,𝑗+1 − 2𝑢𝑖,𝑗−1 + 𝑢𝑖,𝑗−1
=
𝑘2
2
𝐶
= 2 [(𝑢𝑖+1,𝑗+1 − 2𝑢𝑖,𝑗+1 + 𝑢𝑖−1,𝑗+1 ) (8.38.)
4ℎ
+ 2(𝑢𝑖+1,𝑗 − 2𝑢𝑖,𝑗 + 𝑢𝑖−1,𝑗 )
+ (𝑢𝑖+1,𝑗−1 − 2𝑢𝑖,𝑗−1 + 𝑢𝑖−1,𝑗−1 )]
These implicit formulas give good approximations for all 𝑟̃ values.
Example [3]
Solution:
This example has an exact solution which is:
3 1
𝑢(𝑥, 𝑡) = sin 𝜋𝑥 cos 𝜋𝑡 − sin 3𝜋𝑥 cos 3𝜋𝑡
4 4
181
By applying the explicit formula (equation (8.36.)), where 𝑟̃ < 1, let ℎ = 0.25
0.2
and 𝑘 = 0.2. Thus, 𝑟̃ = 0.25 = 0.8, so the stability condition is satisfied. If 𝑢𝑖𝑗 =
𝑢(𝑖ℎ, 𝑗𝑘), than the boundary conditions:
𝑢0,𝑗 = 0
𝑢4,𝑗 = 0
𝑢𝑖,0 = sin3(𝜋𝑖ℎ) , 𝑖 = 1,2,3,4
𝑢𝑖,1 − 𝑢𝑖,−1 = 0 so 𝑢𝑖,−1 = 𝑢𝑖,1
By substituting the calculated 𝑟̃ we get:
𝑢𝑖,𝑗+1 = −𝑢𝑖,𝑗−1 + 0.64(𝑢𝑖−1,𝑗 + 𝑢𝑖+1,𝑗 ) + 2(0.36)𝑢𝑖,𝑗
= 0.32(𝑢𝑖−1,𝑗 + 𝑢𝑖+1,𝑗 ) + 0.36𝑢𝑖,𝑗
Because 𝑢𝑖,−1 = 𝑢𝑖,1 , it can be calculated:
𝑢1,1 = 0.32(𝑢0,𝑗 + 𝑢2,0 ) + 0.36𝑢1,0 = 0.32(0 + 1) + 0.36(0.3537) = 0.4473
The exact value of the function is: 𝑢(0.25,0.2) = 0.4838.
By repeating the same steps, we get:
𝑢2,1 = 0.32(0.3537 + 0.3537) + 0.36(1.0) = 0.5867
The exact value = 0.5296.
As a final step:
𝑢3,1 = 0.32(1.0 + 0) + 0.36(0.3537) = 0.4473
When the exact value = 0.4838.
The computations could be continued for 𝑗 = 1,2, …
Example [3]
Take an elastic ring with the length of 𝐿 = 2 and 𝛼 = 1, that are fixed at both
𝜋𝑥
ends. The initial displacement of the string is given as: 𝑓(𝑥) = 5 sin ( 2 ), while
the initial velocity is zero (𝑔(𝑥) = 0). Find the displacement of the string
(𝑢(𝑥, 𝑓)) for 0 ≤ 𝑥 ≤ 𝐿 and 0 ≤ 𝑡 ≤ 2, when ℎ = 𝑘 = 0.4.
Solution:
The exact solution is the following:
𝜋𝑥 𝜋𝑡
𝑢(𝑥, 𝑡) = 5 sin cos
2 2
At first, calculate 𝑟̃ :
182
𝑘𝛼 2
𝑟̃ = ( ) = 1
ℎ
The FD approach to solve the wave equation to zero-boundary conditions and
prescribed initial displacement and velocity can be constructed as:
function u = Wave1DFD(t,x,u,ut,alpha)
%
% Wave1DFD numerically solves the one-dimensional wave
% equation, with zero boundary conditions, using the
% finite-difference method.
%
% u = Wave1DFD(t,x,u,ut,alpha) where
% t is the row vector of times to compute,
% x is the column vector of x positions to compute,
% u is the column vector of initial displacements
% for each value in x,
% ut is the column vector of initial velocities for
% each value in x,
% alpha is a given parameter of the PDE,
% u is the solution at the mesh points.
u = u(:); % u must be a column vector
ut = ut(:); % ut must be a column vector
k = t(2)-t(1);
h = x(2)-x(1);
r = (k*alpha/h)^2;
if r>1
warning('Method is unstable and divergent. Results will be
inaccurate.')
end
i = 2:length(x)-1;
u(i,2) = (1-r)*u(i,1) + r/2*(u(i-1,1) + u(i+1,1)) +
k*ut(i);
for j = 2:length(t)-1
u(i,j+1) = -u(i,j-1) + 2*(1-r)*u(i,j) + r*(u(i-1,j) +
u(i+1,j));
end
Using the user-defined function, we can get the results and plotted as follows:
t = 0:0.1:2;
x = 0:0.1:2; x = x';
u = 5.*sin(pi*x/2);
ut = zeros(length(x),1);
u = Wave1DFD(t,x,u,ut,1);
plot(x,u)
183
The displacement of the elastic string looks like the following:
184
References
1. Faragó, I. and R. Horváth, Numerikus Módszerek, F. Miklós, Editor. 2011,
BME TTK Matematika Intézet: Budapest.
2. Dr. Nyakóné dr. Juhász Katalin, Dr. Terdik György, Biró Piroska, and Dr.
Kátai Zoltán, Bevezetés az Informatikába. 2011, Debreceni Egyetem,
Informatikai Kar; Sapientia EMTE, Műszaki és Humántudományok Kar:
Kempelen Farkas Hallgatói Információs Központ.
3. Esfandiari, R.S., Numerical Methods for Engineers and Scientists Using
MATLAB. 2013: Taylor & Francis Group, LLC.
185
<< 10-20 soros ismertető a könyv hátsó borítójára>>
The recent book was written for the Budapest University of Technology and
Economics, Transportation and Vehicle Engineering Facility’s curriculum,
fitted for the Stipendium Hungaricum Scholarship Programme. The aim of
developing and using numerical methods instead of relying on complicated
and expensive analytical solution methods is to approximate the accurate
solution we desire using simpler, arithmetical equations, with the required
level of accuracy. After determining the different errors, that has an effect on
a numerical method’s result, and their sources, the chapters introduce the
solution of equations and system of equations, then the two main curve
fitting methods, as approximation and interpolation. These chapters are
followed by the numerical differentiation and integration methods, then the
numerical solution of ordinary and partial differentiation equations are
outlined. Since MATLAB is also a popular choice for engineering use, with its
user friendly, yet powerful computing capability, it is practically an industry
standard for prototyping and new developments. Many chapters in the notes
show examples implemented in MATLAB.
186