Professional Documents
Culture Documents
by
MÄLARDALEN UNIVERSITY
Date:
2017-10-17
Project name:
Author:
Rasha Altoumaimi
Supervisor(s):
Reviewer:
Mats Bodin
Examiner:
Sergei Silvestrov
Comprising:
30 ECTS credits
I would like to dedicate my thesis to my beloved Husband Ahmed Sonba who always gives me
support and love
Acknowledgements
I would like to express my thanks and gratitude to my supervisor Karl Lundengård for his
positive and supportive guidance and to supervisor Milica Rančić. Special thanks to the re-
viewer Mats Bodin for giving a very detailed feedback and to Professor Sergei Silvestrov the
examiner for this thesis.
Last but not least, I want to thank my family and friends for their support and encouragement.
3
Abstract
This thesis examines how to find the best fit to a series of data points when curve fitting
using power-exponential models. We describe the different numerical methods such as the
Gauss-Newton and Levenberg-Marquardt methods to compare them for solving non-linear
least squares of curve fitting using different power-exponential functions. In addition, we
show the results of numerical experiments that illustrate the effectiveness of this approach.
Furthermore, we show its application to the practical problems by using different sets of data
such as death rates and rocket-triggered lightning return strokes based on the transmission line
model.
1 Introduction 7
2.3 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2
3.2.1 Trust-region strategy . . . . . . . . . . . . . . . . . . . . . . . . . . 21
7.4 Objective 4: Independently and Creatively Identify and Carry out Advanced
Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3
7.5 Objective 5: Present and Discuss Conclusions and Knowledge . . . . . . . . 52
A MATLAB Code 56
A.1 Calculating residuals and Jacobian for the first power exponential function
µ(b; x) using GN and Levenberg-Marquardt algorithms . . . . . . . . . . . . 56
4
List of Figures
5.1 Death risk after ages for men in USA between 1995 and 2004. . . . . . . . . . . . 39
5.2 Death risk for men in USA for some years between 1995 and 2004 using Gauss-
Newton algorithm for x denotes age and y death risk. . . . . . . . . . . . . . . . . 41
5.3 Death risk for men in USA for some years between 1995 and 2004 using Levenberg
Marquardt algorithm for x denotes age and y death risk. . . . . . . . . . . . . . . 43
5.4 Equipment for measuring rocket-triggered return strokes, image originally ap-
peared in[17] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.5 Comparison of results for fitting rocket-triggered lightning return strokes data
using the Gauss-Newton algorithm and the Levenberg-Marquardt algorithm . 47
5
List of Tables
5.1 The result of residuals least squares ×10−8 using µ(c1 , c2 , a1 , a2 , a3 ; x). . . . 44
5.2 The result of residuals least squares for each rocket-triggered stroke using
formula fi = yi − µ(xi ; a, b, c). . . . . . . . . . . . . . . . . . . . . . . . . . 48
6
Chapter 1
Introduction
The goal of this thesis is to analyze and conduct several experiments with the properties of
class functions for curve fitting using power exponential functions and by comparing different
methods. Researchers would need to measure before setting up a model and so sample suitable
points will need to be picked up and curve fit to the chosen points. Here we will examine a
few different methods and apply them to the data from different applications. In general
terms, models or mathematical formulas may be developed to estimate a variable in the data
set based on the other variables in the data, with some residuals error depending on model
accuracy such as (Data = Model + Error). This related research can be used in many areas
such as electromagnetic compatibility calculations, lightning strike protection and in models
of death risk as well.
As will be seen in Chapter 2 and 3, there was a selection of papers and research on standard
methods of analysis and standards in performance when fitting models to data: Gauss-Newton
algorithm, trust-region algorithm and Levenberg- Marquardt algorithm, some of these meth-
ods are general in the sense that they can be used for any optimization problem, while the
other methods are especially adapted to least-square problems. This thesis will explain the
most common methods as well as choosing two methods used in non-linear least squares
curve fitting and explore them in detail including when and how they converge to zero
In the following sections of the introduction, some of the terminology used in this thesis
is described. In Chapter 1 we will discuss the basic concepts of non-linear curve fitting.
In Chapter 2 we will discuss the Jacobian and Hessian matrices that are important building
blocks in popular methods for non-linear curve-fitting. We will also discuss the concept of
convergence. In Chapter 3 we will discuss several methods for non-linear curve fitting. In
Chapter 4 two of the methods from Chapter 3 will be applied to real data.
7
1.2 Curve fitting
Curve fitting is used to find the "best fit" line or curve for a series of data points or, differently
put, the curve fitting observes the relationship between one or more predictors (independent
variables) and a response variable (dependent variable), with the aim of defining a "best fit"
model of the relationship. The goal of analyzing the results of non-linear least-square curve
fitting is to find the parameters that can the fit data best in the least-square sense, how much
uncertainty there is in the values of the parameters, and if the model fits this data differently
for a different set of data. Least squares minimize the square of the error between the original
data and the values predicted by the model. To get the best curve fit is determined by how near
the data is from the predicted values of the model. Most of the time, the curve fit will produce
an equation that can be used to find points anywhere along the curve. In some cases, we may
not be concerned about finding an equation. Instead, we may just want to use a curve fit to
smooth the data and improve the appearance of our plot [18].
In a power function the independent variable x is raised to a constant power c in its most
basic form of f (c, x) = xc where c is a coefficient. In this, the analysis should be focused on
b
linear combinations of function of the form µ(b, x) = xe1−x . This function is called power
exponential function where b is real-valued parameter different for each term in the linear
combination that can be changed to adopt the shape of the curve.
The choice of the parameters of power exponential function are extremely crucial as they affect
how good the fit will be. There are some ways to measure the fit that comprise considering the
sum of squares of residuals, the maximum residuals, etc. least-squares minimizes the square
of the error between the original data and the values predicted by model [18]. In this thesis,
we would like the sum of squares error or residuals to be as small as possible.
Least-square curve fitting is a commonly applied method for choosing parameters in a mathe-
matical model so that they approximate observed data in a good way.
Both linear and non-linear regression must deal with the dependent variables being random
in some sense. The difference is that in linear regression the value of the predicting model
8
depends linearly on its parameters. For example y(x; a, b) = ax + b is linear but y(x; a, b) =
ax + b2 is non-linear.
In this status where the objective function ϕ is continuously differentiable at a point b then
any local minimum point b of ϕ must satisfy
g(b) = ∇ϕ(b) = 0 (1.2)
where g(b) is the gradient vector. This shows the close relationship between solving optimiza-
tion problems and non-linear systems of equations.
If in form (1.1) there m > n equation we have an overdetermined nonlinear system. Then the
least squares solution can be defined to be a solution to
1
minϕ(b)b∈Rn = k f (b) k22
2
which is a nonlinear least squares problems. We will describe for this problem in a next
section.
9
1.4 Nonlinear least squares problems
A widely used approach for the estimation of the unknown parameters in the non-linear re-
gression function is the approach of least squares. The problem of non-linear least-squares is
that of minimizing a sum of squares. Considering a vector function f : Rn 7→ Rm where m ≥ n.
The aim is minimizing k f (b) k22 as the nonlinear least-squares objective function
1 1 m
minb ϕ(b) ≡ minb∈Rn k f (b) k22 = minb∈Rn ∑ fi (b)2 , (1.3)
2 2 i=1
in which each fi is a real-valued formula which has continuous second partial derivatives.
The problem is presented as a minimization of the l2 norm of a multivariate function of
minb∈Rn k f (b) k22 , where
f1 (b)
f2 (b)
.
f (b) =
.
(1.4)
.
fm (b)
A common instance is the choice of parameter b within a nonlinear model µ:
m
minb∈Rn ∑ (yi − µ(xi ; b))2 (1.5)
i=1
We must minimize some norm of the vector f (b). The non-linearity arises only from µ(xi ; b).
The model µ(b, xi ) fits the data well if the data points residuals fi are small.
One of the major advantages in the utilization of the non-linear regression is the wide set of
formulas which might be fit. Several scientific or physical procedures are inherently non-
linear; thus, to try a linear fitting of a similar structure would definitely result in inferior out-
comes. For instance, studies in population very often follow exponential patterns that cannot
be structured in a linear way. The non-linear regression can give good estimations of the un-
known parameters in the model with the use of a small data set. The problems of least squares
might be resolved via the general optimizing method. This study offers numeric results for a
large and diverse problem group with the use of software which is common and has under-
gone extensive tests. The algorithms that were used in this software include Newton-based
line searching and trust-region approaches for unconstrained optimizing, in addition to Gauss-
10
Newton, Levenberg-Marquardt for non-linear least squares. Our purpose is comparing the
underlying algorithms for the identification of classes of problems on which the performance
of every one of the methods is either quite good or quite poor, and for providing bench-marks
for further work in non-linear least squares and unconstrained optimizations.
11
Chapter 2
The standard approaches for non-linear least squares problems need derivative information
concerning the component function of f (b) as mentioned in [3] and [4].
Definition 1. If the function fi : Rn → R is differentiable at point b, the fi must also be
continuous at b then the gradient vector [3]
exists and is continuous. All the first-order partial derivatives of a vector valued function f (b):
Rn → Rn , is said to be differentiable at the point b if each component fi (b) is differentiable at
b. Then the matrix.
∂ f1 ∂ f1
∇ f1 (b)T ...
. ∂ .b1 . ∂ bn
..
J(b) =
.. =
. ..
. .
∇ fn (b)T ∂ fn ∂ fn
∂b ... ∂ bn
1
We assume here that f (b) is twice continuously differentiable. It is simply demonstrated that
the gradient of ϕ(b) = 21 f T (b) f (b) is
m
∂ ϕ(b) ∂ fi (b)
[∇ϕ(b)] j = = ∑ fi (b) , i = 1, · · · , m, j = 1, · · · , n.
∂bj i=1 ∂bj
12
and it follows that the gradient is the vector
g(b) = ∇ϕ(b) = J(b)T f (b) (2.1)
where Jacobian J(b) of the vector function f (b) is defined as the matrix with elements. This
is matrix containing the first partial derivatives of the function components
∂ fi (b) ∂ µ(b, xi )
[J(b)]i j = =− ∈ Rm×n , i = 1, .., m, j = 1, ..., n. (2.2)
∂bj ∂bj
The ith row of J(b) equals the transpose of the gradient of fi (b):
[J(b)]i,: = ∇ fi (b)T = −∇µ(b, xi )T , i = 1, ..., m. (2.3)
As we can see that the Jacobian is a function of the independent variable and the parameters,
and therefore it changes from one iteration to the next. So in terms of the linearized model
will be ∂∂bfij = −Ji j , and the residual is given by Eq.(1.6).
We shall also need the elements of the Hessian of ϕ(b), denoted ∇2 ϕ(b), that are given by
m
∂ 2 ϕ(b) ∂ fi (b) ∂ fi (b) m ∂ 2 fi (b)
[∇2 ϕ(b)]kl = =∑ + ∑ fi (b) (2.4)
∂ bk ∂ bl i=1 ∂ bk ∂ bl i=1 ∂ bk ∂ bl
where
[∇2 fi (b)]kl = −[∇2 µ(b, xi )]kl
∂ 2 µ(b, xi )
=− , k, l = 1, .., m.
∂ bk ∂ bl
The summation term can be ignored, therefore we can approximate ϕ(b) as
H(b) = ∇2 ϕ(b) = J(b)T J(b) (2.6)
The special forms of the gradient g(b) and Hessian H(b) can be exploited by methods for the
non-linear least squares problem.
Now we consider how the Hessian matrix can be used to establish the existence of a local
minimum or maximum.
13
Theorem 1. Suppose that f (b) has continuous first and second partial derivatives on a set
D ⊆ Rn . Let b∗ be an interior point of D that is a critical point of f (b). Then b∗ is a:
This theorem can be proved by using the continuity of the second partial derivatives to show
that H f (b) is positive definite for b sufficiently close to b∗ , and then applying the multi-
variable generalization of Taylor’s Formula [21].
We test now if the Hessian of f (b) is indefinite at a critical point. Suppose that f (b) has
continuous second partial derivatives on a set D ⊆ Rn . On the other hand, let b∗ be an interior
point of D that is a critical point of f (b). If H f (b∗) is indefinite, then there exist vectors u, v
such that
By continuity of the second partial derivatives, there exists an an ε > 0 such that
Then U 0 (0) = V 0 (0) = 0, whereas U 00 (0) > 0 and V 00 (0) < 0. Thus, t = 0 is a strict critical
local minimum of U(t), and a strict local maximum of V (t) [21].
Definition 2. A saddle point for f (b) is critical point b∗ such that there are vectors u, v for
which t = 0 is a strict local minimum for U(t) = f (b∗ + tu) and strict local maximum for
V (t) = f (b∗ + tv).
Theorem 2. Sufficient condition for a local minimum. Suppose that bs is a stationary point
and the continuous second partial derivative of function ϕ(bs ) is positive definite. Then bs is
a local minimum.
We can say if Hessian H = ∇2 ϕ(bs ) is negative definite, then bs is a local maximum. If Hessian
∇2 ϕ(bs ) is indefinite that has both positive and negative eigenvalues, then the stationary point
bs is a saddle point [21].
Depending on the theorems that are discussed above, there are two necessary conditions to
obtain an optimal solution. First order necessary condition for b∗ to be a local minimum of
ϕ(b) is that b∗ is a stationary point that it satisfies
14
The second-order sufficient condition if b is a critical point of f (b) that is twice continuously
differentiable f (b) : Rn → Rm , and if the Hessian of f at b is positive definite, then f has a
local minimum at b. In other words, in any direction away from b, the value of f increase at
the first perhaps. Thus
m
∇2 ϕ(b) = J(b)T J(b) + ∑ fi (b)∇2 fi (b)
i=1
where ∇2 f (b) is also positive definite. The first-order and often dominant term J(b)T J(b) of
the Hessian contains only the Jacobian matrix J(b), i.e., only first derivatives. The computa-
2
tional cost for storing the mn2 second derivatives ∂∂ b fi∂(b) might be quite high. In the second
k bl
derivatives are multiplied by the residuals. If the mathematical model is adequate then the
residuals will be small near the solution and the second term will be less important. In this
case the model can be considered an important part of Hessian that includes Jacobian matrix
and the second term can be ignored.
All these notations provide for the model function of non-linear least squares plus its partial
derivatives with respect for each parameter and algorithms such as Levenberg-Marqruadt and
Gauss-Newton algorithms construct all the necessary structures by themselves:
• J(b) = ∇ f (b)- Jacobian matrix of first order derivatives of the residuals with respect to
each parameter and every measurement. It approximates design matrix of the non-linear
squares model.
• g(b) = ∇ϕ(b)- gradient vector of first order derivatives of the objective function with
respect to the mathematical model parameters. It describes slopes of the objective func-
tion ∇ϕ(b) at some points b.
• H(b) = ∇2 ϕ(b)- Hessian matrix of second order partial derivatives of the objective
function with respect to every combination of parameters.
15
2.3 Convergence
One of the useful things that we will discuss is the rate of convergence in various numerical
methods. At this point, one of the most important criteria to consider is the speed or order of
convergence.
Definition 3. A convergent sequence {bk } with lim {bk } = b∗ and bk 6= b∗ is said to have
k→∞
order of convergence equal to p, if
|ek+1 | |bk+1 − b∗ |
lim = lim =C
k→∞ |ek | k→∞ |bk − b∗ | p
Here p ≥ 1 is called the order of convergence, the constant C is the rate of convergence or
asymptotic error constant. Then it is said that bk converges to b of order p with a constant C.
We use the following to demonstrate how rapidly the error ek = bk − b∗ converges to zero.
We consider the following cases for three different orders of convergence with rate p
|ek+1 |
→0 f or k → ∞
|ek |
The value of p measures how speed a converge of the sequence. Then the bigger value of p is,
the faster is the sequence convergence. In the case of numerical approaches, the approximate
solutions sequence converges to the root. In the case where the iterative approach convergence
is faster, then a solution could be reached in a smaller number of iterations compared to another
approach of a slower convergence. We will discuss the specific convergence properties of some
methods in the following chapter.
16
Chapter 3
Newton’s method is a root finding algorithm that uses the first few terms of the Taylor series
of the function f (b) in the surrounding of suspected root. We will derive this method by
providing local stationary points for function f (b) which satisfies ∇ϕ(b) = 0. Assuming
the approximation of the function ϕ with its second-order Taylor expansion about the point
bk+1 = bk + h is given by:
1
ϕ(bk + h) ≈ ϕ(bk ) + g(bk )T h + hT H(bk )h
2
where the gradient is the vector g(bk ) = ∇ϕ(bk ) and the Hessian is the symmetric matrix
∂2 f ∂2 f
∂ b21
... ∂ b1 ∂ bN
H(bk ) = .. ... ..
. .
∂2 f ∂2 f
∂ b1 ∂ bN ... ∂ bN
For finding the minimum of ϕ(bk + h) in h can give us a new direction towards a local station-
ary point b∗ . When the Hessian matrix H(bk ) is a positive definite,the minimum is the solution
of ∇ϕ(bk + h) is equal zero. Hence, we want to solve linear equation system
where h is the Newton step the solution of the symmetric linear system
17
This gives also the iterative update
bk+1 = bk + h
= bk − (∇2 ϕ(bk ))−1 ∇ϕ(bk )
= bk − H(bk )−1 g(bk )
= bk − (J(bk )T J(bk ) + G(bk ))−1 J(bk )T f (bk )
Where G(bk ) denotes the matrix
m
G(bk ) = ∑ fi (bk )∇2 fi (bk ).
i=1
The resulting iterative algorithm can be also written
bk+1 = bk + h = bk − J(bk )−1 ϕ(bk ) (3.2)
In general the inverse Jacobian matrix need not be computed. Now, we will highlight the most
important techniques of Newton’s approach [4]:
• Newton method is quite efficient in the final phase of the iteration, where b is close to
b∗ .
• The Newton’s method is quadratically convergent to a local minimum b∗ as long as
Hessian H(b∗ ) is positive definite. On the other hand if H(bk ) is negative definite every-
where the point b will be in a region and the basic Newton’s approach would converge
quadratically towards stationary point b∗ then this point b∗ maximizer.
• It is better to execute a line search αk which guarantees global convergence.
bk+1 = bk − αk Hk−1 gk
We can construct a hybrid method which based on Newton method and the steepest descent
method. The solution hk = −H −1 (b)k g(bk ) is guaranteed to be downhill direction with pro-
vided that Hk is positive definite. Then it is possible to sketch the central section of this hybrid
algorithm as
i f ∇2 ϕ(b) is positive de f inite
h : = hk
else
h : = hsd
b : = b + αh
Where, hsd is the steepest descent direction and α is obtained by line search. The hybrid
methods can be very efficient, but they are hardly ever used because they require computing
∇2 ϕ(b), and for complex application problems this is not available. There are other methods
to solve such this problem, according to a series of matrices H ∗ = ∇2 ϕ(b∗ ) [4].
18
3.1.1 Line search
As in the case of solving a non-linear system Newton’s method needs to be modified when the
initial value b0 is not close to minimizer. Either a line search can be included or a trust region
technique used. In a line search we take the new iterate to be
bk+1 = bk + αk dk , α ≥0 (3.3)
Here dk is a search direction and step length αk > 0 chosen so that ϕ(bk+1 ) < ϕ(bk ) which is
equivalent to ϕ(α) < ϕ(0).
f (α) = ϕ(bk + αk dk ) (3.4)
So our dk being a descent direction ensures that f 0 (0) = (dk )T g(bk ) < 0, where g(bk ) is gradi-
ent at bk . It is usually not efficient to determine an accurate minimizer. Rather it is demanded
that αk satisfy the two conditions
1 αk should be increased when the value of αk is quite small that that the gain in value of
the objective function is so small.
2 We must decrease αk in order to satisfy the descending condition ϕ(bk+1 ) < ϕ(bk ) when
αk is too large,
We observe that the Newton step hk is not a descent direction if g(bk )T H(bk )−1 g(bk ) ≤ 0. The
one of reasons to admit the gradient as an alternative search direction since there is a risk that
the Newton direction will lead to a saddle point.
We are interested here in analyzing the convergence of Newton’s approach. We can say that
the Newton’s algorithm converges quadratically if the approximation bk is sufficiently close to
a root b∗ at which the Jacobian is non-singular.
Theorem 3. Suppose f : Rn → Rn is continuously differentiable and f (b∗ ) = 0. If
19
2 J is Lipschitz continuous on a neighborhood of b∗ .
Then, for all b(0) sufficiently close to b∗ , Newton’s algorithm produces a sequence b(1) , b(2) , ...,
that converges quadratically to b∗ . The proof of this theorem is described in [3].
Lipschitz is a technical condition that is more robust than the mere continuity of the Jacobian J
but weaker than the condition that the function f (b) be twice continuously differentiable.[3][9]
The biggest drawback to Newton’s approach is that J(b) and its inverse must be computed for
every one of the iterations. Computing each of the Jacobian matrix and its inverse could be
relatively hard and time consuming depending on the system size. Newton’s approach requires
solving several of linear systems that may turn out to be complex when there are a number of
variables. It converges rapidly in the case where the Jacobian J(b∗ ) is well-conditioned, in the
opposite case it may blow up.
Trust region approaches were initially developed for non-linear least-squares problems [11]
and are of a varying kind compared to general descent approaches. The fundamental concept
of trust-region methods is initially to decide the step size for each sub-problem, thereafter
an optimizing for the ultimate orientation. The radius of ∆k of the trust-region is defined by
step size, when function ϕ(b) is twice continuously differentiable (the second-order Taylor
expansion) and it is also believed to be behave similarly to the original formula. Within radius
∆k of maximal step size the optimal orientation is calculated with respect the approximate
formula ϕ(bk ) in other words.
20
Consequently the quadratic model function mk used at each iterate bk is
1
mk (bk + h) ' mk (h) = ϕ(bk ) + g(bk )T h + hT H(bk )h. (3.8)
2
At each iteration, we search for a solution hk of the sub-problem based on the quadratic model
Eq.(3.7) subject to some trusted region. Assuming to know a positive number ∆k in a way that
the model is sufficiently precise inside a ball with radius ∆k , centered at bk , and determine the
step as
1
minimize mk (h) = ϕk + hT gk + hT Hk h (3.9)
h∈Rn 2
subject to k h k2 ≤ ∆k
Since mk (h) is supposed to be a good approximation to ϕ(bk + h) for h sufficiently small, one
of the reasons why the step got failed is that h was too large, and should be reduced. Moreover,
if the step is accepted, it may be possible to use a larger step from the new iterate and that way
reduce the number of steps needed before b∗ is reached.
A basic element in a trust-region algorithm is the process of selecting the trust-region radius
∆k at every one of the iterations. The quality of the model with the computed step can be
evaluated by the so-called gain ratio
ϕ(bk ) − ϕ(bk + hk )
ρk = (3.10)
mk (0) − mk (hk )
which gives the ratio between the actual reduction and expected reduction. The actual decrease
of the objective function is presented by the actual reduction for the trial step hk . The predicted
reduction in the denominator of Equation (3.10) is the reduction that was expected by the
model function mk . The option of ∆k is at least partially decided by the ratio ρk at previous
iterations. By construction the predicted reduction should always be positive. If gain ratio ρk
is negative, the new objective value ϕ(bk + hk ) is bigger than the present value ϕ(bk ), thus the
stage has to be discarded. If gain ratio ρk is approximate to one, there’s an efficient agreement
between the model function mk and the function over the step, therefore it is safe expanding
the trust area for the upcoming iteration. In the case that gain ratio ρk is positive but extremely
smaller than one, the trust region is not updated, but when it’s approximate to zero or if it is
negative, the trust region ∆k is shrank at next iteration.
The approximate function mk (b) can be minimized via a variety of approaches. With a trust
region method the step length is monitored by the size of the radius ∆k . As in the situation
21
of line searching, the exact optimal solution is not necessarily needed. An uncomplicated
approach is minimizing the linear approximation
minkhk<∆k {ϕ(bk ) + g(bk )T h}
Its solution is the steepest descent orientation
−g(bk )
h=
k g(bk ) k
where one only must minimize the step length determined to be less than the trust radius.
For the sake of controlling that the process is performed well, a check is performed to deter-
mine whether the trust radius is adequate. Therefore, the expect edreduction mk (bk ) − mk (bk +
hk ) and the actual reduction ϕ(bk ) − ϕ(bk + hk ) are compared. The ratio between the actual
reduction and the expected one ρk = Ared k
Predk plays a very significant part in the algorithm. This
ratio is to determine if the trial step is agreeable and alter the radius of the new trust region
[14].
s.t k h k≤ ∆k
22
3.2.3 The sub-problem of the trust-region
One of the most significant parts of the trust region algorithms are the trust region sub-
problems. Since every one of the iterations of a trust region algorithm demands to solve
exactly or inexactly a trust region sub-problem, finding an efficient solver for trust-region
problems is very important. We will consider sub-problem (3.9) which has been studied by
many authors. At iteration k of a trust-region approach, the following sub-problem must be
resolved:
1
minimize mk (h) = ϕk + hT gk + hT Hk h
h∈Rn 2
subject to k h k2 ≤ ∆k
It can be shown that the solution h∗ of this constrained problem is the solution of the linear
equation system
(Hk + λ I)h∗ = −gk (3.11)
where gk ∈ Rn , Hk ∈ Rn×n is a symmetric matrix, and ∆k > 0 if and only if there exists λ ≥ 0
such that (Hk + λ I) is positive semi-definite, k h∗ k2 ≤ ∆k and λ (∆k − k h∗ k2 ) = 0. Note
that if Hk = ∇2 ϕ(bk ) is positive definite and ∆k big enough, the solution of the trust-region
sub-problem is the solution of
∇2 ϕ(bk )h = −∇ϕ(bk )
23
3.3 The approach of Gauss-Newton
This approach depends on a linear approximation to the component of ϕ(b) in the neighbour-
hood of b. The idea of the approach is the approximation of the Hessian matrix H(b) by the
first part J T (bk )J(bk ). It is used for solving the non-linear least squares problems and could
only be used to minimize a sum of squares objective function.
This method uses that the approximation Q(bk ) = 0 and determines the search direction as the
solution of the Newton equations
∇2 ϕ(bk )hN = −∇ϕ(bk )
with the Gauss-Newton method approximates the gradient and Hessian respectively as
g(bk ) = J T (bk ) f (bk ) (3.13)
H(bk ) ≈ J T (bk )J(bk ) (3.14)
The resulting method is referred to as the Gauss-Newton approach, where the computation of
the search orientation hGN covers the solution of the linear system
(J(bk )T J(bk ))hGN = −J(bk )T f (bk ) (3.15)
24
Note that J(bk )T J(bk ) is always at least positive semi-definite. While J(bk ) has full rank
(determinant of Jacobian matrix must not be zero)and the gradient g(bk ) is nonzero, the Gauss-
Newton search direction is a descent direction and thence an appropriate direction for a line
search and this case is actually the normal formulas for the linear least squares problem
Otherwise J(bk )T J(bk ) is non-invertible and the equation does not have a unique solution.
The difference between Newton’s approach and Gauss-Newton approach are the search direc-
tions
H(b)hN = −g(b)
H(h)hGN = −g(0)
which is a descent direction as we mentioned before that JkT Jk is positive semi-definite matrix.
It may be illustrated that for a wide range of instances, taking the final step sizes as one step
results in convergence.
25
Sufficient condition for convergence of the GN-approach is known if the normal formulas
for the linearized least squares problem (3.16) are solved exactly in Step 1.1 at every one of
the iterations. The approach with line searching αk may seem to have ensured convergence,
provided that first there exists b∗ ∈ Rn such that J T (b∗ ) f (b∗ ) = 0 and second, the Jacobian
matrix J(b∗ ) at b∗ has full rank n.
We will introduce the notation υ(A) to denote the spectral radius of an n × n matrix A, and
define
ρ = υ((J(b∗ )T J(b∗ ))−1 Q(b∗ )) (3.19)
The following theorem on local convergence of the Gauss-Newton approach then holds.
Theorem 4. Let the first and second assumptions hold. If ρ < 1, then the Gauss-Newton iter-
ation converges locally to b∗ ; that is, there exists ε > 0 such that the sequence {bk } generated
by the Gauss-Newton algorithm converges to b∗ for all b0 ∈ D ≡ {b| k b − b∗ k< ε}.
26
Thus to find the search direction hLM is defined by the following modification to (3.15),
which it can be written on other way that we used the original algorithm of approximation
Hk ≈ JkT Jk to solve (3.22) for different values of damping parameter λ
This approach might also be seen as Gauss-Newton with the use of a trust region process. The
problem is equivalent to solve the model of function mk (hLM ) (3.10) which is the trust region
equation using the approximation of Hessian, gradient of ϕ(bk )
1
minkhLM k≤∆ ϕ(bk ) + ∇ϕ(bk )hLM + hLM H(bk )hLM (3.23)
2
and the iteration step itself is bk+1 = bk + hLM . We perform a trust region strategy instead of
line search technique where the norm of the solution to (3.22), then we must solve at each
iteration
min k J(bk )hLM + f (bk ) k22 subject to k hLM k ≤ ∆k (3.24)
where the trust region radius ∆k is bigger than zero which producing a spherical trust region.
Therefore, the Levenberg-Marquardt approach might be considered as a trust region approach.
In order to compute the step in Levenberg-Marquardt’s method is implemented as:
hLM
k = minhLM {k J(bk )h
LM
+ f (bk ) k22 +λk k hLM k22 } (3.25)
where λk > 0 is called Lagrange parameter for the constraint at the kth iteration and being
updated from iteration to iteration. Thus,the hLM is calculated as the normal equations for the
damped linear least-squares problems [7].
2
1
J(bk ) LM − f (bk )
minhLM
h −
(3.26)
2 λk I 0
2
27
For ill-conditioned Jacobian (condition number is too lager or infinite), this method makes
more robust variation of Gauss-Newton. The key strategy statement of the Levenberg-Marquardt
method is how to choose and update the damping λk at each iteration.
Analogous to the strategy of trust-region method, the ratio gain ρ(hLM )k (3.12) which is a
comparison of the actual reduction of the objective function in the numerator and the expected
reduction of the quadratic model in the denominator. The ratio gain is using to control and
update the damping parameter λk in the Levenberg-Marquardt method.
T 1 T
m(0) − m(hLM ) = f (b) − f (b) − hLM J T f (b) − hLM J T hLM
2
T 1 T
= −(hLM J T f (b) + hLM J T JhLM )
2
T 1 T
≈ −hLM (J T (y − µ(b)) + hLM J T JhLM )
2
1 LMT
≈− h (2J T (y − µ(b)) + (J T J + λ I − λ I)hLM )
2
1 T
≈ − hLM (J T (y − µ(b)) − λ diag(J T J)hLM )
2
1 LMT
≈ h (λ diag(J T J)hLM − J T (y − µ(b)))
2
28
T
The both hLM and hLM are positive, so m(0) − m(hLM ) is guaranteed to be positive. Finally
we get
In a damped method a small value of indicates that we must increase the damping factor and
thereby increasing the penalty on large steps.
• If ρ(hLM ) has large values that m(hLM ) is good approximation to ϕ(b + hLM ), and the
damping parameter λ can be decreased near 0 in a way that the following Levenberg-
Marquardt step is closer to an approximate Gauss-Newton step.
(
bk + hLM
k , if ρk > η0
bk+1 =
bk , otherwise
29
Step 4. Choose λk+1 as
4λk ,
if ρk < η1 ,
λk+1 = λk , if ρk ∈ [η1 , η2 ],
λk
max{ 4 , m}, if ρk > η2 ;
k := k + 1; go to step 2.
From the algorithm we can compute the step of hLM k for the Levenberg-Marquardt
method
−1
hLM T T
k = −(J(bk ) J(bk ) + λk I) J(bk ) f (bk ) (3.28)
30
Chapter 4
The power-exponential model µ(b, x) that will analyze and simulate of the accuracy of fitting
power-exponential function to a certain set of points and focus on the specific of this model
for curve fitting. We may define our model
∂µ
• = (x · e(1−x) )b2 which is independent of parameter b1 .
∂ b1
∂µ
• = µ(b1 , b2 , x) · (ln(x) + (1 − x)) = µ(b1 , b2 , x) · log(x · e(1−x) ) which depends on b2
∂ b2
In the case where least squares were utilized for selecting the parameters, the error associated
to say (b1 e(1−x) )b2 by
m=9
minimize µ(b1 , b2 ) =
b1 ,b2
∑ (yi − (b1 · e(1−xi))b2 )2 (4.2)
i=1
This is just m times variance of the data set {y1 − (b1 · e(1−x1 ) )b2 }, ..., {yi − (b1 · e(1−xi ) )b2 }. It
makes no difference whether or not we examine the variance or m times the variance as our
31
error, and observe that the error is a function of two variables. Thus the ith function would be
which means that it would be the residual for the ith data point. The aim is to find values of
two variables b1 and b2 that minimize the error residuals. The majority of the least squares
problems are of this type, where the function µi (b) are residuals and the index i represents
the specific data point. This is a technique where least squares problems are distinct. Those
problems typically include some assumptions concerning the error in the model. For instance,
there might be
yi = (b1 · e(1−xi ) )b2 + εi (4.4)
where the errors i are thought to spring from one probability distribution often the normal
distribution. In association with this structure are the real parameters b1 and b2 , however every
time we collect data and solve the least-squares problem the results include only estimates
bˆ1 and bˆ2 of those real parameters. After the computation of those estimates, they will be
compared with two common methods the Gauss-Newton and Levenberg-Marquardt methods
for non-linear-least squares problems by using their algorithms.
This is a consequence of the particular structuring of the Hessian matrix ∇2 ϕ(b) for the least-
squares objective function. The Hessian H(b) in this case is the sum of two terms. The first
one is only involved with the gradients of the power-exponential function µi and therefore it
is easier to calculate. The second involves the second derivatives, but is zero if the errors εi
are all zero in case that the model perfectly fits the data. It is trying to approximate the second
term (2.5) in the Hessian, and several other approaches for least-squares do it.
To compute the gradient and Hessian of this model with data points
32
The formula for the least-squares objective function is
1 9 1
µ(b1 , b2 , xi ) = ∑ (yi − (b1 · e(1−xi ) )b2 )2 = f (b)T f (b).
2 i=1 2
The gradient of µ is
m=9
+ ∑ fi (b)∇2 fi (b)
i=1
We observe that {xi } and {yi } are the data values of model, while b1 and b2 are the variables
in the model. As we mentioned before if fi (b∗ ) is equal zero then it is reasonable to expect
that f (b) ≈ 0 for b ≈ b∗ , implying that
This final formula only covers the first derivatives of the functions { fi (b)} and proposed that
an approximation to the Hessian matrix can be found using only first derivatives, at least in
cases where the model is a good fit to the data.
One of the simplest methods of analyzing data is the Gauss-Newton method that uses approx-
imation to the Hessian matrix directly. It computes a search direction using the formula for
Newton’s method
∇2 ϕ(b)hGN = −∇ϕ(b) (4.5)
when f (b∗ ) in Eq(3.15) is equal zero and ∇ϕ(b∗ ) is full rank, the Gauss-Newton method be-
haves like Newton’s method near the solution, but without the costs associated with computing
33
second derivatives. Now, we apply the Gauss-Newton method to a power exponential model
of the form
yi = µ(xi ; a, b) + εi
yi = a · (xi · e(1−xi ) )b + εi
with data
where the εi are measurement errors on the data ordinates, assumed to have like noise. We
apply the GN method without a line search using an initial guess that is close to the solution:
a = 0.53
b = 0.8137
and thus
GN −0.0004775
h =
−0.0000036
and the new estimate of the solution is
0.5273
0.8052
Since ϕ ≈ 0 , an approximate global solution has been found to the least-squares problem.
The least squares objective function can not be negative.
34
Figure 4.1: The power-exponential function of µ(a, b, xi ) in GN-algorithm
In general, the GN method is only assured to find a local solution when we apply a new initial
guess to get close solution and consequently the Gauss-Newton method converges slowly.
The purpose of nonlinear regression is to minimize the least squares problem, k f (b) k2 =
(y − µ(a, b, xi )). We add all the squares of all the entries in f (b) so the result is for residual
k f (b) k= 0.0246.
0.0053
−0.0151
0.0042
0.0122
f (b) =
0.0015
−0.0005
−0.0093
0.0084
−0.0048
Thus (yi − µ(a, b, xi )) = 0.00060653, shows that the model has become quite accurate. The
advantage over Newton method is that we do not need to calculate the second-order derivative
of Hessian matrix. However, if any residual component fi (b∗) is large, the approximation of
second-order derivative in Hessian matrix is equal zero will be poor, and the Gauss-Newton
method will converge slower than the Newton method.
35
There is another important method called the Levenberg-Marquardt method that can also be
used to minimize this least squares problem. This method involves both steepest descent
and Gauss-Newton iteration. We can use a steepest descent type method until we approach
a minimum, then gradually switch to the quadratic rule. We can test to guess how close
we are to a minimum by how our error is changing. In particular, Levenberg’s algorithm is
formalized as follows damping parameter λ which will determine the blend between steepest
descent and Gauss-Newton iteration. Levenberg-Marquardt proposed an algorithm based on
this observation, whose update rule is a mix of the above-mentioned algorithms using the
modified Hessian matrix.
H(b, λ ) = HGN + λ I (4.6)
bi+1 = bi − (HGN + λ I)−1 ∇ϕ(bi ) (4.7)
Evaluate the new residuals error at the point given Eq.(4.7) and compute the cost at the new
point, ϕnew . This update rule is used as follows: if the residuals error goes down following
update, it implies that our quadratic assumption on ϕ(b) is working and when damping pa-
rameter λ is small, Hessian H approximates the Gauss-Newton steps. However, when λ is
large, and Hessian H is close to the identity, this causes steepest-descent steps to be taken.
On the other hand, if the error increases, we would like to follow the gradient more and so
damping parameter λ is increased by the same factor.
36
Figure 4.2: The power-exponential function of µ(a, b, xi )
and (yi − µ(a, b, xi )) = 0.00060611 if the error has decreased as a result the update, then
regress the step and decrease the damping parameter λ by a factor of 10 or 0.1. The above
algorithm has the disadvantage that if the value of damping parameter λ has a big value, the
computed Hessian matrix is not used at all.
37
Chapter 5
This chapter illustrates in detail the analysis of two different data sets by using two methods
for non-linear least squares techniques for curve fitting. After that, we will also compare the
two methods for each data set separately to observe the difference and between the methods
and determine which gives the best results.
Mortality (death rate) data indicate numbers of death by place, time and cause. The fact that
mortality is increasing strongly with increasing age is anything but surprising. But it is to be
worthwhile to look closer to how mortality changes due to age. There are patterns that might
still surprise a little. Death risk is an easy way to put mortality figures in relation to population
[15].
This data on the death rate for men in the USA between 1995 and 2004. Since it is very
unlikely that people die at a young age it is easier to see the structure of the data when we look
at the logarithm of the values instead of the values themselves [16].
38
Figure 5.1: Death risk after ages for men in USA between 1995 and 2004.
where the function of µ is defined as death rate and the variable x is defined as ages for each
year. In this scenario, a mathematical model is arranged using experimental data points, and
smooth curve given by theoretical equation endeavours to be fitted to the data. By solving the
system of non-linear equations, we obtain the best estimates of the variables c1 , c2 , a1 , a2 , a3
of the function µ in a theoretical model (5.1). We are then able to plot this function along with
the data points and look how well these data points fit the theoretical equation.
Here, we will examine the two terms of power-exponential formula for death risk by using the
first main method is Gauss-Newton algorithm to attempt to fit data collected by the mortality
on death rate of USA between 1995 and 2004 to this theoretical model (5.1). We expect the
experimental data to nearly follow the theoretical model (5.1) for mortality data. In order to
do this, using the Gauss-Newton method, we must first calculate the partial derivatives for the
Jacobian and then calculate the Hessian matrix. Here, fi is given by the equation:
fi = yi − µ(xi ; c1 , c2 , a1 , a2 , a3 ) (5.2)
39
Thus our equation for the partial derivatives for the Jacobian matrix are given by:
∂f e(c2 ·xi )
=
∂ c1 xi
∂f
= c1 · e(c2 ·xi )
∂ c2
∂f
= −e(a3 −a1 ) · (a2 · xi · e(−a2 ·xi ) )a3
∂ a1
∂f e(a3 −a1 ) · (a2 · xi · e(−a2 ·xi ) )a3 · a3 · (a2 · xi − 1)
=−
∂ a2 a2
∂f
= e(a3 −a1 ) · (a2 · xi · e(−a2 ·xi ) )a3 · (ln(a2 · xi · e(−a2 ·xi ) ) + 1)
∂ a3
where xi is ages for each year as we mentioned before, yi is the mortality at that year, and
c1 , c2 , a1 , a2 , a3 are the initial mortality and rate of death. Once the partial derivatives for the
Jacobian have been calculated, we can proceed with the Gauss-Newton algorithm thereafter
we compute our residuals from year 1995 to 2004 to see the differences that can appear in the
graph. The code used for this procedure can be found in appendix A.2.
8.9810
7.3979
4.5882
4.5544
3.4147 −8
1.7581 · 10
fi =
2.5582
2.5465
2.6603
2.6603
Implementation of the Gauss-Newton method usually executes a line search in the search
direction hGN , we satisfy the step length condition (3.18) as those discussed in Chapter 3.
40
(a) year 1995. (b) year 1998.
Figure 5.2: Death risk for men in USA for some years between 1995 and 2004 using Gauss-Newton
algorithm for x denotes age and y death risk.
41
We can see from Figure 5.2 that the curve fits the data quite well. We will now try the more
sophisticated L-M method and see if we can get a better fit.
The LM algorithm demands an initial guess for the parameters to be estimated. We chose the
values for each different variables c1 , c2 , a1 , a2 , and a3 for the initial guess on death rate of
USA between 1995 and 2004.
42
(a) year 1995. (b) year 1998.
Figure 5.3: Death risk for men in USA for some years between 1995 and 2004 using Levenberg
Marquardt algorithm for x denotes age and y death risk.
43
Here we notice that the figures of LM algorithm Figure 5.3 got fit well but not as Gauss-
Newton algorithm.
Table 5.1: The result of residuals least squares ×10−8 using µ(c1 , c2 , a1 , a2 , a3 ; x).
In Table 5.1 shows the final result of error residuals that are sums of squares errors for GN
and LM algorithms on mortality data. We explored that the Levenberg-Marquardt method has
gotten a good fit in year 1995 while the Gauss-Newton method has also gotten a good fit curve
in year 2000. So we can say the GN algorithm fits well in some years and LM algorithm as
well.
Here we will fit a model based on power-exponential functions to measured data for rocket
triggered lightning return strokes. This model could then be used to calculate electric and
magnetic field using techniques similar to the ones in [17].
44
Figure 5.4: Equipment for measuring rocket-triggered return strokes, image originally ap-
peared in[17]
Here we will study with a different linear combination of power exponential function as we
did with model previously:
µ(a, b, c, x) = (p − a) · (x · e(1−x) )b + a · (x · e(1−x) )c (5.3)
where the mathematical model of µ is defined as rocket trigged return strokes 8725,8726 and
8705 with a different fixed value for the peak p as it can be seen in Table 5.2. The variable
x is defined as time rescaled so that the peak happens at x = 1 and three different variables a,
b and c in the equation for the power exponential given above, and represent as initial guess
as well. As a final step of application of non-linear least squares for fitting curve, we try to
find the best fit for a set of data with this function in Appendix A.3, shown above Eq.(5.3). As
always, the code requires that we first solve, analytically, for the partial derivatives demands
for the Jacobian. The mathematical model to be minimized, fi , is given by:
fi = yi − µ(xi ; a, b, c) (5.4)
where fi is residual at the particular current waveform of rocket-trigged stroke value, and yi is
represented as an electric field. As mentioned before, xi is time and the valued of parameters
45
a, b, c for initial guess. Since the Gauss-Newton and Levenberg-Marquardt methods call the
Jacobian of fi , the three partial derivatives with respect to the three variables a, b and c must
be calculated and entered into the function d f in MATLAB code refer to Appendix A.3. The
three partial derivatives are shown below for each stroke 8725, 8726 and 8705 respectively,
according to the values of peak:
∂f
= −(x · e(1−x) )b + ((x · e(1−x) )c )
∂a
∂f
= (p − a) · (log(x) − x + 1) · (x · e(1−x) )b
∂b
∂f
= a · (log(x) − x + 1) · (x · e(1−x) )c
∂c
Using these three partial derivatives, the Jacobian of fi and its transpose can be calculated,
that way allowing us to apply both algorithms Gauss-Newton and Levenberg-Marquardt. Now
that the three variables a, b, c in the function have been determined, it is possible to plot the
mathematical model along the measured experimental data points to see if the formulas is
actually a good fit for data using both GN and LM algorithms. The graphs below shows these
plots of the mathematical model Eq.(5.3) and collected data points
46
(a) Result for stroke 8725 using the Gauss- (b) Result for stroke 8725 using the Levenberg-
Newton algorithm. Marquardt algorithm.
(c) Result for stroke 8726 using the Gauss- (d) Result for stroke 8726 using the Levenberg-
Newton algorithm. Marquardt algorithm.
(e) Result for Stroke 8705 using the Gauss- (f) Result for stroke 8705 using the Levenberg-
Newton algorithm. Marquardt algorithm.
Figure 5.5: Comparison of results for fitting rocket-triggered lightning return strokes data
using the Gauss-Newton algorithm and the Levenberg-Marquardt algorithm
47
In the plots shown above, we can see that the power exponential data is actually a pretty
good approximation of collected data. Moreover, a good way to test if the fit is truly good
to look at the sum of the squares of the residuals. As can be seen in the table 5.2, the sum
Table 5.2: The result of residuals least squares for each rocket-triggered stroke using formula
fi = yi − µ(xi ; a, b, c).
of the least squares of the residuals has been reduced after implementing the Gauss-Newton
and Levenberg Marquardt algorithms for each stroke and we can also see the big difference of
the least squares errors for each them. The Stroke 8705 got the best fit of data set in rocket-
triggered data for both algorithms while the stroke 8725 was worst fit we got in GN-algorithm
and the stroke 8726 was also a worst fit we got in LM-algorithm. According to the figures, the
Levenberg-Marquardt fitting works better than Gauss-Newton fitting. It means that the best
method for solving non-linear least square fitting is Levenberg-Marquardt which is showing
to be more reliable than other Gauss-Newton method.
48
Chapter 6
The results from this thesis indicate that the non-linear least squares approximations are a
useful tool for analyzing sets of data and finding the parameters that gives the best fit for a
model.
Two different data sets have been compared using Gauss-Newton and Levenberg-Marquardt
methods to see which method is more effective on these sets of data. In the first experiment
we applied both methods to mortality rate data and we can see the final results of residuals
least squares in the Table 5.1. In the year 2000 the Gauss-Newton algorithm showed better
curve fitting while in 1995 the Levenberg-Marquardt algorithm showed better fits.
In the second part of experiment we ran tests using rocket triggered lightning return strokes
data. It was used with three different power exponential functions for each stroke 8725,8726
and 8705. Stroke 8705 showed the best curve fitting while stroke 8726 resulted the worst curve
fitting errors. As shown in the figure 5.2 illustrated the difference results for curve fitting using
both methods. To summarize, Levenberg-Marquardt showed much better results when using
different power exponential functions for each stroke.
We can conclude from these experiments that the efficiency of the methods depend on the
input data sets that is used.
In this thesis, we went through comparisons between two numerical methods of solving non-
linear squares fitting curves. For future research, different methods can be explored such as
Broyden’s method, Hybrid method: LM-Quasi Newton method and Powell’s Dog Leg method.
Furthermore, more examples of application for non-linear least squares fitting algorithm could
be explored. Lastly, the different data sets that we presented could be studied and analyzed
49
with different models then the ones used in this thesis which might result in the extraction
of additional information from data modelled. If it was possible to find a way to predict
what method would work best for a set of data without trying and comparing the results from
different methods, that would also be a useful result.
50
Chapter 7
A summary of the objectives accomplished in this thesis will be presented in this chapter.
The theoretical description of the methods for non-linear least-square fitting presented in this
thesis are collected from many sources and during the writing of the thesis I encountered
several other methods that it there was not enough time to discuss.
I also did have time to fully implement and test all methods that I described in applications so
I chose to focus on two of the methods in order to understand them properly.
I have learned a lot about the methodology used in the theoretical description of mathematics
and how to analyze and present results using a MATLAB program. One of the major chal-
lenges when writing mathematics is to both be specific and rigorous without obscuring the
intuitive aspects of the examined ideas and methods. I feel that I have developed a lot in
this regard. I have also developed my skills in programming a great deal by implementing
the Gauss-Newton and Levenburg-Marquardt method. Discussions with and advice from my
supervisor was also very valuable in this regard.
51
7.3 Objective 3: Critically and Systematically Integrate Knowl-
edge
While working on the thesis I have found many sources apart from those suggested by my
supervisors. I also found that different sources and ways to describe the same thing was useful
in different ways in different parts of the thesis. The sources with the clearest description of
the theory was not always the most helpful ones for implementation and vice versa.
While working on the project I made the majority of the choices related to the content of the
theoretical of the thesis. The applied part was based on suggestions by my supervisors but
the details of how to reach the results were mostly performed independently. Discussions and
refinements of my ideas with the supervisor Karl Lundengård was very valuable and improved
the quality of the final result a lot.
The theory in this thesis are previously known results collected from many different sources
and should be accessible to any reader familiar with linear algebra and calculus. The complex-
ities of the topics is mostly seen in the application of the methods. I have provided figures and
tables to make it easier for the reader to understand the comparisons and several aspects of the
results are discussed. I learned a lot about how to use a computer to present results well using
figures. The algorithms are presented in a way that is independent of a particular programming
language or software so it is not required that the reader is familiar with MATLAB or other
software for scientific computation.
The original part of this thesis is not in the algorithms used or the mathematical theory pre-
sented. Instead it is the models and data that have not been analyzed previously. The analysis
was suggested by my supervisor Karl Lundengård who also supported me during the appli-
cation and presentation of the results. When implementing the algorithms I examined imple-
52
mentations made by others but all code in the appendix is written by myself, though on several
occasions my supervisor help with debugging and refinement was very valuable. I have also
made sure to give clear references the sources that I found most useful or most important.
53
Bibliography
[1] Alfonso C.,Lindsey P., Winnie R. Solving nonlinear least squares problems with Gauss-
Newton and Levenberg-Marquardt Methods. Department of Mathematics Louisiana
State University Baton Rouge, LA and Department of Mathematics University of Mis-
sissippi Oxford, MS, July 6, 2012.
[2] Henri P. Gavin. The Levenberg-Marquardt method for nonlinear least squares curve-
fitting problems. Department of Civil and Environmental Engineering,Duke University,
May 4, 2016.
[3] Åke Björk and Germand Dahlquist. Numerical Methods in Scientific Computing Volume
II. Linköping university and Royal Institute of Technology, 557-580, April 10,2008.
[4] K. Madsen, H.B. Nielsen, O. Tingleff, Methods for non-linear least squares problems,
Informatics and Mathematical Modelling Technical University of Denmark, April 2004.
[5] Marquardt, Donald W. "An algorithm for least-squares estimation of nonlinear pa-
rameters." Journal of the society for Industrial and Applied Mathematics 11.2: 431–
441(1963).
[6] Levenberg, Kenneth "A Method for the Solution of Certain Non-Linear Problems in
Least Squares." The Quarterly of Applied Mathematics. 2: 164-168 (1944)
[7] Hansen, Per Christian, Godela Scherer and V. Pereyra. Least Squares Data Fitting with
Applications, Johns Hopkins University Press (2013).
[8] Pradit Mittrapiyanuruk, A Memo on How to Use the Levenberg-Marquardt Algorithm for
Refining Camera Calibration Parameters, Robot Vision Laboratory, Purdue University,
West Lafayette, IN, USA. Oct 24, 2014.
[10] Raphael Hauser.Line Search Methods for Unconstrained Optimisation, Numerical Linear
Algebra and Optimisation, Oxford University Computing Laboratory, May 2007.
54
[11] Shidong Shan, A Levenberg-Marquardt Method For Large-Scale Bound-Constrained
Nonlinear Least-Squares, Acadia University, July 2008.
[12] Niclas Börlin. Trust-Region and the Levenberg-Marquardt method, 5DA001 Non-linear
Optimization. Department of Computing Science Umeå University, November 22, 2007
[16] Human Mortality Database. University of California, Berkeley (USA), and Max Planck
Institute for Demographic Research (Germany). Available at
http://www.mortality.org or
http://www.humanmortality.de (2017-06-14)
[17] J. C. Willett, J.C. Bailey, V.P.Idone and A. Eybert Berard and L.Barret "Submicrosecond
Intercomparison of Radiation Fields and Currents in Triggered Lightning Return Strokes
Based on the Transmission-Line Model", Journal of Geophysical Research 13,275-
13,286, (1989)
[18] Press, W. H., Flannery, B. P., Teukolsky, S. A., and Vetterling, W. T. Numerical Recipes
in C. Cambridge University Press, New York (1988).
[19] Lawson, C.L., and Hanson, R. Solving Least Squares Problems.Englewood Cliffs, NJ:
Prentice-Hall (1974).
[20] S.Gratton, A.S. Lawless and N.K. Nickols. Approximate Gauss-Newton methods for non-
linear least squares problems, Department of Mathematics. The University of Reading
Berkshire RG66AX,UK (2004).
[21] Jim Lambers. Positive and Negative Definite Matrices and Optimization. Lecture 3 Notes
MAT 419/519, Summer Session (2011)
55
Appendix A
MATLAB Code
% −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− %
% G e n e r a t e power e x p o n e n t i a l f u n c t i o n by %
% i m p l e m e n t i n g Gauss−Newton a l g o r i t h m s %
% −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− %
%G e n e r a t e t h e s y n t h e t i c d a t a
% from t h e c u r v e f u n c t i o n w i t h some a d d i t i o n a l noise
x=(0.1:0.1:0.9) ’;
a= 0 . 5 3 ; %0.53;
b= 0 . 8 1 3 7 ;
rng ( 1 0 ) ;
f u n = a ∗ ( x . ∗ exp (1−x ) ) . ^ b ;
%n o i s e = r a n d ( l e n g t h ( x ) , 1 ) ;
noise =randn ( s i z e ( x ) ) ;
y = fun + 0.01∗ noise ;
%The main c o d e f o r t h e GN a l g o r i t h m f o r
%e s t i m a t i n g a and b from t h e a b o v e d a t a
% Following algorithm 2.2
% Step 0: choose i n i t i a l parameter values ,
%I c h o o s e them d i f f e r e n t from t h e
56
% e x p e c t e d r e s u l t s o t h a t we c a n s e t h a t t h e method works
a0 = 2 ;
b0 = 1 ;
y _ i n i t = a0 ∗ ( x . ∗ exp (1−x ) ) . ^ b0 ;
Ndata = l e n g t h ( y ) ;
Nparams = 2 ; % a and b a r e t h e p a r a m e t e r s t o be e s t i m a t e d
n _ i t i r s = 15;
updateJ = 1;
a _ e s t = a0 ;
b _ e s t =b0 ;
% Step 1: r e p e a t u n t i l convergence
for i t = 1: n _ i t i r s
% Step 1 . 1 : Solve normal e q u a t i o n s
% Evaluate the Jacobian matrix at
%t h e c u r r e n t p a r a m e t e r s ( a _ e s t , b _ e s t )
J = z e r o s ( Ndata , Nparams ) ;
f o r i =1: l e n g t h ( x )
J ( i , : ) = [ ( x ( i ) ∗ exp (1−x ( i ) ) ) . ^ b _ e s t
a _ e s t ∗ ( x ( i ) . ∗ exp (1− x ( i ) ) ) . ^ b _ e s t . ∗ l o g ( x ( i ) . ∗ exp (1−x ( i ) ) ) ] ;
end
% compute t h e a p p r o x i m a t e d H e s s i a n m a t r i x ,
% J ^{ ’} i s t h e t r a n s p o s e of J
H = J ’∗ J ;
% t h e f i r s t i t e r a t i o n : compute t h e t o t a l e r r o r
d = y−y _ e s t ;
i f i t ==1
e= d o t ( d , d ) ;
end
for i = 1:15
% Compute t h e u p d a t e d p a r a m e t e r s
% A c c o r d i n g t o t h e r e p o r t t h e r e s h o u l d be a
57
% minus s i g n h e r e b u t t h e n t h e r e s u l t s a r e a l l wrong
dp= i n v (H) ∗ ( J ’ ∗ d ( : ) ) ;
% Double−c h e c k t h i s !
a_gn = a _ e s t + a_k ∗ dp ( 1 ) ;
b_gn= b _ e s t + a_k ∗ dp ( 2 ) ;
% I f the t o t a l d i s t a n c e e r r o r of
%t h e u p d a t e d p a r a m e t e r s i s l e s s t h a n t h e p r e v i o u s one
% t h e n makes t h e u p d a t e d p a r a m e t e r s t o be t h e c u r r e n t p a r a m e t e r s ,
% o t h e r w i s e make t h e s t e p l e n g t h h a l f a s l o n g
i f e_gn <e
a _ e s t = a_gn ;
b _ e s t =b_gn ;
e= e_gn ;
break
else
a_k = a_k / 2 ;
end
end
i;
i f i == 15
d i s p ( ’ c a n n o t f i n d b e t t e r v a l u e s , p o s s i b l e l o c a l minima ’ )
break
end
end
plot (x , y , ’ r ∗ ’);
h o l d on
plot (x , y_init , ’g ’ ) ;
p l o t ( x , y_est , ’ b ’ ) ;
p l o t ( x , y _ e s t _ g n , ’ ko ’ ) ;
xlabel ( ’x ’)
ylabel ( ’y ’)
g r i d on
t i t l e ( ’ Use GN−a l g o r i t h m ’ )
l e g e n d ( ’ D a t a p o i n t s ’ , ’ With a0 , b0 ’ , ’ With f i t t e d a , b ’ )
58
% −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− %
% G e n e r a t e power e x p o n e n t i a l f u n c t i o n by %
% i m p l e m e n t i n g L e v e n b e r g −M a r q u a r d t a l g o r i t h m s %
%
% −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− %
%G e n e r a t e t h e s y n t h e t i c d a t a from
%t h e c u r v e f u n c t i o n w i t h some a d d i t i o n a l n o i s e
x=(0.1:0.1:0.9) ’;
a =0.53;
b =0.8137;
rng ( 1 0 ) ;
f u n = a ∗ ( x . ∗ exp (1−x ) ) . ^ b ;
noise =randn ( s i z e ( x ) ) ;
y = fun + 0.01∗ noise ;
%%The main c o d e f o r
%t h e LM a l g o r i t h m f o r e s t i m a t i n g
%a and b from t h e a b o v e d a t a
%i n i t i a l guess f o r the parameters
a0 = 2 ;
b0 = 1 ;
y _ i n i t = a0 ∗ ( x . ∗ exp (1−x ) ) . ^ b0 ;
Ndata = l e n g t h ( y ) ;
Nparams = 2 ; % a and b a r e t h e p a r a m e t e r s t o be e s t i m a t e d
n _ i t i r s = 1 5 ; % s e t # o f i t e r a t i o n s f o r t h e LM
updateJ = 1;
a _ e s t = a0 ;
b _ e s t =b0 ;
%S t e p 1 : r e p e a t u n t i l c o n v e r g e n c e
for i t = 1: n _ i t i r s
i f u p d a t e J ==1
% % Evaluate the Jacobian matrix
%a t t h e c u r r e n t p a r a m e t e r s ( a _ e s t , b _ e s t )
J = z e r o s ( Ndata , Nparams ) ;
f o r i =1: l e n g t h ( x )
J ( i , : ) = [ ( x ( i ) ∗ exp (1−x ( i ) ) ) . ^ b _ e s t a _ e s t ∗ ( x ( i ) . ∗ exp (1−
x ( i ) ) ) . ^ b _ e s t ∗ l o g ( x ( i ) ∗ exp (1−x ( i ) ) ) ] ;
end
59
%Compute t h e a p p r o x i m a t e d H e s s i a n m a t r i x ,
%J { ’ } i s t h e t r a n s p o s e o f J
H = J ’∗ J ;
%E v a l u a t e t h e d i s t a n c e e r r o r a t t h e c u r r e n t p a r a m e t e r s
y _ e s t = a _ e s t ∗ ( x . ∗ exp (1−x ) ) . ^ b _ e s t ;
%t h e f i r s t i t e r a t i o n : compute t h e t o t a l e r r o r
d=y−y _ e s t ;
if i t == 1
e = dot (d , d ) ;
end
end
lamda = 0 . 0 1 ; % s e t an i n i t i a l v a l u e o f t h e damping f a c t o r
%f o r t h e LM
for i = 1:15
%Apply t h e damping f a c t o r t o t h e H e s s i a n m a t r i x
H_LM=H+ ( lamda ∗ e y e ( Nparams , Nparams ) ) ;
%Compute t h e u p d a t e d p a r a m e t e r s
h_lm = i n v (H_LM) ∗ ( J ’ ∗ d ( : ) ) ;
a_lm= a _ e s t +h_lm ( 1 ) ;
b_lm= b _ e s t +h_lm ( 2 ) ;
%E v a l u a t e t h e t o t a l d i s t a n c e e r r o r a t
%t h e u p d a t e d p a r a m e t e r s
y _ e s t _ l m = a_lm ∗ ( x . ∗ exp (1−x ) ) . ^ b_lm ;
d_lm=y−y _ e s t _ l m ;
e_lm= d o t ( d_lm , d_lm ) ;
%I f t h e t o t a l d i s t a n c e e r r o r o f
%t h e u p d a t e d p a r a m e t e r s i s l e s s t h a n t h e p r e v i o u s one
%t h e n makes t h e u p d a t e d p a r a m e t e r s t o
%be t h e c u r r e n t p a r a m e t e r s and d e c r e a s e s
%t h e v a l u e o f t h e damping f a c t o r
i f e_lm <e
60
lamda = lamda / 1 0 ;
a _ e s t =a_lm ;
b _ e s t =b_lm ;
e=e_lm ;
updateJ =1;
break
else
updateJ =0;
lamda = lamda ∗ 1 0 ;
end
end
i;
i f i == 15
disp ( ’ cannot find better ’ )
break
end
end
plot (x , y , ’ r ∗ ’);
h o l d on
plot (x , y_init , ’g ’ ) ;
p l o t ( x , y_est , ’ b ’ )
p l o t ( x , y _ e s t _ l m , ’ ko ’ ) ;
xlabel ( ’x ’)
ylabel ( ’y ’)
g r i d on
t i t l e ( ’ Use LM−a l g o r i t h m ’ )
l e g e n d ( ’ D a t a p o i n t s ’ , ’ With a0 , b0 ’ , ’ With f i t t e d a , b ’ )
% −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− %
% G e n e r a t e power e x p o n e n t i a l f u n c t i o n by %
% i m p l e m e n t i n g Gauss−Newton a l g o r i t h m s %
%
% −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− %
%G e n e r a t e t h e s y n t h e t i c d a t a
%from t h e c u r v e f u n c t i o n w i t h some a d d i t i o n a l n o i s e
load ( ’ test_data_usa ’ ) ;
c1 = 0 . 0 0 5 ;
c2 = 0 . 0 0 8 7 ;
61
a1 = 7 ;
a2 = 1 / 2 0 ;
a3 = 2 0 ;
years = 1:40;
x= a g e s ( y e a r s ) ’ ;
y = t e s t _ d a t a _ u s a ( y e a r s , 1 ) ; % Note : T e s t o t h e r c o l u m n s
%The main c o d e f o r t h e GN a l g o r i t h m f o r
%e s t i m a t i n g a and b from t h e a b o v e d a t a
% Following algorithm 2.2
% Step 0: choose i n i t i a l parameter values ,
% I c h o o s e them d i f f e r e n t from t h e
% e x p e c t e d r e s u l t s o t h a t we c a n s e t h a t t h e method works
c10 = 0 . 0 0 0 8 ; %0.04;
c20 = 0 . 1 1 1 ; %0.095;
a10 = 8 . 9 ;
a20 = 1 / 1 7 ;
a30 = 3 0 ;
y _ i n i t = c10 ∗ exp ( c20 ∗x ) . / x
+ exp ( a30−a10 ) ∗ ( a20 ∗x . ∗ exp (− a20 ∗x ) ) . ^ a30 ;
Ndata = l e n g t h ( y ) ;
Nparams = 5 ;
n _ i t i r s = 100;
figure (1)
plot (x , y , ’b ’ ,x , y_init , ’ r ’)
xlim ( [ 0 30 ] )
%%
updateJ = 1;
c 1 _ e s t = c10 ;
c 2 _ e s t = c20 ;
a 1 _ e s t = a10 ;
a 2 _ e s t = a20 ;
a 3 _ e s t = a30 ;
% Step 1: r e p e a t u n t i l convergence
for i t = 1: n _ i t i r s
% Step 1 . 1 : Solve normal e q u a t i o n s
62
% Evaluate the Jacobian matrix
%a t t h e c u r r e n t p a r a m e t e r s
J = z e r o s ( Ndata , Nparams ) ;
f o r i =1: l e n g t h ( x )
J ( i , : ) = [ exp ( c 2 _ e s t ∗x ( i ) ) . / x ( i )
c 1 _ e s t ∗ exp ( c 2 _ e s t ∗x ( i ) )
−exp ( a 3 _ e s t −a 1 _ e s t ) ∗ ( a 2 _ e s t ∗x ( i ) . ∗ exp (− a 2 _ e s t ∗x ( i ) ) ) . ^ a 3 _ e s t
−exp ( a 3 _ e s t −a 1 _ e s t ) ∗ ( a 2 _ e s t ∗x ( i ) . ∗ exp (− a 2 _ e s t ∗x ( i ) ) )
. ^ a 3 _ e s t ∗ ( a 2 _ e s t ∗x ( i ) − 1 ) . / a 2 _ e s t
exp ( a 3 _ e s t −a 1 _ e s t ) ∗ ( a 2 _ e s t ∗x ( i )
. ∗ exp (− a 2 _ e s t ∗ x ( i ) ) ) . ^ a 3 _ e s t ∗ ( l o g ( a 2 _ e s t ∗x ( i ) ∗ exp (− a 2 _ e s t ∗x ( i ) ) ) + 1 ) ] ’ ;
end
% compute t h e a p p r o x i m a t e d H e s s i a n m a t r i x ,
% J { ’} i s t h e t r a n s p o s e of J
H = J ’∗ J ;
% t h e f i r s t i t e r a t i o n : compute t h e t o t a l e r r o r
d = y−y _ e s t ;
i f i t ==1
e= d o t ( d , d ) ;
end
for i = 1:15
% Compute t h e u p d a t e d p a r a m e t e r s
dp= i n v (H) ∗ ( J ’ ∗ d ( : ) ) ; % A c c o r d i n g t o t h e r e p o r t t h e r e s h o u l d be a
% minus s i g n h e r e b u t t h e n t h e r e s u l t s a r e a l l wrong
% Double−c h e c k t h i s !
c1_gn = c 1 _ e s t + a_k ∗ dp ( 1 ) ;
c2_gn = c 2 _ e s t + a_k ∗ dp ( 2 ) ;
a1_gn = a 1 _ e s t + a_k ∗ dp ( 3 ) ;
a2_gn = a 2 _ e s t + a_k ∗ dp ( 4 ) ;
a3_gn = a 3 _ e s t + a_k ∗ dp ( 5 ) ;
63
% Evaluate the t o t a l d i s t a n c e e r r o r at the updated parameters
y _ e s t _ g n = c1_gn ∗ exp ( c2_gn ∗x ) . / x
+ exp ( a3_gn−a1_gn ) ∗ ( a2_gn ∗x . ∗ exp (− a2_gn ∗x ) ) . ^ a3_gn ;
d_gn=y−y _ e s t _ g n ;
e_gn = d o t ( d_gn , d_gn ) ;
% I f the t o t a l d i s t a n c e e r r o r of
% t h e u p d a t e d p a r a m e t e r s i s l e s s t h a n t h e p r e v i o u s one
% t h e n makes t h e u p d a t e d p a r a m e t e r s t o be t h e c u r r e n t p a r a m e t e r s ,
% o t h e r w i s e make t h e s t e p l e n g t h h a l f a s l o n g
i f e_gn <e
c 1 _ e s t = c1_gn ;
c 2 _ e s t = c2_gn ;
a 1 _ e s t = a1_gn ;
a 2 _ e s t = a2_gn ;
a 3 _ e s t = a3_gn ;
e= e_gn ;
break
else
a_k = a_k / 2 ;
end
end
i;
i f i == 15
d i s p ( ’ c a n n o t f i n d b e t t e r v a l u e s , p o s s i b l e l o c a l minima ’ )
break
end
end
figure (2)
plot (x , y , ’ r ∗ ’);
h o l d on
%p l o t ( x , y _ i n i t , ’ g ’ ) ;
p l o t ( x , y_est , ’ b ’ ) ;
p l o t ( x , y _ e s t _ g n , ’ ko ’ ) ;
xlabel ( ’x ’)
ylabel ( ’y ’)
g r i d on
t i t l e ( ’ Use GN−a l g o r i t h m ’ )
l e g e n d ( ’ D a t a p o i n t s ’ , ’ With c10 , c20 , a10 , a20 , a30 ’ ,
’ With f i t t e d c1 , c2 , a1 , a2 , a3 ’ )
% −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− %
64
% G e n e r a t e power e x p o n e n t i a l f u n c t i o n by %
% i m p l e m e n t i n g L e v e n b e r g −M a r q u a r d t a l g o r i t h m s %
%
% −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− %
%G e n e r a t e t h e s y n t h e t i c d a t a
% from t h e c u r v e f u n c t i o n w i t h some a d d i t i o n a l noise
load ( ’ test_data_usa ’ ) ;
c1 = 0 . 0 0 5 ;
c2 = 0 . 0 0 8 7 ;
a1 = 7 ;
a2 = 1 / 2 0 ;
a3 = 2 0 ;
years = 1:40;
x= a g e s ( y e a r s ) ’ ;
y = t e s t _ d a t a _ u s a ( years , 8 ) ;
%The main c o d e f o r t h e GN a l g o r i t h m
%f o r e s t i m a t i n g a and b from t h e a b o v e d a t a
% Following algorithm 2.2
% Step 0: choose i n i t i a l parameter values ,
% I c h o o s e them d i f f e r e n t from t h e
% e x p e c t e d r e s u l t s o t h a t we c a n s e t h a t t h e method works
c10 = 0 . 0 0 0 5 ; %0.04;
c20 = 0 . 1 1 5 ; %0.095;
a10 = 8 . 5 ;
a20 = 1 / 2 0 ;
a30 = 3 0 ;
y _ i n i t = c10 ∗ exp ( c20 ∗x ) . / x + exp ( a30−a10 ) ∗ ( a20 ∗x . ∗ exp (− a20 ∗ x ) ) . ^ a30 ;
%y _ i n i t = exp ( a30−a10 ) ∗ ( a20 ∗x . ∗ exp (− a20 ∗x ) ) . ^ a30 ;
Ndata = l e n g t h ( y ) ;
Nparams = 5 ;
n _ i t i r s = 100;
figure (1)
plot (x , y , ’b ’ ,x , y_init , ’ r ’)
xlim ( [ 0 30 ] )
%%
updateJ = 1;
65
c 1 _ e s t = c10 ;
c 2 _ e s t = c20 ;
a 1 _ e s t = a10 ;
a 2 _ e s t = a20 ;
a 3 _ e s t = a30 ;
% Step 1: r e p e a t u n t i l convergence
for i t = 1: n _ i t i r s
% Step 1 . 1 : Solve normal e q u a t i o n s
% Evaluate the Jacobian matrix
%a t t h e c u r r e n t p a r a m e t e r s ( a _ e s t , b _ e s t )
J = z e r o s ( Ndata , Nparams ) ;
f o r i =1: l e n g t h ( x )
J ( i , : ) = [ exp ( c 2 _ e s t ∗x ( i ) ) . / x ( i )
c 1 _ e s t ∗ exp ( c 2 _ e s t ∗x ( i ) )
−exp ( a 3 _ e s t −a 1 _ e s t ) ∗ ( a 2 _ e s t ∗x ( i ) . ∗ exp (− a 2 _ e s t ∗x ( i ) ) ) . ^ a 3 _ e s t
−exp ( a 3 _ e s t −a 1 _ e s t ) ∗ ( a 2 _ e s t ∗x ( i ) . ∗ exp (− a 2 _ e s t ∗x ( i ) ) )
. ^ a 3 _ e s t ∗ ( a 2 _ e s t ∗x ( i ) − 1 ) . / a 2 _ e s t
exp ( a 3 _ e s t −a 1 _ e s t ) ∗ ( a 2 _ e s t ∗x ( i )
. ∗ exp (− a 2 _ e s t ∗x ( i ) ) ) . ^ a 3 _ e s t ∗ ( l o g ( a 2 _ e s t ∗x ( i ) ∗ exp (− a 2 _ e s t ∗x ( i ) ) ) + 1 ) ] ’
end
% compute t h e a p p r o x i m a t e d H e s s i a n m a t r i x ,
%J { ’ } i s t h e t r a n s p o s e o f J
H = J ’∗ J ;
% t h e f i r s t i t e r a t i o n : compute t h e t o t a l e r r o r
d = y−y _ e s t ;
i f i t ==1
e= d o t ( d , d ) ;
end
lamda = 0 . 1 ; % s e t an i n i t i a l v a l u e o f
%t h e damping f a c t o r f o r t h e LM
for i = 1:15
%Apply t h e damping f a c t o r t o t h e H e s s i a n m a t r i x
H_LM=H+ ( lamda ∗ e y e ( Nparams , Nparams ) ) ;
%Compute t h e u p d a t e d p a r a m e t e r s
h_lm = i n v (H_LM) ∗ ( J ’ ∗ d ( : ) ) ;
c1_lm = c 1 _ e s t +h_lm ( 1 ) ;
66
c2_lm = c 2 _ e s t +h_lm ( 2 ) ;
a1_lm = a 1 _ e s t +h_lm ( 3 ) ;
a2_lm = a 2 _ e s t +h_lm ( 4 ) ;
a3_lm = a 3 _ e s t +h_lm ( 5 ) ;
%E v a l u a t e t h e t o t a l d i s t a n c e e r r o r a t t h e u p d a t e d p a r a m e t e r s
y _ e s t _ l m = c1_lm ∗ exp ( c2_lm ∗x ) . / x
+ exp ( a3_lm−a1_lm ) ∗ ( a2_lm ∗x . ∗ exp (− a2_lm ∗ x ) ) . ^ a3_lm ;
d_lm=y−y _ e s t _ l m ;
e_lm= d o t ( d_lm , d_lm ) ;
% I f the t o t a l d i s t a n c e e r r o r of
%t h e u p d a t e d p a r a m e t e r s i s l e s s t h a n t h e p r e v i o u s one
% t h e n makes t h e u p d a t e d p a r a m e t e r s t o be t h e c u r r e n t p a r a m e t e r s
% and d e c r e a s e s t h e v a l u e o f t h e damping f a c t o r
i f e_lm <e
lamda = lamda / 1 0 ;
c 1 _ e s t = c1_lm ;
c 2 _ e s t = c2_lm ;
a 1 _ e s t = a1_lm ;
a 2 _ e s t = a2_lm ;
a 3 _ e s t = a3_lm ;
e=e_lm ;
updateJ =1;
break
else
updateJ =0;
lamda = lamda ∗ 1 0 ;
end
end
i;
i f i == 15
disp ( ’ cannot find better ’ )
break
end
end
figure (2)
plot (x , y , ’ r ∗ ’);
h o l d on
67
%p l o t ( x , y _ i n i t , ’ g ’ ) ;
p l o t ( x , y_est , ’ b ’ ) ;
p l o t ( x , y _ e s t _ l m , ’ ko ’ ) ;
xlabel ( ’x ’)
ylabel ( ’y ’)
g r i d on
t i t l e ( ’ Use LM−a l g o r i t h m ’ )
l e g e n d ( ’ D a t a p o i n t s ’ , ’ With c10 , c20 , a10 , a20 , a30 ’ ,
’ With f i t t e d c1 , c2 , a1 , a2 , a3 ’ )
% −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− %
% G e n e r a t e power e x p o n e n t i a l f u n c t i o n by %
% i m p l e m e n t i n g Gauss−Newton a l g o r i t h m s %
%
% −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− %
l o a d ( ’ R o c k e t t r i g . mat ’ ) ;
x= S t r o k e 8 7 2 5 ( 1 : 4 0 , 1 ) ;
%x= S t r o k e 8 7 2 6 ( 1 : 4 0 , 1 ) ;
%x= S t r o k e 8 7 0 5 ( 1 : 4 0 , 1 ) ;
x_1 = x / x ( end ) ;
y= S t r o k e 8 7 2 5 ( 1 : 4 0 , 2 ) ;
%y= S t r o k e 8 7 2 6 ( 1 : 4 0 , 2 ) ;
%y= S t r o k e 8 7 0 5 ( 1 : 4 0 , 2 ) ;
%The main c o d e f o r t h e GN a l g o r i t h m
%f o r e s t i m a t i n g a and b from t h e a b o v e d a t a
% Following algorithm 2.2
% Step 0: choose i n i t i a l parameter values ,
%I c h o o s e them d i f f e r e n t from t h e
% e x p e c t e d r e s u l t s o t h a t we c a n s e t h a t t h e method works
a0 = 6 ; %18
b0 = 4 ;
c0 = 1 0 ;
%F o r s t r o k e ( 8 7 2 5 ) w h i t h p e a k = 2 0 ;
y _ i n i t = (20− a0 ) ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ b0
+ a0 ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ c0 ;
%F o r s t r o k e ( 8 7 2 6 ) w h i t h p e a k = 3 5 . 3 ;
68
%y _ i n i t = (35.3 − a0 ) ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ b0
+ a0 ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ c0 ;
%F o r s t r o k e ( 8 7 0 5 ) w h i t h p e a k = 8 ;
%y _ i n i t = (8− a0 ) ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ b0
+ a0 ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ c0 ;
Ndata = l e n g t h ( y ) ;
Nparams = 3 ;
n _ i t i r s = 15;
updateJ = 1;
a _ e s t = a0 ;
b _ e s t =b0 ;
c _ e s t = c0 ;
% Step 1: r e p e a t u n t i l convergence
for i t = 1: n _ i t i r s
% Step 1 . 1 : Solve normal e q u a t i o n s
% Evaluate the Jacobian matrix
%a t t h e c u r r e n t p a r a m e t e r s
J = z e r o s ( Ndata , Nparams ) ;
f o r i = 1 : l e n g t h ( x_1 )
J ( i , : ) = [ − ( x_1 ( i ) . ∗ exp (1− x_1 ( i ) ) )
. ^ b _ e s t + ( x_1 ( i ) . ∗ exp (1− x_1 ( i ) ) ) . ^ c _ e s t
(20− a _ e s t ) ∗ ( l o g ( x_1 ( i )) − x_1 ( i ) + 1 )
∗ ( x_1 ( i ) . ∗ exp (1− x_1 ( i ) ) ) . ^ b _ e s t
a _ e s t ∗ ( l o g ( x_1 ( i )) − x_1 ( i ) + 1 ) ∗ ( x_1 ( i ) . ∗ exp ( x_1 ( i ) − 1 ) ) . ^ c _ e s t ] ’ ;
% computing the Jacobian matrix f o r s t r o k e (8726)
end
69
% compute t h e a p p r o x i m a t e d H e s s i a n m a t r i x , J { ’ }
% i s the t r a n s p o s e of J
H = J ’∗ J ;
%S t r o k e ( 8 7 0 5 )
% y _ e s t = (8− a _ e s t ) ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ b _ e s t
+ a _ e s t ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ c _ e s t ;
% t h e f i r s t i t e r a t i o n : compute t h e t o t a l e r r o r
d = y−y _ e s t ;
i f i t ==1
e= d o t ( d , d ) ;
end
% S t e p 1 . 2 : Choose s t e p l e n g t h a_k s o t h a t t h e r e i s enough d e s c e n t
% Here I s t a r t w i t h a_k = 10 and make i t s m a l l e r i f we do n o t g e t a
% better result .
%I f we h a v e t r i e d making i t s m a l l e r many t i m e s we g i v e up .
a_k = 1 0 0 ;
for i = 1: n _ i t i r s
% Compute t h e u p d a t e d p a r a m e t e r s
dp= i n v (H) ∗ ( J ’ ∗ d ( : ) ) ;
a_gn = a _ e s t + a_k ∗ dp ( 1 ) ;
b_gn= b _ e s t + a_k ∗ dp ( 2 ) ;
c_gn = c _ e s t + a_k ∗ dp ( 3 ) ;
%f o r S t r o k e ( 8 7 2 6 )
% y _ e s t _ g n = (35.3 − a_gn ) ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ b_gn
+ a_gn ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ c_gn ;
%f o r S t r o k e ( 8 7 0 5 )
% y _ e s t _ g n = (8− a_gn ) ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ b_gn
+ a_gn ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ c_gn ;
70
d_gn=y−y _ e s t _ g n ;
e_gn = d o t ( d_gn , d_gn ) ;
% I f the t o t a l d i s t a n c e e r r o r of
%t h e u p d a t e d p a r a m e t e r s i s l e s s t h a n t h e p r e v i o u s one
% t h e n makes t h e u p d a t e d p a r a m e t e r s t o be t h e c u r r e n t p a r a m e t e r s ,
% o t h e r w i s e make t h e s t e p l e n g t h h a l f a s l o n g
i f e_gn <e
a _ e s t = a_gn ;
b _ e s t =b_gn ;
c _ e s t = c_gn ;
e= e_gn ;
break
else
a_k = a_k / 2 ;
end
end
i;
end
a_gn
b_gn
c_gn
e_gn
p l o t ( x_1 , y , ’ r ∗ ’ ) ;
h o l d on
%p l o t ( x , y _ i n i t , ’ g ’ ) ;
p l o t ( x_1 , y _ e s t , ’ b ’ ) ;
p l o t ( x_1 , y _ e s t _ g n , ’ ko ’ ) ;
p l o t ( S t r o k e 8 7 2 5 ( : , 1 ) / x ( end ) , S t r o k e 8 7 2 5 ( : , 2 ) , ’ c ’ ) ;
%p l o t ( S t r o k e 8 7 2 6 ( : , 1 ) / x ( end ) , S t r o k e 8 7 2 6 ( : , 2 ) , ’ c ’ ) ;
%p l o t ( S t r o k e 8 7 0 5 ( : , 1 ) / x ( end ) , S t r o k e 8 7 0 5 ( : , 2 ) , ’ c ’ ) ;
xlabel ( ’x ’)
ylabel ( ’y ’)
g r i d on
t i t l e ( ’ Use GN−a l g o r i t h m ’ )
l e g e n d ( ’ D a t a p o i n t s ’ , ’ With a0 , b0 , c0 ’ ,
’ With f i t t e d a , b , c ’ )
% −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− %
% G e n e r a t e power e x p o n e n t i a l f u n c t i o n by %
% i m p l e m e n t i n g L e v e n b e r g −M a r q u a r d t a l g o r i t h m s %
71
%
% −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− %
%G e n e r a t e t h e s y n t h e t i c d a t a from t h e
%c u r v e f u n c t i o n w i t h some a d d i t i o n a l n o i s e
l o a d ( ’ R o c k e t t r i g . mat ’ ) ;
x= S t r o k e 8 7 2 5 ( 1 : 4 0 , 1 ) ;
%x= S t r o k e 8 7 2 6 ( 1 : 4 0 , 1 ) ;
%x= S t r o k e 8 7 0 5 ( 1 : 4 0 , 1 ) ;
x_1 = x / x ( end ) ;
y= S t r o k e 8 7 2 5 ( 1 : 4 0 , 2 ) ;
%y= S t r o k e 8 7 2 6 ( 1 : 4 0 , 2 ) ;
%y= S t r o k e 8 7 0 5 ( 1 : 4 0 , 2 ) ;
%The main c o d e f o r t h e GN a l g o r i t h m f o r
% e s t i m a t i n g a and b from t h e a b o v e d a t a
% Following algorithm 2.2
% Step 0: choose i n i t i a l parameter values ,
%I c h o o s e them d i f f e r e n t from t h e
% e x p e c t e d r e s u l t s o t h a t we c a n s e t h a t t h e method works
a0 = 6 ; %18;
b0 = 4 ;
c0 = 1 0 ;
%F o r s t r o k e ( 8 7 2 5 ) w h i t h p e a k = 2 0 ;
y _ i n i t = (20− a0 ) ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ b0
+ a0 ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ c0 ;
%F o r s t r o k e ( 8 7 2 6 ) w h i t h p e a k = 3 5 . 3 ;
%y _ i n i t = (35.3 − a0 ) ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ b0
+ a0 ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ c0 ;
%F o r s t r o k e ( 8 7 0 5 ) w h i t h p e a k = 8 ;
%y _ i n i t = (8− a0 ) ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ b0
+ a0 ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ c0 ;
Ndata = l e n g t h ( y ) ;
Nparams = 3 ;
n _ i t i r s = 15;
updateJ = 1;
a _ e s t = a0 ;
b _ e s t =b0 ;
c _ e s t = c0 ;
%S t e p 1 : r e p e a t u n t i l c o n v e r g e n c e
for i t = 1: n _ i t i r s
72
i f u p d a t e J ==1
% Evaluate the Jacobian matrix at the current parameters
J = z e r o s ( Ndata , Nparams ) ;
f o r i = 1 : l e n g t h ( x_1 )
J ( i , : ) = [ − ( x_1 ( i ) . ∗ exp (1− x_1 ( i ) ) )
. ^ b _ e s t + ( x_1 ( i ) . ∗ exp (1− x_1 ( i ) ) ) . ^ c _ e s t
(20− a _ e s t ) ∗ ( l o g ( x_1 ( i )) − x_1 ( i ) + 1 )
∗ ( x_1 ( i ) . ∗ exp (1− x_1 ( i ) ) ) . ^ b _ e s t
a _ e s t ∗ ( l o g ( x_1 ( i )) − x_1 ( i ) + 1 ) ∗ ( x_1 ( i ) . ∗ exp ( x_1 ( i ) − 1 ) ) . ^ c _ e s t ] ’ ;
end
% compute t h e a p p r o x i m a t e d H e s s i a n m a t r i x , J { ’ }
%i s t h e t r a n s p o s e o f J
H = J ’∗ J ;
%S t r o k e ( 8 7 0 5 )
% y _ e s t = (8− a _ e s t ) ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ b _ e s t
+ a _ e s t ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ c _ e s t ;
73
%t h e f i r s t i t e r a t i o n : compute t h e t o t a l e r r o r
d=y−y _ e s t ;
if i t == 1
e = dot (d , d ) ;
end
end
lamda = 0 . 1 ; % s e t an i n i t i a l v a l u e o f
% t h e damping f a c t o r f o r t h e LM
for i = 1:15
%Apply t h e damping f a c t o r t o t h e H e s s i a n m a t r i x
H_LM=H+ ( lamda ∗ e y e ( Nparams , Nparams ) ) ;
%Compute t h e u p d a t e d p a r a m e t e r s
h_lm = i n v (H_LM) ∗ ( J ’ ∗ d ( : ) ) ;
a_lm= a _ e s t +h_lm ( 1 ) ;
b_lm= b _ e s t +h_lm ( 2 ) ;
c_lm= c _ e s t +h_lm ( 3 ) ;
d_lm=y−y _ e s t _ l m ;
e_lm= d o t ( d_lm , d_lm ) ;
%E v a l u a t e t h e t o t a l d i s t a n c e e r r o r a t
%t h e u p d a t e d p a r a m e t e r s
% I f the t o t a l d i s t a n c e e r r o r of
% t h e u p d a t e d p a r a m e t e r s i s l e s s %t h a n t h e p r e v i o u s one
% t h e n makes t h e u p d a t e d p a r a m e t e r s t o be t h e c u r r e n t %p a r a m e t e r s
74
% and d e c r e a s e s t h e v a l u e o f t h e damping f a c t o r
i f e_lm <e
lamda = lamda / 1 0 ;
a _ e s t =a_lm ;
b _ e s t =b_lm ;
c _ e s t =c_lm ;
e=e_lm ;
updateJ =1;
break
else
updateJ =0;
lamda = lamda ∗ 1 0 ;
end
end
i;
i f i == 15
disp ( ’ cannot find better ’ )
break
end
end
a_lm
b_lm
c_lm
e_lm
p l o t ( x_1 , y , ’ r ∗ ’ ) ;
h o l d on
%p l o t ( x , y _ i n i t , ’ g ’ ) ;
p l o t ( x_1 , y _ e s t , ’ b ’ ) ;
p l o t ( x_1 , y _ e s t _ l m , ’ ko ’ ) ;
p l o t ( S t r o k e 8 7 2 5 ( : , 1 ) / x ( end ) , S t r o k e 8 7 2 5 ( : , 2 ) , ’ c ’ ) ;
%p l o t ( S t r o k e 8 7 2 6 ( : , 1 ) / x ( end ) , S t r o k e 8 7 2 6 ( : , 2 ) , ’ c ’ ) ;
%p l o t ( S t r o k e 8 7 0 5 ( : , 1 ) / x ( end ) , S t r o k e 8 7 0 5 ( : , 2 ) , ’ c ’ ) ;
xlabel ( ’x ’)
ylabel ( ’y ’)
g r i d on
t i t l e ( ’ Use LM−a l g o r i t h m ’ )
l e g e n d ( ’ D a t a p o i n t s ’ , ’ With a0 , b0 , c0 ’ ,
’ With f i t t e d a , b , c ’ )
75