Nonlinear Least-Square Curve Fitting

School of Education, Culture and Communication
Division of Applied Mathematics
Nonlinear Least-Square Curve Fitting of Power-Exponential Functions:

Description and comparison of different fitting methods
by
Rasha Talal Altoumaimi
MASTER THESIS IN MATHEMATICS/ APPLIED MATHEMATICS
DIVISION OF APPLIED MATHEMATICS
MÄLARDALEN UNIVERSITY
SE-721 23 VÄSTERÅS, SWEDEN

School of Education, Culture and Communication
Division of Applied Mathematics
Master thesis in mathematics / applied mathematics
Date:
2017-10-17
Project name:
Nonlinear Regression of Power-Exponential Functions:

Description and comparison of different fitting methods
Author:
Rasha Altoumaimi
Supervisor(s):
Milica Rančić and Karl Lundengård
Reviewer:
Mats Bodin
Examiner:
Sergei Silvestrov
Comprising:
30 ECTS credits
I would like to dedicate my thesis to my beloved Husband Ahmed Sonba who always gives me
support and love
Acknowledgements
I would like to express my thanks and gratitude to my supervisor Karl Lundengård for his
positive and supportive guidance and to supervisor Milica Rančić. Special thanks to the re-
viewer Mats Bodin for giving a very detailed feedback and to Professor Sergei Silvestrov the
examiner for this thesis.
Last but not least, I want to thank my family and friends for their support and encouragement.
3
Abstract
This thesis examines how to find the best fit to a series of data points when curve fitting
using power-exponential models. We describe the different numerical methods such as the
Gauss-Newton and Levenberg-Marquardt methods to compare them for solving non-linear
least squares of curve fitting using different power-exponential functions. In addition, we
show the results of numerical experiments that illustrate the effectiveness of this approach.
Furthermore, we show its application to the practical problems by using different sets of data
such as death rates and rocket-triggered lightning return strokes based on the transmission line
model.
Keywords: Curve fitting, Power exponential functions,Gauss-Newton algorithms, Levenberg

Marquardt algorithm, death rate, rocket-triggered lightning return strokes.
Contents
1 Introduction 7
1.1 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Curve fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.1 Power exponential functions . . . . . . . . . . . . . . . . . . . . . . 8
1.2.2 Selecting parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Least-squares of curve fitting . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Nonlinear least squares problems . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Jacobian, Hessian and gradient 12
2.1 Jacobian and the gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 The Hessian matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3 Description of different methods 17
3.1 Newton’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.1 Line search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.2 Convergence of Newton’s approach . . . . . . . . . . . . . . . . . . 19
3.1.3 Problems with Newton’s method . . . . . . . . . . . . . . . . . . . . 20
3.2 Trust-region methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2
3.2.1 Trust-region strategy . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.2 Trust-region algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.3 The sub-problem of the trust-region . . . . . . . . . . . . . . . . . . 23
3.3 The approach of Gauss-Newton . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4 The Levenberg-Marquardt method . . . . . . . . . . . . . . . . . . . . . . . 26
3.4.1 Implementation strategy of Levenberg-Marquardt method . . . . . . 28
4 Implementation and verification of the Gauss-Newton and L-M method. 31
4.1 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Hessian and gradient for power-exponential model . . . . . . . . . . . . . . 32
4.3 Power exponential data fitting . . . . . . . . . . . . . . . . . . . . . . . . . 33
5 Application of non-linear least squares for curve fitting 38
5.1 Modelling mortality rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.1.1 Analysing mortality rate data . . . . . . . . . . . . . . . . . . . . . . 39
5.2 Modelling rocket triggered lightning return strokes . . . . . . . . . . . . . . 44
5.2.1 Analysing data rocket-trigged return stroke . . . . . . . . . . . . . . 45
6 Conclusions and future work 49
6.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
7 Summary of reflection of objectives in the thesis 51
7.1 Objective 1: Knowledge and understanding . . . . . . . . . . . . . . . . . . 51
7.2 Objective 2: Methodological knowledge . . . . . . . . . . . . . . . . . . . . 51
7.3 Objective 3: Critically and Systematically Integrate Knowledge . . . . . . . . 52
7.4 Objective 4: Independently and Creatively Identify and Carry out Advanced
Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3
7.5 Objective 5: Present and Discuss Conclusions and Knowledge . . . . . . . . 52
7.6 Objective 6: Scientific, Social and Ethical Aspects . . . . . . . . . . . . . . . 52
A MATLAB Code 56
A.1 Calculating residuals and Jacobian for the first power exponential function
µ(b; x) using GN and Levenberg-Marquardt algorithms . . . . . . . . . . . . 56
A.2 Implemention mortality rate In GN and LM Algorithms using power exponen-

tial model µ(c1 , c2 , a1 , a2 , a3 ; x) . . . . . . . . . . . . . . . . . . . . . . . . . 61
A.3 Implemention data rocket-trigged return stroke In GN and LM algorithms us-

ing power exponential model µ(a, b, c; x) . . . . . . . . . . . . . . . . . . . . 68
4
List of Figures
4.1 The power-exponential function of µ(a, b, xi ) in GN-algorithm . . . . . . . . 35
4.2 The power-exponential function of µ(a, b, xi ) . . . . . . . . . . . . . . . . . 37
5.1 Death risk after ages for men in USA between 1995 and 2004. . . . . . . . . . . . 39
5.2 Death risk for men in USA for some years between 1995 and 2004 using Gauss-
Newton algorithm for x denotes age and y death risk. . . . . . . . . . . . . . . . . 41
5.3 Death risk for men in USA for some years between 1995 and 2004 using Levenberg
Marquardt algorithm for x denotes age and y death risk. . . . . . . . . . . . . . . 43
5.4 Equipment for measuring rocket-triggered return strokes, image originally ap-
peared in[17] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.5 Comparison of results for fitting rocket-triggered lightning return strokes data
using the Gauss-Newton algorithm and the Levenberg-Marquardt algorithm . 47
5
List of Tables
5.1 The result of residuals least squares ×10−8 using µ(c1 , c2 , a1 , a2 , a3 ; x). . . . 44
5.2 The result of residuals least squares for each rocket-triggered stroke using
formula fi = yi − µ(xi ; a, b, c). . . . . . . . . . . . . . . . . . . . . . . . . . 48
6
Chapter 1
Introduction
1.1 Thesis outline
The goal of this thesis is to analyze and conduct several experiments with the properties of
class functions for curve fitting using power exponential functions and by comparing different
methods. Researchers would need to measure before setting up a model and so sample suitable
points will need to be picked up and curve fit to the chosen points. Here we will examine a
few different methods and apply them to the data from different applications. In general
terms, models or mathematical formulas may be developed to estimate a variable in the data
set based on the other variables in the data, with some residuals error depending on model
accuracy such as (Data = Model + Error). This related research can be used in many areas
such as electromagnetic compatibility calculations, lightning strike protection and in models
of death risk as well.
As will be seen in Chapter 2 and 3, there was a selection of papers and research on standard
methods of analysis and standards in performance when fitting models to data: Gauss-Newton
algorithm, trust-region algorithm and Levenberg- Marquardt algorithm, some of these meth-
ods are general in the sense that they can be used for any optimization problem, while the
other methods are especially adapted to least-square problems. This thesis will explain the
most common methods as well as choosing two methods used in non-linear least squares
curve fitting and explore them in detail including when and how they converge to zero
In the following sections of the introduction, some of the terminology used in this thesis
is described. In Chapter 1 we will discuss the basic concepts of non-linear curve fitting.
In Chapter 2 we will discuss the Jacobian and Hessian matrices that are important building
blocks in popular methods for non-linear curve-fitting. We will also discuss the concept of
convergence. In Chapter 3 we will discuss several methods for non-linear curve fitting. In
Chapter 4 two of the methods from Chapter 3 will be applied to real data.
7
1.2 Curve fitting
Curve fitting is used to find the "best fit" line or curve for a series of data points or, differently
put, the curve fitting observes the relationship between one or more predictors (independent
variables) and a response variable (dependent variable), with the aim of defining a "best fit"
model of the relationship. The goal of analyzing the results of non-linear least-square curve
fitting is to find the parameters that can the fit data best in the least-square sense, how much
uncertainty there is in the values of the parameters, and if the model fits this data differently
for a different set of data. Least squares minimize the square of the error between the original
data and the values predicted by the model. To get the best curve fit is determined by how near
the data is from the predicted values of the model. Most of the time, the curve fit will produce
an equation that can be used to find points anywhere along the curve. In some cases, we may
not be concerned about finding an equation. Instead, we may just want to use a curve fit to
smooth the data and improve the appearance of our plot [18].
1.2.1 Power exponential functions
In a power function the independent variable x is raised to a constant power c in its most
basic form of f (c, x) = xc where c is a coefficient. In this, the analysis should be focused on
b
linear combinations of function of the form µ(b, x) = xe1−x . This function is called power
exponential function where b is real-valued parameter different for each term in the linear
combination that can be changed to adopt the shape of the curve.
1.2.2 Selecting parameters
The choice of the parameters of power exponential function are extremely crucial as they affect
how good the fit will be. There are some ways to measure the fit that comprise considering the
sum of squares of residuals, the maximum residuals, etc. least-squares minimizes the square
of the error between the original data and the values predicted by model [18]. In this thesis,
we would like the sum of squares error or residuals to be as small as possible.
1.3 Least-squares of curve fitting
Least-square curve fitting is a commonly applied method for choosing parameters in a mathe-
matical model so that they approximate observed data in a good way.
Both linear and non-linear regression must deal with the dependent variables being random
in some sense. The difference is that in linear regression the value of the predicting model
8
depends linearly on its parameters. For example y(x; a, b) = ax + b is linear but y(x; a, b) =
ax + b2 is non-linear.
Non-linear regression modelling is somewhat identical to linear regression modelling in that

each one of them seeks a graphically tracking specific response from a variable group. Non-
linear models are more complex than linear models concerning the development due to the
function that is generated via a set of approximations (iterations) which could result from
trial-and-error. Mathematicians utilize many established approaches, like the Gauss-Newton
approach and the Levenberg-Marquardt approach.
Many problems can be written in the generic form

fi (b1 , b2 , ........., bn ) = 0, i=1:n (1.1)
where fi are given function of n variables. In this section, a detailed study of the numerical
solution of such systems, where at least one function depends nonlinearly on at least one of
the variables. Such a system is called nonlinear system of equations, and may be represented
compactly in the form of:
f (b) = 0, f : Rn → Rn
Even more generally, if f is an operator acting on some function space f (b) is functional
equation. Let us consider a nonlinear optimization problem of the form
minb ϕ(b), b ∈ Rn
where the objective function ϕ is a non-linear mapping Rn → R. Most numerical methods
attempt to find a local minimum of ϕ(b) such that ϕ(b∗ ) ≤ ϕ(y) for all y in a neighbourhood
of b∗ .
In this status where the objective function ϕ is continuously differentiable at a point b then
any local minimum point b of ϕ must satisfy
g(b) = ∇ϕ(b) = 0 (1.2)
where g(b) is the gradient vector. This shows the close relationship between solving optimiza-
tion problems and non-linear systems of equations.
If in form (1.1) there m > n equation we have an overdetermined nonlinear system. Then the
least squares solution can be defined to be a solution to
1
minϕ(b)b∈Rn = k f (b) k22
2
which is a nonlinear least squares problems. We will describe for this problem in a next
section.
Optimization problems can be encountered in many applications such as operations research,

control theory, chemical engineering, and all kinds of curve fitting or more general mathemat-
ical model fitting.
9
1.4 Nonlinear least squares problems
A widely used approach for the estimation of the unknown parameters in the non-linear re-
gression function is the approach of least squares. The problem of non-linear least-squares is
that of minimizing a sum of squares. Considering a vector function f : Rn 7→ Rm where m ≥ n.
The aim is minimizing k f (b) k22 as the nonlinear least-squares objective function
1 1 m
minb ϕ(b) ≡ minb∈Rn k f (b) k22 = minb∈Rn ∑ fi (b)2 , (1.3)
2 2 i=1
in which each fi is a real-valued formula which has continuous second partial derivatives.
The problem is presented as a minimization of the l2 norm of a multivariate function of
minb∈Rn k f (b) k22 , where  
f1 (b)
 f2 (b) 
 
 . 
f (b) = 
 . 
 (1.4)
 
 . 
fm (b)
A common instance is the choice of parameter b within a nonlinear model µ:
m
minb∈Rn ∑ (yi − µ(xi ; b))2 (1.5)
i=1
where yi are observations at specific values xi . Fitting data to a mathematical model is an

important source of non-linear least squares problems. Here one attempt to fit specific data
(yi , xi ), i = 1, ..., m to a structure formula y = µ(xi ; b). In the case of letting fi (b) express the
error or residual in the structure prediction for the i : th observation,
fi (b) = yi − µ(xi ; b), i = 1, ...., m. (1.6)
We must minimize some norm of the vector f (b). The non-linearity arises only from µ(xi ; b).
The model µ(b, xi ) fits the data well if the data points residuals fi are small.
One of the major advantages in the utilization of the non-linear regression is the wide set of
formulas which might be fit. Several scientific or physical procedures are inherently non-
linear; thus, to try a linear fitting of a similar structure would definitely result in inferior out-
comes. For instance, studies in population very often follow exponential patterns that cannot
be structured in a linear way. The non-linear regression can give good estimations of the un-
known parameters in the model with the use of a small data set. The problems of least squares
might be resolved via the general optimizing method. This study offers numeric results for a
large and diverse problem group with the use of software which is common and has under-
gone extensive tests. The algorithms that were used in this software include Newton-based
line searching and trust-region approaches for unconstrained optimizing, in addition to Gauss-
10
Newton, Levenberg-Marquardt for non-linear least squares. Our purpose is comparing the
underlying algorithms for the identification of classes of problems on which the performance
of every one of the methods is either quite good or quite poor, and for providing bench-marks
for further work in non-linear least squares and unconstrained optimizations.
11
Chapter 2
Jacobian, Hessian and gradient
2.1 Jacobian and the gradient
The standard approaches for non-linear least squares problems need derivative information
concerning the component function of f (b) as mentioned in [3] and [4].
Definition 1. If the function fi : Rn → R is differentiable at point b, the fi must also be
continuous at b then the gradient vector [3]
∇ϕ(b) = ( ∂∂bfi , ..., ∂∂bfni )T ∈ Rn

1
exists and is continuous. All the first-order partial derivatives of a vector valued function f (b):
Rn → Rn , is said to be differentiable at the point b if each component fi (b) is differentiable at
b. Then the matrix.
   ∂ f1 ∂ f1

∇ f1 (b)T ...
.  ∂ .b1 . ∂ bn 
.. 
J(b) = 
 .. =
 . ..
 . . 
∇ fn (b)T ∂ fn ∂ fn
∂b ... ∂ bn
1
is called the Jacobian matrix J of f (b) [3].
We assume here that f (b) is twice continuously differentiable. It is simply demonstrated that
the gradient of ϕ(b) = 21 f T (b) f (b) is
m
∂ ϕ(b) ∂ fi (b)
[∇ϕ(b)] j = = ∑ fi (b) , i = 1, · · · , m, j = 1, · · · , n.
∂bj i=1 ∂bj
12
and it follows that the gradient is the vector
g(b) = ∇ϕ(b) = J(b)T f (b) (2.1)
where Jacobian J(b) of the vector function f (b) is defined as the matrix with elements. This
is matrix containing the first partial derivatives of the function components
∂ fi (b) ∂ µ(b, xi )
[J(b)]i j = =− ∈ Rm×n , i = 1, .., m, j = 1, ..., n. (2.2)
∂bj ∂bj
The ith row of J(b) equals the transpose of the gradient of fi (b):
[J(b)]i,: = ∇ fi (b)T = −∇µ(b, xi )T , i = 1, ..., m. (2.3)
As we can see that the Jacobian is a function of the independent variable and the parameters,
and therefore it changes from one iteration to the next. So in terms of the linearized model
will be ∂∂bfij = −Ji j , and the residual is given by Eq.(1.6).
2.2 The Hessian matrix
We shall also need the elements of the Hessian of ϕ(b), denoted ∇2 ϕ(b), that are given by
m
∂ 2 ϕ(b) ∂ fi (b) ∂ fi (b) m ∂ 2 fi (b)
[∇2 ϕ(b)]kl = =∑ + ∑ fi (b) (2.4)
∂ bk ∂ bl i=1 ∂ bk ∂ bl i=1 ∂ bk ∂ bl
and it follows that the matrix H(b) can be written as

m
H(b) = ∇2 ϕ(b) = J(b)T J(b) + ∑ fi (b)∇2 fi (b) (2.5)
i=1
where
[∇2 fi (b)]kl = −[∇2 µ(b, xi )]kl
∂ 2 µ(b, xi )
=− , k, l = 1, .., m.
∂ bk ∂ bl
The summation term can be ignored, therefore we can approximate ϕ(b) as
H(b) = ∇2 ϕ(b) = J(b)T J(b) (2.6)
The special forms of the gradient g(b) and Hessian H(b) can be exploited by methods for the
non-linear least squares problem.
Now we consider how the Hessian matrix can be used to establish the existence of a local
minimum or maximum.
13
Theorem 1. Suppose that f (b) has continuous first and second partial derivatives on a set
D ⊆ Rn . Let b∗ be an interior point of D that is a critical point of f (b). Then b∗ is a:
1. strict local minimum of f (b) if H f (b∗ ) is positive definite.
2. strict local maximum of f (b) if H f (b∗ ) is negative definite.
This theorem can be proved by using the continuity of the second partial derivatives to show
that H f (b) is positive definite for b sufficiently close to b∗ , and then applying the multi-
variable generalization of Taylor’s Formula [21].
We test now if the Hessian of f (b) is indefinite at a critical point. Suppose that f (b) has
continuous second partial derivatives on a set D ⊆ Rn . On the other hand, let b∗ be an interior
point of D that is a critical point of f (b). If H f (b∗) is indefinite, then there exist vectors u, v
such that
u · H f (b∗ )u > 0, v · H f (b∗ )v < 0.
By continuity of the second partial derivatives, there exists an an ε > 0 such that
u · H f (b∗ + tu)u > 0, v · H f (b∗ + tv)v < 0.
for |t| < ε. If we define
U(t) = f (b∗ + tu), V (t) = f (b∗ + tv).
Then U 0 (0) = V 0 (0) = 0, whereas U 00 (0) > 0 and V 00 (0) < 0. Thus, t = 0 is a strict critical
local minimum of U(t), and a strict local maximum of V (t) [21].
Definition 2. A saddle point for f (b) is critical point b∗ such that there are vectors u, v for
which t = 0 is a strict local minimum for U(t) = f (b∗ + tu) and strict local maximum for
V (t) = f (b∗ + tv).
Theorem 2. Sufficient condition for a local minimum. Suppose that bs is a stationary point
and the continuous second partial derivative of function ϕ(bs ) is positive definite. Then bs is
a local minimum.
We can say if Hessian H = ∇2 ϕ(bs ) is negative definite, then bs is a local maximum. If Hessian
∇2 ϕ(bs ) is indefinite that has both positive and negative eigenvalues, then the stationary point
bs is a saddle point [21].
Depending on the theorems that are discussed above, there are two necessary conditions to
obtain an optimal solution. First order necessary condition for b∗ to be a local minimum of
ϕ(b) is that b∗ is a stationary point that it satisfies
g(b∗ ) = J(b∗ )T f (b∗ ) = 0
14
The second-order sufficient condition if b is a critical point of f (b) that is twice continuously
differentiable f (b) : Rn → Rm , and if the Hessian of f at b is positive definite, then f has a
local minimum at b. In other words, in any direction away from b, the value of f increase at
the first perhaps. Thus
m
∇2 ϕ(b) = J(b)T J(b) + ∑ fi (b)∇2 fi (b)
i=1
where ∇2 f (b) is also positive definite. The first-order and often dominant term J(b)T J(b) of
the Hessian contains only the Jacobian matrix J(b), i.e., only first derivatives. The computa-
2
tional cost for storing the mn2 second derivatives ∂∂ b fi∂(b) might be quite high. In the second
k bl
derivatives are multiplied by the residuals. If the mathematical model is adequate then the
residuals will be small near the solution and the second term will be less important. In this
case the model can be considered an important part of Hessian that includes Jacobian matrix
and the second term can be ignored.
All these notations provide for the model function of non-linear least squares plus its partial
derivatives with respect for each parameter and algorithms such as Levenberg-Marqruadt and
Gauss-Newton algorithms construct all the necessary structures by themselves:
• J(b) = ∇ f (b)- Jacobian matrix of first order derivatives of the residuals with respect to
each parameter and every measurement. It approximates design matrix of the non-linear
squares model.
• g(b) = ∇ϕ(b)- gradient vector of first order derivatives of the objective function with
respect to the mathematical model parameters. It describes slopes of the objective func-
tion ∇ϕ(b) at some points b.
• H(b) = ∇2 ϕ(b)- Hessian matrix of second order partial derivatives of the objective
function with respect to every combination of parameters.
15
2.3 Convergence
One of the useful things that we will discuss is the rate of convergence in various numerical
methods. At this point, one of the most important criteria to consider is the speed or order of
convergence.
Definition 3. A convergent sequence {bk } with lim {bk } = b∗ and bk 6= b∗ is said to have
k→∞
order of convergence equal to p, if
|ek+1 | |bk+1 − b∗ |
lim = lim =C
k→∞ |ek | k→∞ |bk − b∗ | p
Here p ≥ 1 is called the order of convergence, the constant C is the rate of convergence or
asymptotic error constant. Then it is said that bk converges to b of order p with a constant C.
We use the following to demonstrate how rapidly the error ek = bk − b∗ converges to zero.
We consider the following cases for three different orders of convergence with rate p
• A sequence bk is said to be linearly convergent if bk converges to b∗ with order p = 1,

for a constant 0 < C < 1 then
|ek+1 | = C|ek | < |ek | when |ek | is small.
• A sequence bk is said to be quadratically convergent if bk converges to b∗ with order

p = 2 for constant C > 0
|ek+1 | = C(|ek |2 ) when kek | is small.
• A sequence bk is said to be superlinearly convergent if C = 0 for some p ≥ 1 then
|ek+1 |
→0 f or k → ∞
|ek |
The value of p measures how speed a converge of the sequence. Then the bigger value of p is,
the faster is the sequence convergence. In the case of numerical approaches, the approximate
solutions sequence converges to the root. In the case where the iterative approach convergence
is faster, then a solution could be reached in a smaller number of iterations compared to another
approach of a slower convergence. We will discuss the specific convergence properties of some
methods in the following chapter.
16
Chapter 3
Description of different methods
3.1 Newton’s method
Newton’s method is a root finding algorithm that uses the first few terms of the Taylor series
of the function f (b) in the surrounding of suspected root. We will derive this method by
providing local stationary points for function f (b) which satisfies ∇ϕ(b) = 0. Assuming
the approximation of the function ϕ with its second-order Taylor expansion about the point
bk+1 = bk + h is given by:
1
ϕ(bk + h) ≈ ϕ(bk ) + g(bk )T h + hT H(bk )h
2
where the gradient is the vector g(bk ) = ∇ϕ(bk ) and the Hessian is the symmetric matrix
∂2 f ∂2 f
 
∂ b21
... ∂ b1 ∂ bN

H(bk ) =  .. ... .. 
. .

 
∂2 f ∂2 f
∂ b1 ∂ bN ... ∂ bN
For finding the minimum of ϕ(bk + h) in h can give us a new direction towards a local station-
ary point b∗ . When the Hessian matrix H(bk ) is a positive definite,the minimum is the solution
of ∇ϕ(bk + h) is equal zero. Hence, we want to solve linear equation system
∇ϕ(b + h) = g(bk ) + H(bk )h = 0 (3.1)
where h is the Newton step the solution of the symmetric linear system
h = −H(bk )−1 g(bk )
17
This gives also the iterative update
bk+1 = bk + h
= bk − (∇2 ϕ(bk ))−1 ∇ϕ(bk )
= bk − H(bk )−1 g(bk )
= bk − (J(bk )T J(bk ) + G(bk ))−1 J(bk )T f (bk )
Where G(bk ) denotes the matrix
m
G(bk ) = ∑ fi (bk )∇2 fi (bk ).
i=1
The resulting iterative algorithm can be also written
bk+1 = bk + h = bk − J(bk )−1 ϕ(bk ) (3.2)
In general the inverse Jacobian matrix need not be computed. Now, we will highlight the most
important techniques of Newton’s approach [4]:
• Newton method is quite efficient in the final phase of the iteration, where b is close to
b∗ .
• The Newton’s method is quadratically convergent to a local minimum b∗ as long as
Hessian H(b∗ ) is positive definite. On the other hand if H(bk ) is negative definite every-
where the point b will be in a region and the basic Newton’s approach would converge
quadratically towards stationary point b∗ then this point b∗ maximizer.
• It is better to execute a line search αk which guarantees global convergence.
bk+1 = bk − αk Hk−1 gk
We can construct a hybrid method which based on Newton method and the steepest descent
method. The solution hk = −H −1 (b)k g(bk ) is guaranteed to be downhill direction with pro-
vided that Hk is positive definite. Then it is possible to sketch the central section of this hybrid
algorithm as
i f ∇2 ϕ(b) is positive de f inite
h : = hk
else
h : = hsd
b : = b + αh
Where, hsd is the steepest descent direction and α is obtained by line search. The hybrid
methods can be very efficient, but they are hardly ever used because they require computing
∇2 ϕ(b), and for complex application problems this is not available. There are other methods
to solve such this problem, according to a series of matrices H ∗ = ∇2 ϕ(b∗ ) [4].
18
3.1.1 Line search
As in the case of solving a non-linear system Newton’s method needs to be modified when the
initial value b0 is not close to minimizer. Either a line search can be included or a trust region
technique used. In a line search we take the new iterate to be
bk+1 = bk + αk dk , α ≥0 (3.3)
Here dk is a search direction and step length αk > 0 chosen so that ϕ(bk+1 ) < ϕ(bk ) which is
equivalent to ϕ(α) < ϕ(0).
f (α) = ϕ(bk + αk dk ) (3.4)
So our dk being a descent direction ensures that f 0 (0) = (dk )T g(bk ) < 0, where g(bk ) is gradi-
ent at bk . It is usually not efficient to determine an accurate minimizer. Rather it is demanded
that αk satisfy the two conditions
ϕ(bk + αk dk ) ≤ ϕ(bk ) + σ αk g(bk )T dk (3.5)
|g(bk ) + αk dk | ≤ β |g(bk )T dk | (3.6)

where σ and β are constants satisfying 0 < σ < β < 1. Typically σ = 0.001 and β = 0.9 are
used [4],[3],[11] and [10]. There is three different situations can arise
1 αk should be increased when the value of αk is quite small that that the gain in value of
the objective function is so small.
2 We must decrease αk in order to satisfy the descending condition ϕ(bk+1 ) < ϕ(bk ) when
αk is too large,
3 αk is close to the minimum of ϕ(αk ) then we can accept this αk value.
We observe that the Newton step hk is not a descent direction if g(bk )T H(bk )−1 g(bk ) ≤ 0. The
one of reasons to admit the gradient as an alternative search direction since there is a risk that
the Newton direction will lead to a saddle point.
3.1.2 Convergence of Newton’s approach
We are interested here in analyzing the convergence of Newton’s approach. We can say that
the Newton’s algorithm converges quadratically if the approximation bk is sufficiently close to
a root b∗ at which the Jacobian is non-singular.
Theorem 3. Suppose f : Rn → Rn is continuously differentiable and f (b∗ ) = 0. If
1 The Jacobian J(b∗ ) of function f at point b∗ is non-singular.
19
2 J is Lipschitz continuous on a neighborhood of b∗ .
Then, for all b(0) sufficiently close to b∗ , Newton’s algorithm produces a sequence b(1) , b(2) , ...,
that converges quadratically to b∗ . The proof of this theorem is described in [3].
Definition 4. Suppose f : Rn → Rm . Then f is said to be Lipschitz continuous on S ⊂ Rn if

there exists a positive constant L such that
k f (a) − f (b) k≤ L k a − b k ∀a, b ∈ S.
Lipschitz is a technical condition that is more robust than the mere continuity of the Jacobian J
but weaker than the condition that the function f (b) be twice continuously differentiable.[3][9]
3.1.3 Problems with Newton’s method
The biggest drawback to Newton’s approach is that J(b) and its inverse must be computed for
every one of the iterations. Computing each of the Jacobian matrix and its inverse could be
relatively hard and time consuming depending on the system size. Newton’s approach requires
solving several of linear systems that may turn out to be complex when there are a number of
variables. It converges rapidly in the case where the Jacobian J(b∗ ) is well-conditioned, in the
opposite case it may blow up.
3.2 Trust-region methods
Trust region approaches were initially developed for non-linear least-squares problems [11]
and are of a varying kind compared to general descent approaches. The fundamental concept
of trust-region methods is initially to decide the step size for each sub-problem, thereafter
an optimizing for the ultimate orientation. The radius of ∆k of the trust-region is defined by
step size, when function ϕ(b) is twice continuously differentiable (the second-order Taylor
expansion) and it is also believed to be behave similarly to the original formula. Within radius
∆k of maximal step size the optimal orientation is calculated with respect the approximate
formula ϕ(bk ) in other words.
min mk (bk + h) (3.7)

khk<∆k
where Taylor’s theorem gives

1
ϕ(b + h) = ϕ + gT h + hT Hh + O(k h k3 )
2
20
Consequently the quadratic model function mk used at each iterate bk is
1
mk (bk + h) ' mk (h) = ϕ(bk ) + g(bk )T h + hT H(bk )h. (3.8)
2
At each iteration, we search for a solution hk of the sub-problem based on the quadratic model
Eq.(3.7) subject to some trusted region. Assuming to know a positive number ∆k in a way that
the model is sufficiently precise inside a ball with radius ∆k , centered at bk , and determine the
step as
1
minimize mk (h) = ϕk + hT gk + hT Hk h (3.9)
h∈Rn 2
subject to k h k2 ≤ ∆k
Since mk (h) is supposed to be a good approximation to ϕ(bk + h) for h sufficiently small, one
of the reasons why the step got failed is that h was too large, and should be reduced. Moreover,
if the step is accepted, it may be possible to use a larger step from the new iterate and that way
reduce the number of steps needed before b∗ is reached.
3.2.1 Trust-region strategy
A basic element in a trust-region algorithm is the process of selecting the trust-region radius
∆k at every one of the iterations. The quality of the model with the computed step can be
evaluated by the so-called gain ratio
ϕ(bk ) − ϕ(bk + hk )
ρk = (3.10)
mk (0) − mk (hk )
which gives the ratio between the actual reduction and expected reduction. The actual decrease
of the objective function is presented by the actual reduction for the trial step hk . The predicted
reduction in the denominator of Equation (3.10) is the reduction that was expected by the
model function mk . The option of ∆k is at least partially decided by the ratio ρk at previous
iterations. By construction the predicted reduction should always be positive. If gain ratio ρk
is negative, the new objective value ϕ(bk + hk ) is bigger than the present value ϕ(bk ), thus the
stage has to be discarded. If gain ratio ρk is approximate to one, there’s an efficient agreement
between the model function mk and the function over the step, therefore it is safe expanding
the trust area for the upcoming iteration. In the case that gain ratio ρk is positive but extremely
smaller than one, the trust region is not updated, but when it’s approximate to zero or if it is
negative, the trust region ∆k is shrank at next iteration.
The approximate function mk (b) can be minimized via a variety of approaches. With a trust
region method the step length is monitored by the size of the radius ∆k . As in the situation
21
of line searching, the exact optimal solution is not necessarily needed. An uncomplicated
approach is minimizing the linear approximation
minkhk<∆k {ϕ(bk ) + g(bk )T h}
Its solution is the steepest descent orientation
−g(bk )
h=
k g(bk ) k
where one only must minimize the step length determined to be less than the trust radius.
3.2.2 Trust-region algorithm
For the sake of controlling that the process is performed well, a check is performed to deter-
mine whether the trust radius is adequate. Therefore, the expect edreduction mk (bk ) − mk (bk +
hk ) and the actual reduction ϕ(bk ) − ϕ(bk + hk ) are compared. The ratio between the actual
reduction and the expected one ρk = Ared k
Predk plays a very significant part in the algorithm. This
ratio is to determine if the trial step is agreeable and alter the radius of the new trust region
[14].
Now we can give a model trust region algorithm [11]and [12]

Algorithm 1. Trust-Region
¯ initial trust region size
step 1: Specify setting approximation b0 , maximum step length ∆,
¯ 1
∆0 ∈ (0, ∆) and acceptance constant γ ∈ [0, 4 ).
step 2: For k=0,1,2,... until bk is optimal
• Solve min mk (h) in (3.8)

h
s.t k h k≤ ∆k
• Compute the gain ratio ρk in Eq.(3.9) for hk .
step 3: Update the current point

(
bk + hk , if ρk > γ
bk+1 =
bk , Otherwise
step 4: Update the trust-region radius

1
 4 ∆k
 if ρk < 41
¯
∆k+1 = min(2∆k , ∆), if ρk > 43 and k hk k= ∆k

Otherwise

∆k
22
3.2.3 The sub-problem of the trust-region
One of the most significant parts of the trust region algorithms are the trust region sub-
problems. Since every one of the iterations of a trust region algorithm demands to solve
exactly or inexactly a trust region sub-problem, finding an efficient solver for trust-region
problems is very important. We will consider sub-problem (3.9) which has been studied by
many authors. At iteration k of a trust-region approach, the following sub-problem must be
resolved:
1
minimize mk (h) = ϕk + hT gk + hT Hk h
h∈Rn 2
subject to k h k2 ≤ ∆k
It can be shown that the solution h∗ of this constrained problem is the solution of the linear
equation system
(Hk + λ I)h∗ = −gk (3.11)
where gk ∈ Rn , Hk ∈ Rn×n is a symmetric matrix, and ∆k > 0 if and only if there exists λ ≥ 0
such that (Hk + λ I) is positive semi-definite, k h∗ k2 ≤ ∆k and λ (∆k − k h∗ k2 ) = 0. Note
that if Hk = ∇2 ϕ(bk ) is positive definite and ∆k big enough, the solution of the trust-region
sub-problem is the solution of
∇2 ϕ(bk )h = −∇ϕ(bk )
i.e h is Newton direction. Otherwise
∆k ≥k hk k=k ∇2 ϕ(bk ) + λ I)−1 ∇ϕ(bk ) k . (3.12)
So if ∆k goes to zero, then λ goes to infinity and,

−1
hk → − ∇ϕ(bk )
λ
when λ varies between zero and infinity, the corresponding search direction hk (λ ) will vary
between the Newton direction and a multiple of the negative gradient.
23
3.3 The approach of Gauss-Newton
This approach depends on a linear approximation to the component of ϕ(b) in the neighbour-
hood of b. The idea of the approach is the approximation of the Hessian matrix H(b) by the
first part J T (bk )J(bk ). It is used for solving the non-linear least squares problems and could
only be used to minimize a sum of squares objective function.
Let us consider the Taylor series expansion follows

1 1
ϕ(bk + h) ≈ f (bk )T f (bk ) + hT J(bk )T f (bk ) + hT J(bk )T J(bk )h
2 2
1
≈ ϕ(bk ) + hT J(bk )T f (bk ) + hT J(bk )T J(bk )h
2
1
≈ ϕ(bk ) + g(bk )T h + hT H(bk )h
2
where the gradient and Hessian of ϕ = 21 f T f are given by (2.2) and (2.6). Since Newton’s
method demands computation of the second derivatives, it is only used when it is feasible
to compute H(bk ). The Gauss-Newton method differs from Newton’s method by using an
approximation of the Hessian matrix. It can be considered as arising from neglecting the
second derivative term
m
H(bk ) = ∇2 ϕ(bk ) = J(bk )T J(bk ) + ∑ fi (bk )∇2 fi (bk )
i=1
2 T
= ∇ ϕ(bk ) = J(bk ) J(bk ) + Q(bk )
The term Q(bk ) will be small close to the solution b∗ if one the residual norm k f (b∗ ) k is small
or if f (b) is only slightly non-linear. The behaviour of the Gauss-Newton approach can then be
expected of being identical to that of Newton’s approach. Specially if a consistent problem has
a zero residual, i.e. f (b∗ ) = 0 the local convergence will be equal for each method. However,
for mild to large residual problems the local convergence rate the Gauss-Newton approach can
be much lower ranking to the one of Newton’s approach.
This method uses that the approximation Q(bk ) = 0 and determines the search direction as the
solution of the Newton equations
∇2 ϕ(bk )hN = −∇ϕ(bk )
with the Gauss-Newton method approximates the gradient and Hessian respectively as
g(bk ) = J T (bk ) f (bk ) (3.13)
H(bk ) ≈ J T (bk )J(bk ) (3.14)
The resulting method is referred to as the Gauss-Newton approach, where the computation of
the search orientation hGN covers the solution of the linear system
(J(bk )T J(bk ))hGN = −J(bk )T f (bk ) (3.15)
24
Note that J(bk )T J(bk ) is always at least positive semi-definite. While J(bk ) has full rank
(determinant of Jacobian matrix must not be zero)and the gradient g(bk ) is nonzero, the Gauss-
Newton search direction is a descent direction and thence an appropriate direction for a line
search and this case is actually the normal formulas for the linear least squares problem
min k J(bk )hGN − (− f (bk )) k22 (3.16)

hGN
Otherwise J(bk )T J(bk ) is non-invertible and the equation does not have a unique solution.
J(bk )T J(bk )hGN = −J(bk )T f (bk ) (3.17)
In this case, the problem is said to be under-determined or over-parametrized. Furthermore, if

ϕ(b∗ ) = 0, then H(h) ' ∇2 ϕ(b) for b close to b∗ , we get quadratic convergence.
The difference between Newton’s approach and Gauss-Newton approach are the search direc-
tions
H(b)hN = −g(b)
H(h)hGN = −g(0)
Gauss-Newton applies to over-determined systems and minimizes the error in k J(b)hGN +

f (b) k2 for the update step. For a quadratic system this reduces to standard Newton. One
could also solve an overdetermined system by minimizing the quadratic error k ϕ(b) k2 . To
determine the zeros of the gradient by Newton’s method would here require second derivatives
of ϕ which are not needed for Gauss-Newton.
The solution is given by the normal equation
hGN = (JkT Jk )−1 JkT (− fk ) (3.18)
which is a descent direction as we mentioned before that JkT Jk is positive semi-definite matrix.
It may be illustrated that for a wide range of instances, taking the final step sizes as one step
results in convergence.
Algorithm 2. Gauss-Newton Method
• Step 0: Choose an initial b0 ∈ Rn , and iterate for k = 0, 1, 2, ...
• Step 1: Repeat until convergence:
– Step 1.1: Solve J T (bk )J(bk )hGN = −J T (bk ) f (bk )

– Step 1.2: Choose a step length αk so that there is enough descent.
– Step 1.3: Set bk+1 = bk + αk hGN
25
Sufficient condition for convergence of the GN-approach is known if the normal formulas
for the linearized least squares problem (3.16) are solved exactly in Step 1.1 at every one of
the iterations. The approach with line searching αk may seem to have ensured convergence,
provided that first there exists b∗ ∈ Rn such that J T (b∗ ) f (b∗ ) = 0 and second, the Jacobian
matrix J(b∗ ) at b∗ has full rank n.
We will introduce the notation υ(A) to denote the spectral radius of an n × n matrix A, and
define
ρ = υ((J(b∗ )T J(b∗ ))−1 Q(b∗ )) (3.19)
The following theorem on local convergence of the Gauss-Newton approach then holds.
Theorem 4. Let the first and second assumptions hold. If ρ < 1, then the Gauss-Newton iter-
ation converges locally to b∗ ; that is, there exists ε > 0 such that the sequence {bk } generated
by the Gauss-Newton algorithm converges to b∗ for all b0 ∈ D ≡ {b| k b − b∗ k< ε}.
The proof of theorem 4 can be found in [20].
3.4 The Levenberg-Marquardt method
Levenberg has added improvements on Gauss-Newton method to eliminate the weaknesses

of this method when Jacobian matrix is rank deficient when both the row and the column
numbers are strictly bigger than the rank. After that Marquardt modified Levenberg’s method
by adding a strategy for controlling the so-called damping parameter λ . The Levenberg-
Marquardt method is also known as damped least squares method. The method switches be-
tween the Gauss-Newton approach described previously and what is called a steepest descent
approach [19]. The Levenberg-Marquardt approach is suggested to consider as modification
for the Gauss-Newton to restrain the problems failure of such method search direction when
Jacobian J is rank deficient. This approach interpolates between the steepest descent and
Gauss-Newton methods. In the steepest descent method the sum of the squared error is re-
duced by updating parameters in the descent direction or search direction or well-behaved
functions and reasonable starting parameters, this algorithm usually a little of less speed than
the Gauss-Newton one. Moreover, this approach is more robust than the Gauss-Newton, and
that indicates the fact that a solution is found in many cases even starting quite far off the final
minimum. Rather than approximation the Hessian as in (2.5), Marquardt observed that the
summation term in (2.6) can be approximated by λ I where λ ≥ 0 is a damping parameter al-
tered by the algorithm. When λ is greater than zero, it leads to the steepest descent approach.
In this approach the Hessian matrix JJ T is not needed. Therefore the rate of convergence is
only linear, and can be very slow. For this reason the steepest descent approach is usually used
only as a starting point or while other search directions fail[4]. Using this we approximate the
Hessian as
H(bk ) = ∇2 ϕ(bk ) ≈ J(bk )T J(bk ) + λ I (3.20)
26
Thus to find the search direction hLM is defined by the following modification to (3.15),
(J(b)T J(b) + λ DT D)hLM = −J(b)T f (b) (3.21)
which it can be written on other way that we used the original algorithm of approximation
Hk ≈ JkT Jk to solve (3.22) for different values of damping parameter λ
(Hk + λk I)hLM = −gk (3.22)
where DT D is a positive-definite, diagonal matrix which represents the relative parameter

scaling but it can be used DT D = I to simplify our characterization of the method and f
is a residual vector. When λ has large values, the approach takes a small step in the steepest
descent orientation. This is good if the current iterate is far from the solution hLM ' − λ1 ∇ϕ(b).
If λ is selected to be small, then the approach converges faster via the Gauss-Newton hLM '
hGN , which is a positive step in the end of stages of the iteration, when b is close to b∗ , so we
get quadratic final convergence, so we prefer the robustness of the steepest descent method.
So we can say that if Jacobian matrix JJ T gets zero, it leads to the approach of steepest
descent while the Gauss-Newton approximates the Hessian matrix as JJ T and does not require
the second term of matrix to get convergence rate which is similar to that of the Newton
method. The Levenberg- Marquardt method acts more like steepest descent approach when
the parameters are far from their optimal value, and acts more like Gauss-Newton approach
when the parameters are near from their optimal value.
This approach might also be seen as Gauss-Newton with the use of a trust region process. The
problem is equivalent to solve the model of function mk (hLM ) (3.10) which is the trust region
equation using the approximation of Hessian, gradient of ϕ(bk )
1
minkhLM k≤∆ ϕ(bk ) + ∇ϕ(bk )hLM + hLM H(bk )hLM (3.23)
2
and the iteration step itself is bk+1 = bk + hLM . We perform a trust region strategy instead of
line search technique where the norm of the solution to (3.22), then we must solve at each
iteration
min k J(bk )hLM + f (bk ) k22 subject to k hLM k ≤ ∆k (3.24)
where the trust region radius ∆k is bigger than zero which producing a spherical trust region.
Therefore, the Levenberg-Marquardt approach might be considered as a trust region approach.
In order to compute the step in Levenberg-Marquardt’s method is implemented as:
hLM
k = minhLM {k J(bk )h
LM
+ f (bk ) k22 +λk k hLM k22 } (3.25)
where λk > 0 is called Lagrange parameter for the constraint at the kth iteration and being
updated from iteration to iteration. Thus,the hLM is calculated as the normal equations for the
damped linear least-squares problems [7].
2
1 J(bk ) LM − f (bk )
minhLM h − (3.26)
2 λk I 0
2
27
For ill-conditioned Jacobian (condition number is too lager or infinite), this method makes
more robust variation of Gauss-Newton. The key strategy statement of the Levenberg-Marquardt
method is how to choose and update the damping λk at each iteration.
3.4.1 Implementation strategy of Levenberg-Marquardt method
Analogous to the strategy of trust-region method, the ratio gain ρ(hLM )k (3.12) which is a
comparison of the actual reduction of the objective function in the numerator and the expected
reduction of the quadratic model in the denominator. The ratio gain is using to control and
update the damping parameter λk in the Levenberg-Marquardt method.
ϕ(bk ) − ϕ(bk + hLM )

ρ(hLM )k = (3.27)
mk (0) − mk (hLM )
It is constructed the denominator is positive, and the numerator is negative if the step was not
downhill, it was too large and should be reduced.
ϕ(bk ) − ϕ(bk + hLM ) ≈ (yT y − 2yT µ(b) + µ T (b)µ(b))

− (yT y − 2yT µ(b) + µ T (b)µ(b)
T 1 T
+ (y − µ(b))hLM J T + hLM J T JhLM )
2
T 1 T
≈ −((y − µ(b))hLM J T + hLM J T JhLM )
2
1 LMT T
≈− h J (2(y − µ(b)) + JhLM )
2
The denominator is the reduction of predicted by the local linear model.
T 1 T
m(0) − m(hLM ) = f (b) − f (b) − hLM J T f (b) − hLM J T hLM
2
T 1 T
= −(hLM J T f (b) + hLM J T JhLM )
2
T 1 T
≈ −hLM (J T (y − µ(b)) + hLM J T JhLM )
2
1 LMT
≈− h (2J T (y − µ(b)) + (J T J + λ I − λ I)hLM )
2
1 T
≈ − hLM (J T (y − µ(b)) − λ diag(J T J)hLM )
2
1 LMT
≈ h (λ diag(J T J)hLM − J T (y − µ(b)))
2
28
T
The both hLM and hLM are positive, so m(0) − m(hLM ) is guaranteed to be positive. Finally
we get
ϕ(b) − ϕ(b + hLM )

ρ(hLM ) =
m(0) − m(hLM )
T
− 21 hLM J T (2(y − µ(b)) + JhLM )
= 1 LM T
2h (λ diag(J T J)hLM − J T (y − µ(b)))
−J T (2(y − µ(b)) + JhLM )
≈
(λ diag(J T J)hLM − J T (y − µ(b)))
In a damped method a small value of indicates that we must increase the damping factor and
thereby increasing the penalty on large steps.
• If ρ(hLM ) has large values that m(hLM ) is good approximation to ϕ(b + hLM ), and the
damping parameter λ can be decreased near 0 in a way that the following Levenberg-
Marquardt step is closer to an approximate Gauss-Newton step.
• If ρ(hLM ) has small values or negative, then m(hLM ) is an insufficient approximation,

and the damping parameter λ must be maximized with two-fold goal of getting nearer
to the steepest descent orientation and minimizing the length of the step.
In general, if there is a good agreement between the actual reduction and the predicted
reduction when ρk ≈ 1, then the trust region radius ∆k is increased. If the agreement
is poor when ρk is positive but remarkably smaller then 1, then we can decrease the
parameter ρk and values of ∆k can be kept. If ρk is smaller or close to zero, the step is
rejected and ∆k is decreased at next iteration.
The algorithm for updating the damping parameter in the Levenberg-Marquardt method
is [11].
Algorithm 3. Levenberg-Marquardt method for nonlinear equations

Step 1. Given b1 ∈ Rn , ε ≥ 0, λ1 > m > 0, 0 ≤ η0 ≤ η1 ≤ η2 , k := 1.
Step 2. if k J(bk )T f (bk ) k≤ ε, then step;
ϕ(bk )−ϕ(bk +hLM )
Step 3. Compute ρ(hLM )k = mk (0)−mk (hLM )
;
set
(
bk + hLM
k , if ρk > η0
bk+1 =
bk , otherwise
29
Step 4. Choose λk+1 as

4λk ,
 if ρk < η1 ,
λk+1 = λk , if ρk ∈ [η1 , η2 ],
 λk
max{ 4 , m}, if ρk > η2 ;

k := k + 1; go to step 2.
From the algorithm we can compute the step of hLM k for the Levenberg-Marquardt
method
−1
hLM T T
k = −(J(bk ) J(bk ) + λk I) J(bk ) f (bk ) (3.28)
The main difference between trust-region and Levenberg-Marquardt approaches is that

trust region approach alters the radius of the trust region ∆k directly, while the Leven-
berg Marquardt approach alters the damping parameter λk which using a trust-region
framework as we got using line search method.
30
Chapter 4
Implementation and verification of the

Gauss-Newton and L-M method.
4.1 Numerical experiments
The power-exponential model µ(b, x) that will analyze and simulate of the accuracy of fitting
power-exponential function to a certain set of points and focus on the specific of this model
for curve fitting. We may define our model
µ(b1 , b2 , x) = (b1 e(1−x) )b2 (4.1)

for proper selections of the parameters b1 and b2 . We have :
∂µ
• = (x · e(1−x) )b2 which is independent of parameter b1 .
∂ b1
∂µ
• = µ(b1 , b2 , x) · (ln(x) + (1 − x)) = µ(b1 , b2 , x) · log(x · e(1−x) ) which depends on b2
∂ b2
In the case where least squares were utilized for selecting the parameters, the error associated
to say (b1 e(1−x) )b2 by
m=9
minimize µ(b1 , b2 ) =
b1 ,b2
∑ (yi − (b1 · e(1−xi))b2 )2 (4.2)
i=1
This is just m times variance of the data set {y1 − (b1 · e(1−x1 ) )b2 }, ..., {yi − (b1 · e(1−xi ) )b2 }. It
makes no difference whether or not we examine the variance or m times the variance as our
31
error, and observe that the error is a function of two variables. Thus the ith function would be
µi (b1 , b2 ) = yi − (b1 · e(1−xi ) )b2 (4.3)
which means that it would be the residual for the ith data point. The aim is to find values of
two variables b1 and b2 that minimize the error residuals. The majority of the least squares
problems are of this type, where the function µi (b) are residuals and the index i represents
the specific data point. This is a technique where least squares problems are distinct. Those
problems typically include some assumptions concerning the error in the model. For instance,
there might be
yi = (b1 · e(1−xi ) )b2 + εi (4.4)
where the errors i are thought to spring from one probability distribution often the normal
distribution. In association with this structure are the real parameters b1 and b2 , however every
time we collect data and solve the least-squares problem the results include only estimates
bˆ1 and bˆ2 of those real parameters. After the computation of those estimates, they will be
compared with two common methods the Gauss-Newton and Levenberg-Marquardt methods
for non-linear-least squares problems by using their algorithms.
This is a consequence of the particular structuring of the Hessian matrix ∇2 ϕ(b) for the least-
squares objective function. The Hessian H(b) in this case is the sum of two terms. The first
one is only involved with the gradients of the power-exponential function µi and therefore it
is easier to calculate. The second involves the second derivatives, but is zero if the errors εi
are all zero in case that the model perfectly fits the data. It is trying to approximate the second
term (2.5) in the Hessian, and several other approaches for least-squares do it.
4.2 Hessian and gradient for power-exponential model
To compute the gradient and Hessian of this model with data points
y1 − (b1 · e(1−x1 ) )b2

 
y2 − (b1 · e(1−x2 ) )b2 
 
y − (b · e(1−x3 ) )b2 
 3 1 
y − (b · e(1−x4 ) )b2 
 4 1 
f (b) = y5 − (b1 · e
 (1−x5 ) b
) 
2

y6 − (b1 · e(1−x6 ) )b2 
 
y7 − (b1 · e(1−x7 ) )b2 
 
 
y8 − (b1 · e(1−x8 ) )b2 
y9 − (b1 · e(1−x9 ) )b2
32
The formula for the least-squares objective function is
1 9 1
µ(b1 , b2 , xi ) = ∑ (yi − (b1 · e(1−xi ) )b2 )2 = f (b)T f (b).
2 i=1 2
The gradient of µ is
∑9i=1 (yi − (b1 · e(1−xi ) )b2 )(xi · e(1−xi ) )b2

∇µ(b1 , b2 ) =
∑9i=1 (yi − (b1 · e(1−xi ) )b2 )µ(xi ) · log(xi · e(1−xi ) )
This can be rewritten as ∇µ(b1 , b2 ) =
(xi · e(1−xi ) )b2

(1−xi ) )b2

(1−x ) yi − (b 1 · e
µ(xi ) · log(xi · e i )
The Hessian matrix is ∇2 ϕ(b) = ∇ f (b)∇ f (b)T + ∑m 2

i=1 f i (b)∇ f i (b) =
(xi · e(1−xi ) )b2

(1−xi ) )b2 µ(x ) · log(x · e(1−xi ) )

(xi · e i i
µ(xi ) · log(xi · e(1−xi ) )
m=9
+ ∑ fi (b)∇2 fi (b)
i=1
We observe that {xi } and {yi } are the data values of model, while b1 and b2 are the variables
in the model. As we mentioned before if fi (b∗ ) is equal zero then it is reasonable to expect
that f (b) ≈ 0 for b ≈ b∗ , implying that
H(b) ≈ ∇ f (b)∇ f (b)T
This final formula only covers the first derivatives of the functions { fi (b)} and proposed that
an approximation to the Hessian matrix can be found using only first derivatives, at least in
cases where the model is a good fit to the data.
4.3 Power exponential data fitting
One of the simplest methods of analyzing data is the Gauss-Newton method that uses approx-
imation to the Hessian matrix directly. It computes a search direction using the formula for
Newton’s method
∇2 ϕ(b)hGN = −∇ϕ(b) (4.5)
when f (b∗ ) in Eq(3.15) is equal zero and ∇ϕ(b∗ ) is full rank, the Gauss-Newton method be-
haves like Newton’s method near the solution, but without the costs associated with computing
33
second derivatives. Now, we apply the Gauss-Newton method to a power exponential model
of the form
yi = µ(xi ; a, b) + εi
yi = a · (xi · e(1−xi ) )b + εi
with data
xi = (0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9)T
where the εi are measurement errors on the data ordinates, assumed to have like noise. We
apply the GN method without a line search using an initial guess that is close to the solution:
a = 0.53
b = 0.8137
For the given data we get the Jacobian

 
0.3232 −0.2394
 0.5211
 −0.2227

 0.6667
 −0.1773

 0.7752
 −0.1295

J=
 0.8560 −0.0873

 0.9146
 −0.0535

 0.9554
 −0.0286

 0.9815 −0.0120
0.09957 −0.0028
and

T 5.8575 −0.5775
J(b)J (b) =
−0.5775 0.1666
The Gauss-Newton search direction is gotten by solving the linear system
J(b)J T (b)hGN = −J(b) f (b)
and thus

GN −0.0004775
h =
−0.0000036
and the new estimate of the solution is

0.5273
0.8052
Since ϕ ≈ 0 , an approximate global solution has been found to the least-squares problem.
The least squares objective function can not be negative.
34
Figure 4.1: The power-exponential function of µ(a, b, xi ) in GN-algorithm
In general, the GN method is only assured to find a local solution when we apply a new initial
guess to get close solution and consequently the Gauss-Newton method converges slowly.
The purpose of nonlinear regression is to minimize the least squares problem, k f (b) k2 =
(y − µ(a, b, xi )). We add all the squares of all the entries in f (b) so the result is for residual
k f (b) k= 0.0246.
 
0.0053
−0.0151
 
 0.0042 
 
 0.0122 
 
f (b) = 
 0.0015 

−0.0005
 
−0.0093
 
 0.0084 
−0.0048
Thus (yi − µ(a, b, xi )) = 0.00060653, shows that the model has become quite accurate. The
advantage over Newton method is that we do not need to calculate the second-order derivative
of Hessian matrix. However, if any residual component fi (b∗) is large, the approximation of
second-order derivative in Hessian matrix is equal zero will be poor, and the Gauss-Newton
method will converge slower than the Newton method.
35
There is another important method called the Levenberg-Marquardt method that can also be
used to minimize this least squares problem. This method involves both steepest descent
and Gauss-Newton iteration. We can use a steepest descent type method until we approach
a minimum, then gradually switch to the quadratic rule. We can test to guess how close
we are to a minimum by how our error is changing. In particular, Levenberg’s algorithm is
formalized as follows damping parameter λ which will determine the blend between steepest
descent and Gauss-Newton iteration. Levenberg-Marquardt proposed an algorithm based on
this observation, whose update rule is a mix of the above-mentioned algorithms using the
modified Hessian matrix.
H(b, λ ) = HGN + λ I (4.6)
bi+1 = bi − (HGN + λ I)−1 ∇ϕ(bi ) (4.7)
Evaluate the new residuals error at the point given Eq.(4.7) and compute the cost at the new
point, ϕnew . This update rule is used as follows: if the residuals error goes down following
update, it implies that our quadratic assumption on ϕ(b) is working and when damping pa-
rameter λ is small, Hessian H approximates the Gauss-Newton steps. However, when λ is
large, and Hessian H is close to the identity, this causes steepest-descent steps to be taken.
On the other hand, if the error increases, we would like to follow the gradient more and so
damping parameter λ is increased by the same factor.
36
Figure 4.2: The power-exponential function of µ(a, b, xi )
In order to compute the step in Levenberg-Marquardt’s method is implemented in Eq.(3.21).

−0.0149
LM
h = · 10−23
−0.1468
The result is for residual error of Levenberg-Marquardt algorithm k f (b) k= 0.0246

 
0.0052
−0.0153
 
 0.0040 
 
 0.0120 
 
f (b) = 
 0.0013 

−0.0008
 
−0.0096
 
 0.0082 
−0.0050
and (yi − µ(a, b, xi )) = 0.00060611 if the error has decreased as a result the update, then
regress the step and decrease the damping parameter λ by a factor of 10 or 0.1. The above
algorithm has the disadvantage that if the value of damping parameter λ has a big value, the
computed Hessian matrix is not used at all.
37
Chapter 5
Application of non-linear least squares for

curve fitting
This chapter illustrates in detail the analysis of two different data sets by using two methods
for non-linear least squares techniques for curve fitting. After that, we will also compare the
two methods for each data set separately to observe the difference and between the methods
and determine which gives the best results.
5.1 Modelling mortality rate
Mortality (death rate) data indicate numbers of death by place, time and cause. The fact that
mortality is increasing strongly with increasing age is anything but surprising. But it is to be
worthwhile to look closer to how mortality changes due to age. There are patterns that might
still surprise a little. Death risk is an easy way to put mortality figures in relation to population
[15].
This data on the death rate for men in the USA between 1995 and 2004. Since it is very
unlikely that people die at a young age it is easier to see the structure of the data when we look
at the logarithm of the values instead of the values themselves [16].
38
Figure 5.1: Death risk after ages for men in USA between 1995 and 2004.
5.1.1 Analysing mortality rate data
Consider a linear combination of two power-exponential functions

c1 · e(c2 ·x)
µ(c1 , c2 , a1 , a2 , a3 , x) = + e(a3 −a1 ) · (a2 · x · e(−a2 ·x) )a3 (5.1)
x
where the function of µ is defined as death rate and the variable x is defined as ages for each
year. In this scenario, a mathematical model is arranged using experimental data points, and
smooth curve given by theoretical equation endeavours to be fitted to the data. By solving the
system of non-linear equations, we obtain the best estimates of the variables c1 , c2 , a1 , a2 , a3
of the function µ in a theoretical model (5.1). We are then able to plot this function along with
the data points and look how well these data points fit the theoretical equation.
Here, we will examine the two terms of power-exponential formula for death risk by using the
first main method is Gauss-Newton algorithm to attempt to fit data collected by the mortality
on death rate of USA between 1995 and 2004 to this theoretical model (5.1). We expect the
experimental data to nearly follow the theoretical model (5.1) for mortality data. In order to
do this, using the Gauss-Newton method, we must first calculate the partial derivatives for the
Jacobian and then calculate the Hessian matrix. Here, fi is given by the equation:
fi = yi − µ(xi ; c1 , c2 , a1 , a2 , a3 ) (5.2)
39
Thus our equation for the partial derivatives for the Jacobian matrix are given by:
∂f e(c2 ·xi )
=
∂ c1 xi
∂f
= c1 · e(c2 ·xi )
∂ c2
∂f
= −e(a3 −a1 ) · (a2 · xi · e(−a2 ·xi ) )a3
∂ a1
∂f e(a3 −a1 ) · (a2 · xi · e(−a2 ·xi ) )a3 · a3 · (a2 · xi − 1)
=−
∂ a2 a2
∂f
= e(a3 −a1 ) · (a2 · xi · e(−a2 ·xi ) )a3 · (ln(a2 · xi · e(−a2 ·xi ) ) + 1)
∂ a3
where xi is ages for each year as we mentioned before, yi is the mortality at that year, and
c1 , c2 , a1 , a2 , a3 are the initial mortality and rate of death. Once the partial derivatives for the
Jacobian have been calculated, we can proceed with the Gauss-Newton algorithm thereafter
we compute our residuals from year 1995 to 2004 to see the differences that can appear in the
graph. The code used for this procedure can be found in appendix A.2.
 
8.9810
7.3979
 
4.5882
 
4.5544
 
3.4147 −8
1.7581 · 10
fi =  
 
2.5582
 
2.5465
 
2.6603
2.6603
Implementation of the Gauss-Newton method usually executes a line search in the search
direction hGN , we satisfy the step length condition (3.18) as those discussed in Chapter 3.
40
(a) year 1995. (b) year 1998.
(c) year 2000. (d) year 2001.
(e) year 2003. (f) year 2004.
Figure 5.2: Death risk for men in USA for some years between 1995 and 2004 using Gauss-Newton
algorithm for x denotes age and y death risk.
41
We can see from Figure 5.2 that the curve fits the data quite well. We will now try the more
sophisticated L-M method and see if we can get a better fit.
Levenberg-Marquardt can also be implemented by combining the features of steepest descent

and Gauss-Newton algorithms. LM steps are linear combinations of these algorithms steepest
descent and Gauss-Newton steps based on adaptive rules. We will also study the power expo-
nential model for mortality using the Levenberg-Marquardt algorithm by computing Jacobian
and Hessian matrix of Eq.(5.2). Here the calculation of error residuals fi is given by graph.
 
0.19135
 0.1087 
 
 6.9263 
 
 5.5287 
 
 4.6414  −8
 5.4514  · 10
fi =  
 
 4.9257 
 
 4.8617 
 
 6.1015 
5.5459
The LM algorithm demands an initial guess for the parameters to be estimated. We chose the
values for each different variables c1 , c2 , a1 , a2 , and a3 for the initial guess on death rate of
USA between 1995 and 2004.
42
(a) year 1995. (b) year 1998.
(c) year 2000. (d) year 2001.
(e) year 2003. (f) year 2004.
Figure 5.3: Death risk for men in USA for some years between 1995 and 2004 using Levenberg
Marquardt algorithm for x denotes age and y death risk.
43
Here we notice that the figures of LM algorithm Figure 5.3 got fit well but not as Gauss-
Newton algorithm.
Year GN algorithm LM algorithm

1995 8.9810 0.19135
1998 4.5544 5.5287
2000 1.7581 5.4514
2001 2.5582 4.9257
2003 2.6603 6.1015
2004 2.6603 5.5459
Table 5.1: The result of residuals least squares ×10−8 using µ(c1 , c2 , a1 , a2 , a3 ; x).
In Table 5.1 shows the final result of error residuals that are sums of squares errors for GN
and LM algorithms on mortality data. We explored that the Levenberg-Marquardt method has
gotten a good fit in year 1995 while the Gauss-Newton method has also gotten a good fit curve
in year 2000. So we can say the GN algorithm fits well in some years and LM algorithm as
well.
5.2 Modelling rocket triggered lightning return strokes
Here we will fit a model based on power-exponential functions to measured data for rocket
triggered lightning return strokes. This model could then be used to calculate electric and
magnetic field using techniques similar to the ones in [17].
44
Figure 5.4: Equipment for measuring rocket-triggered return strokes, image originally ap-
peared in[17]
5.2.1 Analysing data rocket-trigged return stroke
Here we will study with a different linear combination of power exponential function as we
did with model previously:
µ(a, b, c, x) = (p − a) · (x · e(1−x) )b + a · (x · e(1−x) )c (5.3)
where the mathematical model of µ is defined as rocket trigged return strokes 8725,8726 and
8705 with a different fixed value for the peak p as it can be seen in Table 5.2. The variable
x is defined as time rescaled so that the peak happens at x = 1 and three different variables a,
b and c in the equation for the power exponential given above, and represent as initial guess
as well. As a final step of application of non-linear least squares for fitting curve, we try to
find the best fit for a set of data with this function in Appendix A.3, shown above Eq.(5.3). As
always, the code requires that we first solve, analytically, for the partial derivatives demands
for the Jacobian. The mathematical model to be minimized, fi , is given by:
fi = yi − µ(xi ; a, b, c) (5.4)
where fi is residual at the particular current waveform of rocket-trigged stroke value, and yi is
represented as an electric field. As mentioned before, xi is time and the valued of parameters
45
a, b, c for initial guess. Since the Gauss-Newton and Levenberg-Marquardt methods call the
Jacobian of fi , the three partial derivatives with respect to the three variables a, b and c must
be calculated and entered into the function d f in MATLAB code refer to Appendix A.3. The
three partial derivatives are shown below for each stroke 8725, 8726 and 8705 respectively,
according to the values of peak:
∂f
= −(x · e(1−x) )b + ((x · e(1−x) )c )
∂a
∂f
= (p − a) · (log(x) − x + 1) · (x · e(1−x) )b
∂b
∂f
= a · (log(x) − x + 1) · (x · e(1−x) )c
∂c
Using these three partial derivatives, the Jacobian of fi and its transpose can be calculated,
that way allowing us to apply both algorithms Gauss-Newton and Levenberg-Marquardt. Now
that the three variables a, b, c in the function have been determined, it is possible to plot the
mathematical model along the measured experimental data points to see if the formulas is
actually a good fit for data using both GN and LM algorithms. The graphs below shows these
plots of the mathematical model Eq.(5.3) and collected data points
46
(a) Result for stroke 8725 using the Gauss- (b) Result for stroke 8725 using the Levenberg-
Newton algorithm. Marquardt algorithm.
(c) Result for stroke 8726 using the Gauss- (d) Result for stroke 8726 using the Levenberg-
(e) Result for Stroke 8705 using the Gauss- (f) Result for stroke 8705 using the Levenberg-
Figure 5.5: Comparison of results for fitting rocket-triggered lightning return strokes data
using the Gauss-Newton algorithm and the Levenberg-Marquardt algorithm
47
In the plots shown above, we can see that the power exponential data is actually a pretty
good approximation of collected data. Moreover, a good way to test if the fit is truly good
to look at the sum of the squares of the residuals. As can be seen in the table 5.2, the sum
Stroke peak value GN algorithm LM algorithm

8725 20 280.0435 96.4843
8726 35.3 220.6136 187.0953
8705 8 96.2378 32.2094
Table 5.2: The result of residuals least squares for each rocket-triggered stroke using formula
fi = yi − µ(xi ; a, b, c).
of the least squares of the residuals has been reduced after implementing the Gauss-Newton
and Levenberg Marquardt algorithms for each stroke and we can also see the big difference of
the least squares errors for each them. The Stroke 8705 got the best fit of data set in rocket-
triggered data for both algorithms while the stroke 8725 was worst fit we got in GN-algorithm
and the stroke 8726 was also a worst fit we got in LM-algorithm. According to the figures, the
Levenberg-Marquardt fitting works better than Gauss-Newton fitting. It means that the best
method for solving non-linear least square fitting is Levenberg-Marquardt which is showing
to be more reliable than other Gauss-Newton method.
48
Chapter 6
Conclusions and future work
The results from this thesis indicate that the non-linear least squares approximations are a
useful tool for analyzing sets of data and finding the parameters that gives the best fit for a
model.
Two different data sets have been compared using Gauss-Newton and Levenberg-Marquardt
methods to see which method is more effective on these sets of data. In the first experiment
we applied both methods to mortality rate data and we can see the final results of residuals
least squares in the Table 5.1. In the year 2000 the Gauss-Newton algorithm showed better
curve fitting while in 1995 the Levenberg-Marquardt algorithm showed better fits.
In the second part of experiment we ran tests using rocket triggered lightning return strokes
data. It was used with three different power exponential functions for each stroke 8725,8726
and 8705. Stroke 8705 showed the best curve fitting while stroke 8726 resulted the worst curve
fitting errors. As shown in the figure 5.2 illustrated the difference results for curve fitting using
both methods. To summarize, Levenberg-Marquardt showed much better results when using
different power exponential functions for each stroke.
We can conclude from these experiments that the efficiency of the methods depend on the
input data sets that is used.
6.1 Future Work
In this thesis, we went through comparisons between two numerical methods of solving non-
linear squares fitting curves. For future research, different methods can be explored such as
Broyden’s method, Hybrid method: LM-Quasi Newton method and Powell’s Dog Leg method.
Furthermore, more examples of application for non-linear least squares fitting algorithm could
be explored. Lastly, the different data sets that we presented could be studied and analyzed
49
with different models then the ones used in this thesis which might result in the extraction
of additional information from data modelled. If it was possible to find a way to predict
what method would work best for a set of data without trying and comparing the results from
different methods, that would also be a useful result.
50
Chapter 7
Summary of reflection of objectives in the

thesis
A summary of the objectives accomplished in this thesis will be presented in this chapter.
7.1 Objective 1: Knowledge and understanding
The theoretical description of the methods for non-linear least-square fitting presented in this
thesis are collected from many sources and during the writing of the thesis I encountered
several other methods that it there was not enough time to discuss.
I also did have time to fully implement and test all methods that I described in applications so
I chose to focus on two of the methods in order to understand them properly.
7.2 Objective 2: Methodological knowledge
I have learned a lot about the methodology used in the theoretical description of mathematics
and how to analyze and present results using a MATLAB program. One of the major chal-
lenges when writing mathematics is to both be specific and rigorous without obscuring the
intuitive aspects of the examined ideas and methods. I feel that I have developed a lot in
this regard. I have also developed my skills in programming a great deal by implementing
the Gauss-Newton and Levenburg-Marquardt method. Discussions with and advice from my
supervisor was also very valuable in this regard.
51
7.3 Objective 3: Critically and Systematically Integrate Knowl-
edge
While working on the thesis I have found many sources apart from those suggested by my
supervisors. I also found that different sources and ways to describe the same thing was useful
in different ways in different parts of the thesis. The sources with the clearest description of
the theory was not always the most helpful ones for implementation and vice versa.
7.4 Objective 4: Independently and Creatively Identify and

Carry out Advanced Tasks
While working on the project I made the majority of the choices related to the content of the
theoretical of the thesis. The applied part was based on suggestions by my supervisors but
the details of how to reach the results were mostly performed independently. Discussions and
refinements of my ideas with the supervisor Karl Lundengård was very valuable and improved
the quality of the final result a lot.
7.5 Objective 5: Present and Discuss Conclusions and Knowl-

edge
The theory in this thesis are previously known results collected from many different sources
and should be accessible to any reader familiar with linear algebra and calculus. The complex-
ities of the topics is mostly seen in the application of the methods. I have provided figures and
tables to make it easier for the reader to understand the comparisons and several aspects of the
results are discussed. I learned a lot about how to use a computer to present results well using
figures. The algorithms are presented in a way that is independent of a particular programming
language or software so it is not required that the reader is familiar with MATLAB or other
software for scientific computation.
7.6 Objective 6: Scientific, Social and Ethical Aspects
The original part of this thesis is not in the algorithms used or the mathematical theory pre-
sented. Instead it is the models and data that have not been analyzed previously. The analysis
was suggested by my supervisor Karl Lundengård who also supported me during the appli-
cation and presentation of the results. When implementing the algorithms I examined imple-
52
mentations made by others but all code in the appendix is written by myself, though on several
occasions my supervisor help with debugging and refinement was very valuable. I have also
made sure to give clear references the sources that I found most useful or most important.
53
Bibliography
[1] Alfonso C.,Lindsey P., Winnie R. Solving nonlinear least squares problems with Gauss-
Newton and Levenberg-Marquardt Methods. Department of Mathematics Louisiana
State University Baton Rouge, LA and Department of Mathematics University of Mis-
sissippi Oxford, MS, July 6, 2012.
[2] Henri P. Gavin. The Levenberg-Marquardt method for nonlinear least squares curve-
fitting problems. Department of Civil and Environmental Engineering,Duke University,
May 4, 2016.
[3] Åke Björk and Germand Dahlquist. Numerical Methods in Scientific Computing Volume
II. Linköping university and Royal Institute of Technology, 557-580, April 10,2008.
[4] K. Madsen, H.B. Nielsen, O. Tingleff, Methods for non-linear least squares problems,
Informatics and Mathematical Modelling Technical University of Denmark, April 2004.
[5] Marquardt, Donald W. "An algorithm for least-squares estimation of nonlinear pa-
rameters." Journal of the society for Industrial and Applied Mathematics 11.2: 431–
441(1963).
[6] Levenberg, Kenneth "A Method for the Solution of Certain Non-Linear Problems in
Least Squares." The Quarterly of Applied Mathematics. 2: 164-168 (1944)
[7] Hansen, Per Christian, Godela Scherer and V. Pereyra. Least Squares Data Fitting with
Applications, Johns Hopkins University Press (2013).
[8] Pradit Mittrapiyanuruk, A Memo on How to Use the Levenberg-Marquardt Algorithm for
Refining Camera Calibration Parameters, Robot Vision Laboratory, Purdue University,
West Lafayette, IN, USA. Oct 24, 2014.
[9] Mark S. Gockenbach. Newton’s method for nonlinear systems. Available at

http://www.math.mtu.edu /lectures/newton/node3.html(2003-01-
23).
[10] Raphael Hauser.Line Search Methods for Unconstrained Optimisation, Numerical Linear
Algebra and Optimisation, Oxford University Computing Laboratory, May 2007.
54
[11] Shidong Shan, A Levenberg-Marquardt Method For Large-Scale Bound-Constrained
Nonlinear Least-Squares, Acadia University, July 2008.
[12] Niclas Börlin. Trust-Region and the Levenberg-Marquardt method, 5DA001 Non-linear
Optimization. Department of Computing Science Umeå University, November 22, 2007
[13] Mark K. Transtruma, James P. Sethna,Improvements to the Levenberg-Marquardt algo-

rithm for nonlinear least-squares minimization, a Laboratory of Atomic and Solid State
Physics, Cornell University, Ithaca, New York 14853, USA, Jan 27, 2012.
[14] Sandra A. Santos,Tust-Region-Based Methods for nonlinear programming: Recent Ad-

vances and Perspectives,Brazilian Operations Research Society (2014).
[15] Martin Ribe,VälfärdsBulletinen Nr 2 (1999).
[16] Human Mortality Database. University of California, Berkeley (USA), and Max Planck
Institute for Demographic Research (Germany). Available at
http://www.mortality.org or
http://www.humanmortality.de (2017-06-14)
[17] J. C. Willett, J.C. Bailey, V.P.Idone and A. Eybert Berard and L.Barret "Submicrosecond
Intercomparison of Radiation Fields and Currents in Triggered Lightning Return Strokes
Based on the Transmission-Line Model", Journal of Geophysical Research 13,275-
13,286, (1989)
[18] Press, W. H., Flannery, B. P., Teukolsky, S. A., and Vetterling, W. T. Numerical Recipes
in C. Cambridge University Press, New York (1988).
[19] Lawson, C.L., and Hanson, R. Solving Least Squares Problems.Englewood Cliffs, NJ:
Prentice-Hall (1974).
[20] S.Gratton, A.S. Lawless and N.K. Nickols. Approximate Gauss-Newton methods for non-
linear least squares problems, Department of Mathematics. The University of Reading
Berkshire RG66AX,UK (2004).
[21] Jim Lambers. Positive and Negative Definite Matrices and Optimization. Lecture 3 Notes
MAT 419/519, Summer Session (2011)
55
Appendix A
MATLAB Code
A.1 Calculating residuals and Jacobian for the first power

exponential function µ(b; x) using GN and Levenberg-
Marquardt algorithms
% −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− %
% G e n e r a t e power e x p o n e n t i a l f u n c t i o n by %
% i m p l e m e n t i n g Gauss−Newton a l g o r i t h m s %
% −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− %
%G e n e r a t e t h e s y n t h e t i c d a t a
% from t h e c u r v e f u n c t i o n w i t h some a d d i t i o n a l noise
x=(0.1:0.1:0.9) ’;
a= 0 . 5 3 ; %0.53;
b= 0 . 8 1 3 7 ;
rng ( 1 0 ) ;
f u n = a ∗ ( x . ∗ exp (1−x ) ) . ^ b ;
%n o i s e = r a n d ( l e n g t h ( x ) , 1 ) ;
noise =randn ( s i z e ( x ) ) ;
y = fun + 0.01∗ noise ;
%The main c o d e f o r t h e GN a l g o r i t h m f o r
%e s t i m a t i n g a and b from t h e a b o v e d a t a
% Following algorithm 2.2
% Step 0: choose i n i t i a l parameter values ,
%I c h o o s e them d i f f e r e n t from t h e
56
% e x p e c t e d r e s u l t s o t h a t we c a n s e t h a t t h e method works
a0 = 2 ;
b0 = 1 ;
y _ i n i t = a0 ∗ ( x . ∗ exp (1−x ) ) . ^ b0 ;
Ndata = l e n g t h ( y ) ;
Nparams = 2 ; % a and b a r e t h e p a r a m e t e r s t o be e s t i m a t e d
n _ i t i r s = 15;
updateJ = 1;
a _ e s t = a0 ;
b _ e s t =b0 ;
% Step 1: r e p e a t u n t i l convergence
for i t = 1: n _ i t i r s
% Step 1 . 1 : Solve normal e q u a t i o n s
% Evaluate the Jacobian matrix at
%t h e c u r r e n t p a r a m e t e r s ( a _ e s t , b _ e s t )
J = z e r o s ( Ndata , Nparams ) ;
f o r i =1: l e n g t h ( x )
J ( i , : ) = [ ( x ( i ) ∗ exp (1−x ( i ) ) ) . ^ b _ e s t
a _ e s t ∗ ( x ( i ) . ∗ exp (1− x ( i ) ) ) . ^ b _ e s t . ∗ l o g ( x ( i ) . ∗ exp (1−x ( i ) ) ) ] ;
end
% compute t h e a p p r o x i m a t e d H e s s i a n m a t r i x ,
% J ^{ ’} i s t h e t r a n s p o s e of J
H = J ’∗ J ;
% Evaluate the distance error at the current parameters

y _ e s t = a _ e s t ∗ ( x . ∗ exp (1−x ) ) . ^ b _ e s t ;
% t h e f i r s t i t e r a t i o n : compute t h e t o t a l e r r o r
d = y−y _ e s t ;
i f i t ==1
e= d o t ( d , d ) ;
end
% S t e p 1 . 2 : Choose s t e p l e n g t h a_k s o t h a t t h e r e i s enough d e s c e n t

% Here I s t a r t w i t h a_k = 10 and make i t s m a l l e r i f we do n o t g e t a
% better result .
%I f we h a v e t r i e d making i t s m a l l e r many t i m e s we g i v e up .
a_k = 1 0 0 ;
for i = 1:15
% Compute t h e u p d a t e d p a r a m e t e r s
% A c c o r d i n g t o t h e r e p o r t t h e r e s h o u l d be a
57
% minus s i g n h e r e b u t t h e n t h e r e s u l t s a r e a l l wrong
dp= i n v (H) ∗ ( J ’ ∗ d ( : ) ) ;
% Double−c h e c k t h i s !
a_gn = a _ e s t + a_k ∗ dp ( 1 ) ;
b_gn= b _ e s t + a_k ∗ dp ( 2 ) ;
% Evaluate the t o t a l distance error at

%t h e u p d a t e d p a r a m e t e r s
y _ e s t _ g n = a_gn ∗ ( x . ∗ exp (1−x ) ) . ^ b_gn ;
d_gn=y−y _ e s t _ g n ;
e_gn = d o t ( d_gn , d_gn ) ;
% I f the t o t a l d i s t a n c e e r r o r of
%t h e u p d a t e d p a r a m e t e r s i s l e s s t h a n t h e p r e v i o u s one
% t h e n makes t h e u p d a t e d p a r a m e t e r s t o be t h e c u r r e n t p a r a m e t e r s ,
% o t h e r w i s e make t h e s t e p l e n g t h h a l f a s l o n g
i f e_gn <e
a _ e s t = a_gn ;
b _ e s t =b_gn ;
e= e_gn ;
break
else
a_k = a_k / 2 ;
end
end
i;
i f i == 15
d i s p ( ’ c a n n o t f i n d b e t t e r v a l u e s , p o s s i b l e l o c a l minima ’ )
break
end
end
plot (x , y , ’ r ∗ ’);
h o l d on
plot (x , y_init , ’g ’ ) ;
p l o t ( x , y_est , ’ b ’ ) ;
p l o t ( x , y _ e s t _ g n , ’ ko ’ ) ;
xlabel ( ’x ’)
ylabel ( ’y ’)
g r i d on
t i t l e ( ’ Use GN−a l g o r i t h m ’ )
l e g e n d ( ’ D a t a p o i n t s ’ , ’ With a0 , b0 ’ , ’ With f i t t e d a , b ’ )
58
% −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− %
% i m p l e m e n t i n g L e v e n b e r g −M a r q u a r d t a l g o r i t h m s %
%
% −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− %
%G e n e r a t e t h e s y n t h e t i c d a t a from
%t h e c u r v e f u n c t i o n w i t h some a d d i t i o n a l n o i s e
x=(0.1:0.1:0.9) ’;
a =0.53;
b =0.8137;
rng ( 1 0 ) ;
f u n = a ∗ ( x . ∗ exp (1−x ) ) . ^ b ;
noise =randn ( s i z e ( x ) ) ;
y = fun + 0.01∗ noise ;
%%The main c o d e f o r
%t h e LM a l g o r i t h m f o r e s t i m a t i n g
%a and b from t h e a b o v e d a t a
%i n i t i a l guess f o r the parameters
a0 = 2 ;
b0 = 1 ;
y _ i n i t = a0 ∗ ( x . ∗ exp (1−x ) ) . ^ b0 ;
Nparams = 2 ; % a and b a r e t h e p a r a m e t e r s t o be e s t i m a t e d
n _ i t i r s = 1 5 ; % s e t # o f i t e r a t i o n s f o r t h e LM
updateJ = 1;
a _ e s t = a0 ;
b _ e s t =b0 ;
%S t e p 1 : r e p e a t u n t i l c o n v e r g e n c e
i f u p d a t e J ==1
% % Evaluate the Jacobian matrix
%a t t h e c u r r e n t p a r a m e t e r s ( a _ e s t , b _ e s t )
f o r i =1: l e n g t h ( x )
J ( i , : ) = [ ( x ( i ) ∗ exp (1−x ( i ) ) ) . ^ b _ e s t a _ e s t ∗ ( x ( i ) . ∗ exp (1−
x ( i ) ) ) . ^ b _ e s t ∗ l o g ( x ( i ) ∗ exp (1−x ( i ) ) ) ] ;
end
59
%Compute t h e a p p r o x i m a t e d H e s s i a n m a t r i x ,
%J { ’ } i s t h e t r a n s p o s e o f J
H = J ’∗ J ;
%E v a l u a t e t h e d i s t a n c e e r r o r a t t h e c u r r e n t p a r a m e t e r s
y _ e s t = a _ e s t ∗ ( x . ∗ exp (1−x ) ) . ^ b _ e s t ;
%t h e f i r s t i t e r a t i o n : compute t h e t o t a l e r r o r
d=y−y _ e s t ;
if i t == 1
e = dot (d , d ) ;
end
end
lamda = 0 . 0 1 ; % s e t an i n i t i a l v a l u e o f t h e damping f a c t o r
%f o r t h e LM
for i = 1:15
%Apply t h e damping f a c t o r t o t h e H e s s i a n m a t r i x
H_LM=H+ ( lamda ∗ e y e ( Nparams , Nparams ) ) ;
%Compute t h e u p d a t e d p a r a m e t e r s
h_lm = i n v (H_LM) ∗ ( J ’ ∗ d ( : ) ) ;
a_lm= a _ e s t +h_lm ( 1 ) ;
b_lm= b _ e s t +h_lm ( 2 ) ;
%E v a l u a t e t h e t o t a l d i s t a n c e e r r o r a t
y _ e s t _ l m = a_lm ∗ ( x . ∗ exp (1−x ) ) . ^ b_lm ;
d_lm=y−y _ e s t _ l m ;
e_lm= d o t ( d_lm , d_lm ) ;
%I f t h e t o t a l d i s t a n c e e r r o r o f
%t h e n makes t h e u p d a t e d p a r a m e t e r s t o
%be t h e c u r r e n t p a r a m e t e r s and d e c r e a s e s
%t h e v a l u e o f t h e damping f a c t o r
i f e_lm <e
60
lamda = lamda / 1 0 ;
a _ e s t =a_lm ;
b _ e s t =b_lm ;
e=e_lm ;
updateJ =1;
break
else
updateJ =0;
lamda = lamda ∗ 1 0 ;
end
end
i;
i f i == 15
disp ( ’ cannot find better ’ )
break
end
end
plot (x , y , ’ r ∗ ’);
h o l d on
plot (x , y_init , ’g ’ ) ;
p l o t ( x , y_est , ’ b ’ )
p l o t ( x , y _ e s t _ l m , ’ ko ’ ) ;
xlabel ( ’x ’)
ylabel ( ’y ’)
g r i d on
t i t l e ( ’ Use LM−a l g o r i t h m ’ )
l e g e n d ( ’ D a t a p o i n t s ’ , ’ With a0 , b0 ’ , ’ With f i t t e d a , b ’ )
A.2 Implemention mortality rate In GN and LM Algorithms

using power exponential model µ(c1, c2, a1, a2, a3; x)
% −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− %
%
% −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− %
%from t h e c u r v e f u n c t i o n w i t h some a d d i t i o n a l n o i s e
load ( ’ test_data_usa ’ ) ;
c1 = 0 . 0 0 5 ;
c2 = 0 . 0 0 8 7 ;
61
a1 = 7 ;
a2 = 1 / 2 0 ;
a3 = 2 0 ;
years = 1:40;
x= a g e s ( y e a r s ) ’ ;
mu = c1 ∗ exp ( c2 ∗x ) . / x + exp ( a3−a1 ) ∗ ( a2 ∗x . ∗ exp (− a2 ∗x ) ) . ^ a3 ;
y = t e s t _ d a t a _ u s a ( y e a r s , 1 ) ; % Note : T e s t o t h e r c o l u m n s
%e s t i m a t i n g a and b from t h e a b o v e d a t a
% I c h o o s e them d i f f e r e n t from t h e
c10 = 0 . 0 0 0 8 ; %0.04;
c20 = 0 . 1 1 1 ; %0.095;
a10 = 8 . 9 ;
a20 = 1 / 1 7 ;
a30 = 3 0 ;
y _ i n i t = c10 ∗ exp ( c20 ∗x ) . / x
+ exp ( a30−a10 ) ∗ ( a20 ∗x . ∗ exp (− a20 ∗x ) ) . ^ a30 ;
Nparams = 5 ;
n _ i t i r s = 100;
figure (1)
plot (x , y , ’b ’ ,x , y_init , ’ r ’)
xlim ( [ 0 30 ] )
%%
updateJ = 1;
c 1 _ e s t = c10 ;
c 2 _ e s t = c20 ;
a 1 _ e s t = a10 ;
a 2 _ e s t = a20 ;
a 3 _ e s t = a30 ;
62
% Evaluate the Jacobian matrix
%a t t h e c u r r e n t p a r a m e t e r s
f o r i =1: l e n g t h ( x )
J ( i , : ) = [ exp ( c 2 _ e s t ∗x ( i ) ) . / x ( i )
c 1 _ e s t ∗ exp ( c 2 _ e s t ∗x ( i ) )
−exp ( a 3 _ e s t −a 1 _ e s t ) ∗ ( a 2 _ e s t ∗x ( i ) . ∗ exp (− a 2 _ e s t ∗x ( i ) ) ) . ^ a 3 _ e s t
−exp ( a 3 _ e s t −a 1 _ e s t ) ∗ ( a 2 _ e s t ∗x ( i ) . ∗ exp (− a 2 _ e s t ∗x ( i ) ) )
. ^ a 3 _ e s t ∗ ( a 2 _ e s t ∗x ( i ) − 1 ) . / a 2 _ e s t
exp ( a 3 _ e s t −a 1 _ e s t ) ∗ ( a 2 _ e s t ∗x ( i )
. ∗ exp (− a 2 _ e s t ∗ x ( i ) ) ) . ^ a 3 _ e s t ∗ ( l o g ( a 2 _ e s t ∗x ( i ) ∗ exp (− a 2 _ e s t ∗x ( i ) ) ) + 1 ) ] ’ ;
end
% J { ’} i s t h e t r a n s p o s e of J
H = J ’∗ J ;

y _ e s t = c 1 _ e s t ∗ exp ( c 2 _ e s t ∗x ) . / x
+ exp ( a 3 _ e s t −a 1 _ e s t ) ∗ ( a 2 _ e s t ∗x . ∗ exp (− a 2 _ e s t ∗x ) ) . ^ a 3 _ e s t ;
d = y−y _ e s t ;
i f i t ==1
e= d o t ( d , d ) ;
end

% better result .
a_k = 1 0 0 ;
for i = 1:15
dp= i n v (H) ∗ ( J ’ ∗ d ( : ) ) ; % A c c o r d i n g t o t h e r e p o r t t h e r e s h o u l d be a
% minus s i g n h e r e b u t t h e n t h e r e s u l t s a r e a l l wrong
% Double−c h e c k t h i s !
c1_gn = c 1 _ e s t + a_k ∗ dp ( 1 ) ;
c2_gn = c 2 _ e s t + a_k ∗ dp ( 2 ) ;
a1_gn = a 1 _ e s t + a_k ∗ dp ( 3 ) ;
a2_gn = a 2 _ e s t + a_k ∗ dp ( 4 ) ;
a3_gn = a 3 _ e s t + a_k ∗ dp ( 5 ) ;
63
% Evaluate the t o t a l d i s t a n c e e r r o r at the updated parameters
y _ e s t _ g n = c1_gn ∗ exp ( c2_gn ∗x ) . / x
+ exp ( a3_gn−a1_gn ) ∗ ( a2_gn ∗x . ∗ exp (− a2_gn ∗x ) ) . ^ a3_gn ;
% t h e u p d a t e d p a r a m e t e r s i s l e s s t h a n t h e p r e v i o u s one
i f e_gn <e
c 1 _ e s t = c1_gn ;
c 2 _ e s t = c2_gn ;
a 1 _ e s t = a1_gn ;
a 2 _ e s t = a2_gn ;
a 3 _ e s t = a3_gn ;
e= e_gn ;
break
else
a_k = a_k / 2 ;
end
end
i;
i f i == 15
d i s p ( ’ c a n n o t f i n d b e t t e r v a l u e s , p o s s i b l e l o c a l minima ’ )
break
end
end
figure (2)
plot (x , y , ’ r ∗ ’);
h o l d on
%p l o t ( x , y _ i n i t , ’ g ’ ) ;
p l o t ( x , y_est , ’ b ’ ) ;
p l o t ( x , y _ e s t _ g n , ’ ko ’ ) ;
xlabel ( ’x ’)
ylabel ( ’y ’)
g r i d on
l e g e n d ( ’ D a t a p o i n t s ’ , ’ With c10 , c20 , a10 , a20 , a30 ’ ,
’ With f i t t e d c1 , c2 , a1 , a2 , a3 ’ )
% −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− %
64
%
% −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− %
% from t h e c u r v e f u n c t i o n w i t h some a d d i t i o n a l noise
load ( ’ test_data_usa ’ ) ;
c1 = 0 . 0 0 5 ;
c2 = 0 . 0 0 8 7 ;
a1 = 7 ;
a2 = 1 / 2 0 ;
a3 = 2 0 ;
years = 1:40;
x= a g e s ( y e a r s ) ’ ;
mu = c1 ∗ exp ( c2 ∗x ) . / x + exp ( a3−a1 ) ∗ ( a2 ∗x . ∗ exp (− a2 ∗x ) ) . ^ a3 ;
y = t e s t _ d a t a _ u s a ( years , 8 ) ;
%The main c o d e f o r t h e GN a l g o r i t h m
%f o r e s t i m a t i n g a and b from t h e a b o v e d a t a
% I c h o o s e them d i f f e r e n t from t h e
c10 = 0 . 0 0 0 5 ; %0.04;
c20 = 0 . 1 1 5 ; %0.095;
a10 = 8 . 5 ;
a20 = 1 / 2 0 ;
a30 = 3 0 ;
y _ i n i t = c10 ∗ exp ( c20 ∗x ) . / x + exp ( a30−a10 ) ∗ ( a20 ∗x . ∗ exp (− a20 ∗ x ) ) . ^ a30 ;
%y _ i n i t = exp ( a30−a10 ) ∗ ( a20 ∗x . ∗ exp (− a20 ∗x ) ) . ^ a30 ;
Nparams = 5 ;
n _ i t i r s = 100;
figure (1)
plot (x , y , ’b ’ ,x , y_init , ’ r ’)
xlim ( [ 0 30 ] )
%%
updateJ = 1;
65
c 1 _ e s t = c10 ;
c 2 _ e s t = c20 ;
a 1 _ e s t = a10 ;
a 2 _ e s t = a20 ;
a 3 _ e s t = a30 ;
%a t t h e c u r r e n t p a r a m e t e r s ( a _ e s t , b _ e s t )
f o r i =1: l e n g t h ( x )
J ( i , : ) = [ exp ( c 2 _ e s t ∗x ( i ) ) . / x ( i )
c 1 _ e s t ∗ exp ( c 2 _ e s t ∗x ( i ) )
−exp ( a 3 _ e s t −a 1 _ e s t ) ∗ ( a 2 _ e s t ∗x ( i ) . ∗ exp (− a 2 _ e s t ∗x ( i ) ) ) . ^ a 3 _ e s t
−exp ( a 3 _ e s t −a 1 _ e s t ) ∗ ( a 2 _ e s t ∗x ( i ) . ∗ exp (− a 2 _ e s t ∗x ( i ) ) )
. ^ a 3 _ e s t ∗ ( a 2 _ e s t ∗x ( i ) − 1 ) . / a 2 _ e s t
exp ( a 3 _ e s t −a 1 _ e s t ) ∗ ( a 2 _ e s t ∗x ( i )
. ∗ exp (− a 2 _ e s t ∗x ( i ) ) ) . ^ a 3 _ e s t ∗ ( l o g ( a 2 _ e s t ∗x ( i ) ∗ exp (− a 2 _ e s t ∗x ( i ) ) ) + 1 ) ] ’
end
%J { ’ } i s t h e t r a n s p o s e o f J
H = J ’∗ J ;

y _ e s t = c 1 _ e s t ∗ exp ( c 2 _ e s t ∗x ) . / x
+ exp ( a 3 _ e s t −a 1 _ e s t ) ∗ ( a 2 _ e s t ∗x . ∗ exp (− a 2 _ e s t ∗ x ) ) . ^ a 3 _ e s t ;
d = y−y _ e s t ;
i f i t ==1
e= d o t ( d , d ) ;
end
lamda = 0 . 1 ; % s e t an i n i t i a l v a l u e o f
%t h e damping f a c t o r f o r t h e LM
for i = 1:15
h_lm = i n v (H_LM) ∗ ( J ’ ∗ d ( : ) ) ;
c1_lm = c 1 _ e s t +h_lm ( 1 ) ;
66
c2_lm = c 2 _ e s t +h_lm ( 2 ) ;
a1_lm = a 1 _ e s t +h_lm ( 3 ) ;
a2_lm = a 2 _ e s t +h_lm ( 4 ) ;
a3_lm = a 3 _ e s t +h_lm ( 5 ) ;
%E v a l u a t e t h e t o t a l d i s t a n c e e r r o r a t t h e u p d a t e d p a r a m e t e r s
y _ e s t _ l m = c1_lm ∗ exp ( c2_lm ∗x ) . / x
+ exp ( a3_lm−a1_lm ) ∗ ( a2_lm ∗x . ∗ exp (− a2_lm ∗ x ) ) . ^ a3_lm ;
% t h e n makes t h e u p d a t e d p a r a m e t e r s t o be t h e c u r r e n t p a r a m e t e r s
% and d e c r e a s e s t h e v a l u e o f t h e damping f a c t o r
i f e_lm <e
c 1 _ e s t = c1_lm ;
c 2 _ e s t = c2_lm ;
a 1 _ e s t = a1_lm ;
a 2 _ e s t = a2_lm ;
a 3 _ e s t = a3_lm ;
e=e_lm ;
updateJ =1;
break
else
updateJ =0;
end
end
i;
i f i == 15
break
end
end
figure (2)
plot (x , y , ’ r ∗ ’);
h o l d on
67
%p l o t ( x , y _ i n i t , ’ g ’ ) ;
p l o t ( x , y_est , ’ b ’ ) ;
p l o t ( x , y _ e s t _ l m , ’ ko ’ ) ;
xlabel ( ’x ’)
ylabel ( ’y ’)
g r i d on
l e g e n d ( ’ D a t a p o i n t s ’ , ’ With c10 , c20 , a10 , a20 , a30 ’ ,
’ With f i t t e d c1 , c2 , a1 , a2 , a3 ’ )
A.3 Implemention data rocket-trigged return stroke In GN

and LM algorithms using power exponential model µ(a, b, c; x)
% −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− %
%
% −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− %
l o a d ( ’ R o c k e t t r i g . mat ’ ) ;
x= S t r o k e 8 7 2 5 ( 1 : 4 0 , 1 ) ;
%x= S t r o k e 8 7 2 6 ( 1 : 4 0 , 1 ) ;
%x= S t r o k e 8 7 0 5 ( 1 : 4 0 , 1 ) ;
x_1 = x / x ( end ) ;
y= S t r o k e 8 7 2 5 ( 1 : 4 0 , 2 ) ;
%y= S t r o k e 8 7 2 6 ( 1 : 4 0 , 2 ) ;
%y= S t r o k e 8 7 0 5 ( 1 : 4 0 , 2 ) ;
%The main c o d e f o r t h e GN a l g o r i t h m
%f o r e s t i m a t i n g a and b from t h e a b o v e d a t a
a0 = 6 ; %18
b0 = 4 ;
c0 = 1 0 ;
%F o r s t r o k e ( 8 7 2 5 ) w h i t h p e a k = 2 0 ;
y _ i n i t = (20− a0 ) ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ b0
+ a0 ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ c0 ;
%F o r s t r o k e ( 8 7 2 6 ) w h i t h p e a k = 3 5 . 3 ;
68
%y _ i n i t = (35.3 − a0 ) ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ b0
+ a0 ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ c0 ;
%F o r s t r o k e ( 8 7 0 5 ) w h i t h p e a k = 8 ;
%y _ i n i t = (8− a0 ) ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ b0
+ a0 ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ c0 ;
Nparams = 3 ;
n _ i t i r s = 15;
updateJ = 1;
a _ e s t = a0 ;
b _ e s t =b0 ;
c _ e s t = c0 ;
%a t t h e c u r r e n t p a r a m e t e r s
f o r i = 1 : l e n g t h ( x_1 )
J ( i , : ) = [ − ( x_1 ( i ) . ∗ exp (1− x_1 ( i ) ) )
. ^ b _ e s t + ( x_1 ( i ) . ∗ exp (1− x_1 ( i ) ) ) . ^ c _ e s t
(20− a _ e s t ) ∗ ( l o g ( x_1 ( i )) − x_1 ( i ) + 1 )
∗ ( x_1 ( i ) . ∗ exp (1− x_1 ( i ) ) ) . ^ b _ e s t
a _ e s t ∗ ( l o g ( x_1 ( i )) − x_1 ( i ) + 1 ) ∗ ( x_1 ( i ) . ∗ exp ( x_1 ( i ) − 1 ) ) . ^ c _ e s t ] ’ ;
% computing the Jacobian matrix f o r s t r o k e (8726)
%J ( i , : ) = [ − ( x_1 ( i ) . ∗ exp (1− x_1 ( i ) ) )

% . ^ b _ e s t + ( x_1 ( i ) . ∗ exp (1− x_1 ( i ) ) ) . ^ c _ e s t
% (35.3 − a _ e s t ) ∗ ( l o g ( x_1 ( i )) − x_1 ( i ) + 1 )
% ∗ ( x_1 ( i ) . ∗ exp (1− x_1 ( i ) ) ) . ^ b _ e s t
% a _ e s t ∗ ( l o g ( x_1 ( i )) − x_1 ( i ) + 1 ) ∗ ( x_1 ( i ) . ∗ exp ( x_1 ( i ) − 1 ) ) . ^ c _ e s t ] ’ ;
%J ( i , : ) = [ − ( x_1 ( i ) . ∗ exp (1− x_1 ( i ) ) )

% (8− a _ e s t ) ∗ ( l o g ( x_1 ( i )) − x_1 ( i ) + 1 )
% ∗ ( x_1 ( i ) . ∗ exp (1− x_1 ( i ) ) ) . ^ b _ e s t
end
69
% compute t h e a p p r o x i m a t e d H e s s i a n m a t r i x , J { ’ }
% i s the t r a n s p o s e of J
H = J ’∗ J ;

y _ e s t = (20− a _ e s t ) ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ b _ e s t
+ a _ e s t ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ c _ e s t ;
%S t r o k e ( 8 7 2 6 )
% y _ e s t = (35.3 − a _ e s t ) ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ b _ e s t
+ a _ e s t ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ c _ e s t ;
%S t r o k e ( 8 7 0 5 )
% y _ e s t = (8− a _ e s t ) ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ b _ e s t
+ a _ e s t ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ c _ e s t ;
d = y−y _ e s t ;
i f i t ==1
e= d o t ( d , d ) ;
end
% better result .
a_k = 1 0 0 ;
for i = 1: n _ i t i r s
dp= i n v (H) ∗ ( J ’ ∗ d ( : ) ) ;
a_gn = a _ e s t + a_k ∗ dp ( 1 ) ;
b_gn= b _ e s t + a_k ∗ dp ( 2 ) ;
c_gn = c _ e s t + a_k ∗ dp ( 3 ) ;
% Evaluate the t o t a l d i s t a n c e e r r o r at the updated parameters

y _ e s t _ g n = (20− a_gn ) ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ b_gn
+ a_gn ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ c_gn ;
%f o r S t r o k e ( 8 7 2 6 )
% y _ e s t _ g n = (35.3 − a_gn ) ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ b_gn
+ a_gn ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ c_gn ;
%f o r S t r o k e ( 8 7 0 5 )
% y _ e s t _ g n = (8− a_gn ) ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ b_gn
+ a_gn ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ c_gn ;
70
i f e_gn <e
a _ e s t = a_gn ;
b _ e s t =b_gn ;
c _ e s t = c_gn ;
e= e_gn ;
break
else
a_k = a_k / 2 ;
end
end
i;
end
a_gn
b_gn
c_gn
e_gn
p l o t ( x_1 , y , ’ r ∗ ’ ) ;
h o l d on
%p l o t ( x , y _ i n i t , ’ g ’ ) ;
p l o t ( x_1 , y _ e s t , ’ b ’ ) ;
p l o t ( x_1 , y _ e s t _ g n , ’ ko ’ ) ;
p l o t ( S t r o k e 8 7 2 5 ( : , 1 ) / x ( end ) , S t r o k e 8 7 2 5 ( : , 2 ) , ’ c ’ ) ;
%p l o t ( S t r o k e 8 7 2 6 ( : , 1 ) / x ( end ) , S t r o k e 8 7 2 6 ( : , 2 ) , ’ c ’ ) ;
xlabel ( ’x ’)
ylabel ( ’y ’)
g r i d on
l e g e n d ( ’ D a t a p o i n t s ’ , ’ With a0 , b0 , c0 ’ ,
’ With f i t t e d a , b , c ’ )
% −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− %
71
%
% −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− %
%G e n e r a t e t h e s y n t h e t i c d a t a from t h e
%c u r v e f u n c t i o n w i t h some a d d i t i o n a l n o i s e
l o a d ( ’ R o c k e t t r i g . mat ’ ) ;
x= S t r o k e 8 7 2 5 ( 1 : 4 0 , 1 ) ;
%x= S t r o k e 8 7 2 6 ( 1 : 4 0 , 1 ) ;
%x= S t r o k e 8 7 0 5 ( 1 : 4 0 , 1 ) ;
x_1 = x / x ( end ) ;
y= S t r o k e 8 7 2 5 ( 1 : 4 0 , 2 ) ;
%y= S t r o k e 8 7 2 6 ( 1 : 4 0 , 2 ) ;
%y= S t r o k e 8 7 0 5 ( 1 : 4 0 , 2 ) ;
% e s t i m a t i n g a and b from t h e a b o v e d a t a
a0 = 6 ; %18;
b0 = 4 ;
c0 = 1 0 ;
%F o r s t r o k e ( 8 7 2 5 ) w h i t h p e a k = 2 0 ;
y _ i n i t = (20− a0 ) ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ b0
+ a0 ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ c0 ;
%F o r s t r o k e ( 8 7 2 6 ) w h i t h p e a k = 3 5 . 3 ;
%y _ i n i t = (35.3 − a0 ) ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ b0
+ a0 ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ c0 ;
%F o r s t r o k e ( 8 7 0 5 ) w h i t h p e a k = 8 ;
%y _ i n i t = (8− a0 ) ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ b0
+ a0 ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ c0 ;
Nparams = 3 ;
n _ i t i r s = 15;
updateJ = 1;
a _ e s t = a0 ;
b _ e s t =b0 ;
c _ e s t = c0 ;
%S t e p 1 : r e p e a t u n t i l c o n v e r g e n c e
72
i f u p d a t e J ==1
% Evaluate the Jacobian matrix at the current parameters
f o r i = 1 : l e n g t h ( x_1 )
J ( i , : ) = [ − ( x_1 ( i ) . ∗ exp (1− x_1 ( i ) ) )
. ^ b _ e s t + ( x_1 ( i ) . ∗ exp (1− x_1 ( i ) ) ) . ^ c _ e s t
(20− a _ e s t ) ∗ ( l o g ( x_1 ( i )) − x_1 ( i ) + 1 )
∗ ( x_1 ( i ) . ∗ exp (1− x_1 ( i ) ) ) . ^ b _ e s t
a _ e s t ∗ ( l o g ( x_1 ( i )) − x_1 ( i ) + 1 ) ∗ ( x_1 ( i ) . ∗ exp ( x_1 ( i ) − 1 ) ) . ^ c _ e s t ] ’ ;
%J ( i , : ) = [ − ( x_1 ( i ) . ∗ exp (1− x_1 ( i ) ) )

% (35.3 − a _ e s t ) ∗ ( l o g ( x_1 ( i )) − x_1 ( i ) + 1 )
% ∗ ( x_1 ( i ) . ∗ exp (1− x_1 ( i ) ) ) . ^ b _ e s t
%J ( i , : ) = [ − ( x_1 ( i ) . ∗ exp (1− x_1 ( i ) ) )

% (8− a _ e s t ) ∗ ( l o g ( x_1 ( i )) − x_1 ( i ) + 1 )
% ∗ ( x_1 ( i ) . ∗ exp (1− x_1 ( i ) ) ) . ^ b _ e s t
end
% compute t h e a p p r o x i m a t e d H e s s i a n m a t r i x , J { ’ }
%i s t h e t r a n s p o s e o f J
H = J ’∗ J ;
% Evaluate the distance error at

%t h e c u r r e n t p a r a m e t e r s
y _ e s t = (20− a _ e s t ) ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ b _ e s t
+ a _ e s t ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ c _ e s t ;
%S t r o k e ( 8 7 2 6 )
% y _ e s t = (35.3 − a _ e s t ) ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ b _ e s t
+ a _ e s t ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ c _ e s t ;
%S t r o k e ( 8 7 0 5 )
% y _ e s t = (8− a _ e s t ) ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ b _ e s t
+ a _ e s t ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ c _ e s t ;
73
%t h e f i r s t i t e r a t i o n : compute t h e t o t a l e r r o r
d=y−y _ e s t ;
if i t == 1
e = dot (d , d ) ;
end
end
lamda = 0 . 1 ; % s e t an i n i t i a l v a l u e o f
% t h e damping f a c t o r f o r t h e LM
for i = 1:15
h_lm = i n v (H_LM) ∗ ( J ’ ∗ d ( : ) ) ;
a_lm= a _ e s t +h_lm ( 1 ) ;
b_lm= b _ e s t +h_lm ( 2 ) ;
c_lm= c _ e s t +h_lm ( 3 ) ;
% Evaluate the t o t a l distance error at

y _ e s t _ l m = (20− a_lm ) ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ b_lm
+ a_lm ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ c_lm ;
%F o r S t r o k e ( 8 7 2 6 )
% y _ e s t _ l m = (35.3 − a_lm ) ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ b_lm
+ a_lm ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ c_lm ;
%F o r S t r o k e ( 8 7 0 5 )
% y _ e s t _ l m = (8− a_lm ) ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ b_lm
+ a_lm ∗ ( x_1 . ∗ exp (1− x_1 ) ) . ^ c_lm ;
%E v a l u a t e t h e t o t a l d i s t a n c e e r r o r a t
% t h e u p d a t e d p a r a m e t e r s i s l e s s %t h a n t h e p r e v i o u s one
% t h e n makes t h e u p d a t e d p a r a m e t e r s t o be t h e c u r r e n t %p a r a m e t e r s
74
% and d e c r e a s e s t h e v a l u e o f t h e damping f a c t o r
i f e_lm <e
a _ e s t =a_lm ;
b _ e s t =b_lm ;
c _ e s t =c_lm ;
e=e_lm ;
updateJ =1;
break
else
updateJ =0;
end
end
i;
i f i == 15
break
end
end
a_lm
b_lm
c_lm
e_lm
p l o t ( x_1 , y , ’ r ∗ ’ ) ;
h o l d on
%p l o t ( x , y _ i n i t , ’ g ’ ) ;
p l o t ( x_1 , y _ e s t , ’ b ’ ) ;
p l o t ( x_1 , y _ e s t _ l m , ’ ko ’ ) ;
p l o t ( S t r o k e 8 7 2 5 ( : , 1 ) / x ( end ) , S t r o k e 8 7 2 5 ( : , 2 ) , ’ c ’ ) ;
xlabel ( ’x ’)
ylabel ( ’y ’)
g r i d on
l e g e n d ( ’ D a t a p o i n t s ’ , ’ With a0 , b0 , c0 ’ ,
’ With f i t t e d a , b , c ’ )
75

Nonlinear Least-Square Curve Fitting

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Nonlinear Least-Square Curve Fitting

Uploaded by

Copyright:

Available Formats

School of Education, Culture and Communication

Division of Applied Mathematics

Nonlinear Least-Square Curve Fitting of Power-Exponential Functions:

Rasha Talal Altoumaimi

MASTER THESIS IN MATHEMATICS/ APPLIED MATHEMATICS

DIVISION OF APPLIED MATHEMATICS

SE-721 23 VÄSTERÅS, SWEDEN

Master thesis in mathematics / applied mathematics

Nonlinear Regression of Power-Exponential Functions:

Milica Rančić and Karl Lundengård

Keywords: Curve fitting, Power exponential functions,Gauss-Newton algorithms, Levenberg

1.1 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2 Curve fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.2.1 Power exponential functions . . . . . . . . . . . . . . . . . . . . . . 8

1.2.2 Selecting parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.3 Least-squares of curve fitting . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.4 Nonlinear least squares problems . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Jacobian, Hessian and gradient 12

2.1 Jacobian and the gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2 The Hessian matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3 Description of different methods 17

3.1 Newton’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.1.1 Line search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.1.2 Convergence of Newton’s approach . . . . . . . . . . . . . . . . . . 19

3.1.3 Problems with Newton’s method . . . . . . . . . . . . . . . . . . . . 20

3.2 Trust-region methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2.2 Trust-region algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2.3 The sub-problem of the trust-region . . . . . . . . . . . . . . . . . . 23

3.3 The approach of Gauss-Newton . . . . . . . . . . . . . . . . . . . . . . . . 24

3.4 The Levenberg-Marquardt method . . . . . . . . . . . . . . . . . . . . . . . 26

3.4.1 Implementation strategy of Levenberg-Marquardt method . . . . . . 28

4 Implementation and verification of the Gauss-Newton and L-M method. 31

4.1 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2 Hessian and gradient for power-exponential model . . . . . . . . . . . . . . 32

4.3 Power exponential data fitting . . . . . . . . . . . . . . . . . . . . . . . . . 33

5 Application of non-linear least squares for curve fitting 38

5.1 Modelling mortality rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.1.1 Analysing mortality rate data . . . . . . . . . . . . . . . . . . . . . . 39

5.2 Modelling rocket triggered lightning return strokes . . . . . . . . . . . . . . 44

5.2.1 Analysing data rocket-trigged return stroke . . . . . . . . . . . . . . 45

6 Conclusions and future work 49

6.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

7 Summary of reflection of objectives in the thesis 51

7.1 Objective 1: Knowledge and understanding . . . . . . . . . . . . . . . . . . 51

7.2 Objective 2: Methodological knowledge . . . . . . . . . . . . . . . . . . . . 51

7.3 Objective 3: Critically and Systematically Integrate Knowledge . . . . . . . . 52

7.6 Objective 6: Scientific, Social and Ethical Aspects . . . . . . . . . . . . . . . 52

A.2 Implemention mortality rate In GN and LM Algorithms using power exponen-

A.3 Implemention data rocket-trigged return stroke In GN and LM algorithms us-

4.1 The power-exponential function of µ(a, b, xi ) in GN-algorithm . . . . . . . . 35

4.2 The power-exponential function of µ(a, b, xi ) . . . . . . . . . . . . . . . . . 37

1.1 Thesis outline

1.2.1 Power exponential functions

1.2.2 Selecting parameters

1.3 Least-squares of curve fitting

Non-linear regression modelling is somewhat identical to linear regression modelling in that

Many problems can be written in the generic form

Optimization problems can be encountered in many applications such as operations research,

where yi are observations at specific values xi . Fitting data to a mathematical model is an

fi (b) = yi − µ(xi ; b), i = 1, ...., m. (1.6)

Jacobian, Hessian and gradient

2.1 Jacobian and the gradient

∇ϕ(b) = ( ∂∂bfi , ..., ∂∂bfni )T ∈ Rn

is called the Jacobian matrix J of f (b) [3].