Professional Documents
Culture Documents
In this chapter we are concentrating on the Gauss-Newton method for the estimation of unknown parameters in models described by a set of ordinary differential equations (ODEs).
6.1 FORMULATION OF THE PROBLEM
= f(x(t), u, k)
y(t) = Cx(t)
; x(t0) = x0
(6.1)
(6.2)
or more generally
y(t) = h(x(t),k) where
(6.3)
k=[k h k2,...,kp] T is a p-dimensional vector of parameters whose numerical values are unknown;
84
85
ear combinations of state variables) that are measured experimentally. Experimental data are available as measurements of the output vector as a function of time, i.e., [ y j , t j ] , i=l,...,N where w i t h y j we denote the measurement of the output vector at time tj. These are to be matched to the values calculated by the model at the same time, y(tj), in some optimal fashion. Based on the statistical properties of the experimental error involved in the measurement of the output vector, we determine the weighting matrices Qj (i=l,...,N) that should be used in the objective function to be minimized as mentioned earlier in Chapter 2. The objective function is of the form,
S(k)=
(6.4)
Minimization of S(k) can be accomplished by using almost any technique available from optimization theory, however since each objective function evaluation requires the integration of the state equations, the use of quadratically convergent algorithms is highly recommended. The Gauss-Newton method is the most appropriate one for ODE models (Bard, 1970) and it presented in detail below.
6.2
available at the jth iteration. Linearization of the output vector around k1-'' and retaining first order terms yields
86
Chapter 6
y(t,k + l ) ) = y(t,k)
OK
Ak 0+l
(6.5)
Assuming a linear relationship between the output vector and the state variables (y = Cx), the above equation becomes = Cx(t,k G) ) + C
ox
(6.6)
5k
In the case of ODE models, the sensitivity matrix G(t) = (5xT/5k)T can-
not be obtained by a simple differentiation. However, we can find a differential equation that G(t) satisfies and hence, the sensitivity matrix G(t) can be determined as a function of time by solving simultaneously with the state ODEs another set of differential equations. This set of ODEs is obtained by differentiating both sides of Equation 6.1 (the state equations) with respect to k, namely
(6.7)
5k I dt
5k
Reversing the order of differentiation on the left-hand side of Equation 6.7 and performing the implicit differentiation of the right-hand side, we obtain
5f
of
dt or better dG(t) dt
5x
5k
(6.8)
of1 5x
G(t) +
5k
(6.9)
The initial condition G(t0) is obtained by differentiating the initial condition, x(to)=x0, with respect to k and since the initial state is independent of the
parameters, we have:
G(to) = 0.
Copyright 2001 by Taylor & Francis Group, LLC
(6.10)
87
Equation 6.9 is a matrix differential equation and represents a set of nxp ODEs. Once the sensitivity coefficients are obtained by solving numerically the above ODEs, the output vector, y(t,k^ l+1) ), can be computed.
Substitution of the latter into the objective function and use of the stationary condition 9S(kti+1))/5k(i+1) = 0, yields a linear equation for Ak ti+1)
AAk M ) = b
where
N i=l
(6.11)
and
N
(6.13)
Solution of the above equation yields Ak Crl> and hence, k 0 '' 0 is obtained
from
k GH) = k(j)+ Ak t, + i)
(6.14)
where u is a stepping parameter (0<u < 1) to be determined by the bisection rule. The simple bisection rule is presented later in this chapter whereas optimal stepsize determination procedures are presented in detail in Chapter 8. In summary, at each iteration given the current estimate of the parameters, kw, we obtain x(t) and G(t) by integrating the state and sensitivity differential equations. Using these values we compute the model output, y(tj,k w ), and
the sensitivity coefficients, G(t;), for each data point i=l,...,N which are subsequently used to set up matrix A and vector b. Solution of the linear equation yields Ak^ 11 ' and hence k^ 11 ' is obtained. Thus, a sequence of parameter estimates is generated, k (l) , k (2> ,... which often converges to the optimum, k*, if the initial guess, k(0), is sufficiently close. The converged parameter values represent the Least Squares (LS), Weighted Least Squares (WLS) or Generalized Least Squares (GLS) estimates depending on the choice of the weighting matrices Qj. Furthermore, if certain assumptions regarding the statistical distribution of the residuals hold, these parameter values could also be the Maximum Likelihood (ML) estimates.
88 6.2.1 1.
2.
Chapter 6
Input the initial guess for the parameters, k(0) and NSIG.
3.
Integrate state and sensitivity equations to obtain x(t) and G(t). At each sampling period, tj, i=l,...,N compute y(tj,k^), and G(tj) to set up matrix A and vector b.
Solve the linear equation AAk =b and obtain Ak
G+i) (j 1 "')
4. 5.
=k +uAk
(i)
6.
Continue until the maximum number of iterations is reached or convergence is achieved (i.e.,
Ak(
<1(T N S I G ).
7.
The above method is the well-known Gauss-Newton method for differential equation systems and it exhibits quadratic convergence to the optimum.
Computational modifications to the above algorithm for the incorporation of prior knowledge about the parameters (Bayessian estimation) are discussed in detail in Chapter 8.
6.2.2 Implementation Guidelines for ODE Models
If the dimensionality of the problem is not excessively high, simultane-
ous integration of the state and sensitivity equations is the easiest approach to implement the Gauss-Newton method without the need to store x(t) as a function of time. The latter is required in the evaluation of the Jacobeans in Equation 6.9 during the solution of this differential equation to obtain G(t). Let us rewrite G(t) as
ox
ox
OX
G(t)d
ox
ok
ok->
= [gi.fc, ...,g p ]
(6.15)
In this case the n-dimemional vector g, represents the sensitivity coefficients of the state variables with respect to parameter k[ and satisfies the following ODE,
Copyright 2001 by Taylor & Francis Group, LLC
89
dg,(t) _ f of 7
dt
ox
ok i
gi(to)=0
(6.16a)
Similarly, the n-dimensional vector g2 represents the sensitivity coefficients of the state variables with respect to parameter k2 and satisfies the following ODE,
dg 2 (t) _ fofT ox
dt
I ; g2(to)=o ok,
df
(6.16b)
Finally for the last parameter, kp, we have the corresponding sensitivity vector gp
dg p (t) dt
of 1 ox
^f of 6k,,
P(t)
; gp(t0)=o
(6.16c)
Since most of the numerical differential equation solvers require the equations to be integrated to be of the form dz = q>(z) ; z(to) = given (6.17)
dt"
x(t)
ox
ox
x(t)
g 2 (t)
g,(t)
(6.18)
ox ok p
Copyright 2001 by Taylor & Francis Group, LLC
90
Chapter 6
5x
(p(z) =
gl(t)
82 (t)
(6.19)
Of
-f
gp(t)
If the equation solver permits it, information can also be provided about the Jacobean of q>(z), particularly when we are dealing with stiff differential
ox of 1
oz
(6.20)
Sf T
.,.xT
ox
where the "*" in the first column represents terms that have second order derivatives of f with respect to x. In most practical situations these terms can be neglected and hence, this Jacobean can be considered as a block diagonal matrix
Copyright 2001 by Taylor & Francis Group, LLC
91
as far as the ODE solver is concerned. This results in significant savings in terms of memory requirements and robustness of the numerical integration.
In a typical implementation, the numerical integration routine is requested to provide z(t) at each sampling point, tj, i=l,...,N and hence, x(t,) and G(tj) become available for the computation of y(ti,k tl> ) as well as for adding the
appropriate terms in matrix A and b.
In the case of ODE models, evaluation of the objective function, S(k(*+uAk(J':1'), for a particular value of u. implies the integration of the state equations. It should be emphasized here that it is unnecessary to integrate the
state equations for the entire data length [t0, tN] for each trial value of u.. Once the objective function becomes greater than S(k<^')), a smaller value of |j, can be
chosen. By this procedure, besides the savings in computation time, numerical instability is also avoided since the objective function becomes large quickly and the integration is often stopped well before computer overflow is threatened (Kalogerakis and Luus, 1983a). The importance of using a good integration routine should also be emphasized. When Ak*-'"1' is excessively large (severe overstepping) during the determination of an acceptable value for \a numerical instability may cause computer overflow well before we have a chance to compute the output vector at the first data point and compare the objective functions. In this case, the use of a good integration routine is of great importance to provide a message indicating that the tolerence requirements cannot be met. At that moment we can stop the integration and simply halve u and start integration of the state equations again. Several restarts may be necessary before an acceptable value for (i is obtained. Furthermore, when k^+uAk^ 1 '' is used at the next iteration as the current estimate, we do not anticipate any problems in the integration of both the state and sensitivity equations. This is simply due to the fact that the eigenvalues of
the Jacobean of the sensitivity equations (inversely related to the governing time constants) are the same as those in the state equations where the integration was
performed successfully. These considerations are of particular importance when the model is described by a set of stiff differential equations where the wide range of the prevailing time constants creates additional numerical difficulties that tend to shrink the region of convergence (Kalogerakis and Luus, 1983a).
Copyright 2001 by Taylor & Francis Group, LLC
92
Chapter 6
6.3
When the output vector (measured variables) are related to the state variables (and possibly to the parameters) through a nonlinear relationship of the form y(t) = h(x(t),k), we need to make some additional minor modifications. The sensitivity of the output vector to the parameters can be obtained by performing the implicit differentiation to yield:
ox
Substitution into the linearized output vector (Equation 6.5) yields
y(t,k where
O'+i)
) = h(x(t,k )) + W(t)Ak
(j)
(i +1 )
tc. 6 T>-> 2j
( - )
AAk " = b
where
N
(6.25)
A = wT(tj)QiW(tj)
i =l
(6.26)
and
N i= l ,. -,
(6.28)
93
W(t)
T ( V = -^lX Jj
G (tj)
(6.29)
'' 1=1
and
iv o\ )\ \
\v a\ y; \
(6.30)
Jt ^
(nhr^
,. i
(6.31)
In other words, the observation matrix C from the case of a linear output relationship is substituted with the Jacobean matrix (9hT/9x)T in setting up matrix A and vector b.
6.4 THE GAUSS-NEWTON METHOD - SYSTEMS WITH UNKNOWN INITIAL CONDITIONS
dt
; x(t0) = x0
(6.32) (6.33)
The only difference here is that it is further assumed that some or all of the components of the initial state vector x0 are unknown. Let the q-dimensional vector p (0 < q < ri) denote the unknown components of the vector x0. In this class of parameter estimation problems, the objective is to determine not only the parameter vector k but also the unknown vector p containing the unknown elements of the initial state vector x(to). Again, we assume that experimental data are available as measurements of the output vector at various points in time, i.e., [ y j , t j ] , i=l,...,N. The objective function that should be minimized is the same as before. Tthe only difference is that the minimization is carried out over k and p, namely the objective function is viewed as
94
N
Chapter 6
(6.34)
Let us suppose that an estimate k and p of the unknown parameter and initial state vectors is available at the j"1 iteration. Linearization of the output vector around k1^' and p yields,
f^.TW, TV
ax
I I 5k
'1
(6-35)
Assuming a linear output relationship (i.e., y(t) = Cx(t)), the above equation becomes y(t,,kc'" 'V') = Cx(ti,k(i),p(i)) + CG(t) Ak(r" + CP(t) Ap^" (6-36)
where G(t) is the usual nxp parameter sensitivity matrix (oxT/5k)T and P(t) is the nxq initial state sensitivity matrix (9xT/9p)T. The parameter sensitivity matrix G(t) can be obtained as shown in the previous section by solving the matrix differential equation,
dt
with the initial condition, G(to) = 0.
(6.37)
(6.38)
Similar to the parameter sensitivity matrix, the initial state sensitivity matrix, P(t), cannot be obtained by a simple differentiation. P(t) is determined by solving a matrix differential equation that is obtained by differentiating both sides of Equation 6. 1 (state equation) with respect to p. Reversing the order of differentiation and performing implicit differentiation on the right-hand side, we arrive at
Copyright 2001 by Taylor & Francis Group, LLC
95
d or better dP(t) dt
(6.39)
0\
P(t)
(6.40)
The initial condition is obtained by differentiating both sides of the initial condition, x(t0)=x0, with respect to p, yielding (6.41)
(n-q)xq
P(to)=
Without any loss of generality, it has been assumed that the unknown initial states correspond to state variables that are placed as the first elements of the state vector x(t). Hence, the structure of the initial condition in Equation 6.41. Thus, integrating the state and sensitivity equations (Equations 6.1, 6.9 and 6.40), a total of nx(p+q+\) differential equations, the output vector, y(t,kti+1),pc'Tl)) is obtained as a linear function of k ti+l) and p^1'. Next, substitution of yfek^'^p^'1') into the objective function and use of the stationary criteria
(6.42a) and
= 0
(6.42b)
96
Chapter 6
^PT(t,)CTQ(y1--Cx(t,,k(j),p(j))
Solution of the above equation yields Ak^ r l ) and Ap(-i+1). The estimates
k (j)
Ap (j+i)
(6.44)
where a stepping parameter u (to be determined by the bisection rule) is also used. If the initial guess k<0), p(0) is sufficiently close to the optimum, this procedure yields a quadratic convergence to the optimum. However, the same difficulties, as those discussed earlier arise whenever the initial estimates are far from
the optimum.
If we consider the limiting case where p=0 and q^O, i.e., the case where there are no unknown parameters and only some of the initial states are to be estimated, the previously outlined procedure represents a quadratically convergent method for the solution of two-point boundary value problems. Obviously in this case, we need to compute only the sensitivity matrix P(t). It can be shown that
under these conditions the Gauss-Newton method is a typical quadratically convergent "shooting method." As such it can be used to solve optimal control problems using the Boundary Condition Iteration approach (Kalogerakis, 1983).
6.5
EXAMPLES
6.5.1
Bellman et al. (1967) have considered the estimation of the two rate constants k] and k, in the Bodenstein-Linder model for the homogeneous gas phase reaction of NO with O7:
2NO + O,
Copyright 2001 by Taylor & Francis Group, LLC
2NO,
97
dt
(6.45)
where a= 126.2, p=91.9 and x is the concentration of NO2. The concentration of NO2 was measured experimentally as a function of time and the data are given in Table 6.1 The model is of the form dx/dt=f(x,k1,k2) where f(x,k1,k2)=k|(a-x)(p-x)22 k?x . The single state variable x is also the measured variable (i.e., y(t)=x(t)). The sensitivity matrix, G(t), is a (Ix2)-dimensiona/ matrix with elements:
G(t)=
G2(t)] =
(6.46)
Table 6.1 Data for the Homogeneous Gas Phase Reaction of NO with O2. Time
0 1 2 3 4 5 6 7 9 11 14
Concentration ofNO2
0 1.4
6.3 10.5 14.2 17.6 21.4 23.0 27.0 30.5 34.4 38.8 41.6 43.5 45.3
of
ox
dt
9k,
G,(0) = 0
(6.47a)
98
Chapter 6
dG 2 dt
where
5f 5k,
G2(0) = 0
(6.47b)
= -k,(p-x)2-2k,(a-xXp-x)-2k2x
(6.48a)
df 5k,
of
= (cc-x)(p-x)2
(6.48b)
= -x"
(6.48c)
Equations 6.47a and 6.47b should be solved simultaneously with the state equation (Equation 6.45). The three ODEs are put into the standard form (dz/dt = (p(z)) used by differential equation solvers by setting
x(t)
z(t) =
G,(t) G 2 (t)
(6.49a)
and
k!(a-x)(p-x)2-k2x2
q>(z)=
(6.49b)
Integration of the above equation yields x(t) and G(t) which are used in setting up matrix A and vector b at each iteration of the Gauss-Newton method.
6.5.2
Let us now consider the pyrolytic dehydrogenation of benzene to diphenyl and triphenyl (Seinfeld and Gavalas, 1970; Hougen and Watson, 1948):
Copyright 2001 by Taylor & Francis Group, LLC
99
2C2H6 <> C, 2 H 10 + H 2 C 6 H 6 + C i 2 H 1 0 ^^ C I O H I 4 + H 2
dt
dt
where
(6.50b)
(6.5 la)
r2 = k 2 [ x , x 2 - ( l - x 1 - 2 x 2 X 2 - 2 x 1 - x 2 ) / 9 K 2 ]
(6.5 Ib)
where Xi denotes Ib-mole of benzene per Ib-mole of pure benzene feed and x2 denotes Ib-mole of diphenyl per Ib-mole of pure benzene feed. The parameters kj and k2 are unknown reaction rate constants whereas K] and K2 are equilibrium constants. The data consist of measurements of x, and x2 in a flow reactor at eight values of the reciprocal space velocity t and are given below: The feed to the reactor was pure benzene.
Table 6.2. Data for the Pyrolytic Dehydrogenalion of Benzene
Reciprocal Space Xi X2 Velocity (t) x 104 0.828 5.63 0.0737 0.704 11.32 0.113 0.622 16.97 0.1322 22.62 0.565 0.1400 0.499 34.0 0.1468 0.482 39.7 0.1477 0.470 45.2 0.1477 0.1476 0.443 169.7 Source: Seinfeld and Gavalas (1970); Hougen and Watson (1948).
100
Chapter 6
As both state variables are measured, the output vector is the same with the
state vector, i.e., yi=Xi and y2=x2. The feed to the reactor was pure benzene. The equilibrium constants K, and K2 were determined from the run at the lowest space velocity to be 0.242 and 0.428, respectively. Using our standard notation, the above problem is written as follows:
dx,_ = f,(x],x2;k1,k2) dt dx 2 = f2(x],x2,k,,k2) ~dT (6.52a)
(6.52b)
where f]=(-r]-r2) and f2=r]/2-r2. The sensitivity matrix, G(t), is a (2x2)-dimensional matrix with elements:
5x2
dgi(t)
dt
of 5k,
gi(to)=0
(6.54a)
and
dg 2 (t) _
dt
8f]
ox
of
g2(to)=0
(6.54b)
Taking into account Equation 6.53, the above equations can also be written as follows:
101
fdr "
dt dG 21 dt and
\_ivj j ^
5fi ax 2 5f2
11
_ 21.
G
pfl ]
ax i ax 2
of i 5xj 5f2 5x]
; G,,(to)=0, G 2 ,(to)=0
(6.55a)
dt dG 22 dt
"af i 1 ak 2 ;
Sf2 8'k2
G2,(to)=0, G22(to)=0
(6.55b)
dt
ax
5f2
f, \
5k,
( ^x~r ^7
2i+ 5f
2
~~
G 2 1 (0) = 0
(6.56b)
ax,
5f 2 -
(6.56c)
SxJ
r 12+.
lax2)
U ^5f 2 G 22+^
5k2
(6.56d)
ax
=-k
JK,
yR 9
(6.57a)
ax,
3K,
x , - 2 ) | - k 2 [ x, - (5x, -5 + 4 x 2 ) | (6.57b)
vis..
af, ax
= ^ 2x,+-
k,
2X2
3K,
9K
-(4Xl-4 + 5x2)
(6.57c)
102
Chapter 6
5f, 5k,
xj" +
(x 2 + 2 x , x 7 -2x 7
(6.57e)
3K
(x2+2x,x2-2x2)
(6.57f)
Sk 7
X | X 7 - px 2 -4x, + 5 x , x 2 -5x 2 + 2 x 2 9K 2
(6.57g)
5f7
5k,
= -I x , x2, -
'
9K 7
(6.57h)
The four sensitivity equations (Equations 6.56a-d) should be solved simultaneously with the two state equations (Equation 6.52). Integration of these six [=nx(p+l)=2x(2+l)] equations yields x(t) and G(t) which are used in setting up matrix A. and vector b at each iteration of the Gauss-Newton method. The ordinary differential equation that a particular element, Gy, of the (nxp)dimensional sensitivity matrix satisfies, can be written directly using the following expression,
dt
(6.58)
103
The complete data set will be given in the case studies section. In this chapter, we
will discuss how we set up the equations for the regression of an isothermal data set given in Tables 6.3 or 6.4.
The same group also proposed a reaction scheme and a mathematical model that describe the rates of HPA consumption, PD formation as well as the formation of acrolein (Ac). The model is as follows
dt dCPD dt
- = r3 - r4 - r_ 3
5.59a)
(6.59b)
dt
(6.59c)
where Ck is the concentration of the catalyst (10 g/L). The reaction rates are given below
r, =-
(6.60a)
H
KLP H
0.5
k2CPDCHPA 1+
K P
(6.60b)
+ K2CHPA
r, = k,C HPA
r
(6.60c) (6.60d)
Ac
r4 = k 4 C A c C H P A
(6.60e)
In the above equations, kj (j-1, 2, 3, -3, 4) are rate constants (U(mol min g), K] and K2 are the adsorption equilibrium constants (L/mol) for H2 and HPA respectively. P is the hydrogen pressure (MPa) in the reactor and H is the Henry's law constant with a value equal to 1379 (L bar/mol) at 298 K. The seven parameters (kj, k2, k3, k.3, k4, K, and K2) are to be determined from the measured concentrations of HPA and PD.
Copyright 2001 by Taylor & Francis Group, LLC
104
Chapter 6
Table 6.3
Data for the Catalytic Hydrogenation of 3-Hydroxypropanal (HPA) to 1,3-Propanediol (PD) at 5.15 MPa and 45 <C
CHPA (mol/L) 1.34953 1.36324 1.25882 1.17918 0.972102 0.825203 0.697109 0.421451 0.232296 0.128095 0.0289817
CpD (mol/L)
0.0 0.00262812
0.00962368
0.0700394 0.184363 0.354008 0.469777 0.607359 0.852431 1.03535 1.16413 1.30053 1.31971
Table 6.4 Data for the Catalytic Hydrogenation of 3-Hydroxypropanal (HPA) to 1,3-Propanediol (PD) at 5.15 Mpa and80 C
t (miri) 0.0 5 10 15 20 25 30
Source: Zhuetal. (1997).
CPD (mol/L)
k = [k,, k2, k3, Ic,, k5, k^kyf = [kh k2, k3, k.3, kt, K,, K2]T
105
dx -z- = f , ( x 1 , x 2 , x 3 ; k i , k 2 , . . . , k 7 ; u i , u 2 ) dt dx, = f2(x1,x2,x,;k],k2,...,k7;u,,u2) dt dx. - = f-)(x1,x-,,x3;k1 ,k2,...,k7;U] ,u9 dt and the observation matrix is simply
C=
(6.6 la)
(6.6 Ib)
(6.6 Ic)
(6.62)
In Equations 6.61, U[ denotes the concentration of catalyst present in the reactor (Ck) and u2 the hydrogen pressure (P). As far as the estimation problem is concerned, both these variables are assumed to be known precisely. Actually, as it will be discussed later on experimental design (Chapter 12), the value of such variables is chosen by the experimentalist and can have a paramount effect on the quality of the parameter estimates. Equations 6.61 are rewritten as following dx, (6.63a) dx 2
"dT
= Ui(r r r 2 )
(6.63b)
dt
(6.63c)
where r, = H
k,u2 2x ,
0.5
(6.64a)
p, =
k
(6.64b)
6"2
k
H
Copyright 2001 by Taylor & Francis Group, LLC
7Xl
106
Chapter 6
r3 = k 3 x , r_ 3 = k 4 x .
(6.64c)
(6.64d) (6.64e)
r4 = k 5 x 3 x ,
G(t)=
(6.65a)
. . G 1 7 (t)'
G(t) =
G 2 ] ( t ) . . . G 27
G 3 7 (t)
dkj
(6.65b)
^ = (-^lT ,,,+(-*
dt
gi(to)=0
(6.66a)
dg 2 (t) _
dt
3f'
d\
g2(t)
g2(to)=0
(6.66b)
dt
where
ox
g7(to)=0
(6.66c)
107
rr^o po rsf.^i
(
UJ
and
(6.67a)
I5xlj / _,, \
UX2J f ^- \
1^3 J / ^,. \
N=
I 5 k .ij
dG u
dt
dG 2 ,
1 Sk I* 11 l J (^) :J-U....7
(6.67b)
M
, , f ^ l V. , 5fl . r,
(dfi]
-UJ ''
fSf, V
(df,]
^-^
U
dt
v iy
-x o
11 + U 7
v^ /
21+K
,f^V
G
1^*37
31 + a,
af
2
'
r^
/-m
^ i
2l()
dG
t dt
'7 ,
f5fi V -k
^ox,j
f -v? \
'
7+
5f af /U ' V27+ , fk i V3 7 +,
G G
af
3,
(6.68)
^0x3 J
f ~-c
l^ox 3 J
f ~-c
5k7
^c
' J r17() rm n
G
dG 27
Qt
,_
dG
37 dt
fSf3V 6x
( .
_L
,fSfOr
(
^ ^ _ L
,f5fOr
( T ,
4-
, 5fl r
ok 7
^1
v
( I -,-, 1 U 1 (I
~
rm
^
n
'
J I
' '
108
Chapter 6
The partial derivatives with respect to the state variables in Equation 6.67a that are
(6.6%)
(6.69c)
(6 69d)
OX
(6.69e)
(6.69f)
ox.
(6.69g)
(6.69h)
^X^
(6.69i)
The partial derivatives with respect to the parameters in Equation 6.67b that are needed in the above ODEs are given next
109
- u
of 9kT
(6.70a)
-u
dt
dk,
-u
(6.70b)
df
5kT
(6.70c)
of 5k 4
df,
(6.70d)
no
Chapter 6
ok 5 of ok 5
3k,
0
x3x.
(6.70e)
-u,
*
or,
df
ok,
*
5k6 or. ok 6
(6.70f)
*6
^L 3k 7 5k 7
f Sr,
1
(6.70g)
1*7
*7J
yields x(t) and G(t) which are used in setting up matrix A and vector b at each iteration of the Gauss-Newton method. Given the complexity of the ODEs when the dimensionality of the problem increases, it is quite helpful to have a general purpose computer program that sets up the sensitivity equations automatically. Furthermore, since analytical derivatives are subject to user input error, numerical evaluation of the derivatives can also be used in a typical computer implementation of the Gauss-Newton method. Details for a successful implementation of the method are given in Chapter 8.
Copyright 2001 by Taylor & Francis Group, LLC
111
The quasilinearization method (QM) is another method for solving off-line parameter estimation problems described by Equations 6.1, 6.2 and 6.3 (Bellman and Kalaba, 1965). Quasilinearization converges quadratically to the optimum but has a small region of convergence (Seinfeld and Gavaias, 1970). Kalogerakis and Luus (1983b) presented an alternative development of the QM that enables a more efficient implementation of the algorithm. Furthermore, they showed that this simplified QM is very similar to the Gauss-Newton method. Next the quasilinearization method as well as the simplified quasilinearization method are described and the equivalence of QM to the Gauss-Newton method is demonstrated.
6.6.1 The Quasilinearization Method and its Simplification An estimate k^* of the unknown parameter vector is available at the jth iteration. Equation 6.1 then becomes
( i) ^-^ = f(x - (t),k ( i ) ) x dt '
(6.71)
Using the parameter estimate k1-'"'"1' from the next iteration we obtain from Equation 6.1
dt
By using a Taylor series expansion on the right hand side of Equation 6.72 and keeping only the linear terms we obtain the following equation
'
(6.73)
where the partial derivatives are evaluated at x^(t). The above equation is linear in x^1' and k1-'^. Integration of Equation 6.72 will result in the following equation
112
Chapter 6
(6.74)
where g(t) is an n-dimensional vector and G(t) is an nxp matrix. Equation 6.74 is differentiated and the RHS of the resultant equation is equated with the RHS of Equation 6.73 to yield
(6.75)
dt and
dt
The initial conditions for Equations 6.75 and 6.76 are as follows g(t 0 ) = x 0
G(t 0 ) = 0.
(6.77a)
(6.77b)
Equations 6.71, 6.75 and 6.76 can be solved simultaneously to yield g(t) and G(t) when the initial state vector x0 and the parameter estimate vector k^ are given. In order to determine k^+1' the output vector (given by Equation 6.2) is inserted into the objective function (Equation 6.4) and the stationary condition yields,
(6.78)
The case of a nonlinear observational relationship (Equation 6.3) will be examined later. Equation 6.78 yields the following linear equation which is solved by LU decomposition (or any other technique) to obtain k 0+l)
iN
^G T (t,)C T Q 1 CG(t,)
(6.79)
As matrix Qs is positive definite, the above equation gives the minimum of the objective function.
Copyright 2001 by Taylor & Francis Group, LLC
113
Since linearization of the differential Equation 6.1 around the trajectory x(t), resulting from the choice of k has been used, the above method gives k^ +1) which is an approximation to the best parameter vector. Using this value as k a new k^ +l ' can be obtained and thus a sequence of vectors k(0), k (l) , k <2) ... is obtained. This sequence converges rapidly to the optimum provided that the initial guess is sufficiently good. The above described methodology constitutes the Quasilinearization Method (QM). The total number of differential equations which must be integrated at each iteration step is nx(p+2). Kalogerakis and Luus (1983b) noticed that Equation 6.75 is redundant. Since Equation 6.74 is obtained by linearization around the nominal trajectory x(t) resulting from k, if we let k1-'*1' be k then Equation 6.74 becomes
(t\ _ g^l fr/t ) \+ _i_r^^t\L-0) v X (j) (^ I) lj (I JK (t\ Q(\\ ^O.oUJ
Equation 6.80 is exact rather than a first order approximation as Equation 6.74 is. This is simply because Equation 6.80 is Equation 6.74 evaluated at the point of linearization, k. Thus Equation 6.80 can be used to compute g(t) as
fy A- \
&V^/
__v (J ) (+\
VV vJ^l^H
f""1 ftM? 0 )
^U.O 1 )
I.
0 1 \
It is obvious that the use of Equation 6.81 leads to a simplification because the number of differential equations that now need to be integrated is nx(p+l). Kalogerakis and Luus (1983b) then proposed the following algorithm for the QM.
Step 1. Select an initial guess k(0>. Hence j=0.
Step 2. Integrate Equations 6.71 and 6.76 simultaneously to obtain x(t) and G(t).
Step 3. Use equation 6.81 to obtain g(tj), i=l,2,...,N and set up matrix A and vector b in Equation 6.79.
<TOL
(6.82)
where TOL is a preset small number to ensure termination of the iterations. If the above inequality is not satisfied then we set k=k(J+l), increase j by one and go to Step 2 to repeat the calculations.
114
Chapter 6
6.6.2
If we compare Equations 6.79 and 6.11 we notice that the only difference between the quasilinearization method and the Gauss-Newton method is the nature of the equation that yields the parameter estimate vector k (rl> . If one substitutes Equation 6.81 into Equation 6.79 obtains the following equation
t Ci+i) =
i=l
N
(6.83)
By taking the last term on the right hand side of Equation 6.83 to the left hand side one obtains Equation 6.11 that is used for the Gauss-Newton method. Hence, when the output vector is linearly related to the state vector (Equation 6.2) then the simplified quasilinearization method is computationally identical to the Gauss-Newton method. Kalogerakis and Luus (1983b) compared the computational effort required
by Gauss-Newton, simplified quasilinearization and standard quasilinearization
methods. They found that all methods produced the same new estimates at each iteration as expected. Furthermore, the required computational time for the GaussNewton and the simplified quasilinearization was the same and about 90% of that required by the standard quasilinearization method.
6.6.3 Nonlinear Output Relationship
When the output vector is nonlinearly related to the state vector (Equation
6.3) then substitution of x^1+1> from Equation 6.74 into the Equation 6.3 followed by substitution of the resulting equation into the objective function (Equation 6.4)
yields the following equation after application of the stationary condition (Equation 6.78)
two methods. First, by employing Newton's method or alternatively by linearizing the output vector around the trajectory x^(t). Kalogerakis and Luus (1983b) showed that when linearization of the output vector is used, the quasilinearization computational algorithm and the Gauss-Newton method yield the same results.
Copyright 2001 by Taylor & Francis Group, LLC