Numerische Mathemaljk: A Globalization Scheme For The Generalized Gauss-Newton Method

Numer. Math.
56, 591-607 (1989) Numerische

MathemalJk
9 Sprmger-Verlag1989
A Globalization Scheme
for the Generalized Gauss-Newton Method
O. Knoth
Sektion Mathematik, Martm-Luther-Universit~itHalle-Wittenberg,DDR-4010 Halle, PF,
German DemocraticRepubhc
Summary. For solving an equality constrained nonlinear least squares prob-

lem, a globalization scheme for the generalized Gauss-Newton method via
damping is proposed. The stepsize strategy is based on a special exact penalty
function. Under natural conditions the global convergence of the algorithm
is proved. Moreover, if the algorithm converges to a solution having a suffi-
ciently small residual, the algorithm is shown to change automatically into
the undamped generalized Gauss-Newton method with a fast linear rate
of convergence. The behaviour of the method is demonstrated on hand of
some examples taken from the literature.
Subject Classifications: AMS(MOS): 65K10, 90C30; CR: G 1.6.
1 Introduction
In this paper we consider the problem of minimizing the Euclidean norm of

a vector-valued nonlinear function subject to nonlinear equality constraints
min { []F(x)[I 2: G(x) = 0}, (NLQEC)
where F: R " ~ R", G: R " ~ R t with m + l >n ~ I, and 11"I] denotes the Euclidean
norm. Problems of such type arise, e.g., in implicit parameter estimation prob-
lems, see [-4], or in discretized control and identification problems, cf. [-2].
Local solutions x , of N L Q E C can often be computed in an efficient way
by the generalized Gauss-Newton method. In this method, the current iterate
xk is improved by solving the corresponding linearized problem in which F
and G are replaced by its first order Taylor expansion at xk. Under natural
conditions, that method is locally convergent with a linear rate of convergence
if I[F(x,)ll is sufficiently small and if the nonlinearities in the problem functions
are moderate. The rate of convergence is ultimately quadratic if F ( x , ) = 0 .
Though the zero-residual case is an idealized situation in real applications as
nonlinear parameter estimation, it should be taken into consideration as limit
case of error free observations. Moreover, the small residual condition just men-
592 O. Knoth
tioned guarantees that a stationary point is a strong local minimizer and stable
under small perturbations. These properties, however, are in a certain sense
necessary for a parameter estimation problem to be well posed, compare [3]
for a detailed discussion
Some ways for globalizing the generalized Gauss-Newton method have been
proposed and tested in [6] and in [11]. Both methods work well in practice.
Whereas there is no strong theoretical justification of the method from [6],
recently the global convergence has been proved for the method of [11] in
the reports [13] and [25].
The aim of this paper is to describe an alternative globalization that allows
to prove strong global as well as local convergence results. As in the papers
mentioned above the damped Gauss-Newton method is taken as basic method
but here the damping factor is determined by using the exact nondifferentiable
penalty function
4,(x,#)=llF(x)ll +#lla(x)ll, ~>0.
This penalty function is taken as a line search function as it has been done
in existing sequential quadratic programming algorithms for solving general
nonlinear optimization problems, see [9, 15, 20 and 21]. In contrast to these
authors who in, general, use the /1-norm for the penalty term we choose the
Euclidean norm which seems to be more natural in the least squares context.
Two problems are connected with the use of ~9 as a line search function.
The first of these concerns rules for choosing the penalty parameter ~ which
should ensure that the solution of the linearized problem is a descent direction
for the penalty function and that global convergence is guaranteed. An updating
scheme for # proposed in [14] can be adapted to our case in a natural manner.
Secondly, the algorithm should share the good convergence properties which
the undamped Gauss-Newton method has in the small residual case. For the
new method the undamped Gauss-Newton step is shown to be accepted after
a finite number of steps in case of convergence toward a local minimizer with
sufficiently small ]lF(x,)]]. This assertion is not true when ~ is replaced by
02 (x, ~) = IIF(x)II2 + ~ 11a (x) ll.
The following second and third chapter have introducing character and give
some background material on local optimality conditions and the generalized
Gauss-Newton method. The mainpart of the paper is Chap. 4. There the globali-
zation scheme is described, a model algorithm is proposed, and a strong proof
of its global convergence is given. The behaviour of the model algorithm near
a local solution of N L Q E C is analyzed in Chap. 5 and, by a counterexample,
the difference between ~(x, p) and ~2(x, #) is demonstrated. Implementational
details and computational results are the content of Chap. 6.
In the following subscripts will denote the iteration index. The Jacobian
of F(x) and G(x) are denoted by F'(x) and G'(x), resp. For brevity we often
omit the arguments and write F=F(x), F'=F'(x), Fk=F(Xk) and so on. As
matrix norm we take always the spectral norm, and A + denotes the Moore-
Penrose inverse of a matrix A.
A Globahzation Schemefor the GenerahzedGauss-NewtonMethod 593
2 Optimality Conditions
In this chapter necessary and sufficient optimality conditions for N L Q E C are

summarized. These conditions play an essential role in the convergence analysis.
Throughout the paper we suppose that F and G are twice continuously differenti-
able and that the assumption
./V'(x)~
rank(G'(x))=l, ranK[G,(x))=n (a)
is satisfied for all x e R".

Now, let x , be a local solution of NLQEC, i.e., there holds
[If(x)[[ > [IF(x,)ll for all xsR" (2.1)
with G(x)= 0 and IIx-x, II sufficiently small. By using the Lagrangian

~ ( x , 2)= IIf(x)llz/2+2 T G(x), 2 e e t, (2.2)
the optimality conditions of N L Q E C have the following form, see [73.

Necessary first order conditions: There exists a uniquely determined Lagrange
parameter 2, such that
VxL f ( x , , 2,) = F'(x,) T F(x,) + G' (x,) v 2, = 0 (2.3 a)

and
5~ 2 , ) = G(x,) = 0 (2.3 b)
hold.
From (2.3a) the optimal Lagrange parameter 2, can be explicitly expressed
as
2,= ( ~ ' V t + F-'T P (2.4)
and therefore,(2.3a) is equivalent to
(I-- G, + G,) F, T F, = 0. (2.5)
A point x that satisfies the necessary first order conditions is said to be a

stationary point of NLQEC.
Necessary second order conditions: In addition to (2.3) there holds
yVV~xSP(x,,2,)y>O forally with G'(x,)y=O. (2.6)
If condition (2.6) is strengthened to the

Sufficient second order conditions: The pair {x,, 2,} satisfies the first order neces-
sary conditions (2.3), and there holds
yTF~x~'(X,,2,)y>O forally+0 with G'(x,)y=O, (2.7)

594 O. Knoth
then x . is a strict local solution of NLQEC, i.e., the inequality (2.1) holds in
the strict sense for x 4=x . .
As usual, we have denoted by Vx5 ~ and ~ 50 the gradient of 50 with respect
to x and 2, resp., and by V~ 50 the Hessian with respect to x.
3 The Generalized Gauss-Newton Method
In the generalized Gauss-Newton method a correction d is determined to

improve a current point x by solving a first order approximation to NLQEC.
This approximation is obtained by replacing the functions F and G by its first
order Taylor expansions at x. Thus, the correction d is defined as solution
of the linear equality constrained least squares problem
min{llF'(x)d+F(x)ll: G ' ( x ) d + G ( x ) = O , deR"}. (LLQEC)
The solution d is unique because of the full rank assumption (A), c.f. [10].
The improved iterate x + is then given by
x+=x+d (3.1)
Next we are going to derive an explicit representation of the correction

d which is useful for the further investigations. We start with the first order
optimality conditions for L L Q E C which read as
F'V F ' d + G ' T v + F ' T F = O (3.2a)

G'd+G=O (3.2b)
where v denotes the Lagrange parameter belonging to d. By using known tech-

niques from the theory of pseudo-inverse matrices, the general solution of (3.2 b)
can be expressed in the form
d = - G ' + G + ( I - G ' + G')y, yER" arbitrary.
If we substitute this expression into (3.2a) and multiply from left by the orthopro-
jector E = I - G' + G' we obtain the equation
(F' E) T (F' E) y = (F' E) T (F' G' + G - F)
for y. Since the solution d is unique, it is sufficient to find an arbitrary solution

of the last equation. Such a solution is given by
y = ( F ' E ) + (F'G '+ G - F ) .
This leads, finally, to the explicit representation
d = - G '+ G + ( F ' E ) + ( F ' G ' + G - F ) (3.3)

A Globalizatlon Scheme for the Generalized Gauss-Newton Method 595
where we have used the identity E(F'E) + = ( F ' E ) +. Let us still remark that
if the full rank assumption (A) is not satisfied the correction d defined by (3.3)
is the unique solution of the more general linear least squares problem
d=argmin{llPll : P~P2} (3.4)

with
P2={peP~: p minimizes IIF'p+FII}
and
P1 = {pc R": p minimizes 1tG'p + G II},
c.f. [5].
In order to investigate the local behaviour of the generalized Gauss-Newton
method we consider (3.1) as fixed point iteration
x + = x + d(x) = T(x) (3.5)
and apply the Ostrowski theorem to (3.5), c.f. [19] for a formulation of this
theorem. Let us note that, under the assumption (A), x , is a fixed point of
T if and only if x , is a stationary point of NLQEC. Since the Ostrowski theorem
requires T to be continuously differentiable in a neighbourhood of the fixed
point x , we have to show that d is a continuously differentiable function of
x. This property, however, immediately follows from the fact that the pair {d, v}
solves the linear system (3.2) and that the matrix of that system is regular and
continuously differentiable due to the assumption above. Moreover, v = v(x) has
the same smoothness properties as d(x) has. Therefore we can apply Ostrowski's
results which lead to the following theorem. Since Ostrowski's results are only
local ones it sufficies to require smoothness and full rank assumption only in
a neighbourhood of a stationary point x , .
Theorem 3.1. Suppose that F and G are twice continuously differentiable in an
open neighbourhood of a point x , , let x , satisfy the necessary first order optimality
conditions and the full rank assumption (A). Then the generalized Gauss-Newton
method
Xk+l=Xk+d(Xk), k = 0 , 1,2 . . . . ,
Xo given, is well defined and converges towards x , if the spectral radius p, of

the matrix T'(x,) is less than 1 and if [IXo-X,l[ is sufficiently small. The rate
of convergence is Q-linear with convergence rate p,. I f in addition, T ' ( x , ) = 0
then the convergence is Q-quadratic.
In the following we wish to express T'(x,) in terms of F and G and its
derivatives at x , . For this reason we implicitly differentiate Eq. (3.2) with respect
to x at x , , where d and v are considered as functions of the variable x. Taking
into consideration that d ( x , ) = 0 this leads to the linear system
(F,-r ~,
~G',
,,7- [T,~+(F, oF, + G , ov, = 0
\v',] ~0 ) (3.6)
596 O. Knoth
where
F"(x)oF(x)= ~ Vx~Fi(x) F~(x)
and
l
6"(x)ov(x)= ~ vs O,(x) v,(x),
1=1
resp. The coefficient matrix of (3.6) coincides with the one of (3.2) for x = x ,
so that the solution can be obtained in the same way as d and v. Therefore,
with
~, = & = - ( a ; + ) T F; ~ V.
from (3.2 a) we get
T;' = - [(F, e , ) T (F, E,)] + IF," oF, - c v. ". o(~'

, v , +~T
, .~"v
, F,]. (3.7)
F r o m (3.7) it is seen that the convergence condition
p, = p(T'(x,)) < 1
is satisfied if llF, II is sufficiently small. In the unconstrained case l = O, this asser-

tion reduces to the well known small residual condition
p((V. T r.)- ' f . ' o V . ) < 1.
Let us remark that the eigenvalues z of T, are just the eigenvalues of the general-
ized eigenvalue problem
FV, (F,'o F, + G, o2,) E, z = z E-~ -,F'T .F', E, z. (3.8)
By adding E ,v F, T F, E , z , o n , both sides of (3.8), we obtain
~,~ vL ~e(x,. &) E . z=(1 +~) E ,~ F, T F, E , ~. (3.9)
Now, p , < 1 is equivalent to ]~[ < 1 and, therefore, implies 1 + z > 0 , Since R(E,)
= k e r ( G , ) and R(E,)c~ k e r ( F , ) = {0} because of (A), we see that the convergence
condition p , < 1 implies the second order sufficient condition to be satisfied.
As in the unconstrained case, the inverse assertion is, unfortunately, not true.
The second order sufficient condition does not imply, in general, that p , < 1
holds. In order to guarantee local convergence toward a stationary point x ,
at which p , < 1 is not satisfied, a sufficiently good approximation of the second
order terms F"oF and G"ov must be incorporated into the algorithms. This
situation corresponds to the "large residual case" for unconstrained problems,
A G l o b a l i z a t i o n Scheme for the Generalized G a u s s - N e w t o n M e t h o d 597
4 The Globalization Scheme
To make the generalized Gauss-Newton method an efficient tool for solving

N L Q E C a line search must be included to enforce global convergence for bad
starting points. The new iterate is now determined as
x+ =x+ad, ~e(0, 1],
where the damping factor ~ is choosen by reducing an appropriate merit func-

tion. For this purpose we use the exact nondifferentiable penalty function
~9(x, # ) = []F(x)H + # [IG(x)][, /~>0. (4.1)
If # is sufficiently large, a local minimizer x of ~ is also a solution of NLQEC.

In order to reduce ~ with an ~ > 0 the search direction d must be a descent
direction for r as far as x is not a local minimizer of NLQEC. As we shall
see below the generalized Gauss-Newton direction d has this property provided
that # is large enough.
Since an exact minimization of @(x + ~d, p) with respect to ~ is too expensive
and not necessary at all, we use a finite line search algorithm that compares
the actual reduction of ~ with the reduction predicted by the linearized model.
Line search: Let d=d(x)+O (otherwise x would be a stationary point), and
let # > 0 be so large that
~(x, # ) - ~(x, d, #)> 0 (4.2)
holds, see the comment below. Then choose ~e(0, 1] sufficientiy large such that
0 (x, #) - ~ (x + ~ d, #) > 6 [~ (x, p) - (b (x, e d, p)] (4.3)
with fixed 6e(0, 1) where
q~(x, p, # ) = HF'(x) p + F(x)[[ +#H G'(x) p+ G(x)][ (4.4)
is the penalty function for L L Q E C constructed in the same manner as for

NLQEC.
Condition (4.2) implies d to be a descent direction for ~ since ~(x, ~d, #)
is convex with respect to ~ and, therefore,
lim ~k(x + c~d, # ) - ~p(x, #) = lim ~ (x, c~d, p) - ~ (x, #) __<q~(x, d, p) - ~ (x, #) < O.
~0 ~ a~0
Now we discuss how the penalty parameter p has to be choosen such that
(4.2) is valid. The derivation of a lower estimate for /~ goes in the spirit of
[21] but uses the special structure of our penalty function and the explicit
representation of the correction d whereas in [21] the necessary optimality condi-
tion of the linearized problem expressed in terms of d and the corresponding
Lagrange parameter v is used. Let us suppose for the moment that
598 O. Knoth
(IIFII+ IIF'd+F[I) IIGI[>0 holds. Then, by using the projector P=(F'E)+F'E,

the difference ~k(x, ~t)- ~(x, d, #) can be written as
0(x, ~ ) - ~ ( x , d,/~)= IIFII- IlF'd+ FII +/~ IIG[[

tlFII2-11F' d + FIt 2
-- +~IIGII (4.5)
IIFIl + f ' d + FII
lleFll z [F+(I--P)(F-F'G'+ G)] T
= ]IFI[ + IIF'd+FII ~ IlVl[ + ] I ( I - - P ) ( F - - F ' G '+ G)]I
. (I_P)F,G,+ G [IG[] +,u IIG]].
When we define
o~(x) =
{ [ F + ( I - P ) ( F - F ' G ' + G)]T ( I - P ) F ' G ' + G
0
([IFH + IlF'd+ Ell) [{Gll
if (IIFII+IIF'd+FII) IIGI[OeO,
otherwise,
then, from the identity (4.5), we obtain the following desired result.
Lemma 4.1. If x is not a stationary point of NLQEC and if
holds then we have •(x, # ) - ~ ( x , d, #)>0.

Proof In case (llF[J + ][F'd+FH) ]tGI]> 0 the assertion follows directly from (4.5).
If HGI]=0 then HPF[I>0 since otherwise x would be stationary. If
][Fp]+ []F'd+Fl[ = 0 then [[G[]> 0 for the same reason.
It remains to show that an ~e(0, 1] can be choosen such that (4.3) is satisfied.
If, for simplicity of presentation, F' and G' are both supposed to be Lipschitz
continuous, such values of ~ can be found, indeed, where instead of the general-
ized Gauss-Newton direction d an arbitrary direction s that satisfies (4.2) with
d = s can be used.
Theorem 4.1. Let x ~ R n, s~ R", and I~> 0 be given such that
O(x,~)-~(x,s,~)>O
holds, and let F'(x) and G'(x) are Lipschitz continuous with constants Le and
Lo, resp. Then there exists a number qE(0, 1] such that
O(x,#)-O(x+es,#)>6[O(X,l~)-Cb(x, es,#)] for all c~e(0, r/]. (4.6)
The number r/is bounded below according to
2(1-6) O(x,p)--~(x,s,p)~ (4.7)

q>~=min 1, Lv+laL ~ llsll2 j.
A Globalization Scheme for the Generalized Gauss-Newton Method 599
The p r o o f is straightforward and similar to the p r o o f of L e m m a 6.2.2 in

[24].
N o w we are able to formulate a model algorithm. F o r updating the penalty
p a r a m e t e r # we follow the ideas of [14], and the d a m p i n g factor a is updated
as in [24].
Model Algorithm
Step 0: Choose 6~(0, 1), xoeR", 7e(0, 1), ill,/i2~(0, oo) with fi2>fiv Set c~_ 1 = I,
P - 1 = 0, k = 0. Calculate Fo = F (Xo) and Go = G (Xo).
Step 1 : Calculate F~ = F'(Xk) and G~,= G'(Xk).
Step 2: C o m p u t e dk as solution of L L Q E C for x = Xk.
If dk = 0 then stop (Xk = X. stationary).
Step 3: U p d a t e the penalty p a r a m e t e r Pk according to
~'#k- 1 if /'/k- 1 ~ [('Ok]+ 1~1

#k = (IC~ + fi2 otherwise.
Step 4: U p d a t e ~, according to a k e [ m i n {ak_ I/~', 1}, 1].

Step 5: While
O(Xk,#k)--O(Xk+akdk,#k)<6[O(Xk,#k)--q~(Xk, akdk,#t)] (4.8)
do ak = ~ ak.
Step 6: Set Xk +1 = XR+ ak dR, k ~ k + 1, and go to Step 1.

F o r the model algorithm the following convergence t h e o r e m is valid.
T h e o r e m 4.2. (i) Let F(x) and G(x) be twice continuously differentiable, and sup-
pose that assumption (A) is satisfied. Then the model algorithm is well defined
for any starting point Xo, i.e., it either terminates after a finite number of steps
with a stationary point Xk = X., or else it is infinite, and there holds dk :~ 0 for
all k.
(ii) I f the algorithm is infinite and, additionally, all iterates x k belong to a compact
set W on which F' and G' are both Lipschitz continuous, then there holds
lim (lJ [I - G'(Xk) + G'(xk)] f'(xk) T f(xk)]l + ]lG(XR)[I)= 0 .

k o oo
This implies that each accumulation point of {Xk} is stationary.

Proof. (i) We have to show that Xk+l is well defined if Xk is not stationary.
This, however, is an immedia.te consequence of the strategy for choosing #k
in the model algorithm. Let us remark, that the existence of q = rl(Xk) in Theo-
rem 4.1 already follows from the continuity of F' and G'.
600 O. K n o t h
(ii) Let {Xk} be infinite and belonging to a compact set W. Then dk@O for
all k. By definition Ico(x)l is b o u n d e d by
s = ]l(l - P(x)) F' (x) a' (x) + II
and, therefore, I(~(x)l < max {f2(x): x e W}; since f2 is continuous on the compact
set W. As a consequence, there holds /~k =/1 for all k > k0 and sufficiently large
ko due to the updating algorithm for the penalty parameter/~k" This implies
/ z - I~Okl~ f i l for k > k o (4.9)
T o simplify the proof we restrict ourselves to the case n > l > 0 and ]tF (Xk)11+ 0
and ][G(Xk)H4=0 for all k. Otherwise some obvious modifications are necessary.
F r o m inequality (4.8) we obtain O(Xk, p)--O(Xk+Otkdk,#)>6etk[O(Xk, l.t)
--@(Xk, dk, #)]- N o w we are going to show that the ek are bounded below by
some positive constant. Then, obviously, lim [O(Xk, #)--(b(xk, dk, #)] = 0 holds.
k~oo
In order to prove Ctk_-->C~>0 for k>ko we use Theorem 4.1 to derive a lower
bound for the ratio [0(Xk, #)--~(Xk, dk, /~)]/l[dk[[ 2 Assumption (A) together with
the continuity of F' and G' implies the validity of
ml ]lzIP=
< (F'(x)~
\G'(x)] z < m l ]lzll Vz~R", VxeW,
and
m2 Ilyil ~ IIG'(x)V Yll N M 2 IlYlI, VY eR~, Vxe W,, (4.10)
with positive constants mi, Mi. F o r simplicity of notation let us drop the index
k. Since I - G' + G' is an orthogonal projector the relation
is valid, and (4.10) yields [LC+ ]1< 1/ml with C = F'(I - G' + G'). So we can conclude
further
tldll = = IIC + F + ( I - C + F') G '+ GJI2 ~ 2(1tC + C+TCTF[12+ [1(I-- C + F') G '+ GI] 2)
< 2/m'~ [[C s V[[ z + (1 + M1/ml)2/m~ I[G I[2,
]tV'd + rl[ = [](I--P)(F-- F' G '+ G)I[ =< HFI[ + Ma/mz I[Gt],
and
IIPFH = HC + CT FII >=llCWFll/]lcll >=IICT FII/MI.
Combining these three inequalities with the identity (4.5) and (4.9) we obtain
~(x, l t ) - @ ( x , d , p ) > l _ m i n ~ m~ ~1 _}
[ld]t 2 = 2 M2(2 IlF[I +M1/m2 IIG[[)' [(1 +M1/mi)2/m22] IIGII
(4.11)
A Globalization Schemefor the Generalized Gauss-Newton Method 601
where we have used the inequality (a+b)/(c+d)>min{a/c, b/d}, c>0, d > 0 .

Since IIFk}l<O(Xo,/0 and ][Gkll<O(xo, p)/p this leads finally to [qt(xk,#)
- ~ ( X k , rig, lZ)]/]ldk]l 2> fl where fl is a positive constant which does not depend
on k. Together with the rule for updating ~k and Theorem 4.1 this implies
2 ( 1 - 6 ) fl}7.
~k>min 1, LF+I~L G
Therefore, lim [~(Xk, p)--~(X~, dk, #)]--0 holds. If we use again identity (4.5)
k~oo
we can easily verify the desired result
lim (ll [I - G' (Xk) + G' (Xk)] F' (xk) ~- F (xk) ll + IIG (Xk) ll) = O.
k~oo
5 The Local Behaviour of the Model Algorithm
We now investigate the behaviour of the model algorithm if the sequence {Xk}
converges to a point x , which is necessary stationary, i.e., it satisfies the first
order conditions.
Theorem 5.1. Assume that the sequence {Xk} generated by the model algorithm
converges to a point x , and that the assumption of Theorem (4.2 (ii) are fulfilled.
I f now IlF(x,)ll is sufficiently small then ek = 1 for all sufficiently large k, and
{Xk} converges Q-linearly toward x , .
Proof Suppose that x + x , lies in a sufficiently small neighbourhood of x,.
We consider the quotient
tp(x + c~d, # ) - q~(x, c~d, #)
Q= (5.1)
~(x,/~)- ~(x, ~xd,p)
where d is the solution of LLQEC at x. Due to the Lipschitz continuity we

have
t~(x + c~d, ~t)-~(x, c~d, ll) <=c~Z(Lv+ #L6) I]dlt2/2.
From this inequality we obtain together with the convexity of cb with respect
to e and (4.11)
1 Ildll 2
Q < ~- (LF + #LG) ~p(x, ,u)-- ~(x, d, p)
< (L F + pLy) max ) "-M12(2 IJFI} + M ~/m2 IIG II) , [-(1 + Mt/m-O2/mZ2]Nl IIG II}. (5.2)
- ( m4
602 o. Knoth
Since IIF(x,)ll is, as assumed, sufficiently small and G ( x , ) = 0 , we can conclude

the existence of an index kl such that
~(Xk +O~dk, #)-- ~(Xk, O~dk,#)<(1--6), k >~kl, c~(0, 1].

0 (Xk, #)-- ~ (Xk, O~dk,#) --
The last inequality is only another form of the test (4.8). It is now evident
from Step 2 that ~k = 1 for all k > k2 > kl. We will yet remark that in the uncon-
strained case (l=0) Theorem 5.1. can even be proved if p , < ( 1 - 6 ) where p ,
is the spectral radius of the matrix T'(x,) from (3.7), see [-22].
In the following we will discuss another penalty function, namely
O2(x, # ) = IlF(x)ll2 + # llG(x)lI.
This function corresponds to the penalty function f ( x ) + p I[G(xlJ] which is usual-

ly taken in the case of the equality constrained optimization problem rain {f(x):
G(x)=0}. It is also possible to design a model algorithm along the lines of
Sect. 4 for this exact penalty function and to prove global convergence. Unfortu-
nately, the same is not true for the local results stated in Theorem 5.1. as the
following example shows.
Example 5.1. Consider the problem with
n=2, m = l = 1, X = ( X 1 , x 2 ) T ~ R 2,
F(x)= xl, G(x)= x2 + xaz+ X2.
The global minimum is attained at x , = ( 0 , 0) v. The assumption (A) and the

smoothness conditions are satisfied in a neighbourhood of x , . Suppose that
x + x , is a feasible point sufficiently near x , . If we take into consideration
that x is a feasible point one gets after some calculations
llf(x)ll z + # IIG(x)fl = x 2 = IXz[ (x 2 + 1),
dl = - X l
2x2 x2a + x2
d2- 3x22+1 - - 2 ~3 x 2 +
and
x~(x 2 - 1 ) 3 x2z - 1
IiF(x +d)il2 +#llG(x +d)lt=#fx21 ~ - 3 x 2 ~ 4 - 1 ~ 2 '
from which it is seen that the ratio of this two expressions tends to 1//2 if
x2 tends to zero. Therefore it is impossible to prove that even ~k2(x+d, #)
< ~2 (x, #) for all feasible x sufficiently near x , if # is chosen greater than one.
A Globalization Scheme for the Generalized Gauss-Newton Method 603
6 Implementation and Numerical Results
We will now further specify our model algorithm described in Chap. 4. The
solution of the linearized problem L L Q E C in Step 3 is obtained by the algorithm
LSE, described in [10, p. 139], whereby a generalized solution is computed
in the sense of (3.4) if the matrices are rank deficient. The calculation of the
value co(x) can be done simultaneously as will be sketched in the following.
As it is seen from the solution formula (3.3) d can be represented in the form
d=de+d~ with dF=A 1F, dG=A 2 G

and certain matrices A~. Therefore there holds
IlFl[ 2 - [IF'd + F][2= - d ~ F'V F' d e - 2 FT F' de

-d~ F 'T
F ' d e - 2 d eq- F , v F , d o - 2 F T
F t de.
The value co(x) can now be calculated as
- dG F IFT ' d ta - 2 de F', T' F 'ld o - 2 F 'T F'r de if the denominator is not zero
co(x)=10~ (IIFI[ + [[F'd+F[[)(][G[I--[[G'd+GlJ)
otherwise
The quantities arising on the right hand side can be calculated as a byproduct
of the algorithm LSE mentioned above with only small additional cost.
Instead of the simple stepsize strategy used in the model algorithm, in the
implemented version a more sophisticated strategy has been incorporated. The
ideas for this refinement come from the trust region approach as another well
known globalization scheme in nonlinear optimization, cf. [8]. Step 4 to 6 of
the model algorithm read now as follows where additional constants
0<61<&2<1 and 0 < 7 1 <72 < 1 <'y3 <~4.

are used.
Step 4: l f k > O update ak=max{ak_ I []dk l ll/Ijdki], 71 ak- ~},
Step 5 : aN = ak
While
4'(xk, ~k)- 4' (xk + aN dk, ~) >_-62 [4' (x~, ~ ) - - ~(x~, as d~, ~z~)]
do ao = as ;ase [min {73 aN, 1}, min {74 as, 1}]
if
4' (Xk, IJk)-- ~ (Xk + aS dR, I~k)>=&1[4' (Xk, Pk)-- ~(Xk, aS dk, /~k)]
then ak = as
else ~k = ao
While
4' (x,, u , ) - 4' (x, + a, d,, U,) < &, [4' (x,, ~ , ) - ~(x,, a~ d,, ~k)]
do ake[71 ak, 72 ak]
604 O. Knoth
Here the aN and c~k in the intervalls specified above are determined by minimizing
an approximation of the scalar function
I]F (x k -~ O~dk)J[§ k [[G (Xk + c~dk)][

with respect to ~. The approximation is obtained by replacing each component
o f F and G by a quadratic polynom, c.f. also [1] and E12]. If h(x) is a component
of F or G and
(p(oO= h(xk + C~dk)
the quadratic interpolates the values ~o(O)=h(Xk), (p'(O)=h'(xk)da and ~0(ak)

= h(Xk + aa da). This approximation is piecewise differentiable and it is minimized
by an algorithm proposed in [17].
The complete algorithm has been tested with the following 5 problems.
P1 (see [14]):
n=2, m = 2 , 1 = l
Fl (x)= xl
F2(X)=4x2
G l (X)= X 1 - x 2 - 2
Starting point X o = ( - 2 , - 1 ) v
P2 (reported in [6] for the model function
q ( x , t ) = X 1 (t 2 § X 2 t)/(t 2 + x3 t + x4) and data (ti, Yl) given there):
n=4, m=9, 1=2
F, (x) = q (x, t,)-- Yi, i = 1, 9
Gl (X)=q(x, t l ) - Y2
G2(x)=q(x, tlo)--ylo
Xo =(0.25, 0.39, 0.415, 0.39) v
P3 (see [18]):
n=5, m=4, l=3
Fl(x)=xl-x2
F2(x)= x2-- x3
F3(x)=(x 3 - x 4 ) 2
F4(x)=(x4--xs) 2
G1(x)=xl + x 2 + x 2 - 3
G z ( X ) = X 2 - - X 2 § X4 - 1
G3(x)--- x I X 5 -- 1
XoF=(2, 2, - 1 , 0.586, 0.5)T(feasible)
x~=(2, 2, --1, 0.586, 5)V(infeasible)
P 4 (see [16]):
n--5, m = 4 , l = 2
FI(x)=xl--Xz
F 2 (x) = x3 - 1
A G l o b a l i z a t l o n Scheme for the Generalized G a u s s - N e w t o n M e t h o d 605
F 3 ( x ) = ( x 4 - 1)2
F4(x)=(x 5 - - 1) 3
G1 (x)= x4 x~ + s i n ( x 4 - x s ) - 1
G z ( x ) = xz + x'~ x 2 - 2
Xo = (0.5, 1.75, 0.5, 2, 2) v
P5 (see [23]):
n=4, m = 4 , 1 = 2
FI (x)= x 1 - 2.5
F 2 (X) = X 2 -- 2.5
F4 (x) = x 4 + 3.5
o, (x)= x~ + x~ + x~ + x1+ x , - x2 + x 3 - x 4 - 8
G2(x)= 2x2 + x22+ x~ + 2 x l - x 2 - x 4 - 5
Xo = (0, O, O, O)v
In [23], this problem is actually an inequality constrained problem, where we
have here only used the active constraints at the solution point as equality
constraints.
The numerical results computed on an EC 1040 in double precision are
listed in the following table.
P JlF(Xo)ll IlF(x,)ll IlG(xo)ll IIa(x,)l] ~, NI NF T
1 2.83 E 0 2.00 E 00 5.00 E 0 3.50 E-12 2.00 I1 15 2

2 5.48 E-2 2.03 E-02 4.81 E-2 1.74 E-17 2.i2 26 28 2
3~ 3.54 E 0 3.99 E-30 2.14 E-4 2.16 E-15 1.61 9 11 1
3r 1.98 E I 1.73 E-26 9.00 E 0 1.34 E-13 1.27 8 10 1
4 1.83 E 0 2.80 E-09 3.36 E-9 2.80 E-09 1.07 15 16 1
5 894E0 6.00 E 00 9.43 E 0 3.67 E-13 1.37 49 64 2
In the table the problem number P, the starting and final values of F and
G, the final value /~, of the penalty parameter, the number NI of iterations,
the number N F of evaluations of the functions F and G, and the reason T
for terminating the algorithm are given. The iteration is terminated if one of
the following criteria is satisfied:
TI: IlGklloo< 10 -8, ]lFkl]~< 10 -8

T2: ]ldkJ]~o< 10 -8
T3: x = Xk, xold = Xk- 1
]xl-xoldil <=10 -s max {Ixil, 1}, i = 1. . . . . n

606 O. Knoth
In the algorithm e x a c t first d e r i v a t i v e s a r e u s e d . T h e o t h e r p a r a m e t e r s u s e d

in the algorithm a r e 6 1 = 0 . 1 , 6 2 = 0 . 8 , / i 1 = 1 , 172=2, , / 1 = 0 . 3 , ~ 2 = 0 . 7 , ~ 3 = 2 ,
~)4 = 4 .
References
1. A1-Baali, M., Fletcher, R.: An efficient line search for non-linear least squares. J. Optimization
Theory Appl. 48, 359-77 (1986)
2. Bock, H.G.: Recent advances m parameteridentification techniques for o.d.e. In: Deuflhard, P.,
Hairer, E. (eds.) Numerical treatment of inverse problems in differential and integral equations.
Prog. sci. comput., pp. 95-121. Boston: Birkhaeuser 1983
3. Bock, H.G.: Randwertproblemmethoden zur Parameteridentlfiz]erung in Systemen nlchtlinearer
Differentialgleichungen. Bonner Mathematische Schnften, Nr. 183. Bonn 1987
4. Britt, H.I., Luecke, R.H.: The estimation of parameters m nonlinear impticlt models Technometrics
15, 233-247 (1973)
5. Campbell, S.L., Meyer, C.D.: Generahzed reverses of Iinear transformations. London: Pitman
1979
6. Deuflhard, P., Apostolescu, V.: An underrelaxed Gauss-Newton method for equality constrained
nonhnear least squares problems. In: Proc., IFIP Conf. Opt. Tecb. Part 2. Balakrishnan, A.V.,
Thoma, M. (eds.) Lect. Notes Control Inf. Sci, Vol. 7, pp. 22-32. Berlin Heidelberg New York:
Springer 1978
7. Fletcher, R_: Practical methods of optimlzation, Vol. 2, constrained optimization. New York-
Toronto: Wiley 1982
8. Fletcher, R.: An li penalty method for nonlinear constraints. In: Boggs, P.T., Byrd, R.H., Schnabel,
R.B. (eds.) Numerical optimization 1984, pp. 2640. Philadelphia: SIAM Publications 1985
9. Han, S.-P.: A globally convergent method for nonlinear programming. J. Optimization Theory
AppI. 4, 297-309 (1977)
10. Lawson, C.P., Hanson, R.J.: Solving least squares problems. Englewood Cliffs: Prentice-Hall 1974
11. Lindstroem, P.: A general purpose algorithm for nonlinear least squares problems with nonhnear
constraints. University of Umea, Report UMINF-103.83 (1983)
12. Lindstroem, P., Wedin, P.-A.: A new linesearch algorithm for nonhnear least squares problems.
Math. Program 29, 268-296 (1984)
13. Lindstroem, P., Wedin, P.-A.: Methods and software for nonlinear least squares problems. Umver-
sity of Umea, Report UMINF-133.87 (1987)
14. Mayne, D.Q., Maratos, M.. A first order, exact penalty function algorithm for equality constrained
optimization problems. Math. Program 16, 303-324 (1979)
15. Mayne, D.Q., Polak, E.: A superlinearly convergent algorithm for constrained optimization prob-
lems. In: Algorithms for constrained minimization of smooth nonhnear functions. Math. Program.
Study, Vol. 16, pp. 45-61. Amsterdam: North Holland
16. Miele, A., Huang, H.Y., Heidemann, J.C.: Sequential gradient restoration algorithm for the minimi-
zation of constrained functmns - ordinary and conjugate gradient versions. J. Optimization Theory
Appl. 4, 213-243 (1969)
17. Mifflin, R.: Statlonarity and superlinear convergence of an algorithm for umvanate locally Lip-
schitz constrained minimization. Math. Program 28, 50-71 (1984)
18. Mukai, H., Polak, E.: On the use of approximations in algorithms for optimization problem
with equality and inequality constraints. SIAM J. Numer. Anat. 15, 674-693 (1978)
19. Ortega, J.M., Rheinboldt, W.C.: Iterative solution of nonlinear equations in several variables.
New York: Academic 1970
20. Powell, MJ.D.: A fast algorithm for nonlinearly constrained optimization calculations. In: Watson,
G.A. (ed.) Numerical analysis Proceedings Dundee 1977. Lecture Notes in Mathematics 773.
Berlin Heidelberg New York: Springer 1980
21. Powell, M.J.D.: Variable metric methods for constrained optimization. In: Bachem, M., Grotschel,
M., Korte, B. (eds.) Mathematical Programming: The State of the Art, pp. 288-311. Berlin Heidel-
berg New York: Springer 1983
22. Ramsin, H_, Wedin, P.-A.: A comparison of some algorithms for the nonlinear least squares
problem. BIT 17, 72-90 (1977)
A Globalization Scheme for the Generahzed Gauss-Newton Method 607
23. Rosen, J.B., Suzuki, S.: Construction of nonlinear programming test problems. Comm. ACM
8, 113 (1965)
24. Schwetlick, H.: Numerische L6sung mchtlinearer Glelchungen. Berlin: VEB Deutscher Verlag
der Wlssenschaften 1979
25. Wedin, P.-A.: On the use of a quadratic merit function for constrained nonlinear least squares.
University of Umea, Report U M IN F- 135.87 (1987)
Received December 21, 1987/February 13, 1989

Numerische Mathemaljk: A Globalization Scheme For The Generalized Gauss-Newton Method

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Numerische Mathemaljk: A Globalization Scheme For The Generalized Gauss-Newton Method

Uploaded by

Copyright:

Available Formats

Numer. Math.

56, 591-607 (1989) Numerische

Summary. For solving an equality constrained nonlinear least squares prob-

In this paper we consider the problem of minimizing the Euclidean norm of

min { []F(x)[I 2: G(x) = 0}, (NLQEC)

02 (x, ~) = IIF(x)II2 + ~ 11a (x) ll.

In this chapter necessary and sufficient optimality conditions for N L Q E C are

is satisfied for all x e R".

[If(x)[[ > [IF(x,)ll for all xsR" (2.1)

with G(x)= 0 and IIx-x, II sufficiently small. By using the Lagrangian

the optimality conditions of N L Q E C have the following form, see [73.

VxL f ( x , , 2,) = F'(x,) T F(x,) + G' (x,) v 2, = 0 (2.3 a)

and therefore,(2.3a) is equivalent to

(I-- G, + G,) F, T F, = 0. (2.5)

A point x that satisfies the necessary first order conditions is said to be a

If condition (2.6) is strengthened to the

yTF~x~'(X,,2,)y>O forally+0 with G'(x,)y=O, (2.7)

3 The Generalized Gauss-Newton Method

In the generalized Gauss-Newton method a correction d is determined to

min{llF'(x)d+F(x)ll: G ' ( x ) d + G ( x ) = O , deR"}. (LLQEC)

Next we are going to derive an explicit representation of the correction

F'V F ' d + G ' T v + F ' T F = O (3.2a)

where v denotes the Lagrange parameter belonging to d. By using known tech-

d = - G ' + G + ( I - G ' + G')y, yER" arbitrary.

(F' E) T (F' E) y = (F' E) T (F' G' + G - F)

for y. Since the solution d is unique, it is sufficient to find an arbitrary solution

y = ( F ' E ) + (F'G '+ G - F ) .

This leads, finally, to the explicit representation

d = - G '+ G + ( F ' E ) + ( F ' G ' + G - F ) (3.3)

d=argmin{llPll : P~P2} (3.4)

x + = x + d(x) = T(x) (3.5)

Xo given, is well defined and converges towards x , if the spectral radius p, of

F"(x)oF(x)= ~ Vx~Fi(x) F~(x)

from (3.2 a) we get

T;' = - [(F, e , ) T (F, E,)] + IF," oF, - c v. ". o(~'

F r o m (3.7) it is seen that the convergence condition

is satisfied if llF, II is sufficiently small. In the unconstrained case l = O, this asser-

p((V. T r.)- ' f . ' o V . ) < 1.

FV, (F,'o F, + G, o2,) E, z = z E-~ -,F'T .F', E, z. (3.8)

By adding E ,v F, T F, E , z , o n , both sides of (3.8), we obtain

~,~ vL ~e(x,. &) E . z=(1 +~) E ,~ F, T F, E , ~. (3.9)

4 The Globalization Scheme

To make the generalized Gauss-Newton method an efficient tool for solving

x+ =x+ad, ~e(0, 1],

where the damping factor ~ is choosen by reducing an appropriate merit func-

~9(x, # ) = []F(x)H + # [IG(x)][, /~>0. (4.1)

If # is sufficiently large, a local minimizer x of ~ is also a solution of NLQEC.

0 (x, #) - ~ (x + ~ d, #) > 6 [~ (x, p) - (b (x, e d, p)] (4.3)

with fixed 6e(0, 1) where

q~(x, p, # ) = HF'(x) p + F(x)[[ +#H G'(x) p+ G(x)][ (4.4)

is the penalty function for L L Q E C constructed in the same manner as for

(IIFII+ IIF'd+F[I) IIGI[>0 holds. Then, by using the projector P=(F'E)+F'E,

0(x, ~ ) - ~ ( x , d,/~)= IIFII- IlF'd+ FII +/~ IIG[[

. (I_P)F,G,+ G [IG[] +,u IIG]].

{ [ F + ( I - P ) ( F - F ' G ' + G)]T ( I - P ) F ' G ' + G

holds then we have •(x, # ) - ~ ( x , d, #)>0.

O(x,#)-O(x+es,#)>6[O(X,l~)-Cb(x, es,#)] for all c~e(0, r/]. (4.6)

The number r/is bounded below according to

2(1-6) O(x,p)--~(x,s,p)~ (4.7)

The p r o o f is straightforward and similar to the p r o o f of L e m m a 6.2.2 in

If dk = 0 then stop (Xk = X. stationary).

Step 3: U p d a t e the penalty p a r a m e t e r Pk according to

~'#k- 1 if /'/k- 1 ~ [('Ok]+ 1~1