Professional Documents
Culture Documents
A Globalization Scheme
for the Generalized Gauss-Newton Method
O. Knoth
Sektion Mathematik, Martm-Luther-Universit~itHalle-Wittenberg,DDR-4010 Halle, PF,
German DemocraticRepubhc
1 Introduction
where F: R " ~ R", G: R " ~ R t with m + l >n ~ I, and 11"I] denotes the Euclidean
norm. Problems of such type arise, e.g., in implicit parameter estimation prob-
lems, see [-4], or in discretized control and identification problems, cf. [-2].
Local solutions x , of N L Q E C can often be computed in an efficient way
by the generalized Gauss-Newton method. In this method, the current iterate
xk is improved by solving the corresponding linearized problem in which F
and G are replaced by its first order Taylor expansion at xk. Under natural
conditions, that method is locally convergent with a linear rate of convergence
if I[F(x,)ll is sufficiently small and if the nonlinearities in the problem functions
are moderate. The rate of convergence is ultimately quadratic if F ( x , ) = 0 .
Though the zero-residual case is an idealized situation in real applications as
nonlinear parameter estimation, it should be taken into consideration as limit
case of error free observations. Moreover, the small residual condition just men-
592 O. Knoth
tioned guarantees that a stationary point is a strong local minimizer and stable
under small perturbations. These properties, however, are in a certain sense
necessary for a parameter estimation problem to be well posed, compare [3]
for a detailed discussion
Some ways for globalizing the generalized Gauss-Newton method have been
proposed and tested in [6] and in [11]. Both methods work well in practice.
Whereas there is no strong theoretical justification of the method from [6],
recently the global convergence has been proved for the method of [11] in
the reports [13] and [25].
The aim of this paper is to describe an alternative globalization that allows
to prove strong global as well as local convergence results. As in the papers
mentioned above the damped Gauss-Newton method is taken as basic method
but here the damping factor is determined by using the exact nondifferentiable
penalty function
4,(x,#)=llF(x)ll +#lla(x)ll, ~>0.
This penalty function is taken as a line search function as it has been done
in existing sequential quadratic programming algorithms for solving general
nonlinear optimization problems, see [9, 15, 20 and 21]. In contrast to these
authors who in, general, use the /1-norm for the penalty term we choose the
Euclidean norm which seems to be more natural in the least squares context.
Two problems are connected with the use of ~9 as a line search function.
The first of these concerns rules for choosing the penalty parameter ~ which
should ensure that the solution of the linearized problem is a descent direction
for the penalty function and that global convergence is guaranteed. An updating
scheme for # proposed in [14] can be adapted to our case in a natural manner.
Secondly, the algorithm should share the good convergence properties which
the undamped Gauss-Newton method has in the small residual case. For the
new method the undamped Gauss-Newton step is shown to be accepted after
a finite number of steps in case of convergence toward a local minimizer with
sufficiently small ]lF(x,)]]. This assertion is not true when ~ is replaced by
The following second and third chapter have introducing character and give
some background material on local optimality conditions and the generalized
Gauss-Newton method. The mainpart of the paper is Chap. 4. There the globali-
zation scheme is described, a model algorithm is proposed, and a strong proof
of its global convergence is given. The behaviour of the model algorithm near
a local solution of N L Q E C is analyzed in Chap. 5 and, by a counterexample,
the difference between ~(x, p) and ~2(x, #) is demonstrated. Implementational
details and computational results are the content of Chap. 6.
In the following subscripts will denote the iteration index. The Jacobian
of F(x) and G(x) are denoted by F'(x) and G'(x), resp. For brevity we often
omit the arguments and write F=F(x), F'=F'(x), Fk=F(Xk) and so on. As
matrix norm we take always the spectral norm, and A + denotes the Moore-
Penrose inverse of a matrix A.
A Globahzation Schemefor the GenerahzedGauss-NewtonMethod 593
2 Optimality Conditions
./V'(x)~
rank(G'(x))=l, ranK[G,(x))=n (a)
hold.
From (2.3a) the optimal Lagrange parameter 2, can be explicitly expressed
as
2,= ( ~ ' V t + F-'T P (2.4)
then x . is a strict local solution of NLQEC, i.e., the inequality (2.1) holds in
the strict sense for x 4=x . .
As usual, we have denoted by Vx5 ~ and ~ 50 the gradient of 50 with respect
to x and 2, resp., and by V~ 50 the Hessian with respect to x.
The solution d is unique because of the full rank assumption (A), c.f. [10].
The improved iterate x + is then given by
x+=x+d (3.1)
If we substitute this expression into (3.2a) and multiply from left by the orthopro-
jector E = I - G' + G' we obtain the equation
where we have used the identity E(F'E) + = ( F ' E ) +. Let us still remark that
if the full rank assumption (A) is not satisfied the correction d defined by (3.3)
is the unique solution of the more general linear least squares problem
and apply the Ostrowski theorem to (3.5), c.f. [19] for a formulation of this
theorem. Let us note that, under the assumption (A), x , is a fixed point of
T if and only if x , is a stationary point of NLQEC. Since the Ostrowski theorem
requires T to be continuously differentiable in a neighbourhood of the fixed
point x , we have to show that d is a continuously differentiable function of
x. This property, however, immediately follows from the fact that the pair {d, v}
solves the linear system (3.2) and that the matrix of that system is regular and
continuously differentiable due to the assumption above. Moreover, v = v(x) has
the same smoothness properties as d(x) has. Therefore we can apply Ostrowski's
results which lead to the following theorem. Since Ostrowski's results are only
local ones it sufficies to require smoothness and full rank assumption only in
a neighbourhood of a stationary point x , .
Theorem 3.1. Suppose that F and G are twice continuously differentiable in an
open neighbourhood of a point x , , let x , satisfy the necessary first order optimality
conditions and the full rank assumption (A). Then the generalized Gauss-Newton
method
Xk+l=Xk+d(Xk), k = 0 , 1,2 . . . . ,
(F,-r ~,
~G',
,,7- [T,~+(F, oF, + G , ov, = 0
\v',] ~0 ) (3.6)
596 O. Knoth
where
and
l
6"(x)ov(x)= ~ vs O,(x) v,(x),
1=1
resp. The coefficient matrix of (3.6) coincides with the one of (3.2) for x = x ,
so that the solution can be obtained in the same way as d and v. Therefore,
with
~, = & = - ( a ; + ) T F; ~ V.
p, = p(T'(x,)) < 1
Let us remark that the eigenvalues z of T, are just the eigenvalues of the general-
ized eigenvalue problem
Now, p , < 1 is equivalent to ]~[ < 1 and, therefore, implies 1 + z > 0 , Since R(E,)
= k e r ( G , ) and R(E,)c~ k e r ( F , ) = {0} because of (A), we see that the convergence
condition p , < 1 implies the second order sufficient condition to be satisfied.
As in the unconstrained case, the inverse assertion is, unfortunately, not true.
The second order sufficient condition does not imply, in general, that p , < 1
holds. In order to guarantee local convergence toward a stationary point x ,
at which p , < 1 is not satisfied, a sufficiently good approximation of the second
order terms F"oF and G"ov must be incorporated into the algorithms. This
situation corresponds to the "large residual case" for unconstrained problems,
A G l o b a l i z a t i o n Scheme for the Generalized G a u s s - N e w t o n M e t h o d 597
holds, see the comment below. Then choose ~e(0, 1] sufficientiy large such that
lim ~k(x + c~d, # ) - ~p(x, #) = lim ~ (x, c~d, p) - ~ (x, #) __<q~(x, d, p) - ~ (x, #) < O.
~0 ~ a~0
Now we discuss how the penalty parameter p has to be choosen such that
(4.2) is valid. The derivation of a lower estimate for /~ goes in the spirit of
[21] but uses the special structure of our penalty function and the explicit
representation of the correction d whereas in [21] the necessary optimality condi-
tion of the linearized problem expressed in terms of d and the corresponding
Lagrange parameter v is used. Let us suppose for the moment that
598 O. Knoth
When we define
o~(x) =
0
([IFH + IlF'd+ Ell) [{Gll
if (IIFII+IIF'd+FII) IIGI[OeO,
otherwise,
then, from the identity (4.5), we obtain the following desired result.
Lemma 4.1. If x is not a stationary point of NLQEC and if
O(x,~)-~(x,s,~)>O
holds, and let F'(x) and G'(x) are Lipschitz continuous with constants Le and
Lo, resp. Then there exists a number qE(0, 1] such that
Model Algorithm
Step 0: Choose 6~(0, 1), xoeR", 7e(0, 1), ill,/i2~(0, oo) with fi2>fiv Set c~_ 1 = I,
P - 1 = 0, k = 0. Calculate Fo = F (Xo) and Go = G (Xo).
Step 1 : Calculate F~ = F'(Xk) and G~,= G'(Xk).
Step 2: C o m p u t e dk as solution of L L Q E C for x = Xk.
do ak = ~ ak.
(ii) Let {Xk} be infinite and belonging to a compact set W. Then dk@O for
all k. By definition Ico(x)l is b o u n d e d by
and, therefore, I(~(x)l < max {f2(x): x e W}; since f2 is continuous on the compact
set W. As a consequence, there holds /~k =/1 for all k > k0 and sufficiently large
ko due to the updating algorithm for the penalty parameter/~k" This implies
T o simplify the proof we restrict ourselves to the case n > l > 0 and ]tF (Xk)11+ 0
and ][G(Xk)H4=0 for all k. Otherwise some obvious modifications are necessary.
F r o m inequality (4.8) we obtain O(Xk, p)--O(Xk+Otkdk,#)>6etk[O(Xk, l.t)
--@(Xk, dk, #)]- N o w we are going to show that the ek are bounded below by
some positive constant. Then, obviously, lim [O(Xk, #)--(b(xk, dk, #)] = 0 holds.
k~oo
In order to prove Ctk_-->C~>0 for k>ko we use Theorem 4.1 to derive a lower
bound for the ratio [0(Xk, #)--~(Xk, dk, /~)]/l[dk[[ 2 Assumption (A) together with
the continuity of F' and G' implies the validity of
ml ]lzIP=
< (F'(x)~
\G'(x)] z < m l ]lzll Vz~R", VxeW,
and
m2 Ilyil ~ IIG'(x)V Yll N M 2 IlYlI, VY eR~, Vxe W,, (4.10)
with positive constants mi, Mi. F o r simplicity of notation let us drop the index
k. Since I - G' + G' is an orthogonal projector the relation
is valid, and (4.10) yields [LC+ ]1< 1/ml with C = F'(I - G' + G'). So we can conclude
further
tldll = = IIC + F + ( I - C + F') G '+ GJI2 ~ 2(1tC + C+TCTF[12+ [1(I-- C + F') G '+ GI] 2)
< 2/m'~ [[C s V[[ z + (1 + M1/ml)2/m~ I[G I[2,
]tV'd + rl[ = [](I--P)(F-- F' G '+ G)I[ =< HFI[ + Ma/mz I[Gt],
and
IIPFH = HC + CT FII >=llCWFll/]lcll >=IICT FII/MI.
Combining these three inequalities with the identity (4.5) and (4.9) we obtain
~(x, l t ) - @ ( x , d , p ) > l _ m i n ~ m~ ~1 _}
[ld]t 2 = 2 M2(2 IlF[I +M1/m2 IIG[[)' [(1 +M1/mi)2/m22] IIGII
(4.11)
A Globalization Schemefor the Generalized Gauss-Newton Method 601
2 ( 1 - 6 ) fl}7.
~k>min 1, LF+I~L G
Therefore, lim [~(Xk, p)--~(X~, dk, #)]--0 holds. If we use again identity (4.5)
k~oo
lim (ll [I - G' (Xk) + G' (Xk)] F' (xk) ~- F (xk) ll + IIG (Xk) ll) = O.
k~oo
We now investigate the behaviour of the model algorithm if the sequence {Xk}
converges to a point x , which is necessary stationary, i.e., it satisfies the first
order conditions.
Theorem 5.1. Assume that the sequence {Xk} generated by the model algorithm
converges to a point x , and that the assumption of Theorem (4.2 (ii) are fulfilled.
I f now IlF(x,)ll is sufficiently small then ek = 1 for all sufficiently large k, and
{Xk} converges Q-linearly toward x , .
Proof Suppose that x + x , lies in a sufficiently small neighbourhood of x,.
We consider the quotient
tp(x + c~d, # ) - q~(x, c~d, #)
Q= (5.1)
~(x,/~)- ~(x, ~xd,p)
From this inequality we obtain together with the convexity of cb with respect
to e and (4.11)
1 Ildll 2
Q < ~- (LF + #LG) ~p(x, ,u)-- ~(x, d, p)
< (L F + pLy) max ) "-M12(2 IJFI} + M ~/m2 IIG II) , [-(1 + Mt/m-O2/mZ2]Nl IIG II}. (5.2)
- ( m4
602 o. Knoth
The last inequality is only another form of the test (4.8). It is now evident
from Step 2 that ~k = 1 for all k > k2 > kl. We will yet remark that in the uncon-
strained case (l=0) Theorem 5.1. can even be proved if p , < ( 1 - 6 ) where p ,
is the spectral radius of the matrix T'(x,) from (3.7), see [-22].
In the following we will discuss another penalty function, namely
n=2, m = l = 1, X = ( X 1 , x 2 ) T ~ R 2,
dl = - X l
2x2 x2a + x2
d2- 3x22+1 - - 2 ~3 x 2 +
and
x~(x 2 - 1 ) 3 x2z - 1
IiF(x +d)il2 +#llG(x +d)lt=#fx21 ~ - 3 x 2 ~ 4 - 1 ~ 2 '
from which it is seen that the ratio of this two expressions tends to 1//2 if
x2 tends to zero. Therefore it is impossible to prove that even ~k2(x+d, #)
< ~2 (x, #) for all feasible x sufficiently near x , if # is chosen greater than one.
A Globalization Scheme for the Generalized Gauss-Newton Method 603
We will now further specify our model algorithm described in Chap. 4. The
solution of the linearized problem L L Q E C in Step 3 is obtained by the algorithm
LSE, described in [10, p. 139], whereby a generalized solution is computed
in the sense of (3.4) if the matrices are rank deficient. The calculation of the
value co(x) can be done simultaneously as will be sketched in the following.
As it is seen from the solution formula (3.3) d can be represented in the form
- dG F IFT ' d ta - 2 de F', T' F 'ld o - 2 F 'T F'r de if the denominator is not zero
co(x)=10~ (IIFI[ + [[F'd+F[[)(][G[I--[[G'd+GlJ)
otherwise
The quantities arising on the right hand side can be calculated as a byproduct
of the algorithm LSE mentioned above with only small additional cost.
Instead of the simple stepsize strategy used in the model algorithm, in the
implemented version a more sophisticated strategy has been incorporated. The
ideas for this refinement come from the trust region approach as another well
known globalization scheme in nonlinear optimization, cf. [8]. Step 4 to 6 of
the model algorithm read now as follows where additional constants
4'(xk, ~k)- 4' (xk + aN dk, ~) >_-62 [4' (x~, ~ ) - - ~(x~, as d~, ~z~)]
do ao = as ;ase [min {73 aN, 1}, min {74 as, 1}]
if
4' (Xk, IJk)-- ~ (Xk + aS dR, I~k)>=&1[4' (Xk, Pk)-- ~(Xk, aS dk, /~k)]
then ak = as
else ~k = ao
While
4' (x,, u , ) - 4' (x, + a, d,, U,) < &, [4' (x,, ~ , ) - ~(x,, a~ d,, ~k)]
do ake[71 ak, 72 ak]
604 O. Knoth
Here the aN and c~k in the intervalls specified above are determined by minimizing
an approximation of the scalar function
F 3 ( x ) = ( x 4 - 1)2
F4(x)=(x 5 - - 1) 3
G1 (x)= x4 x~ + s i n ( x 4 - x s ) - 1
G z ( x ) = xz + x'~ x 2 - 2
Xo = (0.5, 1.75, 0.5, 2, 2) v
P5 (see [23]):
n=4, m = 4 , 1 = 2
FI (x)= x 1 - 2.5
F 2 (X) = X 2 -- 2.5
F4 (x) = x 4 + 3.5
o, (x)= x~ + x~ + x~ + x1+ x , - x2 + x 3 - x 4 - 8
G2(x)= 2x2 + x22+ x~ + 2 x l - x 2 - x 4 - 5
Xo = (0, O, O, O)v
In [23], this problem is actually an inequality constrained problem, where we
have here only used the active constraints at the solution point as equality
constraints.
The numerical results computed on an EC 1040 in double precision are
listed in the following table.
In the table the problem number P, the starting and final values of F and
G, the final value /~, of the penalty parameter, the number NI of iterations,
the number N F of evaluations of the functions F and G, and the reason T
for terminating the algorithm are given. The iteration is terminated if one of
the following criteria is satisfied:
References
1. A1-Baali, M., Fletcher, R.: An efficient line search for non-linear least squares. J. Optimization
Theory Appl. 48, 359-77 (1986)
2. Bock, H.G.: Recent advances m parameteridentification techniques for o.d.e. In: Deuflhard, P.,
Hairer, E. (eds.) Numerical treatment of inverse problems in differential and integral equations.
Prog. sci. comput., pp. 95-121. Boston: Birkhaeuser 1983
3. Bock, H.G.: Randwertproblemmethoden zur Parameteridentlfiz]erung in Systemen nlchtlinearer
Differentialgleichungen. Bonner Mathematische Schnften, Nr. 183. Bonn 1987
4. Britt, H.I., Luecke, R.H.: The estimation of parameters m nonlinear impticlt models Technometrics
15, 233-247 (1973)
5. Campbell, S.L., Meyer, C.D.: Generahzed reverses of Iinear transformations. London: Pitman
1979
6. Deuflhard, P., Apostolescu, V.: An underrelaxed Gauss-Newton method for equality constrained
nonhnear least squares problems. In: Proc., IFIP Conf. Opt. Tecb. Part 2. Balakrishnan, A.V.,
Thoma, M. (eds.) Lect. Notes Control Inf. Sci, Vol. 7, pp. 22-32. Berlin Heidelberg New York:
Springer 1978
7. Fletcher, R_: Practical methods of optimlzation, Vol. 2, constrained optimization. New York-
Toronto: Wiley 1982
8. Fletcher, R.: An li penalty method for nonlinear constraints. In: Boggs, P.T., Byrd, R.H., Schnabel,
R.B. (eds.) Numerical optimization 1984, pp. 2640. Philadelphia: SIAM Publications 1985
9. Han, S.-P.: A globally convergent method for nonlinear programming. J. Optimization Theory
AppI. 4, 297-309 (1977)
10. Lawson, C.P., Hanson, R.J.: Solving least squares problems. Englewood Cliffs: Prentice-Hall 1974
11. Lindstroem, P.: A general purpose algorithm for nonlinear least squares problems with nonhnear
constraints. University of Umea, Report UMINF-103.83 (1983)
12. Lindstroem, P., Wedin, P.-A.: A new linesearch algorithm for nonhnear least squares problems.
Math. Program 29, 268-296 (1984)
13. Lindstroem, P., Wedin, P.-A.: Methods and software for nonlinear least squares problems. Umver-
sity of Umea, Report UMINF-133.87 (1987)
14. Mayne, D.Q., Maratos, M.. A first order, exact penalty function algorithm for equality constrained
optimization problems. Math. Program 16, 303-324 (1979)
15. Mayne, D.Q., Polak, E.: A superlinearly convergent algorithm for constrained optimization prob-
lems. In: Algorithms for constrained minimization of smooth nonhnear functions. Math. Program.
Study, Vol. 16, pp. 45-61. Amsterdam: North Holland
16. Miele, A., Huang, H.Y., Heidemann, J.C.: Sequential gradient restoration algorithm for the minimi-
zation of constrained functmns - ordinary and conjugate gradient versions. J. Optimization Theory
Appl. 4, 213-243 (1969)
17. Mifflin, R.: Statlonarity and superlinear convergence of an algorithm for umvanate locally Lip-
schitz constrained minimization. Math. Program 28, 50-71 (1984)
18. Mukai, H., Polak, E.: On the use of approximations in algorithms for optimization problem
with equality and inequality constraints. SIAM J. Numer. Anat. 15, 674-693 (1978)
19. Ortega, J.M., Rheinboldt, W.C.: Iterative solution of nonlinear equations in several variables.
New York: Academic 1970
20. Powell, MJ.D.: A fast algorithm for nonlinearly constrained optimization calculations. In: Watson,
G.A. (ed.) Numerical analysis Proceedings Dundee 1977. Lecture Notes in Mathematics 773.
Berlin Heidelberg New York: Springer 1980
21. Powell, M.J.D.: Variable metric methods for constrained optimization. In: Bachem, M., Grotschel,
M., Korte, B. (eds.) Mathematical Programming: The State of the Art, pp. 288-311. Berlin Heidel-
berg New York: Springer 1983
22. Ramsin, H_, Wedin, P.-A.: A comparison of some algorithms for the nonlinear least squares
problem. BIT 17, 72-90 (1977)
A Globalization Scheme for the Generahzed Gauss-Newton Method 607
23. Rosen, J.B., Suzuki, S.: Construction of nonlinear programming test problems. Comm. ACM
8, 113 (1965)
24. Schwetlick, H.: Numerische L6sung mchtlinearer Glelchungen. Berlin: VEB Deutscher Verlag
der Wlssenschaften 1979
25. Wedin, P.-A.: On the use of a quadratic merit function for constrained nonlinear least squares.
University of Umea, Report U M IN F- 135.87 (1987)