Professional Documents
Culture Documents
1, SEIrrEMBER 1976
TECHNICAL NOTE
Conjugate Gradient
Versus Steepest Descent 1
J. C. ALLWRIGHT 2
Communicated by D. Q. Mayne
1. In~oducfion
1 T h a n k s a r e d u e to P r o f e s s o r R . W . S a r g e n t , I m p e r i a l C o l l e g e , L o n d o n , E n g l a n d , f o r
suggestions concerning presentation.
2 Lecturer, Department of Computing and Control, Imperial College, London, England.
129
© 1976 Plenum Publishing Corporation, 227 Wesl 17th Street, New York, N.Y. lO011. No part of this publication
may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical,
photocopying, microfilming, recording, or otherwise, without written permission of the publisher.
130 JOTA: VOL. 20, NO. 1, SEPTEMBER 1976
Steepest descent is the same, but with /3k =--0. The sequences so
generated are denoted {xk}, {gk}, {ak}.
The algorithms will be assumed to start at the same point x0, so 2o = x0.
It can be shown that both Xk (generated by steepest descent) and ;~k
(generated by conjugate gradient) belong to the same linear variety xo + Bk,
for all k - 1. The variety is the translation along Xo of the linear subspace
spanned by the initial gradient go, Qgo. . . . , and Qk-lgo. It is known (Ref. 1,
Sections 8.2 and 8.3) that ~k is optimal in xo+B~, and so the steepest-
descent algorithm cannot give a lower cost J at each iteration k than that
given by the conjugate-gradient algorithm, i.e.,
J[xk]¢ J[~].
It is known also (Ref. 1, Section 7.6 and Exercise 10 of Section 8.8) that
E[xk] <- O~kE[xo], (3)
E[:~k]<_462kE[Xo], (4)
where
E[x]= J [ x ] - J * ,
J* = min J[x],
xER n
0 = (1 - 3,)/(1 + y),
= (1 - 4~,)/(1 + 4v),
2/= a/ A.
Here, a is the smallest eigenvalue of (9 and A is the largest eigenvalue.
Now, t~ < 0, so (3) and (4) suggest that, asymptotically, J[2k] converges
to J* faster, with increasing iterations k, than J[£k] converges. They do not
prove, however, that
1[~k] < J[xk],
JOTA: VOL. 20, NO. t, SEPTEMBER 1976 131
as they give only upper bounds on the cost sequences. Such an inequality' is
established here.
2. Results
and so, as
gj = Qx i - b e R [ Q] --LN[Q],
132 JOTA: VOL. 20, NO. 1, SEPTEMBER 1976
(gj, g/'+l) = 0.
Hence, gj+1 cannot be an eigenvector and &+l # 0 [from (8)] if & # 0 and is
not an eigenvector. Using this result iteratively when go ¢ 0 and is not an
eigenvector establishes sufficiency and completes the proof.
Now, let Hk = H~ project R" orthogonally onto Bk, and consider
applying the steepest-descent algorithm to minimizing
Jk[x ] A (go, [X -- Xo])+ ½<[X-- X0], HkOHk [x -- Xo]) (10)
on R ", starting at x~ = xo. Denote the sequences so constructed by {xk},
{ak}, and {g~}, where g~ is the gradient of J~[x] at x~. For the first k
iterations, relationships between vectors associated with the steepest-
descent algorithm for minimizing J[x], generating a sequence {Xg} with
gradients {gk}, and the steepest-descent algorithm for minimizing Jk[x] are
provided by the following lemma.
Lemma 2.2 (i) follows immediately from this and the fact that, for steepest
descent,
k--1
xk = Xo + ~ a~gi.
i=1
from which Lernma 2.2 (ii) follows inductively, as x~= Xo and, from (10),
k
go = go.
L e m m a 2.2 (iii) is established by (14) when i = k - 1.
Consider next the generic case, with go nonzero and not an eigenvector
of O. Then, L e m m a 2.1 for the steepest-descent algorithm (applied to J)
gives gj # 0 for all finite j. This does not show directly that g~ # 0 (desired
later) as g~ = Hkgk [Lemma 2.2 (iii)], and so could be zero even though
gk # 0. By L e m m a 2.2 (ii), however, g~ = gl # 0, Vk > 1, so application of
134 JOTA: VOL. 20, NO. 1, SEPTEMBER 1976
References