You are on page 1of 3

10.

8 Combination of Steepest Descent and Newton’s Method 309

Analysis of Quadratic Case


Since the method is not a full Newton method, we can conclude that it possesses only
linear convergence and that the dominating aspects of convergence will be revealed
by an analysis of the method as applied to a quadratic function. Furthermore, as
might be intuitively anticipated, the associated rate of convergence is governed
by the steepest descent part of algorithm (57), and that rate is governed by a
Kantorovich-like ratio defined over the subspace orthogonal to N .
Theorem. (Combined method). Let Q be an n × n symmetric positive definite
matrix, and let x∗ ∈ E n . Define the function

Ex = 21 x − x∗ T Qx − x∗ 

and let b = Qx∗ . Let B be an n × m matrix of rank m. Starting at an arbitrary


point x0 , define the iterative process
a) uk = −BT QB−1 BT gk , where gk = Qxk − b.
b) zk = xk + Buk .
c) pk = b − Qzk .
pT p
d) xk+1 = zk + k pk , where k = Tk k .
pk Qpk
This process converges to x∗ , and satisfies

Exk+1   1 − Exk  (58)

where  0    1, is the minimum of

pT p2
pT QppT Q−1 p

over all vectors p in the nullspace of BT .


Proof. The algorithm given in the theorem statement is exactly the general
combined algorithm specialized to the quadratic situation. Next we note that

BT pk = BT Q x∗ − zk  = BT Q x∗ − xk  − BT QBuk


 −1 (59)
= −BT gk + BQBT BT QB BT gk = 0

which merely proves that the gradient at zk is orthogonal to N . Next we calculate

2Exk  − Ezk  = xk − x∗ T Qxk − x∗  − zk − x∗ T Qzk − x∗ 


= −2ukT BT Qxk − x∗  − ukT BT QBuk
(60)
= −2ukT BT gk + ukT BT QBBT QB−1 BT gk
= −ukT BT gk = gkT BBT QB−1 BT gk 
310 Chapter 10 Quasi-Newton Methods

Then we compute

2Ezk  − Exk+1  = zk − x∗ T Qzk − x∗  − xk+1 − x∗ T Qxk+1 − x∗ 


= −2k pTk Qzk − x∗  − 2k pTk Qpk
= 2k pTk pk − 2k pTk Qpk (61)

pTk pk 2
= k pTk pk = 
pTk Qpk

Now using (59) and pk = −gk − QBuk we have

2Exk  = xk − x∗ T Qxk − x∗  = gkT Q−1 gk


= pTk + ukT BT QQ−1 pk + QBuk 
(62)
= pTk Q−1 pk + ukT BT QBuk
= pTk Q−1 pk + gkT BBT QB−1 BT gk 

Adding (60) and (61) and dividing by (62) there results

Exk  − Exk+1  gkT BBT QB−1 BT gk + pTk pk 2 /pTk Qpk


=
Exk  pTk Q−1 pk + gkT BBT QB−1 BT gk
q + pTk pk /pTk Qpk 
= 
q + pTk Q−1 pk /pTk pk 

where q  0. This has the form q + a/q + b with

pTk pk pTk Q−1 pk


a=  b= 
pTk Qpk pTk pk

But for any pk , it follows that a  b. Hence

q+a a
 
q+b b

and thus

Exk  − Exk+1  pTk pk 2


 T 
Exk  pk Qpk pTk Q−1 pk 

Finally,

pTk pk 2
Exk+1   Exk  1 −  1 − Exk 
pTk Qpk pTk Q−1 pk 

since BT pk = 0.
10.8 Combination of Steepest Descent and Newton’s Method 311

The value  associated with the above theorem is related to the eigenvalue
structure of Q. If p were allowed to vary over the whole space, then the Kantorovich
inequality

pT p2 4aA


  (63)
p Qpp Q p a + A2
T T −1

where a and A are, respectively, the smallest and largest eigenvalues of Q, gives
explicitly

4aA
= 
a + A2

When p is restricted to the nullspace of BT , the corresponding value of  is larger.


In some special cases it is possible to obtain a fairly explicit estimate of . Suppose,
for example, that the subspace N were the subspace spanned by m eigenvectors of
Q. Then the subspace in which p is allowed to vary is the space orthogonal to N
and is thus, in this case, the space generated by the other n − m eigenvectors of Q.
In this case since for p in N ⊥ (the space orthogonal to N ), both Qp and Q−1 p are
also in N ⊥ , the ratio  satisfies

pT p2 4aA


=  
p Qpp Q p a + A2
T T −1

where now a and A are, respectively, the smallest and largest of the n − m eigen-
values of Q corresponding to N ⊥ . Thus the convergence ratio (58) reduces to the
familiar form
 
A−a 2
Exk+1   Exk 
A+a

where a and A are these special eigenvalues. Thus, if B, or equivalently N , is chosen


to include the eigenvectors corresponding to the most undesirable eigenvalues of
Q, the convergence rate of the combined method will be quite attractive.

Applications
The combination of steepest descent and Newton’s method can be applied usefully
in a number of important situations. Suppose, for example, we are faced with a
problem of the form

minimize fx y

where x ∈ E n  y ∈ E m , and where the second partial derivatives with respect to x


are easily computable but those with respect to y are not. We may then employ
Newton steps with respect to x and steepest descent with respect to y.

You might also like