Professional Documents
Culture Documents
Lecture10 PDF
Lecture10 PDF
MAT 419/519
Summer Session 2011-12
Lecture 10 Notes
where u is a unit vector; that is, kuk = 1. Then, by the Chain Rule,
∂f ∂x1 ∂f ∂xn
ϕ0 (t) = + ··· +
∂x1 ∂t ∂xn ∂t
∂f ∂f
= u1 + · · · + un
∂x1 ∂xn
= ∇f (x0 + tu) · u,
and therefore
ϕ0 (0) = ∇f (x0 ) · u = k∇f (x0 )k cos θ,
where θ is the angle between ∇f (x0 ) and u. It follows that ϕ0 (0) is minimized when θ = π, which
yields
∇f (x0 )
u=− , ϕ0 (0) = −k∇f (x0 )k.
k∇f (x0 )k
We can therefore reduce the problem of minimizing a function of several variables to a single-
variable minimization problem, by finding the minimum of ϕ(t) for this choice of u. That is, we
find the value of t, for t > 0, that minimizes
x1 = x0 − t0 ∇f (x0 )
1
and continue the process, by searching from x1 in the direction of −∇f (x1 ) to obtain x2 by
minimizing ϕ1 (t) = f (x1 − t∇f (x1 ), and so on.
This is the Method of Steepest Descent: given an initial guess x0 , the method computes a
sequence of iterates {xk }, where
xk+1 = xk − tk ∇f (xk ), k = 0, 1, 2, . . . ,
with initial point x0 = (2, 3). We first compute the steepest descent direction from
to obtain
∇f (x0 ) = ∇f (2, 3) = (4, 4).
We then minimize the function
by computing
This strictly convex function has a strict global minimum when ϕ0 (t) = 64t − 32, or t = 1/2, as can
be seen by noting that ϕ00 (t) = 64 > 0. We therefore set
1 1
x1 = x0 − ∇f (x0 ) = (2, 3) − (4, 4) = (0, 1).
2 2
Continuing the process, we have
2
and by defining
ϕ(t) = f ((0, 1) − t(−4, 4)) = f (4t, 1 − 4t)
we obtain
ϕ0 (t) = −(8(4t) − 4(1 − 4t), 4(1 − 4t) − 4(4t)) · (−4, 4) = −(48t − 4, −32t + 4) · (−4, 4) = 320t − 32.
We have ϕ0 (t) = 0 when t = 1/10, and because ϕ00 (t) = 320, this critical point is a strict global
minimizer. We therefore set
1 1 2 3
x2 = x1 − ∇f (x1 ) = (0, 1) − (−4, 4) = , .
10 10 5 5
2
Repeating this process yields x3 = (0, 10 ). We can see that the Method of Steepest Descent
produces a sequence of iterates xk that is converging to the strict global minimizer of f (x, y) at
x∗ = (0, 0). 2
The following theorems describe some important properties of the Method of Steepest Descent.
Theorem Let f : Rn → R be continuously differentiable on Rn , and let x0 ∈ D. Let t∗ > 0 be the
minimizer of the function
ϕ(t) = f (x0 − t∇f (x0 )), t ≥ 0
and let x1 = x0 − t∗ ∇f (x0 ). Then
f (x1 ) < f (x0 ).
That is, the Method of Steepest Descent is guaranteed to make at least some progress toward a
minimizer x∗ during each iteration. This theorem can be proven by showing that ϕ0 (0) < 0, which
guarantees the existence of t̄ > 0 such that ϕ(t) < ϕ(0).
Theorem Let f : Rn → R be continuously differentiable on Rn , and let xk and xk+1 , for k ≥ 0, be
two consecutive iterates produced by the Method of Steepest Descent. Then the steepest descent
directions from xk and xk+1 are orthogonal; that is,
∇f (xk ) · ∇f (xk+1 ) = 0.
This theorem can be proven by noting that xk+1 is obtained by finding a critical point t∗ of
ϕ(t) = f (xk − t∇f (xk )), and therefore
That is, the Method of Steepest Descent pursues completely independent search directions from
one iteration to the next. However, in some cases this causes the method to “zig-zag” from the
initial iterate x0 to the minimizer x∗ .
3
We have seen that Newton’s Method can fail to converge to a solution if the initial iterate is
not chosen wisely. For certain functions, however, the Method of Steepest Descent can be shown
to be much more reliable.
Theorem Let f : Rn → R be a coercive function with continuous first partial derivatives on Rn .
Then, for any initial guess x0 , the sequence of iterates produced by the Method of Steepest Descent
from x0 contains a subsequence that converges to a critical point of f .
This result can be proved by applying the Bolzano-Weierstrauss Theorem, which states that any
bounded sequence contains a convergent subsequence. The sequence {f (xk )}∞ k=0 is a decreasing
sequence, as indicated by a previous theorem, and it is a bounded sequence, because f (x) is
continuous and coercive and therefore has a global minimum f (x∗ ). It follows that the sequence
{xk } is also bounded, for a coercive function cannot be bounded on an unbounded set.
By the Bolzano-Weierstrauss Theorem, {xk } has a convergent subsequence {xkp }, which can
be shown to converge to a critical point of f (x). Intuitively, as xk+1 = xk − t∗ ∇f (xk ) for some
t∗ > 0, convergence of {xkp } implies that
kp+1 −1
X
0 = lim xkp+1 − xkp = − t∗i ∇f (xi ), t∗i > 0,
p→∞
i=kp
Exercises
1. Chapter 3, Exercise 8
2. Chapter 3, Exercise 11
3. Chapter 3, Exercise 12