You are on page 1of 39

Numerical Solutions of Nonlinear Systems of Equations (I)

I Optimization is the foundation of machine learning.


I Setup: global search for optimal strategy:

minx∈Rn f (x) .

I Approach: Optimal strategy satisfies non-linear equation

∇x f (x) = 0 ∈ Rn .

I Focus of Ch. 10, practical problems far harder


Numerical Solutions of Nonlinear Systems of Equations (II)
I Ex: Support Vector Machine (SVM) for classification:
m
! !
1 X   
2
minw∈Rn , b∈R max 0, 1 − yi wT xi − b + λ kwk2 ,
m
i=1

I Labels: yi = 1 for blue, yi = −1 for red.


I Solution requires more than nonlinear equations.
§10.1 Fixed Points for Functions of Several Variables
system of nonlinear equations has the form
f1 (x1 , x2 , · · · , xn ) = 0,
f2 (x1 , x2 , · · · , xn ) = 0,
.. ..
. .
fn (x1 , x2 , · · · , xn ) = 0.
In vector form
F (x) = 0, where x = (x1 , x2 , · · · , xn )T .
Ex: Nonlinear equations
1
3x1 − cos (x2 x3 ) = ,
2
x12 − 81 (x2 + 0.1)2 + sinx3 = −1.06,
10π − 3
e −x1 x2 + 20x3 = − .
3

In vector form, with x = (x1 , x2 , x3 )T ,

3x1 − cos (x2 x3 ) − 21


 
def
F (x) =  x12 − 81 (x2 + 0.1)2 + sinx3 + 1.06  = 0.
e −x1 x2 + 20x3 + 10π−33
Solution Methods

I Fixed Point Method


I Newton’s Method
I Quasi-Newton Methods
I Steepest Descent Methods
Definitions
Limit: Let f be defined on a set D ⊂ Rn and
mapping into R.

limx→x0 f (x) = L if, given any number  > 0,

a number δ > 0 exists with |f (x) − L| < 


whenever x ∈ D and 0 < kx − x0 | < δ.
Definitions
Limit: Let f be defined on a set D ⊂ Rn and
mapping into R.

limx→x0 f (x) = L if, given any number  > 0,

a number δ > 0 exists with |f (x) − L| < 


whenever x ∈ D and 0 < kx − x0 | < δ.
Continuous: Function f is continuous at x0 ∈ D if

limx→x0 f (x) exists and equals f (x0 ) .


Definitions
Limit: Let f be defined on a set D ⊂ Rn and
mapping into R.

limx→x0 f (x) = L if, given any number  > 0,

a number δ > 0 exists with |f (x) − L| < 


whenever x ∈ D and 0 < kx − x0 | < δ.
Continuous: Function f is continuous at x0 ∈ D if

limx→x0 f (x) exists and equals f (x0 ) .

Continuous: Function f is continuous on D, or


f ∈ C (D), if f is continuous at every
point of D.
Definitions
Limit: Let f be defined on a set D ⊂ Rn and
mapping into R.

limx→x0 f (x) = L if, given any number  > 0,

a number δ > 0 exists with |f (x) − L| < 


whenever x ∈ D and 0 < kx − x0 | < δ.
Continuous: Function f is continuous at x0 ∈ D if

limx→x0 f (x) exists and equals f (x0 ) .

Continuous: Function f is continuous on D, or


f ∈ C (D), if f is continuous at every
point of D.
def
Continuous: F (x) = (f1 (x) , · · · , fn (x))T ∈ Rn is
continuous on D, or F ∈ C (D), if
fj (x) ∈ C (D) for j = 1, · · · , n.
Fixed Point in Rn

Def: Function G (x) : D ⊂ Rn −→ Rn has a


fixed point at p ∈ D if G (p) = p.
Fixed Point in Rn

Def: Function G (x) : D ⊂ Rn −→ Rn has a


fixed point at p ∈ D if G (p) = p.
n D=
Thm: Let o
(x1 , · · · , xn )T | αj ≤ xj ≤ βj , j = 1, · · · , n. .
Suppose G (x) ∈ C (D) with the property
that G (x) ∈ D whenever x ∈ D. Then G
has a fixed point in D.
I Ex: nonlinear equations

3x1 − cos (x2 x3 ) − 12


 
def
F (x) =  x12 − 81 (x2 + 0.1)2 + sinx3 + 1.06  = 0.
e −x1 x2 + 20x3 + 10π−3
3

I FPI 1: x = G1 (x):

1 1
x1 = cos (x2 x3 ) +
3q 6
1
x2 = x12 + sinx3 + 1.06 − 0.1
9  
1 −x1 x2 10π − 3
x3 = − e +
20 3

x(0) = (0, 0, 0)T and FPT: x(k+1) = G1 x(k) , k = 0, 1, · · ·



I
I Ex: nonlinear equations

3x1 − cos (x2 x3 ) − 12


 
def
F (x) =  x12 − 81 (x2 + 0.1)2 + sinx3 + 1.06  = 0.
e −x1 x2 + 20x3 + 10π−3
3

I FPI 2: x = G2 (x):

1 1
x1 = cos (x2 x3 ) +
3 q 6
1
x2 = − x 2 + sinx3 + 1.06 − 0.1
9 1 
1 −x1 x2 10π − 3
x3 = − e +
20 3

x(0) = (0, 0, 0)T and FPT: x(k+1) = G2 x(k) , k = 0, 1, · · ·



I
n o
Let D = (x1 , · · · , xn )T | αj ≤ xj ≤ βj , j = 1, · · · , n. . Suppose
G (x) ∈ C (D) with the property that G (x) ∈ D whenever x ∈ D.
n o
Let D = (x1 , · · · , xn )T | αj ≤ xj ≤ βj , j = 1, · · · , n. . Suppose
G (x) ∈ C (D) with the property that G (x) ∈ D whenever x ∈ D.
 
def
Thm: Let J (x) = ∂g∂xi (x) j
. Assume that there exists a constant
κ < 1 so that kJ (x)k∞ ≤ κ for all x ∈ D. Then FPI
 
x(k+1) = G x(k) , k = 0, 1, · · ·

with x(0) ∈ D converges to the unique FP p ∈ D and



(k)
κk
(0)

x − p ≤ x − p .

∞ 1−κ ∞
n o
Let D = (x1 , · · · , xn )T | αj ≤ xj ≤ βj , j = 1, · · · , n. . Suppose
G (x) ∈ C (D) with the property that G (x) ∈ D whenever x ∈ D.
 
def
Thm: Let J (x) = ∂g∂xi (x) j
. Assume that there exists a constant
κ < 1 so that kJ (x)k∞ ≤ κ for all x ∈ D. Then FPI
 
x(k+1) = G x(k) , k = 0, 1, · · ·

with x(0) ∈ D converges to the unique FP p ∈ D and



(k)
κk
(0)

x − p ≤ x − p .

∞ 1−κ ∞

2
Thm: If J (p) = 0 and ∂∂xgj ∂x
i (x)
t
≤ M for all 1 ≤ i, j, t ≤ n. Then for
sufficiently large k,

(k)
n2 M
(k−1)
2
x − p ≤ x − p .

∞ 2 ∞
§10.2 Newton’s Method: one-dimensional case review
To solve f (x) = 0, consider fixed point function for some function
φ(x):
g (x) = x − φ(x) f (x).

I Let p be a root for f (x): f (p) = 0.


I Then p is fixed point for g (x): p = g (p).
I At fixed point p: g 0 (p) = 1 − φ(p) f 0 (p).
I For quadratic convergence, need to choose φ(x) so g 0 (p) = 0,
1
or φ(p) = f 0 (p) .
I Newton’s method
 
x (k+1) = g x (k) , k = 0, 1, · · ·

f (x)
with g (x) = x − f 0 (x) .
§10.2 Newton’s Method: n-dimensional case
To solve F (x) = 0 ∈ Rn , consider fixed point function for some
matrix function A (x) ∈ Rn×n :

G (x) = x − A (x)−1 F (x) .

I Let p be a root for F (x): F (p) = 0.


I Then p is fixed point for G (x): p = G (p).
I At fixed point p: Jx (G (p)) = I − A (p)−1 Jx (F (p)).
I For quadratic convergence, need to choose A (x) so
Jx (G (p)) = 0, or A (p) = Jx (F (p)) .
I Newton’s method
 
x (k+1) = G x(k) , k = 0, 1, · · ·

with G (x) = x − Jx−1 (F (x)) F (x) .


Ex: Nonlinear equations with x = (x1 , x2 , x3 )T ,

3x1 − cos (x2 x3 ) − 12


 
def
F (x) =  x12 − 81 (x2 + 0.1)2 + sinx3 + 1.06  = 0.
e −x1 x2 + 20x3 + 10π−3
3

Jacobian matrix has analytic form:


 
3 x3 sin (x2 x3 ) x2 sin (x2 x3 )
Jx (F (x)) =  2 x1 −162 (x2 + 0.1) cosx3 .
−x2 e −x1 x2 −x1 e −x1 x2 20

Newton’s method with x(0) = (0.1, 0.1, −0.1)T


§10.2 Quasi-Newton Method: Broyden Method

Motivation: Newton’s method costs too much for large n.


Quasi-Newton Method: poor man’s alternative.
I Newton’s method:
−1   
  ∂fi (x)
x(k+1) = x(k) −A x(k) F x(k) , where A (x) =
∂xj

I Problem I with Newton: Requires n2 partial derivatives.


Problem II with Newton: Needs to factorize A x(k)

I
for each k.

Something cheaper, for slower convergence but less overall computation


Broyden Method: Motivation

I Broyden Method:
 
x(k+1) = x(k) − A−1
k F x (k)
, where Ak ∈ Rn×n .

Desired Property I: ”Mimics” A x(k) in some sense.



I

I Desired Property II: ”Easy” to compute A−1 −1


k+1 from Ak .
Broyden Method: Derivation
I Assume for some step k ≥ 0, we have available x(k) ∈ Rn and
Ak ∈ Rn×n . By Broyden Method:
 
x(k+1) = x(k) − A−1
k F x(k)
.

I Secant equation
      
F x(k+1) − F x(k) ≈ A x(k) x(k+1) − x(k) .

I Broyden ideas:
I Approximate secant equation: Ak+1 sk+1 = yk+1 , where
   
def def
yk+1 = F x(k+1) − F x(k) , sk+1 = x(k+1) − x(k) .

I Choose a special Ak+1 that does not differ much from Ak


yk+1 − Ak sk+1
Ak+1 = Ak + 2 sT
k+1
ksk+1 k2
Broyden Method
I Initialization: Given x(0) ∈ Rn .
I Choose A0 ∈ Rn×n .
I For k = 0, 1, · · · ,
 
x(k+1) = x(k) − A−1
k F x (k)
,
yk+1 − Ak sk+1
Ak+1 = Ak + sT
k+1 , with
ksk+1 k22
   
yk+1 = F x(k+1) − F x(k) , sk+1 = x(k+1) − x(k) .
Broyden Method
I Initialization: Given x(0) ∈ Rn .
I Choose A0 ∈ Rn×n .
I For k = 0, 1, · · · ,
 
x(k+1) = x(k) − A−1
k F x (k)
,
yk+1 − Ak sk+1
Ak+1 = Ak + sT
k+1 , with
ksk+1 k22
   
yk+1 = F x(k+1) − F x(k) , sk+1 = x(k+1) − x(k) .
Practical details:
I A−1 may not exist. (Broyden method fails)
k
I Does an LU factorization of Ak help when we compute LU
factorization of Ak+1 ? (Yes, but another story)
I If A−1 does exist and is available,
k

sk+1 − A−1
k yk+1
A−1 −1
k+1 = Ak + −1
sT −1
k+1 Ak .
sT
k+1 Ak yk+1
Ex: Nonlinear equations with x = (x1 , x2 , x3 )T ,
3x1 − cos (x2 x3 ) − 12
 
def
F (x) =  x12 − 81 (x2 + 0.1)2 + sinx3 + 1.06  = 0.
e −x1 x2 + 20x3 + 10π−3
3
Jacobian matrix has analytic form:
 
3 x3 sin (x2 x3 ) x2 sin (x2 x3 )
Jx (F (x)) =  2 x1 −162 (x2 + 0.1) cosx3 .
−x
−x2 e 1 2 x −x
−x1 e 1 2 x 20

Quasi-Newton method with x(0) = (0.1, 0.1, −0.1)T ,


(0)

A0 = Jx F x
§10.4 Steepest Descent Techniques (I)
System of nonlinear equations has the form

f1 (x1 , x2 , · · · , xn ) = 0,
f2 (x1 , x2 , · · · , xn ) = 0,
.. ..
. .
fn (x1 , x2 , · · · , xn ) = 0.

In vector form, with x = (x1 , x2 , · · · , xn )T :


 
f1 (x1 , x2 , · · · , xn )
def  f2 (x1 , x2 , · · · , xn ) 
 
F (x) =  ..  = 0, .
 . 
fn (x1 , x2 , · · · , xn )
def
Def: g (x) = F (x)T F (x) =
Pn 2 (x , x , · · ·
j=1 fj 1 2 , xn ). Then

F (x) = 0 ⇐⇒ minx∈Rn g (x) = 0.


Steepest Descent Techniques (II)
def
g (x) = F (x)T F (x) =
Pn 2 (x , x , · · ·
j=1 fj 1 2 , xn ). Then

F (x) = 0 ⇐⇒ minx∈Rn g (x) = 0.

Algorithm 1 Generic Descent Algorithm


Evaluate g (x) at an initial approximation vector x(0) ∈ Rn .
Set k = 0.
while not yet converged do
Determine a direction d(k) from x(k) that results in a decrease
in the value of g (x) (Descent direction.)
Move an appropriate amount α (step-size) in this direction:

x(k+1) = x(k) + α d(k) .


k =k +1
end while
Gradient and descent directions
 
∂g (x)
∂x1

Gradient: ∇x g (x) =  .. 
.
 . 
∂g (x)
∂xn

Directional deriv:
def g (x + h v) − g (x)
Dv g (x) = limh−→0 = vT ∇x g (x) .
h
Steepest descent: For any kvk2 = 1 and tiny α > 0:

g (x − α v) = g (x) − α vT ∇x g (x) + O α2


≥ g (x) − α k∇x g (x)k2 + O α2 .




∇x g (x)
g (x − α v) decreases asymptotically the most with v = k∇x g (x)k2
Steepest descent

g (x − α ∇x g (x)) ≈ g (x) − α k∇x g (x)k22 .


Steepest Descent Algorithm for solving F (x) = 0 (I)
Algorithm 2 Steepest Descent Algorithm
def
Evaluate g (x) = kF (x)k22 at initial vector x(0) ∈ Rn . Set k = 0.
while not yet converged do
Set d(k) = −∇x g (x) = −2 (J (F (x)))T F (x).
Move an appropriate amount α (step-size) in this direction:

x(k+1) = x(k) + α d(k) .


k =k +1
end while

Algorithm 2 allows many different choices for αk

I α should be ”cheap” to compute.


I α should ensure ”sufficient” reduction in g (x).
Step-size Selection
I Ensure reduction in g (x): Algorithm 3 will halt with an
(k) (k) < g x(k) .

α3 > 0 so that g x + α3 d

Algorithm 3 g (x) Reduction


Set α3 = 1.
while g x(k) + α3 d(k) ≥ g x(k) do
 

Set α3 = α23 .
end while
Step-size Selection
I Ensure reduction in g (x): Algorithm 3 will halt with an
(k) (k) < g x(k) .

α3 > 0 so that g x + α3 d

Algorithm 3 g (x) Reduction


Set α3 = 1.
while g x(k) + α3 d(k) ≥ g x(k) do
 

Set α3 = α23 .
end while

I Ensure sufficient reduction in g (x) with α(k) :

Algorithm 4 g (x) Sufficient Reduction


Set α2 = α3 /2. Compute coefficients h0 , h1 , h2
so P (α) = h0 + h1 α + h2 α (α − α2 ) interpolates
g x(k) + α d(k) at α = 0, α2 , α3 . Set α0 = 21 (α2 − h1 /h2 )
 
α(k) = argminα∈[α0 ,α2 ,α3 ] g x(k) + α d(k) .
Steepest Descent Algorithm for solving F (x) = 0 (II)

Algorithm 5 Steepest Descent Algorithm


def
Evaluate g (x) = kF (x)k22 at initial vector x(0) ∈ Rn . Set k = 0.
while not yet converged do
Set d(k) = −∇ x g (x)
= −2 (J (F (x)))T F (x),
d(k) = d(k) / d(k) 2 , (bad move)
Compute α(k) with Algorithms 3 and 4.

x(k+1) = x(k) + α d(k) .


k =k +1
end while
Ex: Nonlinear equations with x = (x1 , x2 , x3 )T ,

3x1 − cos (x2 x3 ) − 12


 
def
F (x) =  x12 − 81 (x2 + 0.1)2 + sinx3 + 1.06  = 0.
e −x1 x2 + 20x3 + 10π−3
3

Jacobian matrix has analytic form:


 
3 x3 sin (x2 x3 ) x2 sin (x2 x3 )
Jx (F (x)) =  2 x1 −162 (x2 + 0.1) cosx3 .
−x2 e −x1 x2 −x1 e −x1 x2 20
I Newton’s method with x(0) = (0, 0, 0)T
Newton Iteration Errors, Solution = [[ 5.00000000e-01]
[-7.34546142e-18]
[-5.23598776e-01]]
10 1

10 3

10 5

10 7

10 9

10 11

10 13

10 15

0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5

I Newton’s method with x(0) = − (2, 2, 2)T


Newton Iteration Errors, Solution = [[ 0.49814468]
[-0.1996059 ]
[-0.52882598]]
100

10 3

10 6

10 9

10 12

10 15

0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5


I Steepest Descent with x(0) = (0, 0, 0)T
Steepest Descent Iteration Errors
100
10 1

10 2

10 3

10 4

10 5

10 6

10 7

10 8

0 100 200 300 400 500

I Step-sizes
Steepest Descent Step-sizes
100
10 1

10 2

10 3

10 4

10 5

10 6

10 7

10 8

0 100 200 300 400 500


I Steepest Descent with x(0) = − (10, 10, 10)T
(Newton’s method diverges.)
Steepest Descent Iteration Errors
101

10 1

10 3

10 5

10 7

0 200 400 600 800 1000 1200

I Step-sizes Steepest Descent Step-sizes


100
10 1

10 2

10 3

10 4

10 5

10 6

10 7

10 8

0 200 400 600 800 1000 1200

You might also like